- 6
- 0
- 约1.28万字
- 约 54页
- 2019-01-25 发布于湖北
- 举报
* Bakeoff 2007 – 法国电信北京研发中心 Problems of NER with only local information “Many empirical approaches…make decision only on local context for extract inference, which is based on the data independent assumption. But often this assumption does not hold because non-local dependencies are prevalent in natural language.” Observation from Experiments: There are many seen named entities are missed; At least 10% of unseen and missed named entities have been labeled out correctly for at least once. “If the context surrounding one occurrence of a token sequence is very indicative of it being an entity, then this should also influence the labeling of another occurrence of the same token sequence in a different context that is not indicative of entity”. * Bakeoff 2007 – 法国电信北京研发中心 * Bakeoff 2007 – 法国电信北京研发中心 Local Features Unigram:Cn(n=-2,-1,0,1,2) Bigram:CnCn+1(n=-2,-1,0,1) and C-1C1 0/1 Features Assign 1 to all the characters which are labeled as entity and 0 to all the characters which are labeled as NONE in training data. In such way, the class distribution can be alleviated greatly , taking Bakeoff 2006 MSRA NER training data for example, if we label the corpus with 10 classes, the class distribution is: 0.81(B-PER), 1.70(B-LOC), 0.95(BORG), 0.81(I-PER), 0.88(I-LOC), 2.87(I-ORG), 0.76(EPER), 1.42(E-LOC), 0.94(E-ORG), 88.86(NONE) if we change the label scheme to 2 labels(0/1), the class distribution is: 11.14 (entity), 88.86(NONE) * Bakeoff 2007 – 法国电信北京研发中心 Non-local Features Token-position features(NF1) These refer to the position information(start, middle and last) assigned to the token sequence which is matched with the entity list exactly. These features enable us to capture the dependencies between the identical candidate entities and their boundaries. Entity-majority features(NF2) These refer to the majority label assigned to the token sequence which is matched with the entity list exactly. These features enable us to capture the dependencies between the identical e
您可能关注的文档
最近下载
- ZXM10 EISU(V1.0)增强智能型采集单元用户手册.pdf VIP
- 统编版(2024)一年级语文下册11浪花课件.pptx VIP
- 语文-江西省吉安市2025届高三上学期1月期末教学质量检测试题和答案.docx VIP
- 物理-江西省吉安市2025届高三上学期1月期末教学质量检测试题和答案.docx VIP
- 英语-江西省吉安市2025届高三上学期1月期末教学质量检测试题和答案.docx VIP
- 2022年度(江苏)高考数学真题(带答案).pdf VIP
- 浙江省温州市2025年七年级上学期期末数学试卷附答案.pdf VIP
- 大气降水的同位素水文学-第3讲-2011版本.pdf VIP
- 02J331_地沟及盖板图集.docx
- 政府机关物业管理服务保洁服务工作计划及操作规程服务方案.docx VIP
原创力文档

文档评论(0)