- 131
- 0
- 约 52页
- 2017-01-12 发布于天津
- 举报
ArnetMiner–ExtractionandMiningofAcademicSocialNetworks
* We identify tokens by using heuristics. There are five types of tokens: ‘standard word’, ‘non-standard word’, punctuation mark, space, and line break. Standard words are words in natural language. Non-standard words include several general ‘special words’, for example, email address, IP address, URL, date, number, and so on. We identify non-standard words by using regular expressions. Punctuation marks include period, question mark, and exclamation mark. Words and punctuation marks are separated into different tokens if they are joined together. Natural spaces and line breaks are also regarded as tokens respectively. * Address and affiliation always contain many tokens, the dependencies between the tokens can help improve the accuracy, other approaches can not utilize the dependencies * * The simplifying form is popular in bibliographic records. * The distributions can be typically categorized into the following cases: (1) publications of different persons are clearly separated (“Hui Fang”, in Figure 5 (a)). Name disambiguation on this kind of data can be solved pretty well by our approach and the number K can also be found accurately; (2) publications are mixed together but with a dominant author who writes most of the papers (e.g., “Bing Liu”, in Figure 5 (b)); our approach can achieve a F1-score of 87.36% and found K that is close to the actual number; and (3) publications of different authors are mixed (e.g., “Jing Zhang” and “Yi Li”, in Figure 5 (c) and (d)). Our method can obtain a performance of 91.25% and 82.11% in terms of F1-measure. However, it would be difficult to accurately find the number K. For example, the number found by our approach for “Jing Zhang” is 14, but the correct number should be 25. * * * * 两个研究者要写一篇论文。论文中每个单词的生成都符合这个过程,首先选择一个作者负责这个单词的生成,这个作者又按照一定的概率分布生成了一个话题,这个话题按照一定的概率分布生成了这个单词和会议。依次类推,我们就生成了整片论文。但是论文的内容不都是原创的,有一部分是参考文献中的方法,为了建立起参考文献和论文内容的关系,我们在生成参考文献的时候,为每个参考文献选一个话题,然后按照一定的概率分布生成当前的参考文献。 * * * Modeling the Academic Network and Appl
您可能关注的文档
- 95學年度藝術群職業學校訪視實施計畫.doc
- 95學年度西松國小校內科展評分表.doc
- 96年度原住民性別平等教育研習課程.doc
- 97學年度【健康與體育】領域課程計畫.doc
- 97學生護照詩詞學習.doc
- 97年台灣疼痛醫學會專科醫師考題---筆試部分(單選題).doc
- 98年台灣疼痛醫學會專科醫師考題---筆試部分(單選題).doc
- 9月16日〈星期六〉0830-1005AM.doc
- 9海伦凯勒-edu.kz.doc
- A5重症大樓內科五樓單位簡介.doc
- ATheoryofHumanMotivation.doc
- Atrialfibrillationanddigoxin.ppt
- AutodeskCorporatePowerPointTemplate.ppt
- A型口蹄疫抗体阳性筛查金标检测试纸条卡.doc
- B1醫囑要給陳先生RegularInsulin30U,你取得之製劑是從RI100Uml之10ml小瓶,請問你必須.doc
- BDFlexibleCytometricBeadArraySystem.ppt
- BeforeReading_wordweb.ppt
- Benignlesions(includingcysts)inoralandmxillofacialregion.doc
- bhwyw.fjsen.com.doc
- BloodyDiarrhea.ppt
最近下载
- 新教材人教A版高中数学选择性必修第三册成对数据的统计分析课件.pptx VIP
- 高地500千伏输变电工程环境影响报告书.pdf VIP
- 健康照护师(长期照护师)培训规范.pdf VIP
- 摇臂钻床日常点检表.docx VIP
- 2025 机器人全生命周期碳足迹核算实操报告:ISO 14067 适配与减排方案.docx VIP
- 智能建筑绿色节能技术应用方案.docx VIP
- JJF(皖) 213-2025 回弹仪检定装置校准规范.docx VIP
- BEHRINGER 百灵达 MIC2200 话筒放大器 说明书.pdf VIP
- 2024-2025学年深圳市宝安区8年级上期末生物、地理合卷含答案.pdf VIP
- 人教A版高中数学新教材分析课件.pptx VIP
原创力文档

文档评论(0)