using workflows to explore and optimise named entity recognition for chemistry使用工作流来探索和优化化学命名实体识别.pdfVIP
- 1、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。。
- 2、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
- 4、该文档为VIP文档,如果想要下载,成为VIP会员后,下载免费。
- 5、成为VIP后,下载本文档将扣除1次下载权益。下载后,不支持退款、换文档。如有疑问请联系我们。
- 6、成为VIP后,您将拥有八大权益,权益包括:VIP文档下载权益、阅读免打扰、文档格式转换、高级专利检索、专属身份标志、高级客服、多端互通、版权登记。
- 7、VIP文档为合作方或网友上传,每下载1次, 网站将根据用户上传文档的质量评分、类型等,对文档贡献者给予高额补贴、流量扶持。如果你也想贡献VIP文档。上传文档
查看更多
using workflows to explore and optimise named entity recognition for chemistry使用工作流来探索和优化化学命名实体识别
Using Workflows to Explore and Optimise Named Entity
Recognition for Chemistry
1 2 2 1 1
BalaKrishna Kolluru *, Lezan Hawizy , Peter Murray-Rust , Junichi Tsujii , Sophia Ananiadou
1 National Centre for Text Mining, Manchester Interdisciplinary Biocentre, University of Manchester, Manchester, United Kingdom, 2 Unilever Centre for Molecular
Informatics, University of Cambridge, Cambridge, United Kingdom
Abstract
Chemistry text mining tools should be interoperable and adaptable regardless of system-level implementation, installation
or even programming issues. We aim to abstract the functionality of these tools from the underlying implementation via
reconfigurable workflows for automatically identifying chemical names. To achieve this, we refactored an established
named entity recogniser (in the chemistry domain), OSCAR and studied the impact of each component on the net
performance. We developed two reconfigurable workflows from OSCAR using an interoperable text mining framework, U-
Compare. These workflows can be altered using the drag--drop mechanism of the graphical user interface of U-Compare.
These workflows also provide a platform to study the relationship between text mining components such as tokenisation
and named entity recognition (using maximum entropy Markov model (MEMM) and pattern recognition based classifiers).
Results indicate that, for chemistry in particular, eliminating noise generated by tokenisation techniques lead to a slightly
better performance than others, in terms of named entity recognition (NER) accuracy. Poor tokenisation translates into
poorer input to the classifier components which in turn leads to an increase in Type I or Type II errors, thus, loweri
您可能关注的文档
- ultraviolet radiation increases the toxicity of pyrene, 1-aminopyrene and 1-hydroxypyrene to human keratinocytes紫外线辐射增加芘毒性,1-aminopyrene 1-hydroxypyrene人类角质细胞.pdf
- ultraviolet irradiation-dependent fluorescence enhancement of hemoglobin catalyzed by reactive oxygen species紫外线irradiation-dependent荧光增强血红蛋白催化的活性氧.pdf
- unaffected perceptual thresholds for biological and non-biological form-from-motion perception in autism spectrum conditions影响知觉阈值对生物和非生物form-from-motion知觉在孤独症谱系的条件下.pdf
- unauthorized horizontal spread in the laboratory environment the tactics of lula, a temperate lambdoid bacteriophage of escherichia coli未经授权的水平传播在实验室环境中卢拉的战术,人字形的温和噬菌体的大肠杆菌.pdf
- ultrasonic-assisted enzymolysis to improve the antioxidant activities of peanut (arachin conarachin l.) antioxidant hydrolysate超声波协助酶解,提高抗氧化活动的花生(花生球蛋白conarachin l .).pdf
- unbiased analysis of tcrαβ chains at the single-cell level in human cd8+ t-cell subsets无偏tcrαβ链分析人类cd8 + t细胞在单细胞水平的子集.pdf
- unc45b forms a cytosolic complex with hsp90 and targets the unfolded myosin motor domainunc45b形成胞质复杂以及目标展开肌球蛋白马达域.pdf
- unbiased mutagenesis of mhv68 lana reveals a dna-binding domain required for lana function in vitro and in vivo无偏的诱变mhv68拉娜揭示了dna结合域所需拉娜函数体外和体内.pdf
- unc-41stonin functions with ap2 to recycle synaptic vesicles in caenorhabditis elegansunc-41stonin函数ap2回收突触囊泡在秀丽隐杆线虫.pdf
- unbiased and automated identification of a circulating tumour cell definition that associates with overall survival无偏和自动循环肿瘤细胞的识别与总体存活率定义相关联.pdf
- uterine dysfunction in biglycan and decorin deficient mice leads to dystocia during parturition子宫功能障碍在实验和decorin缺陷小鼠在分娩导致难产.pdf
- usp7hausp promotes the sequence-specific dna binding activity of p53usp7hausp促进p53的sequence-specific dna结合活性.pdf
- uterine epithelial cells specifically induce interferon-stimulated genes in response to polyinosinic-polycytidylic acid independently of estradiol明确子宫上皮细胞诱导干扰素刺激基因独立polyinosinic-polycytidylic酸雌二醇的反应.pdf
- utility function from maximum entropy principle效用函数的最大熵原理.pdf
- usp8 promotes smoothened signaling by preventing its ubiquitination and changing its subcellular localizationusp8促进抵抗信号通过阻止其泛素化和改变其亚细胞定位.pdf
- uterine nk cells are critical in shaping dc immunogenic functions compatible with pregnancy progression子宫nk细胞是至关重要的在塑造dc免疫原性的功能兼容怀孕进展.pdf
- utility of in vivo transcription profiling for identifying pseudomonas aeruginosa genes needed for gastrointestinal colonization and dissemination体内转录的效用分析对于识别所需铜绿假单胞菌基因肠胃殖民和传播.pdf
- using unsupervised patterns to extract gene regulation relationships for network construction使用无监督模式提取基因调控网络建设的关系.pdf
- using the gravity model to estimate the spatial spread of vector-borne diseases使用引力模型来估计空间病媒传播疾病的传播.pdf
- using structural information to change the phosphotransfer specificity of a two-component chemotaxis signalling complex利用结构信息变化的phosphotransfer特异性趋化性信号双组份复杂.pdf
文档评论(0)