GATE功能介绍(对外).pptVIP

  • 4
  • 0
  • 约7.8千字
  • 约 27页
  • 2024-05-07 发布于四川
  • 举报

概览(1)GATEisaGeneralArchitectureforTextEngineeringDevelopedbytheNaturalLanguageProcessingResearchGroupwithintheDepartmentofComputerScienceattheUniversityofSheffield概览(2)LanguageResources(LRs) referstodata-onlyresourcessuchasdocument,corpus.ProcessingResources(PRs) referstoresourceswhosecharacterisprincipallyprogrammaticoralgorithmic,suchastokeniser,POStagger.Applications modelacontrolstrategyfortheexecutionofPRs. Therearetwomaintypesofpipeline:SimplepipelinesCorpuspipelines概览(4)功能介绍Tokeniser 实现分词功能,每个Token标注包括的属性有: kind:Word,Number,Symbol,Punctuation,SpaceToken orth:upperInitial,allCaps,lowerCase,mixedCaps length stringSentenceSpliter 实现分句功能功能介绍Gazetteer 辞典 lists.def内容包括 country.lst:location:country country.lst内容包括 China Chine Chypre Colombia Colombie功能介绍PartofSpeechTagger 词性标注 也有标注错误的: Iwillstudyhardthisyear. JJ(adjective,应当为RBadverb)功能介绍SemanticTagger 就是NETransducer,命名实体识别OrthographicCoreference(Orthomatcher) TheOrthomatchermoduleaddsidentityrelationsbetweennamedentitiesfoundbythesemantictagger,inordertoperformcoreference.PronominalCoreference 将人名、代词联系起来,比如: JohnSmith…he…him…John…he…功能介绍DocumentReset Removealltheannotationsetsandtheircontents,apartfromtheonecontainingthedocumentformatanalysis(OriginalMarkups).功能介绍VerbGroupChunker Therulescoverfinite(isinvestigating),non-finite(toinvestigate),participles(investigated),andspecialverbconstructs(isgoingtoinvestigate).NounPhraseChunker Markingnounphrasesintext.功能介绍OntoTextGazetteer 与ANNIEGazetteer结果相似,但是算法不同。FlexibleGazetteer TheFlexibleGazetteerprovidesuserswiththeexibilitytochoosetheirowncustomizedinputandanexternalGazetteer.GazetteerListCollector 指定标注类型的实体插入到指定Gazetteer的相应list中 并生成统计文件(实体名$次数)功能介绍TreeTagger TheTreeTagg

文档评论(0)

1亿VIP精品文档

相关文档