Unexpected Productions May Well be Errors Seminar für Sprachwissenschaft.pdfVIP

  • 5
  • 0
  • 约4.4万字
  • 约 4页
  • 2017-08-07 发布于浙江
  • 举报

Unexpected Productions May Well be Errors Seminar für Sprachwissenschaft.pdf

Unexpected Productions May Well be Errors Seminar für Sprachwissenschaft.pdf

Unexpected Productions May Well be Errors Tylman Ule and Kiril Simov Seminar für Sprachwissenschaft Linguistic Modelling Laboratory Universität Tübingen Bulgarian Academy of Sciences ule@sfs.uni-tuebingen.de kivs@ Abstract We present a method for detecting annotation errors in treebanks. It assumes that errors are unexpected small tree fragments. We generate statistics over configurations of these fragments using a standard statistical test. We use the test result and the characteristics of their distributions as features to classify unseen configurations as likely errors via machine learning. Evaluation shows that the resulting list of error candidates is reliable, independent of corpus size, annotation quality, and target language. Setting up language resources involves considerable ef- are especially useful when pattern-based approaches are not fort, because human intervention is inevitable and costly. easily applicable, because patterns are difficult to find. Human annotators are essential, because they usually out- We present such a non-symbolic method that attacks er- perform automatic methods in terms of annotation accu- rors and inconsistencies in structural annotation, and that racy, but they still make their own kind of errors. In addition shows good performance across languages and annotation to genuine mistakes, they do not always behave identically schemes. We detect errors and inconsistencies that appear each time when presented with the same infrequent prob- as unexpected events in a corpus using a variant of Directed lem. Thus one can expect a number of errors to be present

文档评论(0)

1亿VIP精品文档

相关文档