中文信息学报基于svm的中文组块分析ξ-中国中文信息学会.pdfVIP

下载本文档

5
0
约1.87万字
约 7页
2017-11-11 发布于天津
举报

中文信息学报基于svm的中文组块分析ξ-中国中文信息学会.pdf

中文信息学报基于svm的中文组块分析ξ-中国中文信息学会

中　文　信　息　学　报第 18 卷第 2 期　 JOURNAL OF CHINESE INFORMATION PROCESSING Vol 18 No2 文章编号 :1003 - 0077 (2004) 02 - 000 1 - 07 基于 SVM 的中文组块分析李　珩 ,朱靖波 ,姚天顺 ( 东北大学计算机软件与理论研究所 ,辽宁沈阳　110004) 摘要 :基于 SVM (support vector machine) 理论的分类算法 , 由于其完善的理论基础和良好的实验结果 , 目前已逐渐引起国内外研究者的关注。和其他分类算法相比 ,基于结构风险最小化原则的 SVM 在小样本模式识别中表现较好的泛化能力。文本组块分析作为句法分析的预处理阶段 ,通过将文本划分成一组互不重叠的片断 ,来达到降低句法分析的难度。本文将中文组块识别问题看成分类问题 ,并利用 SVM 加以解决。实验结果证明 ,SVM 算法在汉语组块识别方面是有效的 ,在哈尔滨工业大学树库语料测试的结果是 F = 8867 % , 并且特别适用于有限的汉语带标信息的情况。关键词 :计算机应用 ; 中文信息处理 ;支持向量机 ;结构风险最小化 ;文本组块中图分类号 : TP39 1 　　　文献标识码 :A SVM Based Chinese Text Chunking L I Heng ,ZHU J ingbo , YAO Tianshun ( Institute of Computer Software and Theory ,Nort heastern Univer sit y , Shenyang , Liaoning 110004 ,China) Abstract : The classification algorit hm based on SVM ( support vector machine) attract s more attention from re searchers due to it s p erfect t heoretical prop erties and good empirical result s. Comp ared wit h ot her classification algo rit hms , structural risk minimizations based SVM achieve high generalization p erformance wit h small number of sam ples. The text chunking , as a preprocessing step for p arsing , is to divide text into syntactically related nonoverlap ping group s of words (chunks) , reducing t he complexity of t he full p arsing. In t his p ap er , we treat Chinese text chunking as a classification problem , and apply SVM to solve it . The chunking exp eriment s were carried out on t he HI T Chinese Treebank corpus. Exp erimental result s show t hat it is an effective approach , achieving an F score of 8867 % , esp ecially for

您可能关注的文档

文档评论（0）

1亿VIP精品文档

更多 >

中文信息学报基于svm的中文组块分析ξ-中国中文信息学会.pdfVIP