- 3
- 0
- 约2.61万字
- 约 12页
- 2018-12-02 发布于天津
- 举报
汉语比字句要素的常规序列模式探索AstudyontheSequential
汉语“比”字句要素的常规序列模式探索∗
朴敏浚 李强 袁毓林
(北京大学 中文系/ 中国语言学研究中心/计算语言学教育部重点实验室,北京 100871 )
摘要:表达“差比”义的 “比”字句,是比较句的重要句型,也是比较句关键要素抽取问题中不可回避的主要句型。
该句型的关键要素 (SUB、BI 、OBJ 、ITM 、DIM 、RES 、EXT )在语义上互相交织,在表层句法上可以实现为多
种多样的序列模式。面向中文“比”字句关键要素抽取问题,本文对于表示“差比”义的460 多个“比”字句文本
进行了七种关键要素的标注。在此基础上利用 Apriori 和 PrefixSpan 算法找出这些要素的关系规则及其序列模式,
并归纳出六种 “比”字句关键要素的分布规律。此外,本文还进一步说明产生这六种模式规则的动因,对于 “比”
字句特征选取问题提供重要的语言学的启发以及理论依据。
关键词: “比”字句;关键要素;关系规则;序列模式;分布规律
A study on the Sequential Patterns of Semantic Constituents of the Bi-
Comparative Structure
Abstract: The Bi-structure, which highlights a contrasting characteristic between two elements, is the key comparative
sentence structure in Chinese. Therefore, it has been the main target of keyword mining of Chinese comparative sentence.
This structure consists of 7 types semantic items (SUB, BI, OBJ, ITM, DIM, RES, EXT), of which various sequential
patterns may occur. To provide meaningful information for the keyword extraction task of this comparative structure, this
study first begins with the tagging of the 7 semantic items on about 460 sentences. Second, associative rules and sequential
patterns are extracted using the Apriori and PrefixSpan algorithms, from which 6 rules of the item distribution are
established. Finally, this paper illustrates the rationale behind the construct of these 6 rules, providing a better understanding
of the particular characteristics and useful insight for feature selection task of the Bi-comparative structure in Chinese.
Keywords: Bi-structure; keyword extraction; sequential pattern mining; distribution rule
1 引言
随着互联网的普及,社交网络(SNS)的影响力也随之日益增强,成为人们的信息交流平台,
这自然而然地引起了企业对网络大数据(big data )进行分析与探索的动机和兴趣。其中,与商品
销售有着直接关系的网络商品评价,尤其是顾客评价中的比较句成了信息抽取(IE )领域的研究热
点。比如,黄小江 (2008 )提出以类序列规则(CSR )为特征的汉语比较句识别模型。张晨等
(2013 )利用了类序列规则(CSR )、语义角色信息、统计词等多种特征,
原创力文档

文档评论(0)