- 10
- 0
- 约 5页
- 2018-08-19 发布于天津
- 举报
融合距离度量和高斯混合模型的中文词义归纳模型-计算机科学
Vo1. 44 No.8
第 44 卷第8 期 计算机科学
2017 年 8 月 COMPUTER SCIENCE Aug.2017
融合距离度量和高斯混合模型的中文词义归纳模型
张宜浩刘智朱常鹏
(重庆理工大学计算机科学与工程学院 重庆 400054)
摘 要 词义归纳是解决词义知识获取的重要研究课题,利用聚类算法对词义进行归纳分析是目前最广泛采用的方
法。通过比较K-Means 聚类算法和 EM 聚类算法在各自词义归纳模型上的优势,提出一种新的融合距离度量和高斯
混合模型的聚类算法,以期利用两种聚类算法分别在距离度量和数据分布计算上的优势,挖掘数据的几何特性和正态
分布信息在词义聚类分析中的作用,从而提高词义归纳模型的性能。实验结果表明,所才是混合聚类算法对于改进词义
归纳模型的性能是十分有效的。
关键词 词义归纳,距离度量,高斯混合模型,混合聚类
中图法分类号 TP391 文献标识码 A 001 10. 11896/j. issn. 1002-137叉 2017.08.045
Chinese Word Sense Induction Model by Integrating Distance Metric and Gaussian Mixture Model
ZHANG Yi-hao LIU Zhi ZHU Chang-peng
(College of Computer Science and Engineering ,Chongqing University of Techno1ogy ,Chongqing 400054 ,China)
Abstract Word sense induction is an important topic in solving knowledge acquisition of word sense , and the most
widely used method to word sense induction is based on cluster analysis algorithm. By comparing K-Means clustering al
gorithm with EM clustering algorithm on the model of word sense induction ,we proposed a new hybrid clustering algo
rithm by integrating distance metric and Gaussian mixture model ,which combine the advantages of distance metric and
data distributed computing in the two cluster algorithms respectively to mine the role of geometrical properties and nor
mal distribution information of training data in clustering analysis and then improve the performance of performance of
word sense mode l. Experimental results show that the hybrid clustering algorithm proposed in this paper is very effec
tive to improve the performance of word sense induction model.
Keywords Word sense induction ,Distance metric ,Gaussian mixture model , Hybrid c
原创力文档

文档评论(0)