符号属性值的相似度学习及属性重要性研究-应用数学专业论文.docx

下载文档 降价啦

5
0
约3.08万字
约 36页
2018-12-06 发布于江苏
举报
版权申诉
保障服务

符号属性值的相似度学习及属性重要性研究-应用数学专业论文.docx

1、本文档共36页，可阅读全部内容。
2、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。
3、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

符号属性值的相似度学习及属性重要性研究-应用数学专业论文

I I 摘要摘要在基于案例推理的分类系统（即 CBR 分类器）中，属性间的相似度对分类和决策结果起着决定性的作用。而属性的相似性度量又依赖于每个属性值之间的相似度计算。本文研究的是符号属性值之间的相似度，这里考虑的符号值属性是其属性值完全无序的一类属性，例如属性“颜色”，其取值为“红”，“黄”，“蓝”。大部分研究认为这类属性值之间的相似度只能取 0 或 1，这种处理方法会导致信息的丢失。已有的工作将这类属性值的相似度从{0,1}扩展到了[0,1]区间，并用遗传算法进行了学习。但当属性的数量及值域较大时，遗传算法的收敛速度明显变慢，且分类精度受到影响。基于此考虑，本文提出一种基于粒子群的相似度学习算法来获得符号属性值的相似度，通过实验证明，基于粒子群的算法比遗传算法收敛速度快，精度高。此外，本研究进一步指出，通过学习获得的属性值相似度可以粗略反应属性本身的重要程度，并给出了属性的重要性度量。实验验证了这种度量的可行性。最后，基于粗糙集理论中相对正域的概念，本文还提出了一种判断数据集的属性之间有无交互作用的方法。关键词符号值属性相似度粒子群算法属性重要性度量粗集约简 II II Abstract Abstract When studying cased-based reasoning classifiers (i.e. CBR classifiers), similarity between features occupies a decisive role for the results of classification and decision-making. Similarity measure between features depends on calculation of similarity between each feature value. This paper learns similarity between values of symbolic features. The symbolic features considered here have completely unordered values, such as for the feature “color”, values are “red”, “yellow” and “blue”. Most researchers considers similarities between these feature values can only be either 0 or 1, this approach will lead to loss of information. Existing work has improved these values from {0, 1} to [0, 1] and has presented A GA-based approach for learning similarity measure of symbolic feature values. However, when the number of feature values and features become larger, the convergence speed based GA obviously slows down, and the accuracy of classification may be also affected. Considered this reason, this paper proposed a PSO-based method to get similarity measure of symbolic features. The results of the experiments show that, the convergence speed based on PSO algorithm is much faster than based on GA algorithm and the accuracy is also improved. In addition, this paper has futher indicated that similarities of feature values we learnt can roughly reflect feature importance and proposed feature importance measure. Experiments proved the feasibility of this