ISSN 1673-9418 CODEN JKYTA8 E-mail: fcst@vip. Journal of Frontiers of Computer Science and Technology 1673-9418/2012/06(10)-0865-12 Tel: +86-10 DOI: 10.3778/j.issn. 1673-9418.2012.10.001 实体数据库中多相似连接顺序选择策略* + 刘雪莉 ,王宏志,李建中,高 宏 哈尔滨工业大学 计算机科学与技术学院,哈尔滨 150001 Multi-Similarity Join Order Selection in Entity Database + LIU Xueli , WANG Hongzhi, LI Jianzhong, GAO Hong School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China + Corresponding author: E-mail: xuei.hit@ LIU Xueli, WANG Hongzhi, LI Jianzhong, et al. Multi-similarity join order selection in entity database. Jour- nal of Frontiers of Computer Science and Technology, 2012, 6(10) :865-876. Abstract: To organize and query entities described by relational tuples is an effective way to manage poor-quality data. Taking into account that the attribute of an entity has more than one description, the similarity join based on entity must consider multiple values. Due to importance effect to the join efficiency of multi-join order, this paper proposes a multi-join order selection algorithm which based on Markov chain Monte Carol (MCMC) method to esti- mate the size of entity similarity join, and raises a cost model to optimize the order of multi-relation of entity on join problem. Moreover, experimental results show that the estimating algorithm has good performance especially when the size of relations is large. Key words: multi-relation; entity; similarity join; Markov chain Monte Carol (MCMC) 摘 要:按照元组描述的实体对其进行组织和查询处理是一种管理劣质数据的有效方法。考虑到同一个实体 的同一属性存在多个描述值,因此基于实体的数据库上的连接是支持多个值的相似性连接。由于多表连接操 作的连接顺序对连接性能有着重要的影响,研究了实



