基于多表数据库中文关键词top-n查询处理-query proces of chinese keyword top - n based on multi-table database.docxVIP

  • 5
  • 0
  • 约4.9万字
  • 约 53页
  • 2018-05-18 发布于上海
  • 举报

基于多表数据库中文关键词top-n查询处理-query proces of chinese keyword top - n based on multi-table database.docx

基于多表数据库中文关键词top-n查询处理-query proces of chinese keyword top - n based on multi-table database

摘要关键词查询的理论和技术在信息检索和 Web 搜索引擎中得到了广泛深入的研究和应 用。传统数据库管理系统仅支持模式匹配,不支持自由形态的关键词查询。鉴于此,近 年来关系数据库上的关键词查询处理的研究成为备受关注的前沿课题之一。传统关系数 据库系统运用结构化查询语言(SQL)对数据库进行操作,需要用户掌握 SQL 和数据库模 式,这对于普通用户是困难的。此外,对返回的查询结果,传统数据库系统只能进行简 单排序,用户要想从中获取最感兴趣的信息是很困难的。目前,关键词查询的研究主要 针对英文关键词,因此针对具有多表的数据库,本文给出一种中文关键词 top-N 查询处 理方法。此方法创建索引表存储从数据库中析出的中文元组字及其相关信息,进而构造 索引用以快速匹配查询关键字,借鉴 IR 的相似度公式构造适合中文关键词查询的排序策 略。对于一个中文关键词查询,利用索引快速匹配查询字和元组字得到相应信息,并根 据这些信息创建候选元组生成链表和 SQL 查询语句, 进而得到候选元组及其与查询之间 的相似度,最终按相似度返回 Top-N 结果。此方法实现了按字搜索及中文的缩略词的查 询处理。最后利用真实数据集进行实验,实验内容包括对查询相应时间和准确性的验证, 实验数据显示本文方法是有效的。关键词关系数据库中文关键词索引排序策略IAbstractThe theories and techniques of keyword query have been extensively studied and applied in Information Retrieval and Web search engines. Traditional relational database management systems support pattern match of tuples with query conditions; however, they do not support free-form keyword search. Thus, the processing of keyword queries over relational databases has intensified in recent years, and has been one of active research issues. Traditional relational database systems utilize SQL (Structured Query Language) to search the database, and require users to know the database schema and SQL. These requirements are difficult for ordinary users to use such search model. Additionally, the ranking functions for results of a query are simple in traditional relational database systems; therefore, it is not easy for users to find their desired answers from too many results. Researches of keyword queries are in the majority of evaluating English keyword search at present. In this paper, we provide a new method for processing Chinese keyword queries in a database system with multiple relations. This method creates an index table to store the Chinese tuple words and the related information coming from the database, and then constructs a procedure of calculating the similarity. Given a Chinese keyword query, using the index to match the query words and tuple words, we establish a linked list to generate identifier

您可能关注的文档

文档评论(0)

1亿VIP精品文档

相关文档