

  1. 1、本文档共70页,可阅读全部内容。
  2. 2、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
  3. 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载
  4. 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
Advanced Topics in Data Mining: Sequential Patterns Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to collect and store massive amounts of sales data, referred to as the basket data A record in such data typically consists of the transaction date and the items bought in the transaction Very often, data records also contain customer-id, particularly when the purchase has been made using a credit card or a frequent-buyer card Catalog companies also collect such data using the orders they receive Sequential Pattern Mining An example of such a pattern is that customers typically rent “Star Wars (星際大戰)”, then “Empire Strikes Back (帝國大反擊)”, and then “Return of the Jedi (絕地大反攻)” These rentals need not be consecutive Customers who rent some other videos in between also support this sequential pattern Elements of a sequential pattern need not be simple items “Computer Science and Programming Language”, followed by “Data Structure”, followed by “System Programs and Operating Systems” is an example of a sequential pattern in which the elements are sets of items Sequential Pattern Mining Given Transaction Time, Customer Id, Items Bought Definition The length of a sequence is the number of itemsets in the sequence A sequence of length k is called a k-sequence The support for an itemset i is defined as the fraction of customers who bought the items in i in a single transaction The itemset i and the 1-sequence i have the same support An itemset with minimum support is called a large (frequent) itemset or litemset AprioriAll Algorithm Each itemset in a large sequence must have minimum support Any large sequence must be a list of litemsets Finding all sequential patterns in five phases Sort Phase Litemset Phase Transformation Phase Sequence Phase Maximal Phase AprioriAll Algorithm: Sort Phase AprioriAll Algorithm: Litemset Phase AprioriAll Algorithm: Transformation Phase AprioriAll Algorithm: Sequence Phase Sequence Phase: C


guan_son + 关注


