Exact Set Matching.pptVIP

  • 10
  • 0
  • 约1.07万字
  • 约 45页
  • 2017-04-04 发布于江苏
  • 举报
Exact Set Matching

Exact Set Matching Charles Yan 2008 Exact Set Matching Goal: To find all occurrences in text T of any pattern in a set of patterns P={p1,p2,…,pz}. n: the total length of all the patterns in P. m: the length of T O(n+zm) vs. O(n+m+k) k: the number of occurrences in T the patterns from P. Keyword Tree Keyword tree for a set P is a rooted directed tree k satisfying three conditions: (1) each edge is labeled with one character; (2) any two edges out of the same node have distinct labels; and (3) every pattern Pi in P maps to some node v of K such that the characters on the path from the root of K to v exactly spell out Pi and every leaf of K is mapped to by some pattern in P. Keyword Tree P={potato, poetry, pottery, science, school} Keyword Tree Construction of keyword tree K1: the tree that includes only pattern 1 Ki: the tree that includes patterns p1 …pi Assuming a fixed-size alphabet Construction of Ki by adding Pi to Ki-1 costs O(|Pi|) Thus total time is O(n) Keyword Tree Naive use of keyword tree for exact set matching: Start from each position l in T and follow the unique path from r in K that matches a substring of T starting at l . O(mn) Keyword Tree The dictionary problem: To find if a input word is contained in the dictionary. The words in a dictionary (P) are encoded in a keyword tree. The problem is reduced to whether the input word (T) completely matches some pattern in P. Require that the set of patterns are initially known. Keyword Tree Speedup the exact set matching problem (1) shift the tree by more than one positions (2) skip comparisons that have been made in previous steps. Failure Link v: a node in keyword tree K L(v): the label on v, that is, the concatenation of characters on the path from the root to v. lp(v): the length of the longest proper suffix of string L(v) that is a prefix of some pattern in P. Let this substring be a. Lemma. There is a unique node in the keyword tree that is labeled by string a. Let this node be nv.

文档评论(0)

1亿VIP精品文档

相关文档