文本处理办法TextProcessing.pptVIP

  • 6
  • 0
  • 约8.47千字
  • 约 28页
  • 2019-08-29 发布于广东
  • 举报
Text Processing Tries Chapter 9: Text Processing Outline and Reading Strings and Pattern Matching (§9.1) Tries (§9.2) Text Compression (§9.3) Optional: Text Similarity (§9.4). No Slides. Texts Pattern Matching Strings A string is a sequence of characters Examples of strings: Java program HTML document DNA sequence Digitized image An alphabet S is the set of possible characters for a family of strings Example of alphabets: ASCII Unicode {0, 1} {A, C, G, T} Let P be a string of size m A substring P[i .. j] of P is the subsequence of P consisting of the characters with ranks between i and j A prefix of P is a substring of the type P[0 .. i] A suffix of P is a substring of the type P[i ..m - 1] Given strings T (text) and P (pattern), the pattern matching problem consists of finding a substring of T equal to P Applications: Text editors Search engines Biological research Brute-Force Algorithm The brute-force pattern matching algorithm compares the pattern P with the text T for each possible shift of P relative to T, until either a match is found, or all placements of the pattern have been tried Brute-force pattern matching runs in time O(nm) Example of worst case: T = aaa … ah P = aaah may occur in images and DNA sequences unlikely in English text Boyer-Moore Heuristics The Boyer-Moore’s pattern matching algorithm is based on two heuristics Looking-glass heuristic: Compare P with a subsequence of T moving backwards Character-jump heuristic: When a mismatch occurs at T[i] = c If P contains c, shift P to align the last occurrence of c in P with T[i] Else, shift P to align P[0] with T[i + 1] Example The Boyer-Moore Algorithm Example Analysis Boyer-Moore’s algorithm runs in time O(nm + s) Example of worst case: T = aaa … a P = baaa The worst case may occur in images and DNA sequences but is unlikely in English text Boyer-Moore’s algorithm is significantly faster than the brute-force algorithm on English text The KMP Algorithm - Motivation Knuth-Morris-Pratt’s alg

文档评论(0)

1亿VIP精品文档

相关文档