- 6
- 0
- 约8.47千字
- 约 28页
- 2019-08-29 发布于广东
- 举报
Text Processing Tries Chapter 9: Text Processing Outline and Reading Strings and Pattern Matching (§9.1) Tries (§9.2) Text Compression (§9.3) Optional: Text Similarity (§9.4). No Slides. Texts Pattern Matching Strings A string is a sequence of characters Examples of strings: Java program HTML document DNA sequence Digitized image An alphabet S is the set of possible characters for a family of strings Example of alphabets: ASCII Unicode {0, 1} {A, C, G, T} Let P be a string of size m A substring P[i .. j] of P is the subsequence of P consisting of the characters with ranks between i and j A prefix of P is a substring of the type P[0 .. i] A suffix of P is a substring of the type P[i ..m - 1] Given strings T (text) and P (pattern), the pattern matching problem consists of finding a substring of T equal to P Applications: Text editors Search engines Biological research Brute-Force Algorithm The brute-force pattern matching algorithm compares the pattern P with the text T for each possible shift of P relative to T, until either a match is found, or all placements of the pattern have been tried Brute-force pattern matching runs in time O(nm) Example of worst case: T = aaa … ah P = aaah may occur in images and DNA sequences unlikely in English text Boyer-Moore Heuristics The Boyer-Moore’s pattern matching algorithm is based on two heuristics Looking-glass heuristic: Compare P with a subsequence of T moving backwards Character-jump heuristic: When a mismatch occurs at T[i] = c If P contains c, shift P to align the last occurrence of c in P with T[i] Else, shift P to align P[0] with T[i + 1] Example The Boyer-Moore Algorithm Example Analysis Boyer-Moore’s algorithm runs in time O(nm + s) Example of worst case: T = aaa … a P = baaa The worst case may occur in images and DNA sequences but is unlikely in English text Boyer-Moore’s algorithm is significantly faster than the brute-force algorithm on English text The KMP Algorithm - Motivation Knuth-Morris-Pratt’s alg
您可能关注的文档
最近下载
- 高一物理期中考试试题及答案.docx VIP
- 基于大数据的心理健康评估.docx VIP
- 医疗影像智能诊断.docx VIP
- 陶瓷膜的制备与水处理.pptx VIP
- (高清版)-B-T 34590.6-2022 道路车辆 功能安全 第6部分:产品开发:软件层面.pdf VIP
- 智能医疗影像分析系统开发与应用.docx VIP
- Axio-Imager-M2显微镜使用手册.ppt VIP
- 2025至2030中国热电材料行业市场深度调研及竞争格局及有效策略与实施路径评估报告.docx VIP
- T_CSGPC 033-2024 陆上风电场设施变形测量技术规程.docx
- 93K测试机异常处理.docx VIP
原创力文档

文档评论(0)