- 3
- 0
- 约8.03千字
- 约 23页
- 2017-03-09 发布于上海
- 举报
Versatile Document Image Content Extraction Lehigh通用的文档图像内容提取里海
Versatile Document Image Content Extraction Henry S. Baird Michael A. Moll Jean Nonnemaker Matthew R. Casey Don L. Delorenzo Document Image Content Extraction Problem Given an image of a document Find regions containing handwriting, machine-print text, graphics, line-art, logos, photographs, noise, etc Difficulties Vast diversity of document types Arduous data collection How big is a representative training set? Expense of preparing correctly labeled “ground-truthed” samples Lack of consensus on how to evaluate performance Our Research Goals Versatility First Beware “brittle” or narrow approaches Develop methods that work across broadest possible spectrum of document and image types Voracious Classifiers Belief that accuracy of a classifier has more to do with training data than other considerations Want to train on extremely large (and representative) data sets (in reasonable amounts of time) Extremely High Speed Classification Ideally, perform nearly at I/O rates (as fast as images can be read) Too ambitious? Related Strategies (for the future) Amplification Real ground-truthed training samples are hard to find, expensive to generate and difficult to ensure coverage Want to use real samples as ‘seeds’ for massive synthetic generation of pseudo randomly perturbed samples for use in supplementary training Confidence Before Accuracy Confidence is maybe more important than accuracy, since even modest accuracy (across all cases) can be useful Near-Infinite Space Design for best performance in near future when main memory will be orders of magnitude larger and faster Data-Driven Design Avoid arbitrary engineering decisions such as choice of features, instead allowing training data to determine this Document Images Range of document and image types Color, grey-level, black and white Any size or resolution Lots of file formats (TIFF, JPEG, PNG, etc) Pre-processing step of converting images into three channel color PNG file in HSL (Hue, Saturation, Luminance) co
您可能关注的文档
- Use and Usefulness of Ejournals a Case study of Research使用电子期刊的研究案例研究中的应用.ppt
- Use and Operation of Vacuum Lines UCLA Chemistry 使用真空线路运行UCLA化学.ppt
- Use Arial Narrow in grey ASISA使用Arial字体缩小灰色asisa.ppt
- Use Arial Narrow in grey South African Savings Institute使用Arial字体缩小灰色南非储蓄所.ppt
- USDA Funding and Technical Assistance Programs for 美国农业部资助和技术援助计划.ppt
- Use of Central Line Insertion Checklist CareGroup Portal使用中央线插入检查治疗组患者门静脉.ppt
- USATF Podium Effort Project Middle Long Distance 讲台上的努力中长距离项目协会.ppt
- Upper Extremity El Camino College上肢埃尔卡米诺学院.ppt
- US EPA Region V Coastal Wetlands REMAP “Goal 2' 美国环保署的V区滨海湿地映射“目标2”.ppt
- Use the Midpoint Rule to approximate the given 用中点法则来逼近给定的.ppt
最近下载
- 《GB_T 18802.331-2024低压电涌保护器元件 第331部分:金属氧化物压敏电阻(MOV)的性能要求和试验方法》专题研究报告.pptx
- 2015-2021年全国体育单招数学真题汇编.pdf VIP
- 2025军队文职公共知识法律部分讲义.pdf VIP
- 大学生职业生涯规划与就业指导教学教案(共10课).docx VIP
- 《民法典之债权法》课件.ppt VIP
- 无线局域网技术与实践课程标准教学教案.docx
- 2002年上海市第十六届初中物理竞赛(大同中学杯)初赛试题.doc VIP
- 2023年四川信息职业技术学院单招职业技能考试题库及答案解析word版.docx VIP
- 大学生职业生涯规划与就业指导课标教案.docx VIP
- 2025年统招专升本云南省医学综合考试试题及答案.docx VIP
原创力文档

文档评论(0)