古代汉字文献切分研究-Core.PDF

  1. 1、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。。
  2. 2、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载
  3. 3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
Computer Engineering and Applications 计算机工程与应用 2013 ,49 (2 ) 29 古代汉字文献切分研究 1 2 1 倪恩志 ,蒋旻隽 ,周昌乐 1 2 1 NI Enzhi , JIANG Minjun , ZHOU Changle 1.厦门大学 信息科学与技术学院,艺术认知与计算实验室,福建 厦门 361005 2.上海应用技术学院 计算机科学与信息工程学院,上海 201418 1.Mind, Art and Computation Lab, School of Information Science and Technology, Xiamen University, Xiamen, Fujian 361005, China 2.School of Computer Science and Information Engineering, Shanghai Institute of Technology, Shanghai 201418, China NI Enzhi, JIANG Minjun, ZHOU Changle. Research on segmentation of historical Chinese books. Computer Engineer- ing and Applications, 2013, 49 (2 ):29-33. Abstract :In this paper, the methods of text line segmentation and character segmentation are proposed according to the charac- teristics of historical Chinese documents. The method of line segmentation analyzes stroke projection, and adopts a recursive segmentation algorithm based on various project thresholds and gap thresholds. This algorithm is robust in the cases of text line adhesion and skew, especially short text lines. The method of character segmentation has two steps. A rough segmentation is applied to get the approximate positions of segmentation. A fine segmentation based on the analysis of connected components and the judgment of adhesion points is carried out. This algorithm can extract the characters even though they overlap and connect each other. The experimental results show the methods have good performance and are suitable for the segmentation of historical Chinese documents. Key words :document image processing; Chinese character segmentation; ancient books digitalization 摘 要:针对古代汉字文档的特点,提出了适合于古文档的列切分方法和字切分方法。提出的列切分方法直接对文档的 笔画投影进行分析,采用一种基于分层投影过滤和变长间隙阈值的递归切分算法。该算法在列间隔较小、列与格线存在 粘连、文档具有一定程度的倾斜的情况下,也能准确地抽取出列,尤其对短列的切分达到了较好的效果。提出的字切分方 法分为两步,进行粗切分确定大致的切分位置,采用基于连通域分析与粘连点判断的方法做进一步的细切分。该算

文档评论(0)

fengruiling + 关注
实名认证
内容提供者

该用户很懒,什么也没介绍

1亿VIP精品文档

相关文档