A Comparison of Different Approaches to Automatic Speech Segmentation.pdf

下载文档

1
0
约2.15万字
约 8页
2017-04-11 发布于江苏
举报
版权申诉
保障服务

A Comparison of Different Approaches to Automatic Speech Segmentation.pdf

1、本文档共8页，可阅读全部内容。
2、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。
3、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

A Comparison of Different Approaches to Automatic Speech Segmentation

A Comparison of Different Approaches to Automatic Speech Segmentation Kris Demuynck and Tom Laureys? K.U.Leuven ESAT/PSI Kasteelpark Arenberg 10 B-3001 Leuven, Belgium {kris.demuynck,tom.laureys}@esat.kuleuven.ac.be http://www.esat.kuleuven.ac.be/～spch Abstract. We compare different methods for obtaining accurate speech segmentations starting from the corresponding orthography. The com- plete segmentation process can be decomposed into two basic steps. First, a phonetic transcription is automatically produced with the help of large vocabulary continuous speech recognition (LVCSR). Then, the phonetic information and the speech signal serve as input to a speech segmentation tool. We compare two automatic approaches to segmentation, based on the Viterbi and the Forward-Backward algorithm respectively. Further, we develop different techniques to cope with biases between automatic and manual segmentations. Experiments were performed to evaluate the generation of phonetic transcriptions as well as the different speech seg- mentation methods. 1 Introduction In this paper we investigate the development of an accurate speech segmenta- tion system for the Spoken Dutch Corpus project. Speech segmentations, on phoneme (e.g. TIMIT) or word level (e.g. Switchboard, CGN), have become a standard annotation in speech corpora. Corpus users can benefit from the fact that the segmentation couples the speech signal to the other annotation layers (orthography, phonetics) by means of time stamps, thus providing easy access to audio fragments in the corpus. For the speech technologist segmentations are indispensable for the initial training of acoustic ASR models, the development of TTS systems and speech research in general. Some speech corpora only provide automatic segmentations, obviously requir- ing an accurate segmentation algorithm. In other corpora speech segmentations are checked manually. The latter case requires a high-quality automatic segmen- tation system as well, since a bet