低资源语音识别中融合多流特征卷积神经网络声学建模方法.docVIP

下载本文档

26
0
约1.67万字
约 29页
2018-08-17 发布于福建
举报
版权申诉

低资源语音识别中融合多流特征卷积神经网络声学建模方法.doc

1、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。。
2、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。
4、该文档为VIP文档，如果想要下载，成为VIP会员后，下载免费。
5、成为VIP后，下载本文档将扣除1次下载权益。下载后，不支持退款、换文档。如有疑问请联系我们。
6、成为VIP后，您将拥有八大权益，权益包括：VIP文档下载权益、阅读免打扰、文档格式转换、高级专利检索、专属身份标志、高级客服、多端互通、版权登记。
7、VIP文档为合作方或网友上传，每下载1次，网站将根据用户上传文档的质量评分、类型等，对文档贡献者给予高额补贴、流量扶持。如果你也想贡献VIP文档。上传文档

低资源语音识别中融合多流特征卷积神经网络声学建模方法

低资源语音识别中融合多流特征的卷积神经网络声学建模方法　　摘要：　　针对卷积神经网络（CNN）声学建模参数在低资源训练数据条件下的语音识别任务中存在训练不充分的问题，提出一种利用多流特征提升低资源卷积神经网络声学模型性能的方法。首先，为了在低资源声学建模过程中充分利用有限训练数据中更多数量的声学特征，先对训练数据提取几类不同的特征；其次，对每一类类特征分别构建卷积子网络，形成一个并行结构，使得多特征数据在概率分布上得以规整；然后通过在并行卷积子网络之上加入全连接层进行融合，从而得到一种新的卷积神经网络声学模型；最后，基于该声学模型搭建低资源语音识别系统。实验结果表明，并行卷积层子网络可以将不同特征空间规整得更为相似，且该方法相对传统多特征拼接方法和单特征CNN建模方法分别提升了3.27%和2.08%的识别率；当引入多语言训练时，该方法依然适用，且识别率分别相对提升了573%和4.57%。　　关键词：　　低资源语音识别；卷积神经网络；特征规整；多流特征　　中图分类号：　　TN912.34 　　文献标志码：A 　　Abstract：　　Aiming at solving the problem of insufficient training of Convolutional Neural Network （CNN） acoustic modeling parameters under the lowresource training data condition in speech recognition tasks， a method for improving CNN acoustic modeling performance in lowresource speech recognition was proposed by utilizing multistream features. Firstly， in order to make use of enough acoustic information of features from limited data to build acoustic model， multiple features of lowresource data were extracted from training data. Secondly， convolutional subnetworks were built for each type of features to form a parallel structure， and to regularize distributions of multiple features. Then， some fully connected layers were added above the parallel convolutional subnetworks to incorporate multistream features， and to form a new CNN acoustic model. Finally， a lowresource speech recognition system was built based on this acoustic model. Experimental results show that parallel convolutional subnetworks normalize different feature spaces more similar， and it gains 3.27% and 2.08% recognition accuracy improvement respectively compared with traditional multifeature splicing training approach and baseline CNN system. Furthermore， when multilingual training is introduced， the proposed method is still applicable， and the recognition accuracy is improved by 5.73% and 457% respectively. 　　英文关键词Key words：　　lowresource speech recognition； Convolutional Neural Network （CNN）； feature normalization； multistream