- 8
- 0
- 约3.28千字
- 约 4页
- 2017-03-24 发布于湖北
- 举报
BuildingaStatisticalLanguageModelUsingCMUCLMTK
Building a Statistical Language Model Using CMUCLMTK
Building a Statistical Language Model Using CMUCLMTK
Building a Statistical Language Model Using CMUCLMTK
Required Software
You need to download and install cmuclmtk. See?CMU Sphinx Downloads?for details.
Text preparation
First of all you need to cleanup text. Expand abbreviations, convert numbers to words, clean non-word items. For example to clean Wikipedia?XML?dump you can use special python scripts. To clean?HTML?pages you can try/p/boilerpipe/?a nice package specifically created to extract text from?HTML
For example on how to create lan
原创力文档

文档评论(0)