SRILM详解过程.docVIP

下载本文档

64
0
约1.59千字
约 3页
2017-08-22 发布于江苏
举报

SRILM详解过程.doc

SRILM详解过程

Generating the N-gram Count File ngram-count -text train.zh -order 5 -write train.count -unk -text:training corpus name -order:n-gram count -write:output countfile name -unk:mark OOV asunk train.count里面的内容：第一列分别为一元，二元，三元，四元，五元；第二列为counts in training corpus Generating the N-gram Language model ngram-count -read train.count -order 5 -lm train.lm -gt1min 3 -gt1max 7 -gt2min 3 -gt2max 7 -gt3min 3 -gt3max 7 -gt4min 3 -gt4max 7 -gt5min 3 -gt5max 7 -read:read count file -lm:output LM file name -grnmin:Good-Turing discounting for n-gram train.lm里面的内容第一列为log probability（base 10）；第三列为log of backoff weight（base） Calculate the Test Data Perplexity ngram -ppl test.zh -order 5 -lm train.lm -ppl:calculate perplexity for test data ppl和ppl1分别的计算公式：其中为句子的个数从以上的过程可以看出SRILM的功能： Generate the n-gram count file from the corpus Train the language model from the n-gram count file Calculate the test data perplexity using the trained language model 平滑方法为Good-Turing时的困惑度： ngram-count -read train.count -order 5 -lm train.lm -gt1min 3 -gt1max 7 -gt2min 3 -gt2max 7 -gt3min 3 -gt3max 7 -gt4min 3 -gt4max 7 -gt5min 3 -gt5max 7 平滑方法为Absolute Discounting时的困惑度： ngram-count -read train.count -order 5 -lm train.lm -cdiscount1 0.5 -cdiscount2 0.5 -cdiscount3 0.5 -cdiscount4 0.5 -cdiscount5 0.5 平滑方法为Witten-Bell Discounting时的困惑度： ngram-count -read train.count -order 5 -lm train.lm -wbdiscount1 -wbdiscount2 -wbdiscount3 -wbdiscount4 -wbdiscount5 平滑方法为Modified Knerser-Ney Discounting时的困惑度 ngram-count -read train.count -order 5 -lm train.lm -kndiscount1 -kndiscount2 -kndiscount3 -kndiscount4 -kndiscount5

您可能关注的文档

文档评论（0）

1亿VIP精品文档

更多 >

SRILM详解过程.docVIP