SRILM详解过程.docVIP

  • 64
  • 0
  • 约1.59千字
  • 约 3页
  • 2017-08-22 发布于江苏
  • 举报
SRILM详解过程

Generating the N-gram Count File ngram-count -text train.zh -order 5 -write train.count -unk -text:training corpus name -order:n-gram count -write:output countfile name -unk:mark OOV asunk train.count里面的内容: 第一列分别为一元,二元,三元,四元,五元;第二列为counts in training corpus Generating the N-gram Language model ngram-count -read train.count -order 5 -lm train.lm -gt1min 3 -gt1max 7 -gt2min 3 -gt2max 7 -gt3min 3 -gt3max 7 -gt4min 3 -gt4max 7 -gt5min 3 -gt5max 7 -read:read count file -lm:output LM file name -grnmin:Good-Turing discounting for n-gram train.lm里面的内容 第一列为log probability(base 10);第三列为log of backoff weight(base) Calculate the Test Data Perplexity ngram -ppl test.zh -order 5 -lm train.lm -ppl:calculate perplexity for test data ppl和ppl1分别的计算公式: 其中为句子的个数 从以上的过程可以看出SRILM的功能: Generate the n-gram count file from the corpus Train the language model from the n-gram count file Calculate the test data perplexity using the trained language model 平滑方法为Good-Turing时的困惑度: ngram-count -read train.count -order 5 -lm train.lm -gt1min 3 -gt1max 7 -gt2min 3 -gt2max 7 -gt3min 3 -gt3max 7 -gt4min 3 -gt4max 7 -gt5min 3 -gt5max 7 平滑方法为Absolute Discounting时的困惑度: ngram-count -read train.count -order 5 -lm train.lm -cdiscount1 0.5 -cdiscount2 0.5 -cdiscount3 0.5 -cdiscount4 0.5 -cdiscount5 0.5 平滑方法为Witten-Bell Discounting时的困惑度: ngram-count -read train.count -order 5 -lm train.lm -wbdiscount1 -wbdiscount2 -wbdiscount3 -wbdiscount4 -wbdiscount5 平滑方法为Modified Knerser-Ney Discounting时的困惑度 ngram-count -read train.count -order 5 -lm train.lm -kndiscount1 -kndiscount2 -kndiscount3 -kndiscount4 -kndiscount5

文档评论(0)

1亿VIP精品文档

相关文档