基于强化学习神语言模型对话生成研究.pdfVIP

下载本文档

0
0
约3.72万字
约 30页
2026-01-21 发布于北京
举报

基于强化学习神语言模型对话生成研究.pdf

使用强化学习和神经语言模型生成

MarcellaCindyPrasetio

计算机科学系

斯坦福大学mp21@stanford.edu

MustafaAbdool计算机科学系斯坦

福大学

moose878@stanford.edu

CarsonLam生物医学信息学系

斯坦福大学

carsonl@stanford.edu

神经机器翻译（NMT）在语言翻译方面展示了令人印象深刻的结果。将NMT应用

于生成仍远未达到现实，并且这一是当前研究的一个引人入胜的领域。

学习到的响应要么不连贯，要么过于通用，使得乏味，无法使进行长期

的吸引人的。对长期规划的需求促使NLP研究人员借鉴强化学习的原则。在

这里，我们考察了的将NMT重新配置为接收序列到响应序列

（seq2seq）的方法。为了鼓励产生有趣且吸引人的，我们使

用强化学习的策略梯度方法更新seq2seq。我们研究了函数（如语义连贯

性、信息流动和回答的难易程度）在模拟与（环境）中质

量的效果，并通过语言多样性的定量指标（如n‑gram重复次数）评估我们的模

型。最后，我们展示了在CornellMovie和Reddit数据集上训练的NMT在应

用REINFORCE算法后产生的响应得到了改进。

DialogueGenerationusingReinforcementLearning

andNeuralLanguageModels

MarcellaCindyPrasetio

DepartmentofComputerScience

StanfordUniversity

mp21@stanford.edu

MustafaAbdool

DepartmentofComputerScience

StanfordUniversity

moose878@stanford.edu

CarsonLam

DepartmentsofBiomedicalInformatics

StanfordUniversity

carsonl@stanford.edu

Neuralmachinetranslation(NMT)hasdemonstratedimpressiveresultsinlanguage

translation.TheapplicationofNMTtodialoguegenerationisstillfarfromrealisticand

thistopicisafascinatingareaofactiveresearch.Learnedresponsesareeither

incoherentorgeneric,makingforuninterestingdialoguethatdoesnotsettheagentup

forlongtermengagingconversation.TheneedforlongtermnninghasledNLP

researcherstodrawonprincipofreinforcementlearning.Hereweexaminerecently

publishedmethodsforcombiningNMTrefittedasareceivedsequencetoresponse

sequence(seq2seq)conversationalagent.Toencouragetheagenttoproduce

interestingengagingdialogueweupdateaseq2seqwithgradientmethodsof

reinforcementlearning.Westudytheeffectsofrewardfunctionssuchassema

您可能关注的文档

文档评论（0）

1亿VIP精品文档

更多 >

基于强化学习神语言模型对话生成研究.pdfVIP