Deep reinforcement learning from human preferencesChatGPT主题资料合编.pdfVIP

  • 0
  • 0
  • 约6.31万字
  • 约 17页
  • 2026-03-26 发布于浙江
  • 举报

Deep reinforcement learning from human preferencesChatGPT主题资料合编.pdf

DeepReinforcementLearning

fromHumanPreferences

PaulFChristianoJanLeikeTomBBrown

OpenAIDeepMindnottombrown@

paul@leike@

3

2

0MiljanMarticShaneLeggDarioAmodei

2DeepMindDeepMindOpenAI

b

您可能关注的文档

文档评论(0)

1亿VIP精品文档

相关文档