Scaling Laws for Reward Model OveroptimizationChatGPT主题资料合编.pdfVIP

  • 0
  • 0
  • 约6.27万字
  • 约 28页
  • 2026-03-26 发布于浙江
  • 举报

Scaling Laws for Reward Model OveroptimizationChatGPT主题资料合编.pdf

ScalingLawsforRewardModelOveroptimization

LeoGaoJohnSchulmanJacobHilton

OpenAIOpenAIOpenAI

2Abstract

2

0

2Inreinforcementlearningfromhumanfeedback,itiscommontooptimizeagainst

arewardmodeltrainedtopredicthumanpreferences.Becausetherewardmodel

t

您可能关注的文档

文档评论(0)

1亿VIP精品文档

相关文档