Red Teaming Language Models to Reduce Harms:Methods, Scaling Behaviors, and Lessons LearnedChatGPT主题资料合编.docVIP

  • 0
  • 0
  • 约11.15万字
  • 约 30页
  • 2026-03-25 发布于浙江
  • 举报

Red Teaming Language Models to Reduce Harms:Methods, Scaling Behaviors, and Lessons LearnedChatGPT主题资料合编.doc

RedTeamingLanguageModelstoReduceHarms:

Methods,ScalingBehaviors,andLessonsLearned

DeepGanguli?LianeLovitt?JacksonKernion?AmandaAskell,YuntaoBai,SauravKadavath,

,

,

,

BenMann,EthanPerez,NicholasSchiefer,KamalNdousse,AndyJones,

arXiv:2209.07858v2[cs.CL]22Nov2022Sam

您可能关注的文档

文档评论(0)

1亿VIP精品文档

相关文档