人工智能论文英文版-Unlocking Recursive Thinking of LLMs:Alignment via Refinement.pdfVIP

  • 0
  • 0
  • 约7.17万字
  • 约 14页
  • 2025-06-13 发布于湖南
  • 举报

人工智能论文英文版-Unlocking Recursive Thinking of LLMs:Alignment via Refinement.pdf

UnlockingRecursiveThinkingofLLMs:AlignmentviaRefinement

HaokeZhang♣,♠,XiaoboLiang♣,♠,CunxiangWang♦,♥,JuntaoLi♣,♠*,MinZhang♣,♠

♣SoochowUniversity,♦ZhipuAI,♥TsinghuaUniversity

♠KeyLaboratoryofDataIntelligenceandAdvancedComputing,SoochowUniversity

hkzhangnlp@,{xbliang,ljt}@

Abstract

TheOpenAIo1-seriesmodelshavedemon-

5stratedthatleveraginglong-formChainof

2Thought(CoT)cansubstantiallyenhanceper-

0

2formance.However,therecursivethinkingca-

pabilitiesofLargeLanguageModels(LLMs)

nremainlimited,particularlyintheabsenceof

u

Jexpert-curateddatafordistillation.Inthis

6paper,weproposeAvR:AlignmentviaRe-

finement,anovelmethodaimedatunlocking

]thepotentialofLLMsforrecursivereasoning

Lthroughlong-formCoT.AvRintroducesare-

Cfinementprocessthatintegratescriticismand

.

simprovementactions,guidedbydifferentiable

clearningtechniquestooptimizerefinement-

[

awarerewards.Asaresult,thesynthesizedbtainedbasedon

1multi-rounddatacanbeorganizedasalongontheStrengthsandWeaknessesof

vrefinementthought,furtherenablingtest-time

9scaling.ExperimentalresultsshowthatAvR

0significantlyoutperformsconventionalprefer-Figure1:Rewardassignmentcomparisonbetweentra-

0

6enceoptimizat

您可能关注的文档

文档评论(0)

1亿VIP精品文档

相关文档