- 0
- 0
- 约7.17万字
- 约 14页
- 2025-06-13 发布于湖南
- 举报
UnlockingRecursiveThinkingofLLMs:AlignmentviaRefinement
HaokeZhang♣,♠,XiaoboLiang♣,♠,CunxiangWang♦,♥,JuntaoLi♣,♠*,MinZhang♣,♠
♣SoochowUniversity,♦ZhipuAI,♥TsinghuaUniversity
♠KeyLaboratoryofDataIntelligenceandAdvancedComputing,SoochowUniversity
hkzhangnlp@,{xbliang,ljt}@
Abstract
TheOpenAIo1-seriesmodelshavedemon-
5stratedthatleveraginglong-formChainof
2Thought(CoT)cansubstantiallyenhanceper-
0
2formance.However,therecursivethinkingca-
pabilitiesofLargeLanguageModels(LLMs)
nremainlimited,particularlyintheabsenceof
u
Jexpert-curateddatafordistillation.Inthis
6paper,weproposeAvR:AlignmentviaRe-
finement,anovelmethodaimedatunlocking
]thepotentialofLLMsforrecursivereasoning
Lthroughlong-formCoT.AvRintroducesare-
Cfinementprocessthatintegratescriticismand
.
simprovementactions,guidedbydifferentiable
clearningtechniquestooptimizerefinement-
[
awarerewards.Asaresult,thesynthesizedbtainedbasedon
1multi-rounddatacanbeorganizedasalongontheStrengthsandWeaknessesof
vrefinementthought,furtherenablingtest-time
9scaling.ExperimentalresultsshowthatAvR
0significantlyoutperformsconventionalprefer-Figure1:Rewardassignmentcomparisonbetweentra-
0
6enceoptimizat
您可能关注的文档
- 人工智能论文英文版-Eigenspectrum Analysis of Neural Networks without Aspect Ratio.pdf
- 人工智能论文英文版-Cartridges:Lightweight and general-purpose long context.pdf
- 人工智能论文英文版-Distillation Robustifies Unlearning.pdf
- 人工智能论文英文版-PersonaAgent:When Large Language Model Agents Meet Personalization at Test Time.pdf
- 人工智能论文英文版-Reflect-then-Plan:Offline Model-Based Planning through a Doubly Bayesian Lens.pdf
- 人工智能论文英文版-DesignBench:A Comprehensive Benchmark for MLLM-based Front-end Code Generation.pdf
- 人工智能论文英文版-Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models.pdf
- 人工智能论文英文版-“We need to avail ourselves of [GenAI] to enhance knowledge distribution”: Empowering Older Adults through GenAI Literacy.pdf
- 人工智能论文英文版-GenIR: Generative Visual Feedback for Mental Image Retrieval.pdf
- 人工智能论文英文版-Integer Linear Programming Preprocessing for Maximum Satisfiability.pdf
- 人工智能论文英文版-Leveraging Generative AI for Enhancing Automated Assessment in Programming Education Contests.pdf
- 人工智能论文英文版-AMPED:Adaptive Multi-objective Projection for balancing Exploration and skill Diversification.pdf
- 人工智能论文英文版-On Measuring Long-Range Interactions in Graph Neural Networks.pdf
- 人工智能论文英文版-Let’s Put Ourselves in Sally’s Shoes:Shoes-of-Others Prefixing Improves Theory of Mind in Large Language Models.pdf
- 人工智能论文英文版-Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning.pdf
- 人工智能论文英文版-ENHANCING ORTHOPOX IMAGE CLASSIFICATION USING HYBRID MACHINE LEARNING AND DEEP LEARNING MODELS.pdf
- 人工智能论文英文版-End-to-End Framework for Robot Lawnmower Coverage Path Planning using Cellular Decomposition.pdf
- 人工智能论文英文版-Preference Learning for AI Alignment:a Causal Perspective.pdf
- 人工智能论文英文版-IntentionESC: An Intention-Centered Framework for Enhancing Emotional Support in Dialogue Systems.pdf
- 人工智能论文英文版-MOGO:Residual Quantized Hierarchical Causal Transformer for High-Quality and Real-Time 3D Human Motion Generation.pdf
原创力文档

文档评论(0)