KimiK1.5技术报告信息安全资料 conv.docxVIP

下载本文档

0
0
约8.98万字
约 28页
2026-02-10 发布于浙江
举报

KimiK1.5技术报告信息安全资料 conv.docx

arXiv:2501.12599v1[cs.AI]22

arXiv:2501.12599v1

[cs.AI]

22Jan2025

KIMIK1.5:

SCALINGREINFORCEMENTLEARNINGWITHLLMS

TECHNICALREPORTOFKIMIK1.5

KimiTeam

ABSTRACT

Languagemodelpretrainingwithnexttokenpredictionhasprovedeffectiveforscalingcomputebutislimitedtotheamountofavailabletrainingdata.Scalingreinforcementlearning(RL)unlocksanewaxisforthecontinuedimprovementofartificialintelligence,withthepromisethatlargelanguagemodels(LLMs)canscaletheirtrainingdatabylearningtoexplorewithrewards.However,priorpublishedworkhasnotproducedcompetitiveresults.Inlightofthis,wereportonthetrainingpracticeofKimik1.5,ourlatestmulti-modalLLMtrainedwithRL,includingitsRLtrainingtechniques,multi-modaldatarecipes,andinfrastructureoptimization.Longcontextscalingandimprovedpolicyoptimizationmethodsarekeyingredientsofourapproach,whichestablishesasimplistic,effectiveRLframeworkwithoutrelyingonmorecomplextechniquessuchasMonteCarlotreesearch,valuefunctions,andprocessrewardmodels.Notably,oursystemachievesstate-of-the-artreasoningperformanceacrossmultiplebenchmarksandmodalities—e.g.,77.5onAIME,96.2onMATH500,94-thpercentileonCodeforces,74.9onMathVista—matchingOpenAI’so1.Moreover,wepresenteffectivelong2shortmethodsthatuselong-CoTtechniquestoimproveshort-CoTmodels,yieldingstate-of-the-artshort-CoTreasoningresults—e.g.,60.8onAIME,94.6onMATH500,47.3onLiveCodeBench—outperformingexistingshort-CoTmodelssuchasGPT-4oandClaudeSonnet3.5byalargemargin(upto+550%).

Kimik1.5long-CoT OpenAIo1 OpenAIo1-mini QVQ-72B-Preview QwQ-32BPreview

Math

96.294.89090.6

Code Vision

9494

77.574.4

63.6

74.9 77.3

67.2 7171.4 70 70.362 62.5

50 53.1

40.6

AIME2024(Pass@1)

MATH500(EM)

Codeforces(Percentile)

LiveCodeBenchv524.12-25.2(Pass@1)

MathVista(Pass@1)

MMMU(Pass@1)

Figure1:Kimi

您可能关注的文档

文档评论（0）

1亿VIP精品文档

更多 >

KimiK1.5技术报告信息安全资料 conv.docxVIP