KimiK1.5技术报告信息安全资料 conv.docxVIP

  • 0
  • 0
  • 约8.98万字
  • 约 28页
  • 2026-02-10 发布于浙江
  • 举报

arXiv:2501.12599v1[cs.AI]22

arXiv:2501.12599v1

[cs.AI]

22Jan2025

KIMIK1.5:

SCALINGREINFORCEMENTLEARNINGWITHLLMS

TECHNICALREPORTOFKIMIK1.5

KimiTeam

ABSTRACT

Languagemodelpretrainingwithnexttokenpredictionhasprovedeffectiveforscalingcomputebutislimitedtotheamountofavailabletrainingdata.Scalingreinforcementlearning(RL)unlocksanewaxisforthecontinuedimprovementofartificialintelligence,withthepromisethatlargelanguagemodels(LLMs)canscaletheirtrainingdatabylearningtoexplorewithrewards.However,priorpublishedworkhasnotproducedcompetitiveresults.Inlightofthis,wereportonthetrainingpracticeofKimik1.5,ourlatestmulti-modalLLMtrainedwithRL,includingitsRLtrainingtechniques,multi-modaldatarecipes,andinfrastructureoptimization.Longcontextscalingandimprovedpolicyoptimizationmethodsarekeyingredientsofourapproach,whichestablishesasimplistic,effectiveRLframeworkwithoutrelyingonmorecomplextechniquessuchasMonteCarlotreesearch,valuefunctions,andprocessrewardmodels.Notably,oursystemachievesstate-of-the-artreasoningperformanceacrossmultiplebenchmarksandmodalities—e.g.,77.5onAIME,96.2onMATH500,94-thpercentileonCodeforces,74.9onMathVista—matchingOpenAI’so1.Moreover,wepresenteffectivelong2shortmethodsthatuselong-CoTtechniquestoimproveshort-CoTmodels,yieldingstate-of-the-artshort-CoTreasoningresults—e.g.,60.8onAIME,94.6onMATH500,47.3onLiveCodeBench—outperformingexistingshort-CoTmodelssuchasGPT-4oandClaudeSonnet3.5byalargemargin(upto+550%).

Kimik1.5long-CoT OpenAIo1 OpenAIo1-mini QVQ-72B-Preview QwQ-32BPreview

Math

96.294.89090.6

Code Vision

9494

88

77.574.4

63.6

74.9 77.3

67.2 7171.4 70 70.362 62.5

50 53.1

40.6

AIME2024(Pass@1)

MATH500(EM)

Codeforces(Percentile)

LiveCodeBenchv524.12-25.2(Pass@1)

MathVista(Pass@1)

MMMU(Pass@1)

Figure1:Kimi

文档评论(0)

1亿VIP精品文档

相关文档