阿里团队Qwen2.5-1M系列大模型技术报告.pdfVIP

  • 0
  • 0
  • 约8.67万字
  • 约 19页
  • 2026-03-10 发布于浙江
  • 举报

阿里团队Qwen2.5-1M系列大模型技术报告.pdf

2025-01-26

Qwen2.5-1MTechnicalReport

AnYang,BowenYu,ChengyuanLi,DayihengLiu,FeiHuang,HaoyanHuang,Jiandong

Jiang,JianhongTu,JianweiZhang,JingrenZhou,JunyangLin,KaiDang,KexinYang,Le

Yu,MeiLi,MinminSun,QinZhu,RuiMen,TaoHe,WeijiaXu,WenbiaoYin,Wenyuan

Yu,XiafeiQiu,XingzhangRen,XinlongYang,YongLi,ZhiyingXu,ZipengZhang∗

QwenTeam,AlibabaGroup

Abstract

Inthisreport,weintroduceQwen2.5-1M,aseriesofmodelsthatextendthecontext

lengthto1milliontokens.Comparedtotheprevious128Kversion,theQwen2.5-1M

serieshavesignificantlyenhancedlong-contextcapabilitiesthroughlong-contextpre-

trainingandpost-training.Keytechniquessuchaslongdatasynthesis,progressive

pre-training,andmulti-stagesupervisedfine-tuningareemployedtoeffectivelyenhance

long-contextperformancewhilereducingtrainingcosts.

Topromotetheuseoflong-contextmodelsamongabroaderuserbase,wepresentand

open-sourceourinferenceframework.Thisframeworkincludesalengthextrapolation

methodthatcanexpandthemodelcontextlengthsbyatleastfourtimes,orevenmore,

withoutadditionaltraining.Toreduceinferencecosts,weimplementasparseattention

methodalongwithchunkedprefilloptimizationfordeploymentscenariosandasparsity

refinementmethodtoimproveprecision.Additionally,wedetailouroptimizationsin

theinferenceengine,includingkerneloptimization,pipelineparallelism,andscheduling

o

您可能关注的文档

文档评论(0)

1亿VIP精品文档

相关文档