DeepSeek_V3技术报告全览.pdfVIP

  • 48
  • 0
  • 约20.38万字
  • 约 53页
  • 2025-02-09 发布于北京
  • 举报

DeepSeek-V3TechnicalReport

DeepSeek-AI

research@

Abstract

WepresentDeepSeek-V3,astrongMixture-of-Experts(MoE)languagemodelwith671Btotal

parameterswith37Bactivatedforeachtoken.Toachieveefficientinferenceandcost-effective

training,DeepSeek-V3adoptsMulti-headLatentAttention(MLA)andDeepSeekMoEarchitec-

tures,whichwerethoroughlyvalidatedinDeepSeek-V2.Furthermore,DeepSeek-V3pioneers

anauxiliary-loss-freestrategyforloadbalancingandsetsamulti-tokenpredictiontraining

objectiveforstrongerperformance.Wepre-trainDeepSeek-V3on14.8trilliondiverseand

high-qualitytokens,followedbySupervisedFine-TuningandReinforcementLearningstagesto

fullyharnessitscapabilities.ComprehensiveevaluationsrevealthatDeepSeek-V3outperforms

otheropen-sourcemodelsandachievesperformancecomparabletoleadingclosed-source

models.Despiteitsexcellentperformance,DeepSeek-V3requiresonly2.788MH800GPUhours

foritsfulltraining.Inaddition,itstrainingprocessisremarkablystable.Throughouttheentire

trainingprocess,wedidnotexperienceanyirrecoverablelossspikesorperformanyrollbacks.

Themodelcheckpointsareavailableat/deepseek-ai/DeepSeek-V3.

DeepSeek-V3DeepSeek-V2.5Qwen2.5-72B-InstLlama-3.1-405B-InstGPT-4o-0513Claude-3.5-Sonnet-1022

100

90.2

80.0

8078.078.3

75.9

74.774.6

73.372.6

文档评论(0)

1亿VIP精品文档

相关文档