面壁智能MiniCPM-SALA:融合稀疏注意力与线性注意力的高效长上下文建模.pdfVIP

  • 0
  • 0
  • 约6.62万字
  • 约 14页
  • 2026-03-02 发布于北京
  • 举报

面壁智能MiniCPM-SALA:融合稀疏注意力与线性注意力的高效长上下文建模.pdf

MiniCPM-SALA

MiniCPM-SALA:HybridizingSparseandLinearAttention

forEfficientLong-ContextModeling

MiniCPMTeam

https://huggingface.co/openbmb/MiniCPM-SALA

/OpenBMB/MiniCPM

Abstract

Theevolutionoflargelanguagemodels(LLMs)towardsapplicationswithultra-long

contextsfaceschallengesposedbythehighcomputationalandmemorycostsofthe

Transformerarchitecture.Whileexistingsparseandlinearattentionmechanismsattempt

tomitigatetheseissues,theytypicallyinvolveatrade-offbetweenmemoryefficiencyand

a

modelperformance.ThispaperintroducesMiniCPM-SALA,ahybridarchitecturethat

integratesthehigh-fidelitylong-contextmodelingofsparseattention(InfLLM-V2)withthe

globalefficiencyoflinearattention(LightningAttention).Byemployingalayerselection

algorithmtointegratethesemechanismsina1:3ratioandutilizingahybridpositional

encoding(HyPE),themodelmaintainsefficiencyandperformanceforlong-contexttasks.

Furthermore,weintroduceacost-effectivecontinualtrainingframeworkthattransforms

pre-trainedTransformer-basedmodelsintohybridmodels,whichreducestrainingcostsby

approximately75%comparedtotrainingfromscratch.Extensiveexperimentsshowthat

MiniCPM-SALAmaintainsgeneralcapabilitiescomparabletofull-attentionmodelswhile

offeringimprovedefficiency.OnasingleNVIDIAA6000DGPU,themodelachieves

inferencespeedsupto3.5×fasterthanfullattentionmodelsatthesequencelengthof

256Ktokensandsupports

文档评论(0)

1亿VIP精品文档

相关文档