- 0
- 0
- 约6.62万字
- 约 14页
- 2026-03-02 发布于北京
- 举报
MiniCPM-SALA
MiniCPM-SALA:HybridizingSparseandLinearAttention
forEfficientLong-ContextModeling
MiniCPMTeam
https://huggingface.co/openbmb/MiniCPM-SALA
/OpenBMB/MiniCPM
Abstract
Theevolutionoflargelanguagemodels(LLMs)towardsapplicationswithultra-long
contextsfaceschallengesposedbythehighcomputationalandmemorycostsofthe
Transformerarchitecture.Whileexistingsparseandlinearattentionmechanismsattempt
tomitigatetheseissues,theytypicallyinvolveatrade-offbetweenmemoryefficiencyand
a
modelperformance.ThispaperintroducesMiniCPM-SALA,ahybridarchitecturethat
integratesthehigh-fidelitylong-contextmodelingofsparseattention(InfLLM-V2)withthe
globalefficiencyoflinearattention(LightningAttention).Byemployingalayerselection
algorithmtointegratethesemechanismsina1:3ratioandutilizingahybridpositional
encoding(HyPE),themodelmaintainsefficiencyandperformanceforlong-contexttasks.
Furthermore,weintroduceacost-effectivecontinualtrainingframeworkthattransforms
pre-trainedTransformer-basedmodelsintohybridmodels,whichreducestrainingcostsby
approximately75%comparedtotrainingfromscratch.Extensiveexperimentsshowthat
MiniCPM-SALAmaintainsgeneralcapabilitiescomparabletofull-attentionmodelswhile
offeringimprovedefficiency.OnasingleNVIDIAA6000DGPU,themodelachieves
inferencespeedsupto3.5×fasterthanfullattentionmodelsatthesequencelengthof
256Ktokensandsupports
您可能关注的文档
最近下载
- 新疆2026届高三(二模)理科综合试卷(含答案).pdf
- 2025年演出经纪人国际演出经纪公司合作模式与案例分析专题试卷及解析.pdf VIP
- 2021年“大梦杯”福建省初中数学竞赛解析版.pdf
- 医师定期考核口腔科医生考核题库888题 .pdf VIP
- 2025年房地产经纪人大数据驱动的房地产市场分析专题试卷及解析.pdf VIP
- The Pitt《匹兹堡医护前线(2025)》第一季第七集完整中英文对照剧本.docx VIP
- 2025年演出经纪人演出合同变更与解除及纠纷解决机制专题试卷及解析.pdf VIP
- 2026年大连装备制造职业技术学院单招职业适应性考试题库含答案详解.docx VIP
- 永辉超市的盈利能力分析.docx
- 智能工厂中基于物联网的设备全生命周期管理.pdf VIP
原创力文档

文档评论(0)