2601.07372v1通过可扩展查找实现的条件记忆:大型语言模型的新型稀疏轴Conditional Memory via Scalable Lookup:A New Axis of Sparsity for Large Language Models.pdfVIP

  • 0
  • 0
  • 约13.1万字
  • 约 33页
  • 2026-01-21 发布于北京
  • 举报

2601.07372v1通过可扩展查找实现的条件记忆:大型语言模型的新型稀疏轴Conditional Memory via Scalable Lookup:A New Axis of Sparsity for Large Language Models.pdf

ConditionalMemoryviaScalableLookup:

ANewAxisofSparsityforLargeLanguageModels

1,22222

XinCheng,WangdingZeng,DamaiDai,QinyuChen,BingxuanWang,

222222

ZhendaXie,KezhaoHuang,XingkaiYu,ZhewenHao,YukunLi,HanZhang,

112

HuishuaiZhang,DongyanZhao,WenfengLiang

1PekingUniversity2DeepSeek-AI

{zhanghuishuai,zhaody}@

6{chengxin,zengwangding,damai.dai}@

2

0

2

Abstract

n

a

J

2WhileMixture-of-Experts(MoE)scalescapacityviaconditionalcomputation,Transformerslack

1anativeprimitiveforknowledgelookup,forcingthemtoinefficientlysimulateretrievalthrough

]computation.Toaddressthis,weintroduceconditionalmemoryasacomplementarysparsity

Laxis,instantiatedviaEngram,amodulethatmodernizesclassic肀-gramembeddingforO(1)

Clookup.ByformulatingtheSparsityAllocationproblem,weuncoveraU-shapedscalinglaw

.

sthatoptimizesthetrade-offbetweenneuralcomputation(MoE)andstaticmemory(Engram).

c

[Guidedbythislaw,wescaleEngramto27Bparameters,achievingsuperiorperformance

1overastrictlyiso-parameterandiso-FLOPsMoEbaseline.Mostnotably,whilethememory

vmoduleisexpectedtoaidknowledgeretrieval(e.g.,MMLU+3.4;CMMLU+4.0),weobserve

2evenlargergainsingeneralreasoning(e

您可能关注的文档

文档评论(0)

1亿VIP精品文档

相关文档