- 1
- 0
- 约3.05千字
- 约 19页
- 2026-02-26 发布于山东
- 举报
DeepLearninganditsApplicationATTENTIONMECHANISM
Transformer-EncoderSJTUDeepLearningLecture.2
SJTUDeepLearningLecture.3TransformerTransformeriskeycomponentofBERT/GPTParallelcomputingReplacingRNNLSTM,becomingthemosteffectiveextractor
SJTUDeepLearningLecture.4Transformer
SJTUDeepLearningLecture.5Transformer
SJTUDeepLearningLecture.6TransformerPositionalEncodingResidualConnection(AddNorm)Encoder-DecoderAttention
SJTUDeepLearningLecture.7TransformerAttentionmechanismcannotdistinguishthepositionorderofinputwordsTheanimalscrossthestreet. ||Crossthethestreetanimals.
SJTUDeepLearningLecture.8Transformer
SJTUDeepLearningLecture.9PositionEncodingPositionEncodingwhereposisthepositionandiisthedimensionForanyfixedoffsetk,PEpos+kcanberepresentedasalinearfunctionofPEpos
SJTUDeepLearningLecture.10PositionEncoding
SJTUDeepLearningLecture.11PositionEncodingSinePECosinePE
SJTUDeepLearningLecture.12Residuals
SJTUDeepLearningLecture.13Residuals
Transformer-DECODERSJTUDeepLearningLecture.14
SJTUDeepLearningLecture.15TransformerTransformeriskeycomponentofBertParallelcomputingReplacingRNNLSTM,becomingthemosteffectiveextractor
SJTUDeepLearningLecture.16TransformerTheencoder’sinputsflowthroughaself-attentionlayerTheoutputsoftheself-attentionlayerarefedtoafeed-forwardneuralnetworkThedecoderhasboththoselayers,butbetweenthemisanattentionlayerthathelpsthedecoderfocusonrelevantpartsoftheinputsentenceMulti-HeadAttention
SJTUDeepLearningLecture.17TransformerAttentionbetweeneverytwotokensAttentionfrombeforetokens
Reference[1]Vaswani,Ashish,etal.Attentionisallyouneed.Advancesinneuralinformationprocessingsystems.2017.[2]Devlin,Jacob,etal.Bert:Pre-trainingofdeepbidirectionaltransformersforlanguageunderstanding.arXivpreprintarXiv:1810.04805(2018).[3]Serrano,Sofia,andNoahA.
原创力文档

文档评论(0)