《一天理解深度学习》.ppt

下载文档 降价啦

20
0
约5.96万字
约 307页
2020-01-22 发布于天津
举报
版权申诉
保障服务

《一天理解深度学习》.ppt

1、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。。
2、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

推文接龍 * 中翻英英翻中沒有哪一個一定比較長或比較短 More applications * A woman is throwing a Frisbee in a part. * Different techniques are used to deal with different problems. 不同方法間的互相搭配!!!!!!!!!!!!!!!! * * * * * * CNN is widely used Very good example for designing your network * /docs/LDC93S4B/corpus.html * Can be very large * * * General idea * Output yi depends on x1, x2, …… xi The same input can have totoally different output. * To determine yi, you have to consider a lot …… You should see the whole sequence * Normal neuron 1 input, 1 output This one 4 input, 1 output * * Identiy C is vector * Identiy C is vector * Identiy C is vector * Usually when people say RNN, they mean LSTM! * * General idea * * Not because you have bug haha * Even adagrad can not handle this problem, maybe RMS prop is better Source: /proceedings/papers/v28/pascanu13.pdf Large or small is fine. Change quickly is bad * Gradient Explode/Gradient Vanishing Echo state network - ….. * * * * English is successful, ASRU’15 better than HMM + N-gram * 中翻英英翻中沒有哪一個一定比較長或比較短 * 中翻英英翻中沒有哪一個一定比較長或比較短 * * * THEANO_FLAGS=device=gpu0 python YourCode.py import os os.environ[THEANO_FLAGS] = device=cpu * * The three steps can also apply on other machine learning methods * Other methods do not emphasize this. * * * * Hopefully, when the match size is large enough, Not that stateble * * * * Parametric ReLU, - PReLU Ref: Glorot, Xavier, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural networks.?International Conference on Artificial Intelligence and Statistics. 2011. * * 3,4,5 * Advanced Idea: Eta Can we give each parameters different learning rates? * 反差 /mediawiki/images/6/6a/Adagrad.pdf /courses/cse547/15sp/slides/adagrad.pdf * * * Momentum: 動量 * What can we see? * In speech recognition: add noise warping * * * * ??=∑?〖??_?? ??_?? 〗 * * 0,0 - 0 1,0 - 1 0,-1 - -2 1,-1 - -1 ?, -1/2 - -0.5 1,2 Geometric Mean? * * Rate = 0.5! * * For example, if we modify “1” to