《一天理解深度学习》.ppt

  1. 1、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。。
  2. 2、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载
  3. 3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
推文接龍 * 中翻英 英翻中 沒有哪一個一定比較長或比較短 More applications * A woman is throwing a Frisbee in a part. * Different techniques are used to deal with different problems. 不同方法間的互相搭配!!!!!!!!!!!!!!!! * * * * * * CNN is widely used Very good example for designing your network * /docs/LDC93S4B/corpus.html * Can be very large * * * General idea * Output yi depends on x1, x2, …… xi The same input can have totoally different output. * To determine yi, you have to consider a lot …… You should see the whole sequence * Normal neuron 1 input, 1 output This one 4 input, 1 output * * Identiy C is vector * Identiy C is vector * Identiy C is vector * Usually when people say RNN, they mean LSTM! * * General idea * * Not because you have bug haha * Even adagrad can not handle this problem, maybe RMS prop is better Source: /proceedings/papers/v28/pascanu13.pdf Large or small is fine. Change quickly is bad * Gradient Explode/Gradient Vanishing Echo state network - ….. * * * * English is successful, ASRU’15 better than HMM + N-gram * 中翻英 英翻中 沒有哪一個一定比較長或比較短 * 中翻英 英翻中 沒有哪一個一定比較長或比較短 * * * THEANO_FLAGS=device=gpu0 python YourCode.py import os os.environ[THEANO_FLAGS] = device=cpu * * The three steps can also apply on other machine learning methods * Other methods do not emphasize this. * * * * Hopefully, when the match size is large enough, Not that stateble * * * * Parametric ReLU, - PReLU Ref: Glorot, Xavier, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural networks.?International Conference on Artificial Intelligence and Statistics. 2011. * * 3,4,5 * Advanced Idea: Eta Can we give each parameters different learning rates? * 反差 /mediawiki/images/6/6a/Adagrad.pdf /courses/cse547/15sp/slides/adagrad.pdf * * * Momentum: 動量 * What can we see? * In speech recognition: add noise warping * * * * ??=∑?〖??_?? ??_?? 〗 * * 0,0 - 0 1,0 - 1 0,-1 - -2 1,-1 - -1 ?, -1/2 - -0.5 1,2 Geometric Mean? * * Rate = 0.5! * * For example, if we modify “1” to

文档评论(0)

kanghao1 + 关注
实名认证
内容提供者

该用户很懒,什么也没介绍

1亿VIP精品文档

相关文档