- 1、本文档共40页,可阅读全部内容。
- 2、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
- 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
2013_Google speech recognition Lecture 14-Neural Networks
Speech recognition
Lecture 14: Neural Networks
Andrew Senior andrewsenior@
Google NYC
December 12, 2013
Andrew Senior andrewsenior@ 1
1 Introduction to Neural networks
2 Neural networks for speech recognition
Neural network features for speech recognition
Hybrid neural networks
History
Variations
3 Language modelling
Andrew Senior andrewsenior@ 2
The perceptron
Input x1
Input x2
Input x3
Input x4
Input x5
Output
w1w2w3w4w5
A perceptron is a linear classifier:
f (x) = 1 if w .x 0 (1)
= 0 otherwise. (2)
Add an extra “always one” input to provide an offset or “bias”. The
weights w can be learned for a given task with the Perceptron Algorithm.
Andrew Senior andrewsenior@ 3
Perceptron algorithm (Rosenblatt, 1957)
Adapt the weights w , example-by example:
1 Initialise the weights and the threshold.
2 For each example j in our training set D, perform the following steps
over the input xj and desired output y?j :
3 1 Calculate the actual output:
yj(t) = f [w(t) · xj ] = f [w0(t) + w1(t)xj,1 + w2(t)xj,2 + · · ·+ wn(t)xj,n]
2 Update the weights:
wi (t + 1) = wi (t) + α(y?j ? yj(t))xj,i , for all nodes 0 ≤ i ≤ n.
4 Repeat Step 2 until the iteration error 1s
∑s
j [y?j ? yj(t)] is less than a
user-specified error threshold γ, or a predetermined number of
iterations have been completed.
Andrew Senior andrewsenior@ 4
Nonlinear perceptrons
Introduce a nonlinearity:
yi = σ(
∑
j
wijxj)
Each unit is a simple nonlinear function of a linear combination of its
inputs
Typically logistic sigmoid:
σ(z) =
1
1 + e?z
or tanh:
σ(z) = tanh z
Andrew Senior andrewsenior@ 5
Multilayer perceptrons
Extend the network to multiple layers
Now a hidden layer of nodes computes a function of the inputs, and
output nodes compute a function of the hidden nodes’ “activations”.
Input x1
Input x2
Input x3
Input x4
y1
y2
y3
Hidden
layer
Input
layer
Output
layer
Andrew Senior andrewsenior@ 6
Cost function
Such networks can be optimized (”trained”) to minimize a cost function
(Loss function or objective function)
文档评论(0)