- 1、本文档共70页,可阅读全部内容。
- 2、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
- 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
* * * * * * * * * * * Motions that I show you next will use 2 layers * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * The replicated softmax model: How to modify an RBM to model word count vectors Modification 1: Keep the binary hidden units but use “softmax” visible units that represent 1-of-N Modification 2: Make each hidden unit use the same weights for all the visible softmax units. Modification 3: Use as many softmax visible units as there are non-stop words in the document. So its actually a family of different-sized RBMs that share weights. It not a single generative model. Modification 4: Multiply each hidden bias by the number of words in the document (not done in our earlier work) The replicated softmax model is much better at modeling bags of words than LDA topic models (in NIPS 2009) The replicated softmax model All the models in this family have 5 hidden units. This model is for 8-word documents. Time series models Inference is difficult in directed models of time series if we use non-linear distributed representations in the hidden units. It is hard to fit Dynamic Bayes Nets to high-dimensional sequences (e.g motion capture data). So people tend to avoid distributed representations and use much weaker methods (e.g. HMM’s). Time series models If we really need distributed representations (which we nearly always do), we can make inference much simpler by using three tricks: Use an RBM for the interactions between hidden and visible variables. This ensures that the main source of information wants the posterior to be factorial. Model short-range temporal information by allowing several previous frames to provide input to the hidden units and to the visible units. This leads to a temporal module that can be stacked So we can use greedy learning to learn deep models of temporal structure. The conditional RBM model (a partially observed CRF) Start with a generic RBM. Add two types of conditioning connections. Given the data, the hidd
文档评论(0)