- 1、本文档共10页,可阅读全部内容。
- 2、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
- 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
- 5、该文档为VIP文档,如果想要下载,成为VIP会员后,下载免费。
- 6、成为VIP后,下载本文档将扣除1次下载权益。下载后,不支持退款、换文档。如有疑问请联系我们。
- 7、成为VIP后,您将拥有八大权益,权益包括:VIP文档下载权益、阅读免打扰、文档格式转换、高级专利检索、专属身份标志、高级客服、多端互通、版权登记。
- 8、VIP文档为合作方或网友上传,每下载1次, 网站将根据用户上传文档的质量评分、类型等,对文档贡献者给予高额补贴、流量扶持。如果你也想贡献VIP文档。上传文档
查看更多
Diversifying Restricted Boltzmann Machine for Document Modeling
Diversifying Restricted Boltzmann Machine for Document
Modeling
Pengtao Xie
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA, 15213
pengtaox@
Yuntian Deng
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA, 15213
yuntiand@
Eric P. Xing
School of Computer Science
Carnegie Mellon University
Pittsburgh, PA, 15213
epxing@
ABSTRACT
Restricted Boltzmann Machine (RBM) has shown great ef-
fectiveness in document modeling. It utilizes hidden units
to discover the latent topics and can learn compact semantic
representations for documents which greatly facilitate doc-
ument retrieval, clustering and classification. The popular-
ity (or frequency) of topics in text corpora usually follow a
power-law distribution where a few dominant topics occur
very frequently while most topics (in the long-tail region)
have low probabilities. Due to this imbalance, RBM tends
to learn multiple redundant hidden units to best represent
dominant topics and ignore those in the long-tail region,
which renders the learned representations to be redundant
and non-informative. To solve this problem, we propose Di-
versified RBM (DRBM) which diversifies the hidden units,
to make them cover not only the dominant topics, but also
those in the long-tail region. We define a diversity metric
and use it as a regularizer to encourage the hidden units to
be diverse. Since the diversity metric is hard to optimize
directly, we instead optimize its lower bound and prove that
maximizing the lower bound with projected gradient ascent
can increase this diversity metric. Experiments on docu-
ment retrieval and clustering demonstrate that with diver-
sification, the document modeling power of DRBM can be
greatly improved.
Categories and Subject Descriptors
H.2.8 [Database Management]: Database Applications,
Data Mining
General Terms
Algorithms, Experiments
Keywords
Diversified Restricted Boltzmann Machine, Diversity, Power-
law Distribution, Document Modeling, Topic Modeling
Permis
您可能关注的文档
- Cressi Leonador 中文手册 - 潜客网整理.pdf
- Creme L.X - Intensive training 时光面霜.pdf
- CRISPR nature protocol.pdf
- Cretaceous extension of the Ganhang Tectonic Belt, southeast China.pdf
- Critical crack tip opening displacement of different strength concrete.pdf
- Creep–fatigue damage dissimilar metal welds of modified 9Cr–1Mo steel and 316L stainless steel.pdf
- Crime Data Mining-An Overview and Case study.pdf
- Critical lines in symmetry of mixture models and its application to component splitting.pdf
- Creep_0607蠕变.pdf
- Critical Dimension for Stable Self-Gravitating Stars in AdS.pdf
- 2023年度执业药师复习提分资料及参考答案详解【完整版】.docx
- 2023年度执法资格真题含答案详解.docx
- 2023年度园林绿化作业人员考前冲刺测试卷附完整答案详解【全国通用】.docx
- 2023年度银行岗位模考模拟试题含完整答案详解(各地真题).docx
- 高考生物专练之热点05 基因工程的典型题分析(原卷版).docx
- 2023年度咨询工程师题库含答案详解(模拟题).docx
- 2023年度中级软考高分题库附答案详解(基础题).docx
- 2024-2025学年邮政行业职业技能鉴定高频难、易错点题带答案详解(轻巧夺冠).docx
- 2023年度园林绿化作业人员考前冲刺练习试题及参考答案详解【培优B卷】.docx
- 2025年医院三基考试模拟试题及答案详解【网校专用】.docx
文档评论(0)