- 1、本文档共15页,可阅读全部内容。
- 2、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。
- 3、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
《BagBoo Bagging the Gradient》.pdf
BagBoo: Bagging the Gradient Boosting
3rd in RR (track I and II)
1st in nDCG (track II) and 2nd in nDCG (track I)
Dmitry Pavlov and Cliff Brunk
aka JOKER
aka team_404
Yandex Labs ()
{dmitry-pavlov,cliff}@yandex-team.ru
Yandex
• Yandex.ru Y
• Leading search engine in Russia w/65%+ search
market share
• Labs office in Palo Alto, CA
• We are doing a lot of technology innovations
• … and We are hiring!
• Come chat with us or send a message
Bagging (Breiman)
• Ensemble of models
– Sampling data
– Voting models
• Random Forest
– Models are functions of iid random vectors
– Assume Model = Tree WLOG from now on
• Nice properties
– Variance reduction
– Resistance to overfitting
– Efficient parallelizable computation
Gradient Boosting (Friedman)
• Ensemble of models
– Each next model learned to optimize residual error
– Weight in the linear combination are optimized
– Randomness/Stochastisity similar to Bagging
• Nice properties
– Bias reduction
• Hard to parallelize
BagBoo: combined Bagging and Boosting
• Combine the best of both worlds:
– get highly parallelizable algorithm with bias and variance
reduction properties
1. Input: Training data D, Nbag and Nboo iter.
2. Output: Random Forest of Nbag x Nboo trees
3. For i=1 to Nbag do
D[i] := SampleData(D) ; # samp. feats and records
BT[i] := BoostedTree(D[i], Nboo) ;
EndFor
4. Output: additive model \sum_i { BT[i] }
BagBoo: highly parallelizable algorithm
• Accurate
– Excellent results in contests and on TREC benchmarks
• Fast
– Can train many trees fast
• Gotchas
– need to control learning rate
– winning the contest with many trees
您可能关注的文档
- 《Argo Delayed-Mode Process》.ppt
- 《Argo Endurance》.pdf
- 《Argo Intelligent Advertising》.pdf
- 《Argo perspective》.pdf
- 《Argo 剖面数据格式说明》.pdf
- 《ARGO-YBJ experiment》.pdf
- 《Argophilia Travel News Albania》.pdf
- 《ArgoUWE A CASE Tool for Web》.pdf
- 《ARM8000系列工控主板》.pdf
- 《Arts and Crafts from Guatemala》.pdf
- 新高考生物二轮复习讲练测第6讲 遗传的分子基础(检测) (原卷版).docx
- 新高考生物二轮复习讲练测第12讲 生物与环境(检测)(原卷版).docx
- 新高考生物二轮复习讲练测第3讲 酶和ATP(检测)(原卷版).docx
- 新高考生物二轮复习讲练测第9讲 神经调节与体液调节(检测)(原卷版).docx
- 新高考生物二轮复习讲练测第11讲 植物生命活动的调节(讲练)(原卷版).docx
- 新高考生物二轮复习讲练测第8讲 生物的变异、育种与进化(检测)(原卷版).docx
- 新高考生物二轮复习讲练测第5讲 细胞的分裂、分化、衰老和死亡(讲练)(原卷版).docx
- 新高考生物二轮复习讲练测第5讲 细胞的分裂、分化、衰老和死亡(检测)(原卷版).docx
- 新高考生物二轮复习讲练测第12讲 生物与环境(讲练)(原卷版).docx
- 新高考生物二轮复习讲练测第11讲 植物生命活动的调节(检测)(原卷版).docx
文档评论(0)