ch04数据流挖掘1概述.pptx

ch04数据流挖掘1概述

MiningDataStreams (Part1)MiningofMassiveDatasetsJureLeskovec,AnandRajaraman,JeffUllmanStanfordUniversityNotetootherteachersandusersoftheseslides:Wewouldbedelightedifyoufoundthisourmaterialusefulingivingyourownlectures.Feelfreetousetheseslidesverbatim,ortomodifythemtofityourownneeds.Ifyoumakeuseofasignificantportionoftheseslidesinyourownlecture,pleaseincludethismessage,oralinktoourwebsite:NewTopic:InfiniteDataJ.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,2DataStreamsInmanydataminingsituations,wedonotknowtheentiredatasetinadvanceStreamManagementisimportantwhentheinputrateiscontrolledexternally:GooglequeriesTwitterorFacebookstatusupdatesWecanthinkofthedataasinfiniteand non-stationary(thedistributionchanges overtime)J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,34TheStreamModelInputelementsenteratarapidrate, atoneormoreinputports(i.e.,streams)WecallelementsofthestreamtuplesThesystemcannotstoretheentirestreamaccessiblyQ:Howdoyoumakecriticalcalculationsaboutthestreamusingalimitedamountof(secondary)memory?J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,Sidenote:SGDisaStreamingAlg.StochasticGradientDescent(SGD)isan exampleofastreamalgorithmInMachineLearningwecallthis:OnlineLearningAllowsformodelingproblemswherewehavea continuous streamofdataWewantanalgorithmtolearnfromitand slowlyadapttothechangesindataIdea:DoslowupdatestothemodelSGD(SVM,Perceptron)makessmallupdatesSo:Firsttraintheclassifierontrainingdata.Then:Foreveryexamplefromthestream,weslightlyupdatethemodel(usingsmalllearningrate)J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,5GeneralStreamProcessingModelJ.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,6ProcessorLimitedWorkingStorage...1,5,2,7,0,9,3...a,r,v,t,y,h,b...0,0,1,0,1,1,0timeStreamsEntering.Eachisstreamis composedof elements/tuplesAd-HocQueriesOutputArchivalStorageStandingQueriesProblemsonDataStreamsTypesofqueriesonewantsonansweron adatastream:(we’lldothesetoday)SamplingdatafromastreamConstructarandomsampleQueriesov

文档评论(0)

1亿VIP精品文档

相关文档