ch04数据流挖掘1概述
MiningDataStreams(Part1)MiningofMassiveDatasetsJureLeskovec,AnandRajaraman,JeffUllmanStanfordUniversityNotetootherteachersandusersoftheseslides:Wewouldbedelightedifyoufoundthisourmaterialusefulingivingyourownlectures.Feelfreetousetheseslidesverbatim,ortomodifythemtofityourownneeds.Ifyoumakeuseofasignificantportionoftheseslidesinyourownlecture,pleaseincludethismessage,oralinktoourwebsite:NewTopic:InfiniteDataJ.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,2DataStreamsInmanydataminingsituations,wedonotknowtheentiredatasetinadvanceStreamManagementisimportantwhentheinputrateiscontrolledexternally:GooglequeriesTwitterorFacebookstatusupdatesWecanthinkofthedataasinfiniteandnon-stationary(thedistributionchangesovertime)J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,34TheStreamModelInputelementsenteratarapidrate,atoneormoreinputports(i.e.,streams)WecallelementsofthestreamtuplesThesystemcannotstoretheentirestreamaccessiblyQ:Howdoyoumakecriticalcalculationsaboutthestreamusingalimitedamountof(secondary)memory?J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,Sidenote:SGDisaStreamingAlg.StochasticGradientDescent(SGD)isanexampleofastreamalgorithmInMachineLearningwecallthis:OnlineLearningAllowsformodelingproblemswherewehavea continuous streamofdataWewantanalgorithmtolearnfromitandslowlyadapttothechangesindataIdea:DoslowupdatestothemodelSGD(SVM,Perceptron)makessmallupdatesSo:Firsttraintheclassifierontrainingdata.Then:Foreveryexamplefromthestream,weslightlyupdatethemodel(usingsmalllearningrate)J.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,5GeneralStreamProcessingModelJ.Leskovec,A.Rajaraman,J.Ullman:MiningofMassiveDatasets,6ProcessorLimitedWorkingStorage...1,5,2,7,0,9,3...a,r,v,t,y,h,b...0,0,1,0,1,1,0timeStreamsEntering.Eachisstreamiscomposedofelements/tuplesAd-HocQueriesOutputArchivalStorageStandingQueriesProblemsonDataStreamsTypesofqueriesonewantsonansweronadatastream:(we’lldothesetoday)SamplingdatafromastreamConstructarandomsampleQueriesov
原创力文档

文档评论(0)