大数据分析师的卓越之道数据分析的典型场景DataValueDataKnowledgeDiscoveryValueInfrastructure新的世界观:不确定的世界大数据的测不准《自然》:测不准《科学》:大数据傲慢数据分析方法论的升级HypothesesCollectionPreparationAnalyticsInterpretationEvaluation数据分析方法论的升级HypothesesCollectionPreparationAnalyticsInterpretationEvaluationHypotheses机械地发掘相关性和假设直觉,拿侦探小说练手阅读广泛涉猎跨界思维碰撞融入业务部门防止数据采集与分析、业务与数据分析的脱节数据分析方法论的升级HypothesesCollectionPreparationAnalyticsInterpretationEvaluation数据!数据!数据!n=All !Enterprise Data Warehouse ? Enterprise Data Hub/Data LakeExternal data sourcesStructured ? semi-structured ? unstructuredLog analysisText analysisImage/videoData with geo and temporal tagsNetworks and graphs数据?数据?数据?n=All ?More data vs. sampling“Raw data” is an oxymoronSignals and noisesSampling biasData exchange and sharingData rights, data pricingData lifecycle managementProvenance capture, representation, and queryingSometimes data are not assets, but costs数据分析方法论的升级HypothesesCollectionPreparationAnalyticsInterpretationEvaluation数据质量:重中之重Noisy, biased and polluted data are unavoidableGoal: models = components for noise + relatively complex models for signalCleansing, validation, …Can it start with a small subset? Can the process be automated?Work together with visualization, machine learningCuration, Wrangling, …Automated learning to discover structure, resolve entities, and transform data数据表示Reduce compute and communication complexitySparse, compressed data structureApproximate computationReduce statistical complexityDimensionality reduction, clusteringSamplingNon-random sampling, compressive sensing, … …Choose best representation for specific computational methodsE.g. tables for data parallelism, networks/graphs for graph parallelismUIMA: Unstructured Information Management Architecture数据分析方法论的升级HypothesesCollectionPreparationAnalyticsInterpretationEvaluationComputational ScienceSource: 检查自身装备检查自身装备ML PipelineScikit-learn style pipelines拥抱云的世界all models are wrong, but some are useful刺猬(一招鲜吃遍天) vs. 狐狸(一把钥匙开一把锁)模型的复杂度与问题匹配:奥卡姆剃刀原理如何做到数据越多、边际收益越大?数据不可名状的功效:简
您可能关注的文档
- 隧道洞口工程(进口)开工报告(含方案).doc
- 高中励志班会.ppt
- 隧道机电工程施工方案.doc
- 量子阱、超晶格中的电子态.ppt
- 高中历史人教版必修2第2课课件.ppt
- 隧道质量培训课件.ppt
- 高中人教版单词表.docx
- 逃离德黑兰 参考资料.doc
- 铝模工艺水电安装精确定位标准做法(主体结构预埋).ppt
- 量子力学 2 波函数与薛定格方程.ppt
- 小区绿化施工协议书.docx
- 墙面施工协议书.docx
- 1 古诗二首(课件)--2025-2026学年统编版语文二年级下册.pptx
- (2026春新版)部编版八年级道德与法治下册《3.1《公民基本权利》PPT课件.pptx
- (2026春新版)部编版八年级道德与法治下册《4.3《依法履行义务》PPT课件.pptx
- (2026春新版)部编版八年级道德与法治下册《6.2《按劳分配为主体、多种分配方式并存》PPT课件.pptx
- (2026春新版)部编版八年级道德与法治下册《6.1《公有制为主体、多种所有制经济共同发展》PPT课件.pptx
- 初三教学管理交流发言稿.docx
- 小学生课外阅读总结.docx
- 餐饮门店夜经济运营的社会责任报告(夜间贡献)撰写流程试题库及答案.doc
原创力文档

文档评论(0)