云端运算虚拟术--云端计算资料处理技术 -- Hadoop--.pptVIP

  • 0
  • 0
  • 约3.25万字
  • 约 79页
  • 2018-10-13 发布于江苏
  • 举报

云端运算虚拟术--云端计算资料处理技术 -- Hadoop--.ppt

云端运算虚拟术--云端计算资料处理技术 -- Hadoop--

雲端運算虛擬技術 --雲端計算資料處理技術 -- Hadoop -- MapReduce 賴智錦/詹奇峰 國立高雄大學電機工程學系 2009/08/05 雲端計算資料處理技術 What is large data? From the point of view of the infrastructure required to do analytics, data comes in three sizes: Small data Medium data Large data 雲端計算資料處理技術 Small data: Small data fits into the memory of a single machine. Example: a small dataset is the dataset for the Netflix Prize. (The Netflix Prize seeks to substantially improve the accuracy of predictions about how much someone is going to love a movie based on their movie preferences.) The Netflix Prize dataset consists of over 100 million movie rating files by 480 thousand randomly-chosen, anonymous Netflix customers that rated over 17 thousand movie titles. This dataset is just 2 GB of data and fits into the memory of a laptop. 雲端計算資料處理技術 Medium data: Medium data fits into a single disk or disk array and can be managed by a database. It is becoming common today for companies to create 1 to 10 TB or large data warehouses. 雲端計算資料處理技術 Large data: Large data is so large that it is challenging to manage it in a database and instead specialized systems are used. Scientific experiments, such as the Large Hadron Collider (LHC, the worlds largest and highest-energy particle accelerator), produce large datasets. Log files produced by Google, Yahoo and Microsoft and similar companies are also examples of large datasets. 雲端計算資料處理技術 Large data sources: Most large datasets were produced by the scientific and defense communities. Two things have changed: Large datasets are now being produced by a third community: companies that provide internet services, such as search, on-line advertising and social media. The ability to analyze these datasets is critical for advertising systems that produce the bulk of the revenue for these companies. 雲端計算資料處理技術 Large data sources: Two things have changed: This provides a measure by which to measure the effectiveness of analytic infrastructure and analytic models. Using this

文档评论(0)

1亿VIP精品文档

相关文档