14. 大数据概述:Oracle 大数据解决方案.ppt

* Larry Page Sergey Brin wrote BigFiles; GFS (Google File System) grew out of that then MapReduce which maps problems across cluster a of worker nodes then collects results aggregates/reduces result (used to generate Google’s index of WWW) Apache came out with Hadoop (used by Facebook, Yahoo, Amazon EC2 S3) which was an Open Source version with HDFS MapReduce – Batch Processing Jobs going after distributed data processing it near the data (same node) – not super fast (seconds vs. ms) not good for interactive/analytic (No updates / only appends) Google then came out with BigTable (compressed, high performance data storage) used by Google Maps, Google Reader, Google Earth, YouTube, and Gmail Apache adds NoSQL DB’s: Cassandra HBase The NoSQL onslaught of systems started (over 100 of them) including Oracle’s NoSQL (BerkeleyDB). * Goal was to Organize Data without moving it! – Hadoop HDFS MapReduce (Cheaper way to access Petabytes). HDFS can store any type of data or structure, but MapReduce works with key/value pairs Acquire Store data – NoSQL (simple key value storage) – Amazon DynamoDB (hosted), Apache Cassandra, HBase, BigTable, MongoDB, Oracle NoSQL (distributed key value) or just use the original HDFS / GFS MapReduce (many are EVENTUALLY consistent!) Analyze Data – Google Dremel, Apache Hive Data Warehouse, Oracle Data Warehouse (OBIEE) 54% of companies doing Big Data say: “Projects are critical!” * Many in the industry have considered ACID properties as integral needs of Databases. But these are more from transactional perspective – and not necessarily required to the fullest extent in analytical situations, as long as the end state continues to be consistent. By carefully dropping certain aspects of ACID support, such systems can be geared to handle Big Data… especially the simpler types of Big Data like web-clicks. * * Notes: Notes: Notes: “As customers look to manage the huge explosion in data from new and evolving sources, such

文档评论(0)

1亿VIP精品文档

相关文档