基于关系型数据库的数据挖掘:传统正面临挑战,常用的调优手段,一体机的理念,大数据的存储与分析 * 大数据市场正处于井喷式发展阶段,国家领导人对高数据高度重视,各地大数据产业蓬勃兴起,政治局委员汪洋在广东任职期间主抓大数据,互联网领军人物雷军力推国家大数据发展规划。大数据落地的5大成功因素(基础设施、产业链、人才、技术、立法),大数据基础设施是基础,潜藏着巨大的商机。 * 用户行为分析,反欺诈,反洗钱 * * Cluster filesystem - a distributed filesystem that is not a single server with a set of clients, but instead a cluster of servers that all work together to provide high performance service to their clients. To the clients the cluster is transparent - it is just "the filesystem", but the filesystem software deals with distributing requests to elements of the storage cluster. Examples include: HP (DEC) Tru64 cluster and Spinnaker is a clustered NAS (NFS) service. Panasas ActiveScale is a cluster filesystem Parallel filesystem - file systems with support for parallel applications, all nodes may be accessing the same files at the same time, concurrently reading and writing. Data for a single file is striped across multiple storage nodes to provide scalable performance to individual files. Examples of this include: Panasas ActiveScale, Lustre, GPFS and Sistina. NFSv4.1 will feature an extension to the NFS standard that supports parallel IO. These definitions overlap. * Level 4, Distributed file systems, such as Locus, Sun Network File System (NFS) and CMU Andrew , where multiple users who are physically dispersed in a network of autonomous computers share in the use of a common file system. New issues location transparency - dynamically maps file names to storage sites Availability – Fault Tolerance – Replication – Consistency Distributed Storage is software for files and directories synchronization locally and between many remote computers connected via LAN or Internet. wiki Consistency, availability and performance tend to be mutually contradictory goals in a distributed system. * object means an ordered set of bytes (within the OSD) that is associated with a unique identifier. * …老问题新需求 Cloud storage is a model of networked online storage where data is


