MaxCompute索引优化实践分享.pptx

下载文档 降价啦

31
0
约2.94千字
约 20页
2018-05-23 发布于山东
举报
版权申诉
保障服务

MaxCompute索引优化实践分享.pptx

1、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。。
2、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

MaxCompute索引优化实践分享阿里云高级专家戴谢宁 MaxCompute 2.0 MaxCompute的数据模型分区下没有定义数据组织方式。 MaxCompute 2.0 能否通过定义数据分片、排序和索引提高效率？ MaxCompute 2.0 哈希分片 – Hash Clustering CREATE TABLE table_name … CLUSTERED BY (col_name [, col_name, ...]) [SORTED BY (col_name [ASC | DESC] …)] INTO number_of_buckets BUCKETS MaxCompute 2.0 区域分片 – Range Clustering CREATE TABLE table_name … RANGE CLUSTERED BY (col_name [, col_name, ...]) [SORTED BY (col_name [ASC | DESC] …)] MaxCompute 2.0 基于索引的查询优化 SELECT … from table_name … WHERE id 3; MaxCompute 2.0 基于索引的查询优化 SELECT … from table_name … WHERE id 3; MaxCompute 2.0 基于索引的查询优化 SELECT … from table_name … WHERE id 3; MaxCompute 2.0 20 0 60 120 100 w/ clustering w/o clustering Query Execution Time (s) 1 0 7 6 w/ clustering w/o clustering CPU Usage (cores * minute) 1E+09 0 3 3E+09 40 2 2E+09 4E+09 6E+09 5 80 5E+09 4 8E+09 7E+09 w/ clustering w/o clustering IO Usage (bytes) MaxCompute 2.0 查询优化性能对比 TPC-H Q6 on 100GB dataset select sum(l_extendedprice * l_discount) as revenue from tpch_lineitem l where l_shipdate = 1994-01-01 and l_shipdate 1995-01-01 and l_discount = 0.05 and l_discount = 0.07 and l_quantity 24; Join优化 SELECT t1.id, t1.name, t2.name FROM t1, t2 WHERE t1.id = t2.id; MaxCompute 2.0 Join优化 MaxCompute 2.0 450 400 350 300 250 200 150 100 50 0 w/ clustering w/o clustering Query Execution Time (s) 25 20 15 10 5 0 w/ clustering w/o clustering CPU Usage (cores * minute) 45 40 35 30 25 20 15 10 5 0 w/ clustering w/o clustering Memory Usage (GB * minutes) MaxCompute 2.0 TPC-H Q4 select o_orderpriority, count(*) as order_count from tpch_orders o join (select distinct l_orderkey from (select * from tpch_lineitem where l_commitdate l_receiptdate) tab1) tab2 on tab2.l_orderkey = o.o_orderkey where o.o_orderdate = 1993-07-01 and o.o_orderdate 1993-10-01 group by o_orderpriority order by o_orderpriority limit 999999; • • • • • 基于用户ID查询数日内交易记录。扫描数据量非常大， ~3TB，~400亿条记录。数据选取率非常低。结果集通常为几十到近百条记