- 1、原创力文档(book118)网站文档一经付费(服务费),不意味着购买了该文档的版权,仅供个人/单位学习、研究之用,不得用于商业用途,未经授权,严禁复制、发行、汇编、翻译或者网络传播等,侵权必究。。
- 2、本站所有内容均由合作方或网友上传,本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺!文档内容仅供研究参考,付费前请自行鉴别。如您付费,意味着您自己接受本站规则且自行承担风险,本站不退款、不进行额外附加服务;查看《如何避免下载的几个坑》。如果您已付费下载过本站文档,您可以点击 这里二次下载。
- 3、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等,请点击“版权申诉”(推荐),也可以打举报电话:400-050-0827(电话支持时间:9:00-18:30)。
查看更多
MaxCompute索引优化实践分享
阿里云高级专家 戴谢宁
MaxCompute 2.0
MaxCompute的数据模型
分区下没有定义数据组织方式。
MaxCompute 2.0
能否通过定义数据分片、排序和索引
提高效率?
MaxCompute 2.0
哈希分片 – Hash Clustering
CREATE TABLE table_name
…
CLUSTERED BY (col_name [, col_name, ...])
[SORTED BY (col_name [ASC | DESC] …)]
INTO number_of_buckets BUCKETS
MaxCompute 2.0
区域分片 – Range Clustering
CREATE TABLE table_name
…
RANGE CLUSTERED BY (col_name [, col_name, ...])
[SORTED BY (col_name [ASC | DESC] …)]
MaxCompute 2.0
基于索引的查询优化
SELECT … from table_name
…
WHERE id 3;
MaxCompute 2.0
基于索引的查询优化
SELECT … from table_name
…
WHERE id 3;
MaxCompute 2.0
基于索引的查询优化
SELECT … from table_name
…
WHERE id 3;
MaxCompute 2.0
20
0
60
120
100
w/ clustering
w/o clustering
Query Execution Time (s)
1
0
7
6
w/ clustering
w/o clustering
CPU Usage (cores * minute)
1E+09
0
3
3E+09
40
2
2E+09
4E+09
6E+09
5
80
5E+09
4
8E+09
7E+09
w/ clustering
w/o clustering
IO Usage (bytes)
MaxCompute 2.0
查询优化性能对比
TPC-H Q6 on 100GB dataset
select sum(l_extendedprice * l_discount) as revenue
from tpch_lineitem l
where l_shipdate = 1994-01-01 and l_shipdate 1995-01-01
and l_discount = 0.05 and l_discount = 0.07
and l_quantity 24;
Join优化
SELECT t1.id, t1.name, t2.name
FROM t1, t2
WHERE t1.id = t2.id;
MaxCompute 2.0
Join优化
MaxCompute 2.0
450
400
350
300
250
200
150
100
50
0
w/ clustering
w/o clustering
Query Execution
Time (s)
25
20
15
10
5
0
w/ clustering
w/o clustering
CPU Usage (cores
* minute)
45
40
35
30
25
20
15
10
5
0
w/ clustering
w/o clustering
Memory Usage (GB *
minutes)
MaxCompute 2.0
TPC-H Q4
select o_orderpriority, count(*) as order_count from tpch_orders o join
(select distinct l_orderkey from
(select * from tpch_lineitem where l_commitdate l_receiptdate) tab1) tab2
on tab2.l_orderkey = o.o_orderkey
where o.o_orderdate = 1993-07-01 and o.o_orderdate 1993-10-01
group by o_orderpriority order by o_orderpriority limit 999999;
•
•
•
•
•
基于用户ID查询数日内交易记录。
扫描数据量非常大, ~3TB,~400亿条记录。
数据选取率非常低。结果集通常为几十到近百条记
文档评论(0)