-
Notifications
You must be signed in to change notification settings - Fork 38
大基数roaringbitmap值频繁重复detoast的优化
Chen Huajun edited this page Dec 18, 2022
·
2 revisions
表中roaringbitmap类型的字段大小超过2K时会以toast形式存储,SQL中使用这些roaringbitmap字段时需要进行detoast。 如果一个SQL中detoast roaringbitmap数据的次数非常多,可能非常影响SQL性能。 下面是一个例子。
create table label as
select grp1,rb_build_agg((random()*100000000)::int) member_ids
from generate_series(1,1000000),generate_series(1,10)grp1
group by grp1;
create table fact as
select id,(random()*10)::int grp1, random() value1, random() value2
from generate_series(1,10000) id;
select a.grp1,rb_cardinality(rb_build_agg(id)), sum(value1) sum_value1,sum(value2) sum_value2
from fact a join label b on(a.grp1 = b.grp1)
where rb_contains(b.member_ids,id)
group by a.grp1;
这个SQL执行时间是9秒
postgres=# explain analyze
select a.grp1,rb_cardinality(rb_build_agg(id)), sum(value1) sum_value1,sum(value2) sum_value2
from fact a join label b on(a.grp1 = b.grp1)
where rb_contains(b.member_ids,id)
group by a.grp1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=920.57..2259.19 rows=200 width=28) (actual time=935.632..8999.550 rows=10 loops=1)
Group Key: a.grp1
-> Merge Join (cost=920.57..2043.51 rows=21268 width=24) (actual time=53.328..8998.784 rows=88 loops=1)
Merge Cond: (b.grp1 = a.grp1)
Join Filter: rb_contains(b.member_ids, a.id)
Rows Removed by Join Filter: 9406
-> Sort (cost=88.17..91.35 rows=1270 width=36) (actual time=0.025..0.036 rows=10 loops=1)
Sort Key: b.grp1
Sort Method: quicksort Memory: 25kB
-> Seq Scan on label b (cost=0.00..22.70 rows=1270 width=36) (actual time=0.012..0.015 rows=10 loops=1)
-> Sort (cost=832.40..857.52 rows=10048 width=24) (actual time=5.266..23.894 rows=10000 loops=1)
Sort Key: a.grp1
Sort Method: quicksort Memory: 1166kB
-> Seq Scan on fact a (cost=0.00..164.48 rows=10048 width=24) (actual time=0.014..2.643 rows=10000 loops=1)
Planning Time: 0.168 ms
Execution Time: 8999.643 ms
(16 rows)
通过perf分析,可以看出99%的时间都耗在对大基数roaringbitmap字段member_ids
的detoast上,detoast了1w次。
- 99.05% 0.00% postgres roaringbitmap.so [.] rb_exsit
rb_exsit
- pg_detoast_datum
- detoast_attr
- toast_fetch_datum
- 98.67% table_relation_fetch_toast_slice
+ 61.83% heap_fetch_toast_slice
36.83% __memcpy_ssse3_back
SQL中反复对同一个大基数的roaringbitmap值进行解序列化,可以在roaringbitmap函数内部将解序列化的结果缓存起来。
pg_roaringbitmap
的rb_cache2
分支中实现了这个原型。使用示例如下:
select a.grp1,rb_cardinality(rb_build_agg(id)), sum(value1) sum_value1,sum(value2) sum_value2
from fact a join label b on(a.grp1 = b.grp1)
where rb_cache_contains(b.member_ids,id,b.grp1::text)
group by a.grp1;
rb_cache_contains
中需要传入缓存的key,下次以相同的key调用这个函数时,直接使用第一次缓存的值。
这个SQL的执行时间是87毫秒
可以先detoast大基数roaringbitmap数据再通过CTE物化,将roaringbitmap字段值加载到内存里,避免重复detoast。
WITH b as MATERIALIZED(
select grp1, member_ids::bytea::roaringbitmap from label
)
select a.grp1,rb_cardinality(rb_build_agg(id)), sum(value1) sum_value1,sum(value2) sum_value2
from fact a join b on(a.grp1 = b.grp1)
where rb_contains(b.member_ids,id)
group by a.grp1;
这个SQL的执行时间是98毫秒。效果和方式一相同,但不需要修改插件代码。 优化后SQL的执行计划如下:
postgres=# explain analyze
WITH b as MATERIALIZED(
select grp1, member_ids::bytea::roaringbitmap from label
)
select a.grp1,rb_cardinality(rb_build_agg(id)), sum(value1) sum_value1,sum(value2) sum_value2
from fact a join b on(a.grp1 = b.grp1)
where rb_contains(b.member_ids,id)
group by a.grp1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=945.13..2274.57 rows=11 width=28) (actual time=81.166..93.566 rows=10 loops=1)
Group Key: a.grp1
CTE b
-> Seq Scan on label (cost=0.00..25.88 rows=1270 width=36) (actual time=2.997..22.095 rows=10 loops=1)
-> Merge Join (cost=919.26..2036.86 rows=21167 width=24) (actual time=79.159..93.398 rows=88 loops=1)
Merge Cond: (b.grp1 = a.grp1)
Join Filter: rb_contains(b.member_ids, a.id)
Rows Removed by Join Filter: 9406
-> Sort (cost=90.87..94.05 rows=1270 width=36) (actual time=73.392..77.978 rows=10 loops=1)
Sort Key: b.grp1
Sort Method: external merge Disk: 19600kB
-> CTE Scan on b (cost=0.00..25.40 rows=1270 width=36) (actual time=4.396..44.820 rows=10 loops=1)
-> Sort (cost=828.39..853.39 rows=10000 width=24) (actual time=5.556..7.226 rows=10000 loops=1)
Sort Key: a.grp1
Sort Method: quicksort Memory: 1166kB
-> Seq Scan on fact a (cost=0.00..164.00 rows=10000 width=24) (actual time=0.029..2.741 rows=10000 loops=1)
Planning Time: 0.214 ms
Execution Time: 98.528 ms
(18 rows)
CTE中的member_ids::bytea::roaringbitmap
类型转换是必须的,这会促使CTE物化时进行detoast。
去掉类型转换,性能和原始的SQL一样差。
postgres=# explain analyze
WITH b as MATERIALIZED(
select grp1, member_ids from label
)
select a.grp1,rb_cardinality(rb_build_agg(id)), sum(value1) sum_value1,sum(value2) sum_value2
from fact a join b on(a.grp1 = b.grp1)
where rb_contains(b.member_ids,id)
group by a.grp1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=941.96..2271.39 rows=11 width=28) (actual time=739.516..7810.720 rows=10 loops=1)
Group Key: a.grp1
CTE b
-> Seq Scan on label (cost=0.00..22.70 rows=1270 width=36) (actual time=0.009..0.011 rows=10 loops=1)
-> Merge Join (cost=919.26..2036.86 rows=21167 width=24) (actual time=55.051..7809.806 rows=88 loops=1)
Merge Cond: (b.grp1 = a.grp1)
Join Filter: rb_contains(b.member_ids, a.id)
Rows Removed by Join Filter: 9406
-> Sort (cost=90.87..94.05 rows=1270 width=36) (actual time=0.022..0.028 rows=10 loops=1)
Sort Key: b.grp1
Sort Method: quicksort Memory: 25kB
-> CTE Scan on b (cost=0.00..25.40 rows=1270 width=36) (actual time=0.012..0.016 rows=10 loops=1)
-> Sort (cost=828.39..853.39 rows=10000 width=24) (actual time=4.950..20.384 rows=10000 loops=1)
Sort Key: a.grp1
Sort Method: quicksort Memory: 1166kB
-> Seq Scan on fact a (cost=0.00..164.00 rows=10000 width=24) (actual time=0.008..2.527 rows=10000 loops=1)
Planning Time: 0.147 ms
Execution Time: 7810.785 ms
(18 rows)
- 通过CTE物化 +
::bytea::roaringbitmap
类型转换,可以有效避免大基数roaringbitmap值的重复detoast,提高SQL性能