當(dāng)前位置：維易PHP培訓(xùn)學(xué)院 > PostgreSQL數(shù)據(jù)庫(kù) > 內(nèi)容正文

PostgreSQL雕蟲小技，分組TOP性能提升44倍

作者：云棲社區(qū) 時(shí)間 2016-07-08

《PostgreSQL雕蟲小技，分組TOP性能提升44倍》要點(diǎn)：
本文介紹了PostgreSQL雕蟲小技，分組TOP性能提升44倍，希望對(duì)您有用。如果有疑問(wèn)，可以聯(lián)系我們。

相關(guān)主題：PostgreSQL教程

業(yè)務(wù)配景

按分組取出TOP值,是非經(jīng)常見的業(yè)務(wù)需求.

好比提取每位歌手的下載量TOP 10的曲目、提取每個(gè)城市納稅前10的人或企業(yè).

傳統(tǒng)辦法

傳統(tǒng)的辦法是使用窗口查詢,PostgreSQL是支持窗口查詢的.

例子

測(cè)試表和測(cè)試數(shù)據(jù),生成10000個(gè)分組,1000萬(wàn)條記錄.

postgres=# create table tbl(c1 int, c2 int, c3 int);
CREATE TABLE
postgres=# create index idx1 on tbl(c1,c2);
CREATE INDEX
postgres=# insert into tbl select mod(trunc(random()*10000)::int, 10000), trunc(random()*10000000) from generate_series(1,10000000);
INSERT 0 10000000

使用窗口查詢的執(zhí)行計(jì)劃

postgres=# explain select * from (select row_number() over(partition by c1 order by c2) as rn,* from tbl) t where t.rn<=10;
QUERY PLAN
----------------------------------------------------------------------------------------
Subquery Scan on t (cost=0.43..770563.03 rows=3333326 width=20)
Filter: (t.rn <= 10)
-> WindowAgg (cost=0.43..645563.31 rows=9999977 width=12)
-> Index Scan using idx1 on tbl (cost=0.43..470563.72 rows=9999977 width=12)
(4 rows)

使用窗口查詢的成果舉例

postgres=# select * from (select row_number() over(partition by c1 order by c2) as rn,* from tbl) t where t.rn<=10;
rn | c1 | c2 | c3
----+------+--------+----
1 | 0 | 1657 |
2 | 0 | 3351 |
3 | 0 | 6347 |
4 | 0 | 12688 |
5 | 0 | 16991 |
6 | 0 | 19584 |
7 | 0 | 24694 |
8 | 0 | 36646 |
9 | 0 | 40882 |
10 | 0 | 41599 |
1 | 1 | 14465 |
2 | 1 | 29032 |
3 | 1 | 39969 |
4 | 1 | 41094 |
5 | 1 | 69481 |
6 | 1 | 70919 |
7 | 1 | 75575 |
8 | 1 | 81102 |
9 | 1 | 87496 |
10 | 1 | 90603 |
......

使用窗口查詢的效率,20.1秒

postgres=# explain (analyze,verbose,costs,timing,buffers) select * from (select row_number() over(partition by c1 order by c2) as rn,* from tbl) t where t.rn<=10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on t (cost=0.43..770563.03 rows=3333326 width=20) (actual time=0.040..20813.469 rows=100000 loops=1)
Output: t.rn, t.c1, t.c2, t.c3
Filter: (t.rn <= 10)
Rows Removed by Filter: 9900000
Buffers: shared hit=10035535
-> WindowAgg (cost=0.43..645563.31 rows=9999977 width=12) (actual time=0.035..18268.027 rows=10000000 loops=1)
Output: row_number() OVER (?), tbl.c1, tbl.c2, tbl.c3
Buffers: shared hit=10035535
-> Index Scan using idx1 on public.tbl (cost=0.43..470563.72 rows=9999977 width=12) (actual time=0.026..11913.677 rows=10000000 loops=1)
Output: tbl.c1, tbl.c2, tbl.c3
Buffers: shared hit=10035535
Planning time: 0.110 ms
Execution time: 20833.747 ms
(13 rows)

蟲篆之技

如何優(yōu)化?

可以參考我之前寫的,使用遞歸查詢,優(yōu)化count distinct的辦法.

https://yq.aliyun.com/articles/39689

(點(diǎn)擊“閱讀原文”,方可點(diǎn)擊該連接進(jìn)入喲!)

本文同樣必要用到遞歸查詢,獲得分組ID：

postgres=# with recursive t1 as (
postgres(# (select min(c1) c1 from tbl )
postgres(# union all
postgres(# (select (select min(tbl.c1) c1 from tbl where tbl.c1>t.c1) c1 from t1 t where t.c1 is not null)
postgres(# )
postgres-# select * from t1;

寫成SRF函數(shù),如下：

postgres=# create or replace function f() returns setof tbl as $$
postgres$# declare
postgres$# v int;
postgres$# begin
postgres$# for v in with recursive t1 as (
postgres$# (select min(c1) c1 from tbl )
postgres$# union all
postgres$# (select (select min(tbl.c1) c1 from tbl where tbl.c1>t.c1) c1 from t1 t where t.c1 is not null)
postgres$# )
postgres$# select * from t1
postgres$# LOOP
postgres$# return query select * from tbl where c1=v order by c2 limit 10;
postgres$# END LOOP;
postgres$# return;
postgres$#
postgres$# end;
postgres$# $$ language plpgsql strict;
CREATE FUNCTION

優(yōu)化后的查詢成果例子

postgres=# select * from f();
c1 | c2 | c3
------+--------+----
0 | 1657 |
0 | 3351 |
0 | 6347 |
0 | 12688 |
0 | 16991 |
0 | 19584 |
0 | 24694 |
0 | 36646 |
0 | 40882 |
0 | 41599 |
1 | 14465 |
1 | 29032 |
1 | 39969 |
1 | 41094 |
1 | 69481 |
1 | 70919 |
1 | 75575 |
1 | 81102 |
1 | 87496 |
1 | 90603 |
......

優(yōu)化后,只必要464毫秒返回10000個(gè)分組的TOP 10.

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from f();
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Function Scan on public.f (cost=0.25..10.25 rows=1000 width=12) (actual time=419.218..444.810 rows=100000 loops=1)
Output: c1, c2, c3
Function Call: f()
Buffers: shared hit=170407, temp read=221 written=220
Planning time: 0.037 ms
Execution time: 464.257 ms
(6 rows)

小結(jié)

1)傳統(tǒng)的辦法使用窗口查詢,輸出多個(gè)每個(gè)分組的TOP 10,需要掃描所有的記錄.效率較低.

2)由于分組不是非常多,只有10000個(gè),所以可以選擇使用遞歸的辦法,用上索引取TOP 10,速度非常快.

3)目前PostgreSQL的遞歸語(yǔ)法不支持遞歸的啟動(dòng)表寫在subquery里面,也不支持啟動(dòng)表在遞歸查詢中使用order by,所以不能直接使用遞歸得出結(jié)果,目前必要套一層函數(shù).

更多深度技術(shù)內(nèi)容,請(qǐng)存眷云棲社區(qū)微信公眾號(hào)：yunqiinsight.

《PostgreSQL雕蟲小技，分組TOP性能提升44倍》是否對(duì)您有啟發(fā)，歡迎查看更多與《PostgreSQL雕蟲小技，分組TOP性能提升44倍》相關(guān)教程，學(xué)精學(xué)透。維易PHP學(xué)院為您提供精彩教程。

轉(zhuǎn)載請(qǐng)注明本頁(yè)網(wǎng)址：
http://www.snjht.com/jiaocheng/11566.html

標(biāo)簽：

欧美97色伦欧美一区二区日韩,国产福利片在线观看,freexxx性欧美vide0高清,西西亚洲,日本欧美国产精品第一页久久,成人18免费软件

PHP教程

WEB前端開發(fā)

數(shù)據(jù)庫(kù)

WEB服務(wù)器

APP開發(fā)

LINUX學(xué)習(xí)

后端開發(fā)課程

前端開發(fā)課程

數(shù)據(jù)庫(kù)課程

PostgreSQL雕蟲小技，分組TOP性能提升44倍

同類教程排行

特輯教程