21-11-14
banq
在本文中,我们将了解什么是 SQL Seek搜索方法或 Keyset Pagination键集分页,以及为什么在浏览大型结果集时应该考虑它。
分页的目标是避免获取大量数据。
偏移OFFSET 分页
在讨论键集分页之前,让我们看看默认的 偏移OFFSET 分页在 SQL 中是如何工作的。
尽管关系数据库系统长期以来一直提供限制查询结果集的特定方法,但自 SQL:2008 以来,就有了标准的分页语法。
因此,限制给定结果集的记录数的TOP-N 查询可以使用该FETCH FIRST N ROWS ONLY指令,如以下示例所示:
SELECT id FROM post ORDER BY created_on DESC FETCH FIRST 50 ROWS ONLY |
并且,跳过前 M 条记录并获取接下来的 N 条记录的 NEXT-N 查询如下所示:
SELECT id FROM post ORDER BY created_on DESC OFFSET 150 ROWS FETCH NEXT 50 ROWS ONLY |
偏移分页索引
由于分页需要一个ORDER BY子句以保证一致的排序顺序,因此索引排序标准是很常见的。
在我们的例子中,我们需要在created_on列上创建以下索引:
CREATE INDEX idx_post_created_on ON post (created_on DESC) |
在执行 TOP-N 查询时,我们可以看到idx_post_created_on正在使用的,并且只扫描了 50 条记录:
SELECT id FROM post ORDER BY created_on DESC FETCH FIRST 50 ROWS ONLY Limit (cost=0.28..2.51 rows=50 width=16) (actual time=0.013..0.022 rows=50 loops=1) -> Index Scan using idx_post_created_on on post p (cost=0.28..223.28 rows=5000 width=16) (actual time=0.013..0.019 rows=50 loops=1) Planning time: 0.113 ms Execution time: 0.055 ms |
对于第二页,我们可以看到idx_post_created_on必须扫描 100 条记录,因为它需要跳过第一页中包含的前 50 行,以便加载此查询需要返回的接下来的 50 条记录:
SELECT id FROM post ORDER BY created_on DESC OFFSET 50 ROWS FETCH NEXT 50 ROWS ONLY Limit (cost=2.51..4.74 rows=50 width=16) (actual time=0.032..0.044 rows=50 loops=1) -> Index Scan using idx_post_created_on on post p (cost=0.28..223.28 rows=5000 width=16) (actual time=0.022..0.040 rows=100 loops=1) Planning time: 0.198 ms Execution time: 0.071 ms |
我们离第一页越远,为了跳过OFFSET子句指示的记录,idx_post_created_on索引需要扫描的记录就越多:
SELECT id FROM post ORDER BY created_on DESC OFFSET 4950 ROWS FETCH NEXT 50 ROWS ONLY Limit (cost=221.05..223.28 rows=50 width=16) (actual time=1.154..1.166 rows=50 loops=1) -> Index Scan using idx_post_created_on on post p (cost=0.28..223.28 rows=5000 width=16) (actual time=0.079..1.033 rows=5000 loops=1) Planning time: 1.629 ms Execution time: 1.190 ms |
请注意,扫描整个idx_post_created_on索引比扫描单个页面多 20 倍,这是初始 TOP-N 查询的情况。
SQL Seek 方法或键集分页
为了解决 OFFSET 分页中固有的索引扫描问题,我们可以使用 Seek 方法或键集分页 Keyset Pagination 技术。
TOP-N Keyset Pagination 查询如下所示:
SELECT id, created_on FROM post ORDER BY created_on DESC, id DESC FETCH FIRST 50 ROWS ONLY |
请注意,我们需要在 ORDER BY 子句中包含id ,因为created_on列值不是唯一的。因此,加载下一页时我们需要传递最后处理的created_on和id。因此,这里查询投影也需要加载created_on列。
Next-N 查询将使用以前处理created_on和id列值来定位需要加载的记录下一个页面。
SELECT id, created_on FROM post WHERE (created_on, id) < ('2019-10-02 21:00:00.0', 4951) ORDER BY created_on DESC, id DESC FETCH FIRST 50 ROWS ONLY |
(created_on, id) < ('2019-10-02 21:00:00.0', 4951) 行值表达式是等效于:
created_on < '2019-10-02 21:00:00.0' OR ( (created_on = '2019-10-02 21:00:00.0') AND (id < 4951) ) |
SQL Seek 方法或键集分页索引
因为 Seek 方法在子句中同时使用 thecreated_on和 the idcolumns ORDER BY,我们可以在这两列上创建索引idx_post_created_on:
CREATE INDEX idx_post_created_on ON post (created_on DESC, id DESC) |
现在,当执行 TOP-N Keyset Pagination 查询时,我们可以看到它使用了idx_post_created_on索引,并且只扫描了 50 条记录:
SELECT id, created_on FROM post ORDER BY created_on DESC, id DESC FETCH FIRST 50 ROWS ONLY Limit (cost=0.28..1.91 rows=50 width=16) (actual time=0.104..0.110 rows=50 loops=1) -> Index Only Scan using idx_post_created_on_id on post (cost=0.28..163.28 rows=5000 width=16) (actual time=0.102..0.107 rows=50 loops=1) Heap Fetches: 0 Planning Time: 0.882 ms Execution Time: 0.129 ms |
Next-N Keyset Pagination 查询也使用idx_post_created_on索引,并且与 OFFSET分页Pagination 不同,这次只扫描了 50 行:
SELECT id, created_on FROM post WHERE (created_on, id) < ('2019-10-02 21:00:00.0', 4951) ORDER BY created_on DESC, id DESC FETCH FIRST 50 ROWS ONLY Limit (cost=0.28..3.40 rows=50 width=32) (actual time=0.029..0.063 rows=50 loops=1) -> Index Scan using idx_post_created_on_id on post (cost=0.28..308.58 rows=4950 width=32) (actual time=0.027..0.057 rows=50 loops=1) Index Cond: ( created_on <= '2020-04-24 06:00:00'::timestamp without time zone ) Filter: ( ROW(created_on, (id)::numeric) < ROW('2020-04-24 06:00:00'::timestamp without time zone, '4951'::numeric) ) Rows Removed by Filter: 2 Heap Fetches: 52 Planning Time: 0.806 ms Execution Time: 0.158 ms |
而且,加载最后一页也会很快,因为 Keyset分页Pagination 不需要扫描整个索引来跳过 OFFSET 记录:
SELECT id, created_on FROM post WHERE (created_on, id) < ('2019-10-03 02:00:00.0', 51) ORDER BY created_on DESC, id DESC FETCH FIRST 50 ROWS ONLY Limit (cost=48.82..48.83 rows=1 width=16) (actual time=0.168..0.175 rows=50 loops=1) -> Sort (cost=48.82..48.83 rows=1 width=16) (actual time=0.166..0.170 rows=50 loops=1) Sort Key: created_on DESC, id DESC Sort Method: quicksort Memory: 27kB -> Bitmap Heap Scan on post (cost=4.76..48.81 rows=1 width=16) (actual time=0.071..0.085 rows=50 loops=1) Recheck Cond: (created_on <= '2019-10-03 02:00:00'::timestamp without time zone) Filter: ( (created_on < '2019-10-03 02:00:00'::timestamp without time zone) OR ( (created_on = '2019-10-03 02:00:00'::timestamp without time zone) AND (id < '51'::bigint) ) ) Rows Removed by Filter: 2 Heap Blocks: exact=1 -> Bitmap Index Scan on idx_post_created_on_id (cost=0.00..4.75 rows=63 width=0) (actual time=0.061..0.062 rows=52 loops=1) Index Cond: (created_on <= '2019-10-03 02:00:00'::timestamp without time zone) Planning Time: 0.676 ms Execution Time: 0.279 ms |