目录结构
注:提前言明 本文借鉴了以下博主、书籍或网站的内容,其列表如下:
1、参考书籍:《PostgreSQL数据库内核分析》
2、参考书籍:《数据库事务处理的艺术:事务管理与并发控制》
3、PostgreSQL数据库仓库链接,点击前往
4、日本著名PostgreSQL数据库专家 铃木启修 网站主页,点击前往
5、参考书籍:《PostgreSQL中文手册》
6、参考书籍:《PostgreSQL指南:内幕探索》,点击前往
1、本文内容全部来源于开源社区 GitHub和以上博主的贡献,本文也免费开源(可能会存在问题,评论区等待大佬们的指正)
2、本文目的:开源共享 抛砖引玉 一起学习
3、本文不提供任何资源 不存在任何交易 与任何组织和机构无关
4、大家可以根据需要自行 复制粘贴以及作为其他个人用途,但是不允许转载 不允许商用 (写作不易,还请见谅 💖)
5、本文内容基于PostgreSQL master源码开发而成
深入理解PostgreSQL数据库之 处理HAVING子句 的使用和实现
- 文章快速说明索引
- HAVING 子句使用
- HAVING 子句优化

文章快速说明索引
学习目标:
做数据库内核开发久了就会有一种 少年得志,年少轻狂 的错觉,然鹅细细一品觉得自己其实不算特别优秀 远远没有达到自己想要的。也许光鲜的表面掩盖了空洞的内在,每每想到于此,皆有夜半临渊如履薄冰之感。为了睡上几个踏实觉,即日起 暂缓其他基于PostgreSQL数据库的兼容功能开发,近段时间 将着重于学习分享Postgres的基础知识和实践内幕。
学习内容:(详见目录)
1、处理HAVING子句 的使用和实现
学习时间:
2025年02月25日 20:19:52
学习产出:
1、PostgreSQL数据库基础知识回顾 1个
2、CSDN 技术博客 1篇
3、PostgreSQL数据库内核深入学习
注:下面我们所有的学习环境是Centos8+PostgreSQL master+Oracle19C+MySQL8.0
postgres=# select version();
version
---------------------------------------------------------------------------------
PostgreSQL 18devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.1.0, 64-bit
(1 row)
postgres=#
#-----------------------------------------------------------------------------#
SQL> select * from v$version;
BANNER Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production
BANNER_FULL Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production Version 19.17.0.0.0
BANNER_LEGACY Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production
CON_ID 0
#-----------------------------------------------------------------------------#
mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.27 |
+-----------+
1 row in set (0.06 sec)
mysql>
HAVING 子句使用
一、在 PostgreSQL 中,HAVING 子句用于过滤分组后的结果,通常与 GROUP BY 一起使用。它的作用是对分组后的数据进行条件筛选,类似于 WHERE 子句,但 WHERE 是在分组前过滤行,而 HAVING 是在分组后过滤组。使用场景:
- 当你需要对分组后的结果进行条件筛选时,使用 HAVING。
- HAVING 通常与聚合函数(如 COUNT、SUM、AVG 等)一起使用。
postgres=# CREATE TABLE orders (
postgres(# order_id SERIAL PRIMARY KEY,
postgres(# customer_id INT NOT NULL,
postgres(# amount NUMERIC NOT NULL
postgres(# );
CREATE TABLE
postgres=#
postgres=# INSERT INTO orders (customer_id, amount) VALUES
postgres-# (101, 100),
postgres-# (102, 200),
postgres-# (101, 150),
postgres-# (103, 300),
postgres-# (102, 250),
postgres-# (101, 50),
postgres-# (103, 400);
INSERT 0 7
postgres=# table orders;
order_id | customer_id | amount
----------+-------------+--------
1 | 101 | 100
2 | 102 | 200
3 | 101 | 150
4 | 103 | 300
5 | 102 | 250
6 | 101 | 50
7 | 103 | 400
(7 rows)
postgres=#
postgres=# SELECT customer_id, SUM(amount) AS total_amount, COUNT(order_id) AS order_count
postgres-# FROM orders
postgres-# GROUP BY customer_id;
customer_id | total_amount | order_count
-------------+--------------+-------------
101 | 300 | 3
103 | 700 | 2
102 | 450 | 2
(3 rows)
postgres=#
-- 示例 1:筛选总金额大于 300 且 订单数大于 1 的客户
postgres=# SELECT customer_id, SUM(amount) AS total_amount, COUNT(order_id) AS order_count
postgres-# FROM orders
postgres-# GROUP BY customer_id
postgres-# HAVING SUM(amount) > 300 AND COUNT(order_id) > 1;
customer_id | total_amount | order_count
-------------+--------------+-------------
103 | 700 | 2
102 | 450 | 2
(2 rows)
-- 示例 2:筛选总金额大于 300 或 订单数大于 2 的客户
postgres=# SELECT customer_id, SUM(amount) AS total_amount, COUNT(order_id) AS order_count
postgres-# FROM orders
postgres-# GROUP BY customer_id
postgres-# HAVING SUM(amount) > 300 OR COUNT(order_id) > 2;
customer_id | total_amount | order_count
-------------+--------------+-------------
101 | 300 | 3
103 | 700 | 2
102 | 450 | 2
(3 rows)
-- 示例 3:筛选总金额大于 200 且 平均金额小于 250 的客户
postgres=# SELECT customer_id, SUM(amount) AS total_amount, AVG(amount) AS avg_amount
postgres-# FROM orders
postgres-# GROUP BY customer_id
postgres-# HAVING SUM(amount) > 200 AND AVG(amount) < 250;
customer_id | total_amount | avg_amount
-------------+--------------+----------------------
101 | 300 | 100.0000000000000000
102 | 450 | 225.0000000000000000
(2 rows)
postgres=#
二、在 PostgreSQL 中,HAVING 子句通常是和 GROUP BY 一起使用的,用于对分组后的结果进行过滤。但是,HAVING 也可以不搭配 GROUP BY 使用,不过这种情况非常少见,且只有在特定场景下才有意义。
postgres=# SELECT SUM(amount) AS total_amount FROM orders;
total_amount
--------------
1450
(1 row)
postgres=#
-- 查询所有订单的总金额是否大于 1000 如果总金额大于 1000,则返回结果:
postgres=# SELECT SUM(amount) AS total_amount
postgres-# FROM orders
postgres-# HAVING SUM(amount) > 1000;
total_amount
--------------
1450
(1 row)
-- 如果总金额不大于 1000,则不返回任何结果:
postgres=# SELECT SUM(amount) AS total_amount
FROM orders
HAVING SUM(amount) > 10000;
total_amount
--------------
(0 rows)
postgres=#
如果查询中没有使用聚合函数,直接使用 HAVING 会报错。例如:
postgres=# SELECT customer_id
postgres-# FROM orders
postgres-# HAVING customer_id = 101;
2025-02-20 07:05:27.035 PST [246800] ERROR: column "orders.customer_id" must appear in the GROUP BY clause or be used in an aggregate function at character 8
2025-02-20 07:05:27.035 PST [246800] STATEMENT: SELECT customer_id
FROM orders
HAVING customer_id = 101;
ERROR: column "orders.customer_id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT customer_id
^
postgres=#
对比 HAVING 和 WHERE:WHERE 用于在分组前过滤行;HAVING 用于在分组后过滤组(或聚合结果):
postgres=# SELECT SUM(amount) AS total_amount, COUNT(order_id) AS order_count FROM orders;
total_amount | order_count
--------------+-------------
1450 | 7
(1 row)
postgres=#
-- 查询所有订单的总金额是否大于 1000,并且订单数大于 5:
postgres=# SELECT SUM(amount) AS total_amount, COUNT(order_id) AS order_count
postgres-# FROM orders
postgres-# HAVING SUM(amount) > 1000 AND COUNT(order_id) > 5;
total_amount | order_count
--------------+-------------
1450 | 7
(1 row)
postgres=#
HAVING 子句优化
[postgres@localhost:~/test/bin]$ cat /home/postgres/zhangshujie_op/1.sql
create table student(sno int primary key, sname varchar(10), ssex int);
create table course(cno int primary key, cname varchar(10), tno int);
create table score(sno int, cno int, degree int);
create table teacher(tno int primary key, tname varchar(10), tsex int);
insert into student values(5, 'zs', 1), (3, 'ls', 0), (2, 'ww', 1), (4, 'zl', 1), (1, 'lq', 0);
insert into course values(1, 'English', 2), (2, 'Math', 5), (3, 'Data', 3), (4, 'Design', 5), (5, 'Phys', 6);
insert into score values(2, 1, 60), (3, 2, 50), (1, 3, 80), (1, 5, 90), (4, 4, 85), (3, 3, 99), (5, 1, 78);
insert into teacher values(1, 'Jim', 1), (2, 'Tom', 0), (3, 'Lucy', 1), (4, 'Dadge', 0), (5, 'Benny', 1);
[postgres@localhost:~/test/bin]$
postgres=# \i /home/postgres/zhangshujie_op/1.sql
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
INSERT 0 5
INSERT 0 5
INSERT 0 7
INSERT 0 5
postgres=#
postgres=# \d+ score
Table "public.score"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
sno | integer | | | | plain | | |
cno | integer | | | | plain | | |
degree | integer | | | | plain | | |
Access method: heap
postgres=# table score ;
sno | cno | degree
-----+-----+--------
2 | 1 | 60
3 | 2 | 50
1 | 3 | 80
1 | 5 | 90
4 | 4 | 85
3 | 3 | 99
5 | 1 | 78
(7 rows)
postgres=#
-- sno > 0 条件过滤,中间结果如下:
postgres=# select sum(degree), sno, cno from score where sno > 0 group by sno, cno;
sum | sno | cno
-----+-----+-----
99 | 3 | 3
90 | 1 | 5
80 | 1 | 3
85 | 4 | 4
78 | 5 | 1
50 | 3 | 2
60 | 2 | 1
(7 rows)
-- sno > 0 and cno > 3 条件过滤,中间结果如下:
postgres=# select sum(degree), sno, cno from score where sno > 0 and cno > 3 group by sno, cno;
sum | sno | cno
-----+-----+-----
90 | 1 | 5
85 | 4 | 4
(2 rows)
postgres=#
-- 在下面情况下能够提高查询的效率,因为我们可以在分组和聚合之前减少数据量。
-- 可以假定是 7选1
postgres=# select sum(degree), sno, cno from score where sno > 0 group by sno, cno having sum(degree) >= 90 and cno > 3;
sum | sno | cno
-----+-----+-----
90 | 1 | 5
(1 row)
-- 二选一
postgres=# select sum(degree), sno, cno from score where sno > 0 and cno > 3 group by sno, cno having sum(degree) >= 90;
sum | sno | cno
-----+-----+-----
90 | 1 | 5
(1 row)
-- 两者的执行计划,如下:
postgres=# explain select sum(degree), sno, cno from score where sno > 0 group by sno, cno having sum(degree) >= 90 and cno > 3;
QUERY PLAN
---------------------------------------------------------------
HashAggregate (cost=42.30..44.06 rows=47 width=16)
Group Key: sno, cno
Filter: (sum(degree) >= 90)
-> Seq Scan on score (cost=0.00..40.60 rows=227 width=12)
Filter: ((sno > 0) AND (cno > 3))
(5 rows)
postgres=# explain select sum(degree), sno, cno from score where sno > 0 and cno > 3 group by sno, cno having sum(degree) >= 90;
QUERY PLAN
---------------------------------------------------------------
HashAggregate (cost=42.30..44.06 rows=47 width=16)
Group Key: sno, cno
Filter: (sum(degree) >= 90)
-> Seq Scan on score (cost=0.00..40.60 rows=227 width=12)
Filter: ((sno > 0) AND (cno > 3))
(5 rows)
postgres=#
如上,在 Having
子句中,有些约束条件是可以转变为过滤条件的,这里对 Having
子句中的约束条件进行了拆分。从上面的示例可以看出, cno>3
这个约束条件己经变成了 score 表扫描路径上的过滤条件,而SUM(degree) > 90
这个约束条件则保留在了原来的位置。
deepseek的解析,如下:
/*
从如上的示例和查询计划中,可以看出 PostgreSQL 在处理 `HAVING` 子句时,对约束条件进行了优化和拆分。
具体来说,PostgreSQL 会将 `HAVING` 子句中的部分约束条件提前到 `WHERE` 子句或表扫描阶段进行处理,以减少后续聚合操作的数据量。
以下是对该查询计划的分析:
### 查询语句
explain select sum(degree), sno, cno
from score
where sno > 0
group by sno, cno
having sum(degree) >= 90 and cno > 3;
### 查询计划分析
1. **Seq Scan on score**:
- 这是查询的第一步,PostgreSQL 对 `score` 表进行顺序扫描(Seq Scan)。
- 在扫描过程中,PostgreSQL 应用了 `WHERE` 子句中的过滤条件 `(sno > 0) AND (cno > 3)`,这意味着只有满足 `sno > 0` 且 `cno > 3` 的行才会被保留。
- 这一步的目的是减少数据量,避免在后续的聚合操作中处理不必要的数据。
2. **HashAggregate**:
- 在过滤后的数据上,PostgreSQL 使用 `HashAggregate` 进行聚合操作。
- 聚合的键是 `sno` 和 `cno`,即按照这两个字段进行分组。
- 在聚合过程中,PostgreSQL 计算每个组的 `sum(degree)`。
3. **Filter: (sum(degree) > 90)**:
- 在聚合完成后,PostgreSQL 应用 `HAVING` 子句中的条件 `sum(degree) > 90`,过滤掉不满足条件的组。
- 注意,`cno > 3` 这个条件已经在表扫描阶段被处理了,因此在 `HAVING` 子句中不再需要重复处理。
### 约束条件的拆分与优化
- **`cno > 3`**:
- 这个条件是一个简单的过滤条件,可以在表扫描阶段直接应用,因为它不依赖于聚合结果。
- PostgreSQL 将其提前到 `WHERE` 子句中,减少了后续聚合操作的数据量。
- **`sum(degree) > 90`**:
- 这个条件依赖于聚合函数的结果,因此必须在聚合完成后才能应用。
- PostgreSQL 将其保留在 `HAVING` 子句中,作为聚合后的过滤条件。
### 总结
- PostgreSQL 对 `HAVING` 子句中的约束条件进行了优化拆分,将可以在表扫描阶段处理的约束条件(如 `cno > 3`)提前到 `WHERE` 子句中,以减少数据量。
- 依赖于聚合结果的约束条件(如 `sum(degree) > 90`)则保留在 `HAVING` 子句中,作为聚合后的过滤条件。
- 这种优化策略可以提高查询性能,减少不必要的计算和内存消耗。
如果你对这部分源代码的逻辑感兴趣,可以进一步研究 PostgreSQL 的查询优化器(planner)和执行器(executor)的实现细节,特别是与 `HAVING` 子句和聚合操作相关的部分。
*/
这块内容在张树杰书中是3.6章,但是讲解比较简单。接下来我们今天学习一下这块的源码实现:
// src/backend/optimizer/plan/planner.c
/*--------------------
* subquery_planner
* Invokes the planner on a subquery. We recurse to here for each
* sub-SELECT found in the query tree.
* 在子查询上调用规划器。我们对查询树中找到的每个子 SELECT 递归到这里。
*
* glob is the global state for the current planner run.
* parse is the querytree produced by the parser & rewriter.
* parent_root is the immediate parent Query's info (NULL at the top level).
* hasRecursion is true if this is a recursive WITH query.
* tuple_fraction is the fraction of tuples we expect will be retrieved.
* tuple_fraction is interpreted as explained for grouping_planner, below.
* setops is used for set operation subqueries to provide the subquery with
* the context in which it's being used so that Paths correctly sorted for the
* set operation can be generated. NULL when not planning a set operation
* child, or when a child of a set op that isn't interested in sorted input.
*
* glob 是当前规划器运行的全局状态。
* parse 是解析器和重写器生成的查询树。
* parent_root 是直接父查询的信息(顶层为 NULL)。
* 如果这是递归 WITH 查询,则 hasRecursion 为真。
* tuple_fraction 是我们预期将检索到的元组的比例。tuple_fraction 的解释如下 grouping_planner 中所述。
* setops 用于集合操作子查询,为子查询提供其使用上下文,以便可以生成正确排序的集合操作路径。当不计划集合操作子项时,或者当集合操作的子项对排序输入不感兴趣时,为 NULL。
*
* Basically, this routine does the stuff that should only be done once
* per Query object. It then calls grouping_planner. At one time,
* grouping_planner could be invoked recursively on the same Query object;
* that's not currently true, but we keep the separation between the two
* routines anyway, in case we need it again someday.
* 基本上,此例程执行每个查询对象只应执行一次的操作。
* 然后它调用 grouping_planner。
* 曾经,grouping_planner 可以在同一个 Query 对象上递归调用;
* 目前还不是这样,但我们无论如何都会保持这两个例程之间的分离,以防将来有一天我们再次需要它。
*
* subquery_planner will be called recursively to handle sub-Query nodes
* found within the query's expressions and rangetable.
* subquery_planner 将以递归方式调用,以处理在查询的表达式和范围表中找到的子查询节点。
*
* Returns the PlannerInfo struct ("root") that contains all data generated
* while planning the subquery. In particular, the Path(s) attached to
* the (UPPERREL_FINAL, NULL) upperrel represent our conclusions about the
* cheapest way(s) to implement the query. The top level will select the
* best Path and pass it through createplan.c to produce a finished Plan.
*
* 返回 PlannerInfo 结构(“root”),其中包含在规划子查询时生成的所有数据。
* 特别是,附加到 (UPPERREL_FINAL, NULL) upperrel 的路径代表我们关于实现查询的最便宜方法的结论。
* 顶层将选择最佳路径并将其传递给 createplan.c 以生成完成的计划。
*--------------------
*/
PlannerInfo *
subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
bool hasRecursion, double tuple_fraction,
SetOperationStmt *setops)
{
...
parse->havingQual = preprocess_expression(root, parse->havingQual,
EXPRKIND_QUAL);
...
/*
* In some cases we may want to transfer a HAVING clause into WHERE. We
* cannot do so if the HAVING clause contains aggregates (obviously) or
* volatile functions (since a HAVING clause is supposed to be executed
* only once per group). We also can't do this if there are any nonempty
* grouping sets and the clause references any columns that are nullable
* by the grouping sets; moving such a clause into WHERE would potentially
* change the results. (If there are only empty grouping sets, then the
* HAVING clause must be degenerate as discussed below.)
* 在某些情况下,我们可能希望将 HAVING 子句转移到 WHERE 中。
* 如果 HAVING 子句包含聚合(显然)或易失性函数(因为 HAVING 子句应该每个组只执行一次),我们就不能这样做。
* 如果存在任何非空分组集并且子句引用分组集可空的任何列,我们也不能这样做;将这样的子句移入 WHERE 可能会改变结果。
* (如果只有空分组集,则 HAVING 子句必须退化,如下所述。)
*
* Also, it may be that the clause is so expensive to execute that we're
* better off doing it only once per group, despite the loss of
* selectivity. This is hard to estimate short of doing the entire
* planning process twice, so we use a heuristic: clauses containing
* subplans are left in HAVING. Otherwise, we move or copy the HAVING
* clause into WHERE, in hopes of eliminating tuples before aggregation
* instead of after.
* 此外,该子句的执行成本可能非常高,因此尽管选择性会下降,我们还是最好只对每个组执行一次。
* 除非将整个规划过程执行两次,否则很难估计,因此我们使用启发式方法:包含子计划的子句留在 HAVING 中。
* 否则,我们将 HAVING 子句移动或复制到 WHERE 中,希望在聚合之前而不是之后消除元组。
*
* If the query has explicit grouping then we can simply move such a
* clause into WHERE; any group that fails the clause will not be in the
* output because none of its tuples will reach the grouping or
* aggregation stage. Otherwise we must have a degenerate (variable-free)
* HAVING clause, which we put in WHERE so that query_planner() can use it
* in a gating Result node, but also keep in HAVING to ensure that we
* don't emit a bogus aggregated row. (This could be done better, but it
* seems not worth optimizing.)
* 如果查询具有显式分组,那么我们可以简单地将这样的子句移动到 WHERE 中;
* 任何不符合该子句的组都不会出现在输出中,因为它的任何元组都不会进入分组或聚合阶段。
* 否则,我们必须有一个退化的(无变量的)HAVING 子句,我们将其放入 WHERE 中,以便 query_planner() 可以在门控结果节点中使用它,但也保留在 HAVING 中以确保我们不会发出虚假的聚合行。
* (这可以做得更好,但似乎不值得优化。)
*
* Note that a HAVING clause may contain expressions that are not fully
* preprocessed. This can happen if these expressions are part of
* grouping items. In such cases, they are replaced with GROUP Vars in
* the parser and then replaced back after we've done with expression
* preprocessing on havingQual. This is not an issue if the clause
* remains in HAVING, because these expressions will be matched to lower
* target items in setrefs.c. However, if the clause is moved or copied
* into WHERE, we need to ensure that these expressions are fully
* preprocessed.
* 请注意,HAVING 子句可能包含未完全预处理的表达式。
* 如果这些表达式是分组项的一部分,则可能会发生这种情况。
* 在这种情况下,它们将在解析器中被 GROUP Vars 替换,然后在我们完成对 havingQual 的表达式预处理后再替换回来。
* 如果子句保留在 HAVING 中,这不是问题,因为这些表达式将与 setrefs.c 中的较低目标项匹配。
* 但是,如果子句被移动或复制到 WHERE,我们需要确保这些表达式经过完全预处理。
*
* Note that both havingQual and parse->jointree->quals are in
* implicitly-ANDed-list form at this point, even though they are declared
* as Node *.
* 请注意,此时 havingQual 和 parse->jointree->quals 都采用隐式 ANDed 列表形式,即使它们被声明为 Node *。
*/
newHaving = NIL;
foreach(l, (List *) parse->havingQual)
{
Node *havingclause = (Node *) lfirst(l);
if (contain_agg_clause(havingclause) ||
contain_volatile_functions(havingclause) ||
contain_subplans(havingclause) ||
(parse->groupClause && parse->groupingSets &&
bms_is_member(root->group_rtindex, pull_varnos(root, havingclause))))
{
/* keep it in HAVING */
newHaving = lappend(newHaving, havingclause);
}
else if (parse->groupClause)
{
Node *whereclause;
/* Preprocess the HAVING clause fully */
whereclause = preprocess_expression(root, havingclause,
EXPRKIND_QUAL);
/* ... and move it to WHERE */
parse->jointree->quals = (Node *)
list_concat((List *) parse->jointree->quals,
(List *) whereclause);
}
else
{
Node *whereclause;
/* Preprocess the HAVING clause fully */
whereclause = preprocess_expression(root, copyObject(havingclause),
EXPRKIND_QUAL);
/* ... and put a copy in WHERE */
parse->jointree->quals = (Node *)
list_concat((List *) parse->jointree->quals,
(List *) whereclause);
/* ... and also keep it in HAVING */
newHaving = lappend(newHaving, havingclause);
}
}
parse->havingQual = (Node *) newHaving;
...
}
在开始之前,先看一下相关的语法:
// src/backend/parser/gram.y
having_clause:
HAVING a_expr { $$ = $2; }
| /*EMPTY*/ { $$ = NULL; }
;
下面开始调试,如下:
subquery_planner(PlannerGlobal * glob, Query * parse, PlannerInfo * parent_root, _Bool hasRecursion, double tuple_fraction, SetOperationStmt * setops)
standard_planner(Query * parse, const char * query_string, int cursorOptions, ParamListInfo boundParams)
planner(Query * parse, const char * query_string, int cursorOptions, ParamListInfo boundParams)
pg_plan_query(Query * querytree, const char * query_string, int cursorOptions, ParamListInfo boundParams)
standard_ExplainOneQuery(Query * query, int cursorOptions, IntoClause * into, ExplainState * es, const char * queryString, ParamListInfo params, QueryEnvironment * queryEnv)
ExplainOneQuery(Query * query, int cursorOptions, IntoClause * into, ExplainState * es, ParseState * pstate, ParamListInfo params)
ExplainQuery(ParseState * pstate, ExplainStmt * stmt, ParamListInfo params, DestReceiver * dest)
standard_ProcessUtility(PlannedStmt * pstmt, const char * queryString, _Bool readOnlyTree, ProcessUtilityContext context, ParamListInfo params, QueryEnvironment * queryEnv, DestReceiver * dest, QueryCompletion * qc)
ProcessUtility(PlannedStmt * pstmt, const char * queryString, _Bool readOnlyTree, ProcessUtilityContext context, ParamListInfo params, QueryEnvironment * queryEnv, DestReceiver * dest, QueryCompletion * qc)
PortalRunUtility(Portal portal, PlannedStmt * pstmt, _Bool isTopLevel, _Bool setHoldSnapshot, DestReceiver * dest, QueryCompletion * qc)
FillPortalStore(Portal portal, _Bool isTopLevel)
PortalRun(Portal portal, long count, _Bool isTopLevel, DestReceiver * dest, DestReceiver * altdest, QueryCompletion * qc)
exec_simple_query(const char * query_string)
...
第一步:预处理
首先在如下的预处理中处理 AND 子句的参数,如下:
simplify_and_arguments(List * args, eval_const_expressions_context * context, _Bool * haveNull, _Bool * forceFalse)
eval_const_expressions_mutator(Node * node, eval_const_expressions_context * context)
eval_const_expressions(PlannerInfo * root, Node * node)
preprocess_expression(PlannerInfo * root, Node * expr, int kind)
subquery_planner(PlannerGlobal * glob, Query * parse, PlannerInfo * parent_root, _Bool hasRecursion, double tuple_fraction, SetOperationStmt * setops)
standard_planner(Query * parse, const char * query_string, int cursorOptions, ParamListInfo boundParams)
planner(Query * parse, const char * query_string, int cursorOptions, ParamListInfo boundParams)
...
于是新的expr就是一个只有两个参数的AND_EXPR
!在预处理的最后,需要额外的转换 如下:
...
/*
* If it's a qual or havingQual, convert it to implicit-AND format. (We
* don't want to do this before eval_const_expressions, since the latter
* would be unable to simplify a top-level AND correctly. Also,
* SS_process_sublinks expects explicit-AND format.)
*
* 如果是 qual 或 havingQual,则将其转换为隐式 AND 格式。
* (我们不想在 eval_const_expressions 之前执行此操作,因为后者无法正确简化顶级 AND。
* 此外,SS_process_sublinks 需要显式 AND 格式。)
*/
if (kind == EXPRKIND_QUAL)
expr = (Node *) make_ands_implicit((Expr *) expr);
return expr;
...
如上,也就是说在对parse->havingQual
进行预处理之后,原来的布尔表达式直接简化成了参数list!
第二步:开始处理having子句
如上,SUM(degree) >= 90
被继续留在了having子句中!因为这种情况下:当 HAVING 包含聚合函数、易变函数(volatile functions)或子查询等情况时,我们无法直接将其移到 WHERE,因为这样会改变查询的语义。
/*
包含聚合函数:如果 HAVING 子句中的条件包含聚合函数(比如 COUNT()、SUM() 等),则无法将其转移到 WHERE,因为聚合操作是在分组后的数据上执行的,WHERE 只能操作原始行。
包含易变函数(volatile functions):易变函数是指每次调用可能产生不同结果的函数(例如 random()),这些函数不能在 WHERE 中使用,因为它们的行为可能会随着查询的执行过程而变化,导致不正确的结果。
包含子查询(subplans):如果条件中包含子查询(subplans),就无法将该条件转移到 WHERE,因为子查询可能依赖于 HAVING 的分组信息。
涉及分组集的列且列为空:如果查询使用了分组集(groupingSets),且条件引用了这些分组集中的空值列(nullable columns),将条件从 HAVING 移到 WHERE 可能会改变查询的结果。分组集的列可能会在某些情况下为空,因此需要小心处理。
*/
如上,cno > 3
就可以被移动到where条件parse->jointree->quals
中!因为该 HAVING 子句没有包含上面提到的复杂操作,且存在显式的分组(groupClause),那么可以安全地将其移到 WHERE 子句中。
/*
有显式分组的查询(移到 WHERE 中): 如果查询存在显式分组,则可以将 HAVING 子句的条件移到 WHERE 中。
通过这种方式,可以在数据聚合前就进行过滤,从而减少处理的数据量。
对于每个条件,先使用 preprocess_expression 进行完全预处理,然后将条件移到 WHERE 子句。
*/
下面看一下这两个SQL,如下:
postgres=# \d+ test_having
Table "public.test_having"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+--------------+-----------+----------+---------+----------+-------------+--------------+-------------
a | integer | | | | plain | | |
b | integer | | | | plain | | |
c | character(8) | | | | extended | | |
d | character(1) | | | | extended | | |
Access method: heap
postgres=# table test_having;
a | b | c | d
---+---+----------+---
0 | 1 | XXXX | A
1 | 2 | AAAA | b
2 | 2 | AAAA | c
3 | 3 | BBBB | D
4 | 3 | BBBB | e
5 | 3 | bbbb | F
6 | 4 | cccc | g
7 | 4 | cccc | h
8 | 4 | CCCC | I
9 | 4 | CCCC | j
(10 rows)
postgres=#
postgres=# explain SELECT 1 AS one FROM test_having HAVING 1 < 2; ## sql 1
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.01 rows=1 width=4)
(1 row)
postgres=# explain SELECT 1 AS one FROM test_having HAVING 1 > 2; ## sql 2
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.01 rows=1 width=4)
One-Time Filter: false
(2 rows)
postgres=#
如上单个表达式1 < 2
在make_ands_implicit
最后被处理成NIL list
。
同样 类似永真的例子 如下:
在simplify_and_arguments
简化之后,newargs
为空 于是就返回一个常量真,如下:
如上的AND_EXPR
表达式是永真的,首先被简化成一个常量真 最后同样被make_ands_implicit
最后处理成NIL list
。也就是parse->havingQual
经过预处理之后为NIL
。
下面看一下sql2的调试,如下:
如上单个表达式1 > 2
永假,无法被消除。于是parse->havingQual
就是一个含有永假的list1,后续处理 如下:
非上面两种情况,于是就既将 HAVING 条件移到 WHERE 也保留在 HAVING!