PostgreSQL的学习心得和知识总结（一百七十）|深入理解PostgreSQL数据库之处理HAVING子句的使用和实现

目录结构

注：提前言明本文借鉴了以下博主、书籍或网站的内容，其列表如下：

1、参考书籍：《PostgreSQL数据库内核分析》
2、参考书籍：《数据库事务处理的艺术：事务管理与并发控制》
3、PostgreSQL数据库仓库链接，点击前往
4、日本著名PostgreSQL数据库专家铃木启修网站主页，点击前往
5、参考书籍：《PostgreSQL中文手册》
6、参考书籍：《PostgreSQL指南：内幕探索》，点击前往

1、本文内容全部来源于开源社区 GitHub和以上博主的贡献，本文也免费开源（可能会存在问题，评论区等待大佬们的指正）
2、本文目的：开源共享抛砖引玉一起学习
3、本文不提供任何资源不存在任何交易与任何组织和机构无关
4、大家可以根据需要自行复制粘贴以及作为其他个人用途，但是不允许转载不允许商用（写作不易，还请见谅 💖）
5、本文内容基于PostgreSQL master源码开发而成

深入理解PostgreSQL数据库之处理HAVING子句的使用和实现

文章快速说明索引
HAVING 子句使用
HAVING 子句优化

文章快速说明索引

学习目标：

做数据库内核开发久了就会有一种少年得志，年少轻狂的错觉，然鹅细细一品觉得自己其实不算特别优秀远远没有达到自己想要的。也许光鲜的表面掩盖了空洞的内在，每每想到于此，皆有夜半临渊如履薄冰之感。为了睡上几个踏实觉，即日起暂缓其他基于PostgreSQL数据库的兼容功能开发，近段时间将着重于学习分享Postgres的基础知识和实践内幕。

学习内容：（详见目录）

1、处理HAVING子句的使用和实现

学习时间：

2025年02月25日 20:19:52

学习产出：

1、PostgreSQL数据库基础知识回顾 1个
2、CSDN 技术博客 1篇
3、PostgreSQL数据库内核深入学习

注：下面我们所有的学习环境是Centos8+PostgreSQL master+Oracle19C+MySQL8.0

postgres=# select version();
                                     version                                     
---------------------------------------------------------------------------------
 PostgreSQL 18devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.1.0, 64-bit
(1 row)

postgres=#

#-----------------------------------------------------------------------------#

SQL> select * from v$version;          

BANNER        Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production	
BANNER_FULL	  Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production Version 19.17.0.0.0	
BANNER_LEGACY Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production	
CON_ID 0


#-----------------------------------------------------------------------------#

mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.27    |
+-----------+
1 row in set (0.06 sec)

mysql>

HAVING 子句使用

一、在 PostgreSQL 中，HAVING 子句用于过滤分组后的结果，通常与 GROUP BY 一起使用。它的作用是对分组后的数据进行条件筛选，类似于 WHERE 子句，但 WHERE 是在分组前过滤行，而 HAVING 是在分组后过滤组。使用场景：

当你需要对分组后的结果进行条件筛选时，使用 HAVING。
HAVING 通常与聚合函数（如 COUNT、SUM、AVG 等）一起使用。

postgres=# CREATE TABLE orders (
postgres(#     order_id SERIAL PRIMARY KEY,
postgres(#     customer_id INT NOT NULL,
postgres(#     amount NUMERIC NOT NULL
postgres(# );
CREATE TABLE
postgres=# 
postgres=# INSERT INTO orders (customer_id, amount) VALUES
postgres-# (101, 100),
postgres-# (102, 200),
postgres-# (101, 150),
postgres-# (103, 300),
postgres-# (102, 250),
postgres-# (101, 50),
postgres-# (103, 400);
INSERT 0 7
postgres=# table orders;
 order_id | customer_id | amount 
----------+-------------+--------
        1 |         101 |    100
        2 |         102 |    200
        3 |         101 |    150
        4 |         103 |    300
        5 |         102 |    250
        6 |         101 |     50
        7 |         103 |    400
(7 rows)

postgres=#
postgres=# SELECT customer_id, SUM(amount) AS total_amount, COUNT(order_id) AS order_count
postgres-# FROM orders
postgres-# GROUP BY customer_id;
 customer_id | total_amount | order_count 
-------------+--------------+-------------
         101 |          300 |           3
         103 |          700 |           2
         102 |          450 |           2
(3 rows)

postgres=#

-- 示例 1：筛选总金额大于 300 且 订单数大于 1 的客户
postgres=# SELECT customer_id, SUM(amount) AS total_amount, COUNT(order_id) AS order_count
postgres-# FROM orders
postgres-# GROUP BY customer_id
postgres-# HAVING SUM(amount) > 300 AND COUNT(order_id) > 1;
 customer_id | total_amount | order_count 
-------------+--------------+-------------
         103 |          700 |           2
         102 |          450 |           2
(2 rows)

-- 示例 2：筛选总金额大于 300 或 订单数大于 2 的客户
postgres=# SELECT customer_id, SUM(amount) AS total_amount, COUNT(order_id) AS order_count
postgres-# FROM orders
postgres-# GROUP BY customer_id
postgres-# HAVING SUM(amount) > 300 OR COUNT(order_id) > 2;
 customer_id | total_amount | order_count 
-------------+--------------+-------------
         101 |          300 |           3
         103 |          700 |           2
         102 |          450 |           2
(3 rows)

-- 示例 3：筛选总金额大于 200 且 平均金额小于 250 的客户
postgres=# SELECT customer_id, SUM(amount) AS total_amount, AVG(amount) AS avg_amount
postgres-# FROM orders
postgres-# GROUP BY customer_id
postgres-# HAVING SUM(amount) > 200 AND AVG(amount) < 250;
 customer_id | total_amount |      avg_amount      
-------------+--------------+----------------------
         101 |          300 | 100.0000000000000000
         102 |          450 | 225.0000000000000000
(2 rows)

postgres=#

二、在 PostgreSQL 中，HAVING 子句通常是和 GROUP BY 一起使用的，用于对分组后的结果进行过滤。但是，HAVING 也可以不搭配 GROUP BY 使用，不过这种情况非常少见，且只有在特定场景下才有意义。

postgres=# SELECT SUM(amount) AS total_amount FROM orders;
 total_amount 
--------------
         1450
(1 row)

postgres=#

-- 查询所有订单的总金额是否大于 1000 如果总金额大于 1000，则返回结果：
postgres=# SELECT SUM(amount) AS total_amount
postgres-# FROM orders
postgres-# HAVING SUM(amount) > 1000;
 total_amount 
--------------
         1450
(1 row)

-- 如果总金额不大于 1000，则不返回任何结果：
postgres=# SELECT SUM(amount) AS total_amount
FROM orders
HAVING SUM(amount) > 10000;
 total_amount 
--------------
(0 rows)

postgres=#

如果查询中没有使用聚合函数，直接使用 HAVING 会报错。例如：

postgres=# SELECT customer_id
postgres-# FROM orders
postgres-# HAVING customer_id = 101;
2025-02-20 07:05:27.035 PST [246800] ERROR:  column "orders.customer_id" must appear in the GROUP BY clause or be used in an aggregate function at character 8
2025-02-20 07:05:27.035 PST [246800] STATEMENT:  SELECT customer_id
        FROM orders
        HAVING customer_id = 101;
ERROR:  column "orders.customer_id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT customer_id
               ^
postgres=#

对比 HAVING 和 WHERE：WHERE 用于在分组前过滤行；HAVING 用于在分组后过滤组（或聚合结果）：

postgres=# SELECT SUM(amount) AS total_amount, COUNT(order_id) AS order_count FROM orders;
 total_amount | order_count 
--------------+-------------
         1450 |           7
(1 row)

postgres=#

-- 查询所有订单的总金额是否大于 1000，并且订单数大于 5：
postgres=# SELECT SUM(amount) AS total_amount, COUNT(order_id) AS order_count
postgres-# FROM orders
postgres-# HAVING SUM(amount) > 1000 AND COUNT(order_id) > 5;
 total_amount | order_count 
--------------+-------------
         1450 |           7
(1 row)

postgres=#

HAVING 子句优化

[postgres@localhost:~/test/bin]$ cat /home/postgres/zhangshujie_op/1.sql
create table student(sno int primary key, sname varchar(10), ssex int);
create table course(cno int primary key, cname varchar(10), tno int);
create table score(sno int, cno int, degree int);
create table teacher(tno int primary key, tname varchar(10), tsex int);

insert into student values(5, 'zs', 1), (3, 'ls', 0), (2, 'ww', 1), (4, 'zl', 1), (1, 'lq', 0);
insert into course values(1, 'English', 2), (2, 'Math', 5), (3, 'Data', 3), (4, 'Design', 5), (5, 'Phys', 6);
insert into score values(2, 1, 60), (3, 2, 50), (1, 3, 80), (1, 5, 90), (4, 4, 85), (3, 3, 99), (5, 1, 78);
insert into teacher values(1, 'Jim', 1), (2, 'Tom', 0), (3, 'Lucy', 1), (4, 'Dadge', 0), (5, 'Benny', 1);
[postgres@localhost:~/test/bin]$

postgres=# \i /home/postgres/zhangshujie_op/1.sql 
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
INSERT 0 5
INSERT 0 5
INSERT 0 7
INSERT 0 5
postgres=# 
postgres=# \d+ score 
                                          Table "public.score"
 Column |  Type   | Collation | Nullable | Default | Storage | Compression | Stats target | Description 
--------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
 sno    | integer |           |          |         | plain   |             |              | 
 cno    | integer |           |          |         | plain   |             |              | 
 degree | integer |           |          |         | plain   |             |              | 
Access method: heap

postgres=# table score ;
 sno | cno | degree 
-----+-----+--------
   2 |   1 |     60
   3 |   2 |     50
   1 |   3 |     80
   1 |   5 |     90
   4 |   4 |     85
   3 |   3 |     99
   5 |   1 |     78
(7 rows)

postgres=#

-- sno > 0 条件过滤，中间结果如下：
postgres=# select sum(degree), sno, cno from score where sno > 0 group by sno, cno;
 sum | sno | cno 
-----+-----+-----
  99 |   3 |   3
  90 |   1 |   5
  80 |   1 |   3
  85 |   4 |   4
  78 |   5 |   1
  50 |   3 |   2
  60 |   2 |   1
(7 rows)

-- sno > 0 and cno > 3 条件过滤，中间结果如下：
postgres=# select sum(degree), sno, cno from score where sno > 0 and cno > 3 group by sno, cno;
 sum | sno | cno 
-----+-----+-----
  90 |   1 |   5
  85 |   4 |   4
(2 rows)

postgres=#

-- 在下面情况下能够提高查询的效率，因为我们可以在分组和聚合之前减少数据量。

-- 可以假定是 7选1
postgres=# select sum(degree), sno, cno from score where sno > 0 group by sno, cno having sum(degree) >= 90 and cno > 3;
 sum | sno | cno 
-----+-----+-----
  90 |   1 |   5
(1 row)

-- 二选一
postgres=# select sum(degree), sno, cno from score where sno > 0 and cno > 3 group by sno, cno having sum(degree) >= 90;
 sum | sno | cno 
-----+-----+-----
  90 |   1 |   5
(1 row)

-- 两者的执行计划，如下：
postgres=# explain select sum(degree), sno, cno from score where sno > 0 group by sno, cno having sum(degree) >= 90 and cno > 3;
                          QUERY PLAN                           
---------------------------------------------------------------
 HashAggregate  (cost=42.30..44.06 rows=47 width=16)
   Group Key: sno, cno
   Filter: (sum(degree) >= 90)
   ->  Seq Scan on score  (cost=0.00..40.60 rows=227 width=12)
         Filter: ((sno > 0) AND (cno > 3))
(5 rows)

postgres=# explain select sum(degree), sno, cno from score where sno > 0 and cno > 3 group by sno, cno having sum(degree) >= 90;
                          QUERY PLAN                           
---------------------------------------------------------------
 HashAggregate  (cost=42.30..44.06 rows=47 width=16)
   Group Key: sno, cno
   Filter: (sum(degree) >= 90)
   ->  Seq Scan on score  (cost=0.00..40.60 rows=227 width=12)
         Filter: ((sno > 0) AND (cno > 3))
(5 rows)

postgres=#

如上，在 Having 子句中，有些约束条件是可以转变为过滤条件的，这里对 Having 子句中的约束条件进行了拆分。从上面的示例可以看出， cno＞3 这个约束条件己经变成了 score 表扫描路径上的过滤条件，而SUM(degree) > 90这个约束条件则保留在了原来的位置。

deepseek的解析，如下：

/*
从如上的示例和查询计划中，可以看出 PostgreSQL 在处理 `HAVING` 子句时，对约束条件进行了优化和拆分。
具体来说，PostgreSQL 会将 `HAVING` 子句中的部分约束条件提前到 `WHERE` 子句或表扫描阶段进行处理，以减少后续聚合操作的数据量。
以下是对该查询计划的分析：

### 查询语句

explain select sum(degree), sno, cno 
from score 
where sno > 0 
group by sno, cno 
having sum(degree) >= 90 and cno > 3;

### 查询计划分析
1. **Seq Scan on score**：
   - 这是查询的第一步，PostgreSQL 对 `score` 表进行顺序扫描（Seq Scan）。
   - 在扫描过程中，PostgreSQL 应用了 `WHERE` 子句中的过滤条件 `(sno > 0) AND (cno > 3)`，这意味着只有满足 `sno > 0` 且 `cno > 3` 的行才会被保留。
   - 这一步的目的是减少数据量，避免在后续的聚合操作中处理不必要的数据。

2. **HashAggregate**：
   - 在过滤后的数据上，PostgreSQL 使用 `HashAggregate` 进行聚合操作。
   - 聚合的键是 `sno` 和 `cno`，即按照这两个字段进行分组。
   - 在聚合过程中，PostgreSQL 计算每个组的 `sum(degree)`。

3. **Filter: (sum(degree) > 90)**：
   - 在聚合完成后，PostgreSQL 应用 `HAVING` 子句中的条件 `sum(degree) > 90`，过滤掉不满足条件的组。
   - 注意，`cno > 3` 这个条件已经在表扫描阶段被处理了，因此在 `HAVING` 子句中不再需要重复处理。

### 约束条件的拆分与优化
- **`cno > 3`**：
  - 这个条件是一个简单的过滤条件，可以在表扫描阶段直接应用，因为它不依赖于聚合结果。
  - PostgreSQL 将其提前到 `WHERE` 子句中，减少了后续聚合操作的数据量。

- **`sum(degree) > 90`**：
  - 这个条件依赖于聚合函数的结果，因此必须在聚合完成后才能应用。
  - PostgreSQL 将其保留在 `HAVING` 子句中，作为聚合后的过滤条件。

### 总结
- PostgreSQL 对 `HAVING` 子句中的约束条件进行了优化拆分，将可以在表扫描阶段处理的约束条件（如 `cno > 3`）提前到 `WHERE` 子句中，以减少数据量。
- 依赖于聚合结果的约束条件（如 `sum(degree) > 90`）则保留在 `HAVING` 子句中，作为聚合后的过滤条件。
- 这种优化策略可以提高查询性能，减少不必要的计算和内存消耗。

如果你对这部分源代码的逻辑感兴趣，可以进一步研究 PostgreSQL 的查询优化器（planner）和执行器（executor）的实现细节，特别是与 `HAVING` 子句和聚合操作相关的部分。
*/

这块内容在张树杰书中是3.6章，但是讲解比较简单。接下来我们今天学习一下这块的源码实现：

// src/backend/optimizer/plan/planner.c

/*--------------------
 * subquery_planner
 *	  Invokes the planner on a subquery.  We recurse to here for each
 *	  sub-SELECT found in the query tree.
 *	  在子查询上调用规划器。我们对查询树中找到的每个子 SELECT 递归到这里。
 *
 * glob is the global state for the current planner run.
 * parse is the querytree produced by the parser & rewriter.
 * parent_root is the immediate parent Query's info (NULL at the top level).
 * hasRecursion is true if this is a recursive WITH query.
 * tuple_fraction is the fraction of tuples we expect will be retrieved.
 * tuple_fraction is interpreted as explained for grouping_planner, below.
 * setops is used for set operation subqueries to provide the subquery with
 * the context in which it's being used so that Paths correctly sorted for the
 * set operation can be generated.  NULL when not planning a set operation
 * child, or when a child of a set op that isn't interested in sorted input.
 * 
 * glob 是当前规划器运行的全局状态。
 * parse 是解析器和重写器生成的查询树。
 * parent_root 是直接父查询的信息（顶层为 NULL）。
 * 如果这是递归 WITH 查询，则 hasRecursion 为真。
 * tuple_fraction 是我们预期将检索到的元组的比例。tuple_fraction 的解释如下 grouping_planner 中所述。
 * setops 用于集合操作子查询，为子查询提供其使用上下文，以便可以生成正确排序的集合操作路径。当不计划集合操作子项时，或者当集合操作的子项对排序输入不感兴趣时，为 NULL。
 *
 * Basically, this routine does the stuff that should only be done once
 * per Query object.  It then calls grouping_planner.  At one time,
 * grouping_planner could be invoked recursively on the same Query object;
 * that's not currently true, but we keep the separation between the two
 * routines anyway, in case we need it again someday.
 * 基本上，此例程执行每个查询对象只应执行一次的操作。
 * 然后它调用 grouping_planner。
 * 曾经，grouping_planner 可以在同一个 Query 对象上递归调用；
 * 目前还不是这样，但我们无论如何都会保持这两个例程之间的分离，以防将来有一天我们再次需要它。
 *
 * subquery_planner will be called recursively to handle sub-Query nodes
 * found within the query's expressions and rangetable.
 * subquery_planner 将以递归方式调用，以处理在查询的表达式和范围表中找到的子查询节点。
 *
 * Returns the PlannerInfo struct ("root") that contains all data generated
 * while planning the subquery.  In particular, the Path(s) attached to
 * the (UPPERREL_FINAL, NULL) upperrel represent our conclusions about the
 * cheapest way(s) to implement the query.  The top level will select the
 * best Path and pass it through createplan.c to produce a finished Plan.
 * 
 * 返回 PlannerInfo 结构（“root”），其中包含在规划子查询时生成的所有数据。
 * 特别是，附加到 (UPPERREL_FINAL, NULL) upperrel 的路径代表我们关于实现查询的最便宜方法的结论。
 * 顶层将选择最佳路径并将其传递给 createplan.c 以生成完成的计划。
 *--------------------
 */
PlannerInfo *
subquery_planner(PlannerGlobal *glob, Query *parse, PlannerInfo *parent_root,
				 bool hasRecursion, double tuple_fraction,
				 SetOperationStmt *setops)
{
	...
	parse->havingQual = preprocess_expression(root, parse->havingQual,
										 	  EXPRKIND_QUAL);
	...
	/*
	 * In some cases we may want to transfer a HAVING clause into WHERE. We
	 * cannot do so if the HAVING clause contains aggregates (obviously) or
	 * volatile functions (since a HAVING clause is supposed to be executed
	 * only once per group).  We also can't do this if there are any nonempty
	 * grouping sets and the clause references any columns that are nullable
	 * by the grouping sets; moving such a clause into WHERE would potentially
	 * change the results.  (If there are only empty grouping sets, then the
	 * HAVING clause must be degenerate as discussed below.)
	 * 在某些情况下，我们可能希望将 HAVING 子句转移到 WHERE 中。
	 * 如果 HAVING 子句包含聚合（显然）或易失性函数（因为 HAVING 子句应该每个组只执行一次），我们就不能这样做。
	 * 如果存在任何非空分组集并且子句引用分组集可空的任何列，我们也不能这样做；将这样的子句移入 WHERE 可能会改变结果。
	 * （如果只有空分组集，则 HAVING 子句必须退化，如下所述。）
	 *
	 * Also, it may be that the clause is so expensive to execute that we're
	 * better off doing it only once per group, despite the loss of
	 * selectivity.  This is hard to estimate short of doing the entire
	 * planning process twice, so we use a heuristic: clauses containing
	 * subplans are left in HAVING.  Otherwise, we move or copy the HAVING
	 * clause into WHERE, in hopes of eliminating tuples before aggregation
	 * instead of after.
	 * 此外，该子句的执行成本可能非常高，因此尽管选择性会下降，我们还是最好只对每个组执行一次。
	 * 除非将整个规划过程执行两次，否则很难估计，因此我们使用启发式方法：包含子计划的子句留在 HAVING 中。
	 * 否则，我们将 HAVING 子句移动或复制到 WHERE 中，希望在聚合之前而不是之后消除元组。
	 *
	 * If the query has explicit grouping then we can simply move such a
	 * clause into WHERE; any group that fails the clause will not be in the
	 * output because none of its tuples will reach the grouping or
	 * aggregation stage.  Otherwise we must have a degenerate (variable-free)
	 * HAVING clause, which we put in WHERE so that query_planner() can use it
	 * in a gating Result node, but also keep in HAVING to ensure that we
	 * don't emit a bogus aggregated row. (This could be done better, but it
	 * seems not worth optimizing.)
	 * 如果查询具有显式分组，那么我们可以简单地将这样的子句移动到 WHERE 中；
	 * 任何不符合该子句的组都不会出现在输出中，因为它的任何元组都不会进入分组或聚合阶段。
	 * 否则，我们必须有一个退化的（无变量的）HAVING 子句，我们将其放入 WHERE 中，以便 query_planner() 可以在门控结果节点中使用它，但也保留在 HAVING 中以确保我们不会发出虚假的聚合行。 
	 * （这可以做得更好，但似乎不值得优化。）
	 *
	 * Note that a HAVING clause may contain expressions that are not fully
	 * preprocessed.  This can happen if these expressions are part of
	 * grouping items.  In such cases, they are replaced with GROUP Vars in
	 * the parser and then replaced back after we've done with expression
	 * preprocessing on havingQual.  This is not an issue if the clause
	 * remains in HAVING, because these expressions will be matched to lower
	 * target items in setrefs.c.  However, if the clause is moved or copied
	 * into WHERE, we need to ensure that these expressions are fully
	 * preprocessed.
	 * 请注意，HAVING 子句可能包含未完全预处理的表达式。
	 * 如果这些表达式是分组项的一部分，则可能会发生这种情况。
	 * 在这种情况下，它们将在解析器中被 GROUP Vars 替换，然后在我们完成对 havingQual 的表达式预处理后再替换回来。
	 * 如果子句保留在 HAVING 中，这不是问题，因为这些表达式将与 setrefs.c 中的较低目标项匹配。
	 * 但是，如果子句被移动或复制到 WHERE，我们需要确保这些表达式经过完全预处理。
	 *
	 * Note that both havingQual and parse->jointree->quals are in
	 * implicitly-ANDed-list form at this point, even though they are declared
	 * as Node *.
	 * 请注意，此时 havingQual 和 parse->jointree->quals 都采用隐式 ANDed 列表形式，即使它们被声明为 Node *。
	 */
	newHaving = NIL;
	foreach(l, (List *) parse->havingQual)
	{
		Node	   *havingclause = (Node *) lfirst(l);

		if (contain_agg_clause(havingclause) ||
			contain_volatile_functions(havingclause) ||
			contain_subplans(havingclause) ||
			(parse->groupClause && parse->groupingSets &&
			 bms_is_member(root->group_rtindex, pull_varnos(root, havingclause))))
		{
			/* keep it in HAVING */
			newHaving = lappend(newHaving, havingclause);
		}
		else if (parse->groupClause)
		{
			Node	   *whereclause;

			/* Preprocess the HAVING clause fully */
			whereclause = preprocess_expression(root, havingclause,
												EXPRKIND_QUAL);
			/* ... and move it to WHERE */
			parse->jointree->quals = (Node *)
				list_concat((List *) parse->jointree->quals,
							(List *) whereclause);
		}
		else
		{
			Node	   *whereclause;

			/* Preprocess the HAVING clause fully */
			whereclause = preprocess_expression(root, copyObject(havingclause),
												EXPRKIND_QUAL);
			/* ... and put a copy in WHERE */
			parse->jointree->quals = (Node *)
				list_concat((List *) parse->jointree->quals,
							(List *) whereclause);
			/* ... and also keep it in HAVING */
			newHaving = lappend(newHaving, havingclause);
		}
	}
	parse->havingQual = (Node *) newHaving;
	...
}

在开始之前，先看一下相关的语法：

// src/backend/parser/gram.y

having_clause:
			HAVING a_expr							{ $$ = $2; }
			| /*EMPTY*/								{ $$ = NULL; }
		;

下面开始调试，如下：

subquery_planner(PlannerGlobal * glob, Query * parse, PlannerInfo * parent_root, _Bool hasRecursion, double tuple_fraction, SetOperationStmt * setops) 
standard_planner(Query * parse, const char * query_string, int cursorOptions, ParamListInfo boundParams)
planner(Query * parse, const char * query_string, int cursorOptions, ParamListInfo boundParams) 
pg_plan_query(Query * querytree, const char * query_string, int cursorOptions, ParamListInfo boundParams)
standard_ExplainOneQuery(Query * query, int cursorOptions, IntoClause * into, ExplainState * es, const char * queryString, ParamListInfo params, QueryEnvironment * queryEnv)
ExplainOneQuery(Query * query, int cursorOptions, IntoClause * into, ExplainState * es, ParseState * pstate, ParamListInfo params)
ExplainQuery(ParseState * pstate, ExplainStmt * stmt, ParamListInfo params, DestReceiver * dest) 
standard_ProcessUtility(PlannedStmt * pstmt, const char * queryString, _Bool readOnlyTree, ProcessUtilityContext context, ParamListInfo params, QueryEnvironment * queryEnv, DestReceiver * dest, QueryCompletion * qc)
ProcessUtility(PlannedStmt * pstmt, const char * queryString, _Bool readOnlyTree, ProcessUtilityContext context, ParamListInfo params, QueryEnvironment * queryEnv, DestReceiver * dest, QueryCompletion * qc)
PortalRunUtility(Portal portal, PlannedStmt * pstmt, _Bool isTopLevel, _Bool setHoldSnapshot, DestReceiver * dest, QueryCompletion * qc)
FillPortalStore(Portal portal, _Bool isTopLevel)
PortalRun(Portal portal, long count, _Bool isTopLevel, DestReceiver * dest, DestReceiver * altdest, QueryCompletion * qc)
exec_simple_query(const char * query_string)
...

第一步：预处理

在这里插入图片描述

首先在如下的预处理中处理 AND 子句的参数，如下：

simplify_and_arguments(List * args, eval_const_expressions_context * context, _Bool * haveNull, _Bool * forceFalse)
eval_const_expressions_mutator(Node * node, eval_const_expressions_context * context) 
eval_const_expressions(PlannerInfo * root, Node * node)
preprocess_expression(PlannerInfo * root, Node * expr, int kind)
subquery_planner(PlannerGlobal * glob, Query * parse, PlannerInfo * parent_root, _Bool hasRecursion, double tuple_fraction, SetOperationStmt * setops) 
standard_planner(Query * parse, const char * query_string, int cursorOptions, ParamListInfo boundParams) 
planner(Query * parse, const char * query_string, int cursorOptions, ParamListInfo boundParams) 
...

在这里插入图片描述

于是新的expr就是一个只有两个参数的AND_EXPR！在预处理的最后，需要额外的转换如下：

...
	/*
	 * If it's a qual or havingQual, convert it to implicit-AND format. (We
	 * don't want to do this before eval_const_expressions, since the latter
	 * would be unable to simplify a top-level AND correctly. Also,
	 * SS_process_sublinks expects explicit-AND format.)
	 * 
	 * 如果是 qual 或 havingQual，则将其转换为隐式 AND 格式。
	 * （我们不想在 eval_const_expressions 之前执行此操作，因为后者无法正确简化顶级 AND。
	 * 此外，SS_process_sublinks 需要显式 AND 格式。）
	 */
	if (kind == EXPRKIND_QUAL)
		expr = (Node *) make_ands_implicit((Expr *) expr);

	return expr;
...

在这里插入图片描述

如上，也就是说在对parse->havingQual进行预处理之后，原来的布尔表达式直接简化成了参数list！

第二步：开始处理having子句

在这里插入图片描述

如上，SUM(degree) >= 90被继续留在了having子句中！因为这种情况下：当 HAVING 包含聚合函数、易变函数（volatile functions）或子查询等情况时，我们无法直接将其移到 WHERE，因为这样会改变查询的语义。

/*
包含聚合函数：如果 HAVING 子句中的条件包含聚合函数（比如 COUNT()、SUM() 等），则无法将其转移到 WHERE，因为聚合操作是在分组后的数据上执行的，WHERE 只能操作原始行。

包含易变函数（volatile functions）：易变函数是指每次调用可能产生不同结果的函数（例如 random()），这些函数不能在 WHERE 中使用，因为它们的行为可能会随着查询的执行过程而变化，导致不正确的结果。

包含子查询（subplans）：如果条件中包含子查询（subplans），就无法将该条件转移到 WHERE，因为子查询可能依赖于 HAVING 的分组信息。

涉及分组集的列且列为空：如果查询使用了分组集（groupingSets），且条件引用了这些分组集中的空值列（nullable columns），将条件从 HAVING 移到 WHERE 可能会改变查询的结果。分组集的列可能会在某些情况下为空，因此需要小心处理。
*/

在这里插入图片描述

如上，cno > 3就可以被移动到where条件parse->jointree->quals中！因为该 HAVING 子句没有包含上面提到的复杂操作，且存在显式的分组（groupClause），那么可以安全地将其移到 WHERE 子句中。

/*
有显式分组的查询（移到 WHERE 中）： 如果查询存在显式分组，则可以将 HAVING 子句的条件移到 WHERE 中。
通过这种方式，可以在数据聚合前就进行过滤，从而减少处理的数据量。
对于每个条件，先使用 preprocess_expression 进行完全预处理，然后将条件移到 WHERE 子句。
*/

下面看一下这两个SQL，如下：

postgres=# \d+ test_having
                                          Table "public.test_having"
 Column |     Type     | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
--------+--------------+-----------+----------+---------+----------+-------------+--------------+-------------
 a      | integer      |           |          |         | plain    |             |              | 
 b      | integer      |           |          |         | plain    |             |              | 
 c      | character(8) |           |          |         | extended |             |              | 
 d      | character(1) |           |          |         | extended |             |              | 
Access method: heap

postgres=# table test_having;
 a | b |    c     | d 
---+---+----------+---
 0 | 1 | XXXX     | A
 1 | 2 | AAAA     | b
 2 | 2 | AAAA     | c
 3 | 3 | BBBB     | D
 4 | 3 | BBBB     | e
 5 | 3 | bbbb     | F
 6 | 4 | cccc     | g
 7 | 4 | cccc     | h
 8 | 4 | CCCC     | I
 9 | 4 | CCCC     | j
(10 rows)

postgres=#
postgres=# explain SELECT 1 AS one FROM test_having HAVING 1 < 2; ## sql 1
                QUERY PLAN                
------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=4)
(1 row)

postgres=# explain SELECT 1 AS one FROM test_having HAVING 1 > 2; ## sql 2
                QUERY PLAN                
------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=4)
   One-Time Filter: false
(2 rows)

postgres=#