We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
FILTER and Conditional Aggregation
Standard GROUP BY with SUM() and COUNT() is powerful, but sometimes you need to aggregate different subsets of data in a single query. PostgreSQL's FILTER clause and conditional aggregation techniques let you compute multiple conditional summaries in one pass through the data.
The FILTER Clause
The FILTER clause restricts which rows are fed into an aggregate function:
aggregate_function(expression) FILTER (WHERE condition)
It is logically equivalent to aggregating only the rows that match the condition, but you can apply different filters to different aggregates in the same query.
Basic Example
SELECT
COUNT(*) AS total_sales,
COUNT(*) FILTER (WHERE region = 'North') AS north_sales,
COUNT(*) FILTER (WHERE region = 'South') AS south_sales,
COUNT(*) FILTER (WHERE region = 'East') AS east_sales,
COUNT(*) FILTER (WHERE region = 'West') AS west_sales
FROM sales;
This computes five counts in one table scan -- far more efficient than running five separate queries.
SUM with FILTER
SELECT
SUM(amount) AS total_revenue,
SUM(amount) FILTER (WHERE payment_method = 'credit_card') AS credit_card_revenue,
SUM(amount) FILTER (WHERE payment_method = 'cash') AS cash_revenue,
SUM(amount) FILTER (WHERE payment_method = 'bank_transfer') AS bank_transfer_revenue
FROM sales;
AVG with FILTER
SELECT
region,
AVG(amount) AS avg_all,
AVG(amount) FILTER (WHERE amount > 100) AS avg_large_sales,
AVG(amount) FILTER (WHERE amount <= 100) AS avg_small_sales
FROM sales
GROUP BY region
ORDER BY region;
FILTER with Date Conditions
SELECT
COUNT(*) FILTER (WHERE sale_date >= '2025-01-01' AND sale_date < '2025-04-01') AS q1_count,
COUNT(*) FILTER (WHERE sale_date >= '2025-04-01' AND sale_date < '2025-07-01') AS q2_count,
SUM(amount) FILTER (WHERE sale_date >= '2025-01-01' AND sale_date < '2025-04-01') AS q1_revenue,
SUM(amount) FILTER (WHERE sale_date >= '2025-04-01' AND sale_date < '2025-07-01') AS q2_revenue
FROM sales;
Conditional Aggregation with CASE
Before FILTER was added (PostgreSQL 9.4), the standard technique used CASE inside aggregate functions. You will still see this pattern in older code and in databases that do not support FILTER:
SELECT
COUNT(CASE WHEN region = 'North' THEN 1 END) AS north_sales,
COUNT(CASE WHEN region = 'South' THEN 1 END) AS south_sales,
SUM(CASE WHEN payment_method = 'credit_card' THEN amount ELSE 0 END) AS credit_revenue,
SUM(CASE WHEN payment_method = 'cash' THEN amount ELSE 0 END) AS cash_revenue
FROM sales;
FILTER vs. CASE Comparison
-- FILTER (cleaner)
COUNT(*) FILTER (WHERE region = 'North')
-- CASE equivalent (more verbose)
COUNT(CASE WHEN region = 'North' THEN 1 END)
-- FILTER with SUM
SUM(amount) FILTER (WHERE region = 'North')
-- CASE with SUM (need ELSE 0 to avoid NULL)
SUM(CASE WHEN region = 'North' THEN amount ELSE 0 END)
The FILTER syntax is more readable and less error-prone. Use it when you can.
Boolean Aggregates: bool_and() and bool_or()
These aggregates work on boolean expressions:
SELECT
department,
bool_and(is_active) AS all_active, -- TRUE if every employee is active
bool_or(is_active) AS any_active, -- TRUE if at least one is active
COUNT(*) FILTER (WHERE is_active) AS active_count
FROM employees
GROUP BY department;
| department | all_active | any_active | active_count |
|---|---|---|---|
| Engineering | true | true | 3 |
| Marketing | false | true | 1 |
| Sales | true | true | 2 |
Practical Uses
-- Check if all orders for each customer are completed
SELECT
customer_name,
bool_and(status = 'completed') AS all_completed,
bool_or(status = 'pending') AS has_pending
FROM orders
GROUP BY customer_name;
array_agg() -- Collect Values into Arrays
array_agg() gathers values from multiple rows into a PostgreSQL array:
SELECT
department,
array_agg(name) AS employees
FROM employees
GROUP BY department;
| department | employees |
|---|---|
| Engineering | {Alice,Carlos,Diana} |
| Marketing | {Bob,Eve} |
With DISTINCT and ORDER BY
SELECT
region,
array_agg(DISTINCT payment_method ORDER BY payment_method) AS payment_methods_used
FROM sales
GROUP BY region;
| region | payment_methods_used |
|---|---|
| North | {bank_transfer,cash,credit_card} |
| South | {cash,credit_card} |
Filtering NULL values
-- Exclude NULLs from the array
SELECT array_agg(name) FILTER (WHERE name IS NOT NULL)
FROM employees;
string_agg() -- Concatenate Values
string_agg() joins text values with a delimiter:
SELECT
department,
string_agg(name, ', ' ORDER BY name) AS employee_list
FROM employees
GROUP BY department;
| department | employee_list |
|---|---|
| Engineering | Alice, Carlos, Diana |
| Marketing | Bob, Eve |
Building CSV or Readable Lists
-- Comma-separated list of unique regions
SELECT string_agg(DISTINCT region, ', ' ORDER BY region) AS regions
FROM sales;
-- Result: 'East, North, South, West'
-- Newline-separated for reports
SELECT string_agg(name || ': $' || salary::TEXT, E'\n' ORDER BY salary DESC)
FROM employees;
string_agg with FILTER
SELECT
department,
string_agg(name, ', ') FILTER (WHERE is_active) AS active_employees,
string_agg(name, ', ') FILTER (WHERE NOT is_active) AS inactive_employees
FROM employees
GROUP BY department;
Combining Techniques
These techniques are most powerful when combined:
SELECT
region,
COUNT(*) AS total_sales,
SUM(amount) AS total_revenue,
ROUND(AVG(amount), 2) AS avg_sale,
COUNT(*) FILTER (WHERE amount > 200) AS large_sales,
ROUND(
100.0 * COUNT(*) FILTER (WHERE payment_method = 'credit_card') / COUNT(*),
1
) AS credit_card_pct,
string_agg(DISTINCT payment_method, ', ' ORDER BY payment_method) AS methods_used,
MIN(sale_date) AS first_sale,
MAX(sale_date) AS last_sale
FROM sales
GROUP BY region
ORDER BY total_revenue DESC;
Pivot-Style Reports
Use FILTER to create a pivot table without a PIVOT keyword:
SELECT
customer_name,
COUNT(*) FILTER (WHERE status = 'completed') AS completed,
COUNT(*) FILTER (WHERE status = 'pending') AS pending,
COUNT(*) FILTER (WHERE status = 'shipped') AS shipped,
COUNT(*) FILTER (WHERE status = 'cancelled') AS cancelled,
SUM(total_price) AS total_spent
FROM orders
GROUP BY customer_name
ORDER BY total_spent DESC;
Try It Yourself
Run these queries against the sandbox tables:
-
For each
payment_methodinsales, count how many sales had an amount over 100 and how many had an amount of 100 or less. Use FILTER. -
Create a summary of the
orderstable that shows, for eachstatus, the count, total value, and a comma-separated list of customer names. -
For each
regioninsales, calculate what percentage of total revenue came from each payment method. -
Use
array_agg()to get a list of all distinct products ordered by each customer in theorderstable.
Summary
| Technique | Syntax | Purpose |
|---|---|---|
| FILTER | agg(x) FILTER (WHERE cond) |
Conditional aggregation (clean syntax) |
| CASE aggregation | SUM(CASE WHEN ... THEN x END) |
Conditional aggregation (portable) |
| bool_and | bool_and(expr) |
TRUE if all rows are TRUE |
| bool_or | bool_or(expr) |
TRUE if any row is TRUE |
| array_agg | array_agg(x ORDER BY y) |
Collect values into an array |
| string_agg | string_agg(x, delim ORDER BY y) |
Concatenate values with a delimiter |
| DISTINCT in agg | array_agg(DISTINCT x) |
Remove duplicates within aggregation |