We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Basic CTEs (Common Table Expressions)
As your SQL queries grow more complex, they can become difficult to read and maintain. Common Table Expressions (CTEs) are a way to write cleaner, more modular SQL by breaking a complex query into named steps.
What Is a CTE?
A CTE is a temporary named result set that you define at the beginning of a query using the WITH keyword. You can then reference it by name in the main query, just like a table.
WITH cte_name AS (
SELECT ...
)
SELECT *
FROM cte_name;
Think of a CTE as creating a temporary "view" that only exists for the duration of that single query.
Your First CTE
Let's find high-earning employees and display them with their department info. Without a CTE, you might write a subquery in the FROM clause:
-- Without CTE (subquery in FROM)
SELECT *
FROM (
SELECT name, department, salary
FROM employees
WHERE salary > 75000
) AS high_earners;
With a CTE, this becomes more readable:
-- With CTE
WITH high_earners AS (
SELECT name, department, salary
FROM employees
WHERE salary > 75000
)
SELECT *
FROM high_earners
ORDER BY salary DESC;
The CTE version reads top-to-bottom: first define what "high_earners" means, then use it.
CTEs vs Subqueries
Readability
CTEs give names to intermediate results, making the query self-documenting:
-- Hard to read: nested subqueries
SELECT department, avg_salary
FROM (
SELECT department, AVG(salary) AS avg_salary
FROM (
SELECT * FROM employees WHERE is_active = true
) active_emps
GROUP BY department
) dept_avgs
WHERE avg_salary > 70000;
-- Easy to read: sequential CTEs
WITH active_employees AS (
SELECT * FROM employees WHERE is_active = true
),
department_averages AS (
SELECT
department,
AVG(salary) AS avg_salary
FROM active_employees
GROUP BY department
)
SELECT department, avg_salary
FROM department_averages
WHERE avg_salary > 70000;
Reusability
A CTE can be referenced multiple times in the main query, avoiding duplication:
WITH dept_stats AS (
SELECT
department,
AVG(salary) AS avg_salary,
COUNT(*) AS emp_count
FROM employees
GROUP BY department
)
SELECT
d1.department,
d1.avg_salary,
d1.emp_count,
(SELECT AVG(avg_salary) FROM dept_stats) AS overall_dept_avg
FROM dept_stats d1
ORDER BY d1.avg_salary DESC;
The dept_stats CTE is used twice: once in the main FROM clause and once in a scalar subquery. Without the CTE, you would have to repeat the same GROUP BY query.
Multiple CTEs
You can define multiple CTEs in a single WITH clause, separated by commas. Each CTE can reference any CTE defined before it:
WITH
-- Step 1: Get active employees
active_employees AS (
SELECT id, name, department, salary
FROM employees
WHERE is_active = true
),
-- Step 2: Calculate department stats (references step 1)
department_stats AS (
SELECT
department,
COUNT(*) AS active_count,
ROUND(AVG(salary)) AS avg_salary,
MAX(salary) AS max_salary
FROM active_employees
GROUP BY department
),
-- Step 3: Find high-performing departments (references step 2)
high_performing_depts AS (
SELECT department
FROM department_stats
WHERE active_count >= 2 AND avg_salary > 70000
)
-- Final query: uses multiple CTEs
SELECT
ae.name,
ae.department,
ae.salary,
ds.avg_salary AS dept_avg
FROM active_employees ae
JOIN department_stats ds ON ae.department = ds.department
WHERE ae.department IN (SELECT department FROM high_performing_depts)
ORDER BY ae.department, ae.salary DESC;
This query is complex, but each step is clearly named and easy to understand on its own.
CTE with Orders and Order Items
Let's analyze order data by building up the query in steps:
WITH
-- Step 1: Calculate the total value and item count for each order
order_summaries AS (
SELECT
o.id AS order_id,
o.customer_name,
o.order_date,
o.status,
COUNT(oi.id) AS item_count,
SUM(oi.quantity * oi.unit_price) AS calculated_total
FROM orders o
LEFT JOIN order_items oi ON o.id = oi.order_id
GROUP BY o.id, o.customer_name, o.order_date, o.status
),
-- Step 2: Calculate per-customer statistics
customer_stats AS (
SELECT
customer_name,
COUNT(*) AS total_orders,
ROUND(AVG(calculated_total), 2) AS avg_order_value,
MAX(calculated_total) AS largest_order
FROM order_summaries
GROUP BY customer_name
)
SELECT
cs.customer_name,
cs.total_orders,
cs.avg_order_value,
cs.largest_order
FROM customer_stats cs
ORDER BY cs.avg_order_value DESC;
CTEs for Filtering in Stages
CTEs excel when you need to apply filters in stages:
WITH
-- First, find orders from 2024
recent_orders AS (
SELECT * FROM orders
WHERE order_date >= '2024-01-01'
),
-- Then find completed recent orders
completed_recent AS (
SELECT * FROM recent_orders
WHERE status = 'completed'
),
-- Then calculate stats
stats AS (
SELECT
customer_name,
COUNT(*) AS order_count,
SUM(total_price) AS total_spent
FROM completed_recent
GROUP BY customer_name
)
SELECT * FROM stats
WHERE total_spent > 50
ORDER BY total_spent DESC;
Each CTE narrows the data further, making the logic transparent.
CTE Naming Conventions
Use descriptive, clear names for your CTEs:
-- Good names:
WITH active_employees AS (...)
WITH monthly_revenue AS (...)
WITH top_customers AS (...)
-- Poor names:
WITH t1 AS (...)
WITH sub AS (...)
WITH data AS (...)
Good CTE names make the query readable even without looking at the CTE definitions.
Try It Yourself
-
Write a query with a CTE called
engineering_teamthat selects all employees in the Engineering department. Then, in the main query, find the highest and lowest salary in that team. -
Create a query with two CTEs: one that calculates the total quantity sold per product (from order_items), and another that joins this with orders to get customer information. Display the product_id, total quantity sold, and the number of distinct customers.
-
Write a query with three chained CTEs that: (a) finds all completed orders, (b) calculates total revenue per customer from those orders, (c) identifies customers who spent above average. Display the final list of above-average customers.