We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Correlated Subqueries
In the previous lessons, the subqueries we wrote were independent -- they could run on their own without any information from the outer query. A correlated subquery is different: it references columns from the outer query, making it dependent on the current row being processed.
Independent vs Correlated Subqueries
Independent (Non-Correlated)
-- The subquery does NOT reference the outer query
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-- This subquery runs ONCE, independently
The inner query computes the overall average salary, returns a single number, and the outer query uses it. The subquery has no knowledge of which employee row is being processed.
Correlated
-- The subquery DOES reference the outer query (e.department)
SELECT name, department, salary
FROM employees e
WHERE salary > (
SELECT AVG(salary)
FROM employees e2
WHERE e2.department = e.department -- references outer query!
);
This finds employees who earn more than the average salary in their own department. The subquery must be re-evaluated for each row because the department changes from row to row.
How Correlated Subqueries Execute
Outer query processes row: Alice (Engineering, 95000)
Inner query runs: SELECT AVG(salary) FROM employees WHERE department = 'Engineering'
Result: 90750
Is 95000 > 90750? YES --> include Alice
Outer query processes row: Bob (Marketing, 65000)
Inner query runs: SELECT AVG(salary) FROM employees WHERE department = 'Marketing'
Result: 60000
Is 65000 > 60000? YES --> include Bob
Outer query processes row: Charlie (Engineering, 80000)
Inner query runs: SELECT AVG(salary) FROM employees WHERE department = 'Engineering'
Result: 90750
Is 80000 > 90750? NO --> exclude Charlie
... and so on for every row
Correlated Subqueries in the SELECT List
You can use correlated subqueries to compute a value for each row:
SELECT
name,
department,
salary,
(SELECT AVG(salary) FROM employees e2
WHERE e2.department = e.department) AS dept_avg_salary
FROM employees e
ORDER BY department, salary DESC;
This adds a column showing the average salary for each employee's department. The subquery runs once per row, computing the average for that row's specific department.
Counting Related Rows
SELECT
name,
department,
(SELECT COUNT(*) FROM employees e2
WHERE e2.department = e.department
AND e2.id != e.id) AS coworker_count
FROM employees e
ORDER BY coworker_count DESC;
For each employee, this counts how many other employees are in the same department (excluding themselves).
Correlated Subqueries with Orders
Find customers whose total order value exceeds 100:
SELECT DISTINCT customer_name
FROM orders o1
WHERE (
SELECT SUM(total_price)
FROM orders o2
WHERE o2.customer_name = o1.customer_name
) > 100;
Show each order alongside the customer's order count:
SELECT
id,
customer_name,
total_price,
(SELECT COUNT(*)
FROM orders o2
WHERE o2.customer_name = o1.customer_name) AS customer_order_count
FROM orders o1
ORDER BY customer_name, id;
Correlated Subqueries in WHERE with Comparison Operators
Finding the Highest Salary Per Department
SELECT name, department, salary
FROM employees e
WHERE salary = (
SELECT MAX(salary)
FROM employees e2
WHERE e2.department = e.department
);
This returns the highest-paid employee in each department. The subquery computes the MAX salary for the current employee's department.
Finding the Most Recent Order Per Customer
SELECT id, customer_name, order_date, total_price
FROM orders o1
WHERE order_date = (
SELECT MAX(order_date)
FROM orders o2
WHERE o2.customer_name = o1.customer_name
);
Correlated Subqueries vs JOINs
Many correlated subqueries can be rewritten as JOINs. The JOIN version is often more efficient because it avoids re-executing the subquery for every row.
Correlated Subquery Version
SELECT name, department, salary
FROM employees e
WHERE salary > (
SELECT AVG(salary)
FROM employees e2
WHERE e2.department = e.department
);
Equivalent JOIN Version
SELECT e.name, e.department, e.salary
FROM employees e
INNER JOIN (
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
) dept_avg
ON e.department = dept_avg.department
WHERE e.salary > dept_avg.avg_salary;
The JOIN version computes all department averages in a single pass, then joins the results. The correlated subquery computes the average repeatedly for each row.
When to Use Each
| Use Correlated Subquery When | Use JOIN When |
|---|---|
| Logic is clearer as a subquery | Performance matters on large tables |
| Checking existence (EXISTS) | You need columns from both "tables" |
| One-off calculations per row | The same aggregation is used multiple times |
| The result set is small | The result set is large |
Performance Considerations
Correlated subqueries can be slow on large tables because the inner query runs once per outer row. For a table with 1,000,000 rows, the inner query could execute 1,000,000 times.
PostgreSQL's query optimizer can sometimes transform correlated subqueries into JOINs internally, but it is not guaranteed. If you notice slow performance, consider rewriting as a JOIN.
Try It Yourself
-
For each employee, display their name, salary, and the average salary of their department. Use a correlated subquery in the SELECT list.
-
Find employees who have the lowest salary in their department using a correlated subquery.
-
Rewrite the query from exercise 2 using a JOIN with a derived table (subquery in the FROM clause) instead of a correlated subquery. Verify you get the same results.
-
For each order, show the order id, customer_name, total_price, and that customer's maximum order total_price using a correlated subquery.