Correlated Subqueries

In the previous lessons, the subqueries we wrote were independent -- they could run on their own without any information from the outer query. A correlated subquery is different: it references columns from the outer query, making it dependent on the current row being processed.

Independent vs Correlated Subqueries

Independent (Non-Correlated)

-- The subquery does NOT reference the outer query
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
--               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
--               This subquery runs ONCE, independently

The inner query computes the overall average salary, returns a single number, and the outer query uses it. The subquery has no knowledge of which employee row is being processed.

Correlated

-- The subquery DOES reference the outer query (e.department)
SELECT name, department, salary
FROM employees e
WHERE salary > (
  SELECT AVG(salary)
  FROM employees e2
  WHERE e2.department = e.department  -- references outer query!
);

This finds employees who earn more than the average salary in their own department. The subquery must be re-evaluated for each row because the department changes from row to row.

How Correlated Subqueries Execute

Outer query processes row: Alice (Engineering, 95000)
  Inner query runs: SELECT AVG(salary) FROM employees WHERE department = 'Engineering'
  Result: 90750
  Is 95000 > 90750? YES --> include Alice

Outer query processes row: Bob (Marketing, 65000)
  Inner query runs: SELECT AVG(salary) FROM employees WHERE department = 'Marketing'
  Result: 60000
  Is 65000 > 60000? YES --> include Bob

Outer query processes row: Charlie (Engineering, 80000)
  Inner query runs: SELECT AVG(salary) FROM employees WHERE department = 'Engineering'
  Result: 90750
  Is 80000 > 90750? NO --> exclude Charlie

... and so on for every row

Correlated Subqueries in the SELECT List

You can use correlated subqueries to compute a value for each row:

SELECT
  name,
  department,
  salary,
  (SELECT AVG(salary) FROM employees e2
   WHERE e2.department = e.department) AS dept_avg_salary
FROM employees e
ORDER BY department, salary DESC;

This adds a column showing the average salary for each employee's department. The subquery runs once per row, computing the average for that row's specific department.

Counting Related Rows

SELECT
  name,
  department,
  (SELECT COUNT(*) FROM employees e2
   WHERE e2.department = e.department
     AND e2.id != e.id) AS coworker_count
FROM employees e
ORDER BY coworker_count DESC;

For each employee, this counts how many other employees are in the same department (excluding themselves).

Correlated Subqueries with Orders

Find customers whose total order value exceeds 100:

SELECT DISTINCT customer_name
FROM orders o1
WHERE (
  SELECT SUM(total_price)
  FROM orders o2
  WHERE o2.customer_name = o1.customer_name
) > 100;

Show each order alongside the customer's order count:

SELECT
  id,
  customer_name,
  total_price,
  (SELECT COUNT(*)
   FROM orders o2
   WHERE o2.customer_name = o1.customer_name) AS customer_order_count
FROM orders o1
ORDER BY customer_name, id;

Correlated Subqueries in WHERE with Comparison Operators

Finding the Highest Salary Per Department

SELECT name, department, salary
FROM employees e
WHERE salary = (
  SELECT MAX(salary)
  FROM employees e2
  WHERE e2.department = e.department
);

This returns the highest-paid employee in each department. The subquery computes the MAX salary for the current employee's department.

Finding the Most Recent Order Per Customer

SELECT id, customer_name, order_date, total_price
FROM orders o1
WHERE order_date = (
  SELECT MAX(order_date)
  FROM orders o2
  WHERE o2.customer_name = o1.customer_name
);

Correlated Subqueries vs JOINs

Many correlated subqueries can be rewritten as JOINs. The JOIN version is often more efficient because it avoids re-executing the subquery for every row.

Correlated Subquery Version

SELECT name, department, salary
FROM employees e
WHERE salary > (
  SELECT AVG(salary)
  FROM employees e2
  WHERE e2.department = e.department
);

Equivalent JOIN Version

SELECT e.name, e.department, e.salary
FROM employees e
INNER JOIN (
  SELECT department, AVG(salary) AS avg_salary
  FROM employees
  GROUP BY department
) dept_avg
  ON e.department = dept_avg.department
WHERE e.salary > dept_avg.avg_salary;

The JOIN version computes all department averages in a single pass, then joins the results. The correlated subquery computes the average repeatedly for each row.

When to Use Each

Use Correlated Subquery When Use JOIN When
Logic is clearer as a subquery Performance matters on large tables
Checking existence (EXISTS) You need columns from both "tables"
One-off calculations per row The same aggregation is used multiple times
The result set is small The result set is large

Performance Considerations

Correlated subqueries can be slow on large tables because the inner query runs once per outer row. For a table with 1,000,000 rows, the inner query could execute 1,000,000 times.

PostgreSQL's query optimizer can sometimes transform correlated subqueries into JOINs internally, but it is not guaranteed. If you notice slow performance, consider rewriting as a JOIN.

Try It Yourself

  1. For each employee, display their name, salary, and the average salary of their department. Use a correlated subquery in the SELECT list.

  2. Find employees who have the lowest salary in their department using a correlated subquery.

  3. Rewrite the query from exercise 2 using a JOIN with a derived table (subquery in the FROM clause) instead of a correlated subquery. Verify you get the same results.

  4. For each order, show the order id, customer_name, total_price, and that customer's maximum order total_price using a correlated subquery.

Take Quiz Next Lesson
SQL Editor Ctrl+Enter to run
Results
Run a query to see results here
Tables: