1/3 Next

B-Tree Index Basics

As your database tables grow from hundreds to millions of rows, query performance becomes critical. Indexes are the primary tool PostgreSQL provides to keep queries fast.

What Is an Index?

Think of an index like the index at the back of a textbook. Instead of reading every page to find a topic, you look up the topic in the index, find the page number, and go directly there.

A database index works similarly. Without an index, PostgreSQL must perform a sequential scan (Seq Scan) -- reading every single row in the table to find matching ones. With an index, PostgreSQL can jump directly to the relevant rows.

B-Tree: The Default Index Type

When you create an index in PostgreSQL without specifying a type, you get a B-tree (Balanced Tree) index. B-tree indexes are the workhorse of relational databases and are effective for:

Equality comparisons: WHERE region = 'North'
Range queries: WHERE amount > 100 AND amount < 500
Sorting: ORDER BY sale_date
Min/Max lookups: SELECT MIN(amount) FROM sales
Pattern matching with prefix: WHERE customer_name LIKE 'John%'

How B-Trees Work (Conceptual)

A B-tree organizes data in a balanced, sorted tree structure:

              [M]
            /     \
        [D, H]    [R, V]
       / |  \    / |   \
    [A-C][E-G][I-L][N-Q][S-U][W-Z]

The tree has multiple levels (root, internal nodes, leaf nodes)
Data is kept sorted, so PostgreSQL can binary-search through levels
All leaf nodes are at the same depth, giving predictable O(log n) performance
Leaf nodes contain pointers to the actual table rows

Creating Indexes

Basic Syntax

CREATE INDEX index_name ON table_name (column_name);

For example, to speed up queries filtering by region:

CREATE INDEX idx_sales_region ON sales (region);

Naming Conventions

A good naming convention makes indexes self-documenting:

idx_tablename_columnname

For multiple columns:

idx_tablename_col1_col2

Unique Indexes

A unique index ensures that no two rows have the same value in the indexed column:

CREATE UNIQUE INDEX idx_employees_email ON employees (email);

This both speeds up lookups and enforces a uniqueness constraint.

Index on Multiple Columns

CREATE INDEX idx_sales_region_date ON sales (region, sale_date);

Column order matters. This index is efficient for:

WHERE region = 'North' (uses the first column)
WHERE region = 'North' AND sale_date > '2025-06-01' (uses both columns)
ORDER BY region, sale_date (uses both columns)

But it is not efficient for:

WHERE sale_date > '2025-06-01' alone (the leading column is not used)

This is called the leftmost prefix rule.

When to Create Indexes

Indexes are beneficial when:

Columns appear frequently in WHERE clauses: WHERE region = 'North'
Columns are used in JOIN conditions: ON a.id = b.foreign_id
Columns are used in ORDER BY: ORDER BY sale_date
Columns have high cardinality: Many distinct values (like email) benefit more than low cardinality (like a boolean is_active column)

When NOT to Create Indexes

Indexes come with costs:

Small tables: If the table has fewer than a few hundred rows, a sequential scan is often faster than using an index.
Write-heavy columns: Each INSERT, UPDATE, or DELETE must also update every index on the table.
Low-selectivity columns: A column with only 2-3 distinct values (like payment_method with 'cash'/'card'/'transfer') may not benefit much.
Disk space: Indexes consume storage. Large tables with many indexes can use significant disk space.

Seeing Indexes in Action

Use EXPLAIN to see whether PostgreSQL uses an index:

-- Before creating an index (likely Seq Scan on a small table)
EXPLAIN SELECT * FROM sales WHERE region = 'North';

The output shows the query plan -- how PostgreSQL will execute the query:

Seq Scan on sales  (cost=0.00..1.62 rows=10 width=100)
  Filter: (region = 'North'::text)

After creating an index:

CREATE INDEX idx_sales_region ON sales (region);
EXPLAIN SELECT * FROM sales WHERE region = 'North';

On larger tables, this might change to:

Index Scan using idx_sales_region on sales  (cost=0.14..8.16 rows=10 width=100)
  Index Cond: (region = 'North'::text)

Note: With only 50 rows in the sales table, PostgreSQL may still choose a Seq Scan because the table is small enough that scanning it is faster than using an index. This is the query planner being smart -- an index lookup has overhead that only pays off with larger tables.

Managing Indexes

Listing Indexes

-- See all indexes on a table
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'sales';

Dropping Indexes

DROP INDEX idx_sales_region;

Or safely, only if it exists:

DROP INDEX IF EXISTS idx_sales_region;

Creating Indexes Without Blocking

On production databases, creating an index locks the table against writes. Use CONCURRENTLY to avoid this:

CREATE INDEX CONCURRENTLY idx_sales_region ON sales (region);

This takes longer but allows reads and writes to continue.

Try It Yourself

Run EXPLAIN SELECT * FROM sales WHERE region = 'North'; to see the query plan. What type of scan does PostgreSQL use?
Create an index on the region column of the sales table. Then run the EXPLAIN again. Did the plan change? (On a 50-row table, it may not, and that is expected.)
Run EXPLAIN SELECT * FROM sales WHERE amount > 200 ORDER BY sale_date;. Think about which indexes could help this query.

Take Quiz Next Lesson

SQL Editor Ctrl+Enter to run

Results

Run a query to see results here

Tables: