We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
B-Tree Index Basics
As your database tables grow from hundreds to millions of rows, query performance becomes critical. Indexes are the primary tool PostgreSQL provides to keep queries fast.
What Is an Index?
Think of an index like the index at the back of a textbook. Instead of reading every page to find a topic, you look up the topic in the index, find the page number, and go directly there.
A database index works similarly. Without an index, PostgreSQL must perform a sequential scan (Seq Scan) -- reading every single row in the table to find matching ones. With an index, PostgreSQL can jump directly to the relevant rows.
B-Tree: The Default Index Type
When you create an index in PostgreSQL without specifying a type, you get a B-tree (Balanced Tree) index. B-tree indexes are the workhorse of relational databases and are effective for:
- Equality comparisons:
WHERE region = 'North' - Range queries:
WHERE amount > 100 AND amount < 500 - Sorting:
ORDER BY sale_date - Min/Max lookups:
SELECT MIN(amount) FROM sales - Pattern matching with prefix:
WHERE customer_name LIKE 'John%'
How B-Trees Work (Conceptual)
A B-tree organizes data in a balanced, sorted tree structure:
[M]
/ \
[D, H] [R, V]
/ | \ / | \
[A-C][E-G][I-L][N-Q][S-U][W-Z]
- The tree has multiple levels (root, internal nodes, leaf nodes)
- Data is kept sorted, so PostgreSQL can binary-search through levels
- All leaf nodes are at the same depth, giving predictable O(log n) performance
- Leaf nodes contain pointers to the actual table rows
Creating Indexes
Basic Syntax
CREATE INDEX index_name ON table_name (column_name);
For example, to speed up queries filtering by region:
CREATE INDEX idx_sales_region ON sales (region);
Naming Conventions
A good naming convention makes indexes self-documenting:
idx_tablename_columnname
For multiple columns:
idx_tablename_col1_col2
Unique Indexes
A unique index ensures that no two rows have the same value in the indexed column:
CREATE UNIQUE INDEX idx_employees_email ON employees (email);
This both speeds up lookups and enforces a uniqueness constraint.
Index on Multiple Columns
CREATE INDEX idx_sales_region_date ON sales (region, sale_date);
Column order matters. This index is efficient for:
WHERE region = 'North'(uses the first column)WHERE region = 'North' AND sale_date > '2025-06-01'(uses both columns)ORDER BY region, sale_date(uses both columns)
But it is not efficient for:
WHERE sale_date > '2025-06-01'alone (the leading column is not used)
This is called the leftmost prefix rule.
When to Create Indexes
Indexes are beneficial when:
- Columns appear frequently in WHERE clauses:
WHERE region = 'North' - Columns are used in JOIN conditions:
ON a.id = b.foreign_id - Columns are used in ORDER BY:
ORDER BY sale_date - Columns have high cardinality: Many distinct values (like email) benefit more than low cardinality (like a boolean
is_activecolumn)
When NOT to Create Indexes
Indexes come with costs:
- Small tables: If the table has fewer than a few hundred rows, a sequential scan is often faster than using an index.
- Write-heavy columns: Each INSERT, UPDATE, or DELETE must also update every index on the table.
- Low-selectivity columns: A column with only 2-3 distinct values (like
payment_methodwith 'cash'/'card'/'transfer') may not benefit much. - Disk space: Indexes consume storage. Large tables with many indexes can use significant disk space.
Seeing Indexes in Action
Use EXPLAIN to see whether PostgreSQL uses an index:
-- Before creating an index (likely Seq Scan on a small table)
EXPLAIN SELECT * FROM sales WHERE region = 'North';
The output shows the query plan -- how PostgreSQL will execute the query:
Seq Scan on sales (cost=0.00..1.62 rows=10 width=100)
Filter: (region = 'North'::text)
After creating an index:
CREATE INDEX idx_sales_region ON sales (region);
EXPLAIN SELECT * FROM sales WHERE region = 'North';
On larger tables, this might change to:
Index Scan using idx_sales_region on sales (cost=0.14..8.16 rows=10 width=100)
Index Cond: (region = 'North'::text)
Note: With only 50 rows in the sales table, PostgreSQL may still choose a Seq Scan because the table is small enough that scanning it is faster than using an index. This is the query planner being smart -- an index lookup has overhead that only pays off with larger tables.
Managing Indexes
Listing Indexes
-- See all indexes on a table
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'sales';
Dropping Indexes
DROP INDEX idx_sales_region;
Or safely, only if it exists:
DROP INDEX IF EXISTS idx_sales_region;
Creating Indexes Without Blocking
On production databases, creating an index locks the table against writes. Use CONCURRENTLY to avoid this:
CREATE INDEX CONCURRENTLY idx_sales_region ON sales (region);
This takes longer but allows reads and writes to continue.
Try It Yourself
-
Run
EXPLAIN SELECT * FROM sales WHERE region = 'North';to see the query plan. What type of scan does PostgreSQL use? -
Create an index on the
regioncolumn of the sales table. Then run the EXPLAIN again. Did the plan change? (On a 50-row table, it may not, and that is expected.) -
Run
EXPLAIN SELECT * FROM sales WHERE amount > 200 ORDER BY sale_date;. Think about which indexes could help this query.