We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Ranking and Indexing Search
Finding matching documents is only half the job. Users expect results ranked by relevance, with highlighted snippets showing why each result matched. This lesson covers ranking functions, result highlighting, field weighting, and GIN indexes for fast full-text search.
Ranking with ts_rank()
ts_rank() scores how well a document matches a query. Higher scores mean better matches:
SELECT
title,
ts_rank(search_vector, to_tsquery('english', 'postgresql')) AS rank
FROM articles
WHERE search_vector @@ to_tsquery('english', 'postgresql')
ORDER BY rank DESC;
The score is influenced by:
- Term frequency -- how often the search terms appear in the document.
- Proximity -- how close the terms are to each other.
- Document length -- optionally normalized (see below).
Normalization Options
By default, longer documents score higher simply because they contain more words. You can normalize by document length:
SELECT
title,
ts_rank(search_vector, query, 32) AS rank -- 32 = divide by document length
FROM articles,
to_tsquery('english', 'performance') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;
Normalization flags (can be combined by adding):
| Flag | Meaning |
|---|---|
| 0 | Default (no normalization) |
| 1 | Divide rank by 1 + log(document length) |
| 2 | Divide rank by document length |
| 4 | Divide by mean harmonic distance between extents |
| 8 | Divide by number of unique words in document |
| 16 | Divide by 1 + log(unique words) |
| 32 | Divide by itself + 1 |
The most common choice is 32 (self-normalization) or 2 (length normalization).
ts_rank_cd() -- Cover Density Ranking
ts_rank_cd() uses a different algorithm that considers how close matching terms are to each other. It tends to produce better results for multi-word queries:
SELECT
title,
ts_rank_cd(search_vector, plainto_tsquery('english', 'query optimization')) AS rank
FROM articles
WHERE search_vector @@ plainto_tsquery('english', 'query optimization')
ORDER BY rank DESC;
Cover density ranking rewards documents where the search terms appear near each other, which usually indicates higher relevance.
Highlighting with ts_headline()
ts_headline() returns a snippet of the original text with matching terms highlighted:
SELECT
title,
ts_headline(
'english',
body,
to_tsquery('english', 'index'),
'StartSel=<b>, StopSel=</b>, MaxWords=35, MinWords=15, MaxFragments=2'
) AS snippet
FROM articles
WHERE search_vector @@ to_tsquery('english', 'index')
ORDER BY ts_rank(search_vector, to_tsquery('english', 'index')) DESC;
ts_headline Options
| Option | Default | Description |
|---|---|---|
StartSel |
<b> |
Opening highlight marker |
StopSel |
</b> |
Closing highlight marker |
MaxWords |
35 | Maximum words in a fragment |
MinWords |
15 | Minimum words in a fragment |
MaxFragments |
0 | Number of fragments (0 = use whole document) |
ShortWord |
3 | Minimum word length to consider |
FragmentDelimiter |
" ... " |
Text between fragments |
HighlightAll |
FALSE | Highlight all occurrences |
Customized Highlighting
-- Markdown-style highlighting
SELECT
title,
ts_headline(
'english',
body,
plainto_tsquery('english', 'postgresql'),
'StartSel=**, StopSel=**, MaxFragments=1, MaxWords=25'
) AS snippet
FROM articles
WHERE search_vector @@ plainto_tsquery('english', 'postgresql');
Performance Note
ts_headline() is relatively expensive because it re-processes the original text. Always filter with @@ first to limit the number of rows that need highlighting:
-- Good: filter first, then highlight only matching rows
SELECT title, ts_headline('english', body, query) AS snippet
FROM articles, plainto_tsquery('english', 'search') AS query
WHERE search_vector @@ query;
-- Bad: computing headlines on every row
SELECT title, ts_headline('english', body, query) AS snippet
FROM articles, plainto_tsquery('english', 'search') AS query;
Field Weighting with setweight()
In most search applications, a match in the title should rank higher than a match in the body. Use setweight() to assign importance levels:
| Weight | Typical Use | Default Multiplier |
|---|---|---|
| A | Title, headings | 1.0 |
| B | Abstract, tags | 0.4 |
| C | Body text | 0.2 |
| D | Everything else | 0.1 |
Building a Weighted Tsvector
-- Combine weighted fields
SELECT
setweight(to_tsvector('english', title), 'A') ||
setweight(to_tsvector('english', COALESCE(category, '')), 'B') ||
setweight(to_tsvector('english', body), 'C')
AS weighted_vector
FROM articles
LIMIT 1;
Searching with Weights
SELECT
title,
ts_rank(
setweight(to_tsvector('english', title), 'A') ||
setweight(to_tsvector('english', body), 'C'),
query
) AS rank
FROM articles,
plainto_tsquery('english', 'postgresql') AS query
WHERE (
setweight(to_tsvector('english', title), 'A') ||
setweight(to_tsvector('english', body), 'C')
) @@ query
ORDER BY rank DESC;
A match in the title (weight A) now scores roughly 5x higher than a match in the body (weight C).
Pre-Computing Weighted Vectors
For production use, store the weighted vector in a column:
-- Update the search_vector to include weights
UPDATE articles
SET search_vector =
setweight(to_tsvector('english', title), 'A') ||
setweight(to_tsvector('english', COALESCE(category, '')), 'B') ||
setweight(to_tsvector('english', body), 'C');
Then queries become simple and fast:
SELECT title,
ts_rank(search_vector, query) AS rank
FROM articles,
plainto_tsquery('english', 'postgresql') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;
Custom Weight Values
You can override the default weight multipliers:
-- {D, C, B, A} weights
SELECT ts_rank(
'{0.1, 0.2, 0.4, 1.0}', -- custom weights for D, C, B, A
search_vector,
to_tsquery('english', 'postgresql')
) AS rank
FROM articles;
GIN Indexes for Full-Text Search
Without an index, every full-text search query scans the entire table. For tables with thousands or millions of rows, you need a GIN index:
-- Create a GIN index on the pre-computed search_vector
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);
Index on an Expression
If you do not have a pre-computed column, you can index the expression directly:
CREATE INDEX idx_articles_body_search
ON articles
USING GIN (to_tsvector('english', body));
When using an expression index, your query must match the expression exactly:
-- This uses the index (matches the expression)
SELECT * FROM articles
WHERE to_tsvector('english', body) @@ to_tsquery('english', 'search');
-- This does NOT use the index (different expression)
SELECT * FROM articles
WHERE to_tsvector('simple', body) @@ to_tsquery('simple', 'search');
GIN vs. GiST
PostgreSQL offers two index types for full-text search:
| Feature | GIN | GiST |
|---|---|---|
| Build speed | Slower | Faster |
| Query speed | Faster | Slower |
| Index size | Larger | Smaller |
| Update speed | Slower (use fastupdate) | Faster |
| Best for | Read-heavy workloads | Write-heavy workloads |
GIN is the right choice for most applications. Use GiST only when you have extremely high write rates and can tolerate slower searches.
Checking Index Usage
Verify your index is being used with EXPLAIN:
EXPLAIN SELECT title
FROM articles
WHERE search_vector @@ to_tsquery('english', 'postgresql');
Look for Bitmap Index Scan on idx_articles_search in the output.
Putting It All Together: A Search API
Here is a complete search query combining everything from this lesson:
SELECT
title,
category,
ts_rank(search_vector, query, 32) AS relevance,
ts_headline(
'english',
body,
query,
'StartSel=<b>, StopSel=</b>, MaxWords=30, MaxFragments=2'
) AS snippet
FROM
articles,
websearch_to_tsquery('english', 'postgresql performance') AS query
WHERE
search_vector @@ query
ORDER BY
relevance DESC
LIMIT 10;
This gives you:
- Matching articles filtered by
@@. - Relevance ranking with
ts_rank()(normalized). - Highlighted snippets with
ts_headline(). - Clean, user-friendly search syntax with
websearch_to_tsquery().
Combining Full-Text Search with Other Filters
Full-text search works naturally with other WHERE conditions:
-- Search within a specific category
SELECT title, ts_rank(search_vector, query) AS rank
FROM articles,
plainto_tsquery('english', 'optimization') AS query
WHERE search_vector @@ query
AND category = 'Performance'
ORDER BY rank DESC;
-- Search recent articles only
SELECT title, published_at, ts_rank(search_vector, query) AS rank
FROM articles,
plainto_tsquery('english', 'postgresql') AS query
WHERE search_vector @@ query
AND published_at >= '2024-01-01'
ORDER BY rank DESC;
Try It Yourself
Run these queries against the articles table:
- Search for articles matching "performance" and rank them by relevance. Which article scores highest?
- Generate search snippets for articles matching "index" using
ts_headline(). Customize the highlight markers. - Compare
ts_rankandts_rank_cdscores for the query "query optimization". Do the rankings differ? - Build a weighted search vector that gives title matches 10x the importance of body matches, then search for "postgresql".
Summary
| Function | Purpose |
|---|---|
ts_rank(vector, query [, norm]) |
Score document relevance (frequency-based) |
ts_rank_cd(vector, query [, norm]) |
Score document relevance (cover density) |
ts_headline(config, text, query, opts) |
Generate highlighted snippet |
setweight(vector, 'A'|'B'|'C'|'D') |
Assign importance weight to lexemes |
CREATE INDEX ... USING GIN(vector) |
Create a fast full-text search index |
| Concept | Recommendation |
|---|---|
| Index type | GIN for most workloads, GiST for write-heavy |
| Normalization | Use flag 32 for general-purpose ranking |
| Field weighting | A for title, B for tags/category, C for body |
| User-facing search | Use websearch_to_tsquery() for forgiving syntax |
| Highlighting | Always filter with @@ before calling ts_headline() |