Ranking and Indexing Search

Finding matching documents is only half the job. Users expect results ranked by relevance, with highlighted snippets showing why each result matched. This lesson covers ranking functions, result highlighting, field weighting, and GIN indexes for fast full-text search.

Ranking with ts_rank()

ts_rank() scores how well a document matches a query. Higher scores mean better matches:

SELECT
  title,
  ts_rank(search_vector, to_tsquery('english', 'postgresql')) AS rank
FROM articles
WHERE search_vector @@ to_tsquery('english', 'postgresql')
ORDER BY rank DESC;

The score is influenced by:

Term frequency -- how often the search terms appear in the document.
Proximity -- how close the terms are to each other.
Document length -- optionally normalized (see below).

Normalization Options

By default, longer documents score higher simply because they contain more words. You can normalize by document length:

SELECT
  title,
  ts_rank(search_vector, query, 32) AS rank  -- 32 = divide by document length
FROM articles,
     to_tsquery('english', 'performance') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;

Normalization flags (can be combined by adding):

Flag	Meaning
0	Default (no normalization)
1	Divide rank by 1 + log(document length)
2	Divide rank by document length
4	Divide by mean harmonic distance between extents
8	Divide by number of unique words in document
16	Divide by 1 + log(unique words)
32	Divide by itself + 1

The most common choice is 32 (self-normalization) or 2 (length normalization).

ts_rank_cd() -- Cover Density Ranking

ts_rank_cd() uses a different algorithm that considers how close matching terms are to each other. It tends to produce better results for multi-word queries:

SELECT
  title,
  ts_rank_cd(search_vector, plainto_tsquery('english', 'query optimization')) AS rank
FROM articles
WHERE search_vector @@ plainto_tsquery('english', 'query optimization')
ORDER BY rank DESC;

Cover density ranking rewards documents where the search terms appear near each other, which usually indicates higher relevance.

Highlighting with ts_headline()

ts_headline() returns a snippet of the original text with matching terms highlighted:

SELECT
  title,
  ts_headline(
    'english',
    body,
    to_tsquery('english', 'index'),
    'StartSel=<b>, StopSel=</b>, MaxWords=35, MinWords=15, MaxFragments=2'
  ) AS snippet
FROM articles
WHERE search_vector @@ to_tsquery('english', 'index')
ORDER BY ts_rank(search_vector, to_tsquery('english', 'index')) DESC;

ts_headline Options

Option	Default	Description
`StartSel`	`<b>`	Opening highlight marker
`StopSel`	`</b>`	Closing highlight marker
`MaxWords`	35	Maximum words in a fragment
`MinWords`	15	Minimum words in a fragment
`MaxFragments`	0	Number of fragments (0 = use whole document)
`ShortWord`	3	Minimum word length to consider
`FragmentDelimiter`	`" ... "`	Text between fragments
`HighlightAll`	FALSE	Highlight all occurrences

Customized Highlighting

-- Markdown-style highlighting
SELECT
  title,
  ts_headline(
    'english',
    body,
    plainto_tsquery('english', 'postgresql'),
    'StartSel=**, StopSel=**, MaxFragments=1, MaxWords=25'
  ) AS snippet
FROM articles
WHERE search_vector @@ plainto_tsquery('english', 'postgresql');

Performance Note

ts_headline() is relatively expensive because it re-processes the original text. Always filter with @@ first to limit the number of rows that need highlighting:

-- Good: filter first, then highlight only matching rows
SELECT title, ts_headline('english', body, query) AS snippet
FROM articles, plainto_tsquery('english', 'search') AS query
WHERE search_vector @@ query;

-- Bad: computing headlines on every row
SELECT title, ts_headline('english', body, query) AS snippet
FROM articles, plainto_tsquery('english', 'search') AS query;

Field Weighting with setweight()

In most search applications, a match in the title should rank higher than a match in the body. Use setweight() to assign importance levels:

Weight	Typical Use	Default Multiplier
A	Title, headings	1.0
B	Abstract, tags	0.4
C	Body text	0.2
D	Everything else	0.1

Building a Weighted Tsvector

-- Combine weighted fields
SELECT
  setweight(to_tsvector('english', title), 'A') ||
  setweight(to_tsvector('english', COALESCE(category, '')), 'B') ||
  setweight(to_tsvector('english', body), 'C')
  AS weighted_vector
FROM articles
LIMIT 1;

Searching with Weights

SELECT
  title,
  ts_rank(
    setweight(to_tsvector('english', title), 'A') ||
    setweight(to_tsvector('english', body), 'C'),
    query
  ) AS rank
FROM articles,
     plainto_tsquery('english', 'postgresql') AS query
WHERE (
  setweight(to_tsvector('english', title), 'A') ||
  setweight(to_tsvector('english', body), 'C')
) @@ query
ORDER BY rank DESC;

A match in the title (weight A) now scores roughly 5x higher than a match in the body (weight C).

Pre-Computing Weighted Vectors

For production use, store the weighted vector in a column:

-- Update the search_vector to include weights
UPDATE articles
SET search_vector =
  setweight(to_tsvector('english', title), 'A') ||
  setweight(to_tsvector('english', COALESCE(category, '')), 'B') ||
  setweight(to_tsvector('english', body), 'C');

Then queries become simple and fast:

SELECT title,
       ts_rank(search_vector, query) AS rank
FROM articles,
     plainto_tsquery('english', 'postgresql') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;

Custom Weight Values

You can override the default weight multipliers:

-- {D, C, B, A} weights
SELECT ts_rank(
  '{0.1, 0.2, 0.4, 1.0}',  -- custom weights for D, C, B, A
  search_vector,
  to_tsquery('english', 'postgresql')
) AS rank
FROM articles;

GIN Indexes for Full-Text Search

Without an index, every full-text search query scans the entire table. For tables with thousands or millions of rows, you need a GIN index:

-- Create a GIN index on the pre-computed search_vector
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);

Index on an Expression

If you do not have a pre-computed column, you can index the expression directly:

CREATE INDEX idx_articles_body_search
ON articles
USING GIN (to_tsvector('english', body));

When using an expression index, your query must match the expression exactly:

-- This uses the index (matches the expression)
SELECT * FROM articles
WHERE to_tsvector('english', body) @@ to_tsquery('english', 'search');

-- This does NOT use the index (different expression)
SELECT * FROM articles
WHERE to_tsvector('simple', body) @@ to_tsquery('simple', 'search');

GIN vs. GiST

PostgreSQL offers two index types for full-text search:

Feature	GIN	GiST
Build speed	Slower	Faster
Query speed	Faster	Slower
Index size	Larger	Smaller
Update speed	Slower (use fastupdate)	Faster
Best for	Read-heavy workloads	Write-heavy workloads

GIN is the right choice for most applications. Use GiST only when you have extremely high write rates and can tolerate slower searches.

Checking Index Usage

Verify your index is being used with EXPLAIN:

EXPLAIN SELECT title
FROM articles
WHERE search_vector @@ to_tsquery('english', 'postgresql');

Look for Bitmap Index Scan on idx_articles_search in the output.

Putting It All Together: A Search API

Here is a complete search query combining everything from this lesson:

SELECT
  title,
  category,
  ts_rank(search_vector, query, 32) AS relevance,
  ts_headline(
    'english',
    body,
    query,
    'StartSel=<b>, StopSel=</b>, MaxWords=30, MaxFragments=2'
  ) AS snippet
FROM
  articles,
  websearch_to_tsquery('english', 'postgresql performance') AS query
WHERE
  search_vector @@ query
ORDER BY
  relevance DESC
LIMIT 10;

This gives you:

Matching articles filtered by @@.
Relevance ranking with ts_rank() (normalized).
Highlighted snippets with ts_headline().
Clean, user-friendly search syntax with websearch_to_tsquery().

Combining Full-Text Search with Other Filters

Full-text search works naturally with other WHERE conditions:

-- Search within a specific category
SELECT title, ts_rank(search_vector, query) AS rank
FROM articles,
     plainto_tsquery('english', 'optimization') AS query
WHERE search_vector @@ query
  AND category = 'Performance'
ORDER BY rank DESC;

-- Search recent articles only
SELECT title, published_at, ts_rank(search_vector, query) AS rank
FROM articles,
     plainto_tsquery('english', 'postgresql') AS query
WHERE search_vector @@ query
  AND published_at >= '2024-01-01'
ORDER BY rank DESC;

Try It Yourself

Run these queries against the articles table:

Search for articles matching "performance" and rank them by relevance. Which article scores highest?
Generate search snippets for articles matching "index" using ts_headline(). Customize the highlight markers.
Compare ts_rank and ts_rank_cd scores for the query "query optimization". Do the rankings differ?
Build a weighted search vector that gives title matches 10x the importance of body matches, then search for "postgresql".

Summary

Function	Purpose
`ts_rank(vector, query [, norm])`	Score document relevance (frequency-based)
`ts_rank_cd(vector, query [, norm])`	Score document relevance (cover density)
`ts_headline(config, text, query, opts)`	Generate highlighted snippet
`setweight(vector, 'A'\|'B'\|'C'\|'D')`	Assign importance weight to lexemes
`CREATE INDEX ... USING GIN(vector)`	Create a fast full-text search index

Concept	Recommendation
Index type	GIN for most workloads, GiST for write-heavy
Normalization	Use flag 32 for general-purpose ranking
Field weighting	A for title, B for tags/category, C for body
User-facing search	Use `websearch_to_tsquery()` for forgiving syntax
Highlighting	Always filter with `@@` before calling `ts_headline()`

Take Quiz Next Lesson

SQL Editor Ctrl+Enter to run

Results

Run a query to see results here

Tables: