Ranking and Indexing Search

Finding matching documents is only half the job. Users expect results ranked by relevance, with highlighted snippets showing why each result matched. This lesson covers ranking functions, result highlighting, field weighting, and GIN indexes for fast full-text search.

Ranking with ts_rank()

ts_rank() scores how well a document matches a query. Higher scores mean better matches:

SELECT
  title,
  ts_rank(search_vector, to_tsquery('english', 'postgresql')) AS rank
FROM articles
WHERE search_vector @@ to_tsquery('english', 'postgresql')
ORDER BY rank DESC;

The score is influenced by:

  • Term frequency -- how often the search terms appear in the document.
  • Proximity -- how close the terms are to each other.
  • Document length -- optionally normalized (see below).

Normalization Options

By default, longer documents score higher simply because they contain more words. You can normalize by document length:

SELECT
  title,
  ts_rank(search_vector, query, 32) AS rank  -- 32 = divide by document length
FROM articles,
     to_tsquery('english', 'performance') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;

Normalization flags (can be combined by adding):

Flag Meaning
0 Default (no normalization)
1 Divide rank by 1 + log(document length)
2 Divide rank by document length
4 Divide by mean harmonic distance between extents
8 Divide by number of unique words in document
16 Divide by 1 + log(unique words)
32 Divide by itself + 1

The most common choice is 32 (self-normalization) or 2 (length normalization).

ts_rank_cd() -- Cover Density Ranking

ts_rank_cd() uses a different algorithm that considers how close matching terms are to each other. It tends to produce better results for multi-word queries:

SELECT
  title,
  ts_rank_cd(search_vector, plainto_tsquery('english', 'query optimization')) AS rank
FROM articles
WHERE search_vector @@ plainto_tsquery('english', 'query optimization')
ORDER BY rank DESC;

Cover density ranking rewards documents where the search terms appear near each other, which usually indicates higher relevance.

Highlighting with ts_headline()

ts_headline() returns a snippet of the original text with matching terms highlighted:

SELECT
  title,
  ts_headline(
    'english',
    body,
    to_tsquery('english', 'index'),
    'StartSel=<b>, StopSel=</b>, MaxWords=35, MinWords=15, MaxFragments=2'
  ) AS snippet
FROM articles
WHERE search_vector @@ to_tsquery('english', 'index')
ORDER BY ts_rank(search_vector, to_tsquery('english', 'index')) DESC;

ts_headline Options

Option Default Description
StartSel <b> Opening highlight marker
StopSel </b> Closing highlight marker
MaxWords 35 Maximum words in a fragment
MinWords 15 Minimum words in a fragment
MaxFragments 0 Number of fragments (0 = use whole document)
ShortWord 3 Minimum word length to consider
FragmentDelimiter " ... " Text between fragments
HighlightAll FALSE Highlight all occurrences

Customized Highlighting

-- Markdown-style highlighting
SELECT
  title,
  ts_headline(
    'english',
    body,
    plainto_tsquery('english', 'postgresql'),
    'StartSel=**, StopSel=**, MaxFragments=1, MaxWords=25'
  ) AS snippet
FROM articles
WHERE search_vector @@ plainto_tsquery('english', 'postgresql');

Performance Note

ts_headline() is relatively expensive because it re-processes the original text. Always filter with @@ first to limit the number of rows that need highlighting:

-- Good: filter first, then highlight only matching rows
SELECT title, ts_headline('english', body, query) AS snippet
FROM articles, plainto_tsquery('english', 'search') AS query
WHERE search_vector @@ query;

-- Bad: computing headlines on every row
SELECT title, ts_headline('english', body, query) AS snippet
FROM articles, plainto_tsquery('english', 'search') AS query;

Field Weighting with setweight()

In most search applications, a match in the title should rank higher than a match in the body. Use setweight() to assign importance levels:

Weight Typical Use Default Multiplier
A Title, headings 1.0
B Abstract, tags 0.4
C Body text 0.2
D Everything else 0.1

Building a Weighted Tsvector

-- Combine weighted fields
SELECT
  setweight(to_tsvector('english', title), 'A') ||
  setweight(to_tsvector('english', COALESCE(category, '')), 'B') ||
  setweight(to_tsvector('english', body), 'C')
  AS weighted_vector
FROM articles
LIMIT 1;

Searching with Weights

SELECT
  title,
  ts_rank(
    setweight(to_tsvector('english', title), 'A') ||
    setweight(to_tsvector('english', body), 'C'),
    query
  ) AS rank
FROM articles,
     plainto_tsquery('english', 'postgresql') AS query
WHERE (
  setweight(to_tsvector('english', title), 'A') ||
  setweight(to_tsvector('english', body), 'C')
) @@ query
ORDER BY rank DESC;

A match in the title (weight A) now scores roughly 5x higher than a match in the body (weight C).

Pre-Computing Weighted Vectors

For production use, store the weighted vector in a column:

-- Update the search_vector to include weights
UPDATE articles
SET search_vector =
  setweight(to_tsvector('english', title), 'A') ||
  setweight(to_tsvector('english', COALESCE(category, '')), 'B') ||
  setweight(to_tsvector('english', body), 'C');

Then queries become simple and fast:

SELECT title,
       ts_rank(search_vector, query) AS rank
FROM articles,
     plainto_tsquery('english', 'postgresql') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;

Custom Weight Values

You can override the default weight multipliers:

-- {D, C, B, A} weights
SELECT ts_rank(
  '{0.1, 0.2, 0.4, 1.0}',  -- custom weights for D, C, B, A
  search_vector,
  to_tsquery('english', 'postgresql')
) AS rank
FROM articles;

GIN Indexes for Full-Text Search

Without an index, every full-text search query scans the entire table. For tables with thousands or millions of rows, you need a GIN index:

-- Create a GIN index on the pre-computed search_vector
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);

Index on an Expression

If you do not have a pre-computed column, you can index the expression directly:

CREATE INDEX idx_articles_body_search
ON articles
USING GIN (to_tsvector('english', body));

When using an expression index, your query must match the expression exactly:

-- This uses the index (matches the expression)
SELECT * FROM articles
WHERE to_tsvector('english', body) @@ to_tsquery('english', 'search');

-- This does NOT use the index (different expression)
SELECT * FROM articles
WHERE to_tsvector('simple', body) @@ to_tsquery('simple', 'search');

GIN vs. GiST

PostgreSQL offers two index types for full-text search:

Feature GIN GiST
Build speed Slower Faster
Query speed Faster Slower
Index size Larger Smaller
Update speed Slower (use fastupdate) Faster
Best for Read-heavy workloads Write-heavy workloads

GIN is the right choice for most applications. Use GiST only when you have extremely high write rates and can tolerate slower searches.

Checking Index Usage

Verify your index is being used with EXPLAIN:

EXPLAIN SELECT title
FROM articles
WHERE search_vector @@ to_tsquery('english', 'postgresql');

Look for Bitmap Index Scan on idx_articles_search in the output.

Putting It All Together: A Search API

Here is a complete search query combining everything from this lesson:

SELECT
  title,
  category,
  ts_rank(search_vector, query, 32) AS relevance,
  ts_headline(
    'english',
    body,
    query,
    'StartSel=<b>, StopSel=</b>, MaxWords=30, MaxFragments=2'
  ) AS snippet
FROM
  articles,
  websearch_to_tsquery('english', 'postgresql performance') AS query
WHERE
  search_vector @@ query
ORDER BY
  relevance DESC
LIMIT 10;

This gives you:

  • Matching articles filtered by @@.
  • Relevance ranking with ts_rank() (normalized).
  • Highlighted snippets with ts_headline().
  • Clean, user-friendly search syntax with websearch_to_tsquery().

Combining Full-Text Search with Other Filters

Full-text search works naturally with other WHERE conditions:

-- Search within a specific category
SELECT title, ts_rank(search_vector, query) AS rank
FROM articles,
     plainto_tsquery('english', 'optimization') AS query
WHERE search_vector @@ query
  AND category = 'Performance'
ORDER BY rank DESC;

-- Search recent articles only
SELECT title, published_at, ts_rank(search_vector, query) AS rank
FROM articles,
     plainto_tsquery('english', 'postgresql') AS query
WHERE search_vector @@ query
  AND published_at >= '2024-01-01'
ORDER BY rank DESC;

Try It Yourself

Run these queries against the articles table:

  1. Search for articles matching "performance" and rank them by relevance. Which article scores highest?
  2. Generate search snippets for articles matching "index" using ts_headline(). Customize the highlight markers.
  3. Compare ts_rank and ts_rank_cd scores for the query "query optimization". Do the rankings differ?
  4. Build a weighted search vector that gives title matches 10x the importance of body matches, then search for "postgresql".

Summary

Function Purpose
ts_rank(vector, query [, norm]) Score document relevance (frequency-based)
ts_rank_cd(vector, query [, norm]) Score document relevance (cover density)
ts_headline(config, text, query, opts) Generate highlighted snippet
setweight(vector, 'A'|'B'|'C'|'D') Assign importance weight to lexemes
CREATE INDEX ... USING GIN(vector) Create a fast full-text search index
Concept Recommendation
Index type GIN for most workloads, GiST for write-heavy
Normalization Use flag 32 for general-purpose ranking
Field weighting A for title, B for tags/category, C for body
User-facing search Use websearch_to_tsquery() for forgiving syntax
Highlighting Always filter with @@ before calling ts_headline()
Take Quiz Next Lesson
SQL Editor Ctrl+Enter to run
Results
Run a query to see results here
Tables: