Tsvector and Tsquery

PostgreSQL includes a powerful full-text search engine right in the database. Unlike LIKE or regular expressions which match character patterns, full-text search understands language: it stems words, removes noise words, and ranks results by relevance.

Why Not Just Use LIKE?

-- LIKE has serious limitations:
SELECT * FROM articles WHERE body LIKE '%database%';

Problems with LIKE:

  • No stemming -- searching for "databases" will not match "database".
  • No stop words -- searching for "the" matches everything.
  • No ranking -- you cannot sort results by relevance.
  • Slow -- %word% cannot use a regular B-tree index; it scans every row.

Full-text search solves all of these problems.

Core Concepts

Full-text search revolves around two data types:

Type Purpose Example
tsvector A processed document -- sorted list of distinct lexemes (stems) with positions 'databas':1 'postgresql':2 'search':3
tsquery A search condition -- lexemes combined with boolean operators 'database' & 'search'

The @@ operator tests whether a tsvector matches a tsquery.

Creating Tsvectors with to_tsvector()

The to_tsvector() function converts plain text into a tsvector:

SELECT to_tsvector('english', 'PostgreSQL databases are powerful and fast');

Result:

'databas':2 'fast':6 'power':4 'postgresql':1

Notice what happened:

  • Stemming: "databases" became "databas", "powerful" became "power".
  • Stop words removed: "are" and "and" are gone.
  • Positions recorded: each lexeme has a position number.
  • Lowercased: "PostgreSQL" became "postgresql".

Text Search Configurations

The first argument is the configuration that controls processing:

-- English stemming (default in most installations)
SELECT to_tsvector('english', 'The running dogs quickly jumped');
-- Result: 'dog':3 'jump':5 'quick':4 'run':2

-- Simple: no stemming, just lowercase and split
SELECT to_tsvector('simple', 'The running dogs quickly jumped');
-- Result: 'dogs':3 'jumped':5 'quickly':4 'running':2 'the':1

Common configurations: english, simple, spanish, french, german, russian.

Creating Tsqueries

to_tsquery() -- Manual Control

You write the boolean operators yourself:

-- Single term
SELECT to_tsquery('english', 'database');
-- Result: 'databas'

-- AND: both must be present
SELECT to_tsquery('english', 'database & search');
-- Result: 'databas' & 'search'

-- OR: either can be present
SELECT to_tsquery('english', 'database | search');
-- Result: 'databas' | 'search'

-- NOT: first must be present, second must not
SELECT to_tsquery('english', 'database & !oracle');
-- Result: 'databas' & !'oracl'

-- Phrase (adjacent words): <-> means "followed by"
SELECT to_tsquery('english', 'full <-> text <-> search');
-- Result: 'full' <-> 'text' <-> 'search'

-- Within N words: <2> means "within 2 words"
SELECT to_tsquery('english', 'database <2> performance');

plainto_tsquery() -- Simple User Input

Converts plain text to a tsquery with & (AND) between words. Best for simple search boxes:

SELECT plainto_tsquery('english', 'database performance tuning');
-- Result: 'databas' & 'perform' & 'tune'

No special syntax is needed -- all words are ANDed together.

phraseto_tsquery() -- Phrase Search

Words must appear consecutively in order:

SELECT phraseto_tsquery('english', 'full text search');
-- Result: 'full' <-> 'text' <-> 'search'

websearch_to_tsquery() -- Google-like Syntax

Supports intuitive search syntax that users expect:

-- AND (default for adjacent words)
SELECT websearch_to_tsquery('english', 'database performance');
-- Result: 'databas' & 'perform'

-- OR
SELECT websearch_to_tsquery('english', 'database OR nosql');
-- Result: 'databas' | 'nosql'

-- Exclude with -
SELECT websearch_to_tsquery('english', 'database -oracle');
-- Result: 'databas' & !'oracl'

-- Exact phrase with quotes
SELECT websearch_to_tsquery('english', '"full text search"');
-- Result: 'full' <-> 'text' <-> 'search'

websearch_to_tsquery() is the best choice for user-facing search because it never raises a syntax error -- malformed input is handled gracefully.

The @@ Match Operator

Now combine the two sides:

-- Does this document match this query?
SELECT to_tsvector('english', 'PostgreSQL has excellent full text search')
       @@
       to_tsquery('english', 'search & text');
-- Result: true

SELECT to_tsvector('english', 'PostgreSQL has excellent full text search')
       @@
       to_tsquery('english', 'search & oracle');
-- Result: false

Searching the Articles Table

The articles table has a pre-computed search_vector column of type TSVECTOR. Let us search it:

-- Find articles mentioning "index"
SELECT title, category
FROM articles
WHERE search_vector @@ to_tsquery('english', 'index');

You can also build the tsvector on the fly from the body column:

-- Search the body text directly
SELECT title, category
FROM articles
WHERE to_tsvector('english', body) @@ plainto_tsquery('english', 'query performance');

Using the pre-computed search_vector column is faster because it avoids re-processing the text on every query.

Searching Multiple Columns

Combine columns by concatenating them:

SELECT title, category
FROM articles
WHERE to_tsvector('english', title || ' ' || body)
      @@ plainto_tsquery('english', 'postgresql');

Or concatenate the tsvectors with ||:

SELECT title, category
FROM articles
WHERE (to_tsvector('english', title) || to_tsvector('english', body))
      @@ plainto_tsquery('english', 'postgresql');

Practical Search Queries

-- Articles about either "join" or "subquery"
SELECT title, category
FROM articles
WHERE search_vector @@ to_tsquery('english', 'join | subquery');

-- Articles about "performance" but not "index"
SELECT title, category
FROM articles
WHERE search_vector @@ to_tsquery('english', 'performance & !index');

-- Articles matching a phrase
SELECT title, category
FROM articles
WHERE search_vector @@ phraseto_tsquery('english', 'query optimization');

-- Using websearch syntax from user input
SELECT title, category
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'PostgreSQL "window functions"');

Pre-Computing Tsvectors

For large tables, computing to_tsvector() on every query is expensive. The articles table already has a search_vector column, but here is how you would set that up:

-- Add a tsvector column
ALTER TABLE articles ADD COLUMN search_vector TSVECTOR;

-- Populate it
UPDATE articles
SET search_vector = to_tsvector('english', title || ' ' || body);

-- Keep it in sync with a trigger
CREATE FUNCTION fn_articles_search_vector()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $$
BEGIN
  NEW.search_vector := to_tsvector('english', NEW.title || ' ' || NEW.body);
  RETURN NEW;
END;
$$;

CREATE TRIGGER trg_articles_search_vector
BEFORE INSERT OR UPDATE OF title, body ON articles
FOR EACH ROW
EXECUTE FUNCTION fn_articles_search_vector();

Understanding Lexemes and Positions

You can inspect how PostgreSQL processes your text:

-- See all the lexemes and their positions
SELECT * FROM ts_debug('english', 'The quick brown foxes jumped over lazy dogs');

This shows each token, what dictionary handled it, and the resulting lexeme. It is invaluable for understanding why a search does or does not match.

-- See the tokens
SELECT alias, description, token, dictionaries, lexemes
FROM ts_debug('english', 'Running databases quickly');

Try It Yourself

Run these queries against the articles table in the sandbox:

  1. Find all articles that contain the word "PostgreSQL" using full-text search on the body column.
  2. Search for articles about "performance" OR "optimization" using the search_vector column.
  3. Try searching with websearch_to_tsquery using the search string: database -beginner.
  4. Compare the results of plainto_tsquery vs. phraseto_tsquery for "text search".

Summary

Function Purpose Example
to_tsvector(config, text) Convert text to searchable tsvector to_tsvector('english', 'Hello world')
to_tsquery(config, expr) Parse tsquery with manual operators to_tsquery('english', 'cat & dog')
plainto_tsquery(config, text) Plain text to AND-joined tsquery plainto_tsquery('english', 'cat dog')
phraseto_tsquery(config, text) Phrase (consecutive words) tsquery phraseto_tsquery('english', 'full text')
websearch_to_tsquery(config, text) Google-style search syntax websearch_to_tsquery('english', 'cat -dog')
@@ Match operator tsvector @@ tsquery
ts_debug(config, text) Inspect tokenization and stemming ts_debug('english', 'running')
Take Quiz Next Lesson
SQL Editor Ctrl+Enter to run
Results
Run a query to see results here
Tables: