We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Tsvector and Tsquery
PostgreSQL includes a powerful full-text search engine right in the database. Unlike LIKE or regular expressions which match character patterns, full-text search understands language: it stems words, removes noise words, and ranks results by relevance.
Why Not Just Use LIKE?
-- LIKE has serious limitations:
SELECT * FROM articles WHERE body LIKE '%database%';
Problems with LIKE:
- No stemming -- searching for "databases" will not match "database".
- No stop words -- searching for "the" matches everything.
- No ranking -- you cannot sort results by relevance.
- Slow --
%word%cannot use a regular B-tree index; it scans every row.
Full-text search solves all of these problems.
Core Concepts
Full-text search revolves around two data types:
| Type | Purpose | Example |
|---|---|---|
tsvector |
A processed document -- sorted list of distinct lexemes (stems) with positions | 'databas':1 'postgresql':2 'search':3 |
tsquery |
A search condition -- lexemes combined with boolean operators | 'database' & 'search' |
The @@ operator tests whether a tsvector matches a tsquery.
Creating Tsvectors with to_tsvector()
The to_tsvector() function converts plain text into a tsvector:
SELECT to_tsvector('english', 'PostgreSQL databases are powerful and fast');
Result:
'databas':2 'fast':6 'power':4 'postgresql':1
Notice what happened:
- Stemming: "databases" became "databas", "powerful" became "power".
- Stop words removed: "are" and "and" are gone.
- Positions recorded: each lexeme has a position number.
- Lowercased: "PostgreSQL" became "postgresql".
Text Search Configurations
The first argument is the configuration that controls processing:
-- English stemming (default in most installations)
SELECT to_tsvector('english', 'The running dogs quickly jumped');
-- Result: 'dog':3 'jump':5 'quick':4 'run':2
-- Simple: no stemming, just lowercase and split
SELECT to_tsvector('simple', 'The running dogs quickly jumped');
-- Result: 'dogs':3 'jumped':5 'quickly':4 'running':2 'the':1
Common configurations: english, simple, spanish, french, german, russian.
Creating Tsqueries
to_tsquery() -- Manual Control
You write the boolean operators yourself:
-- Single term
SELECT to_tsquery('english', 'database');
-- Result: 'databas'
-- AND: both must be present
SELECT to_tsquery('english', 'database & search');
-- Result: 'databas' & 'search'
-- OR: either can be present
SELECT to_tsquery('english', 'database | search');
-- Result: 'databas' | 'search'
-- NOT: first must be present, second must not
SELECT to_tsquery('english', 'database & !oracle');
-- Result: 'databas' & !'oracl'
-- Phrase (adjacent words): <-> means "followed by"
SELECT to_tsquery('english', 'full <-> text <-> search');
-- Result: 'full' <-> 'text' <-> 'search'
-- Within N words: <2> means "within 2 words"
SELECT to_tsquery('english', 'database <2> performance');
plainto_tsquery() -- Simple User Input
Converts plain text to a tsquery with & (AND) between words. Best for simple search boxes:
SELECT plainto_tsquery('english', 'database performance tuning');
-- Result: 'databas' & 'perform' & 'tune'
No special syntax is needed -- all words are ANDed together.
phraseto_tsquery() -- Phrase Search
Words must appear consecutively in order:
SELECT phraseto_tsquery('english', 'full text search');
-- Result: 'full' <-> 'text' <-> 'search'
websearch_to_tsquery() -- Google-like Syntax
Supports intuitive search syntax that users expect:
-- AND (default for adjacent words)
SELECT websearch_to_tsquery('english', 'database performance');
-- Result: 'databas' & 'perform'
-- OR
SELECT websearch_to_tsquery('english', 'database OR nosql');
-- Result: 'databas' | 'nosql'
-- Exclude with -
SELECT websearch_to_tsquery('english', 'database -oracle');
-- Result: 'databas' & !'oracl'
-- Exact phrase with quotes
SELECT websearch_to_tsquery('english', '"full text search"');
-- Result: 'full' <-> 'text' <-> 'search'
websearch_to_tsquery() is the best choice for user-facing search because it never raises a syntax error -- malformed input is handled gracefully.
The @@ Match Operator
Now combine the two sides:
-- Does this document match this query?
SELECT to_tsvector('english', 'PostgreSQL has excellent full text search')
@@
to_tsquery('english', 'search & text');
-- Result: true
SELECT to_tsvector('english', 'PostgreSQL has excellent full text search')
@@
to_tsquery('english', 'search & oracle');
-- Result: false
Searching the Articles Table
The articles table has a pre-computed search_vector column of type TSVECTOR. Let us search it:
-- Find articles mentioning "index"
SELECT title, category
FROM articles
WHERE search_vector @@ to_tsquery('english', 'index');
You can also build the tsvector on the fly from the body column:
-- Search the body text directly
SELECT title, category
FROM articles
WHERE to_tsvector('english', body) @@ plainto_tsquery('english', 'query performance');
Using the pre-computed search_vector column is faster because it avoids re-processing the text on every query.
Searching Multiple Columns
Combine columns by concatenating them:
SELECT title, category
FROM articles
WHERE to_tsvector('english', title || ' ' || body)
@@ plainto_tsquery('english', 'postgresql');
Or concatenate the tsvectors with ||:
SELECT title, category
FROM articles
WHERE (to_tsvector('english', title) || to_tsvector('english', body))
@@ plainto_tsquery('english', 'postgresql');
Practical Search Queries
-- Articles about either "join" or "subquery"
SELECT title, category
FROM articles
WHERE search_vector @@ to_tsquery('english', 'join | subquery');
-- Articles about "performance" but not "index"
SELECT title, category
FROM articles
WHERE search_vector @@ to_tsquery('english', 'performance & !index');
-- Articles matching a phrase
SELECT title, category
FROM articles
WHERE search_vector @@ phraseto_tsquery('english', 'query optimization');
-- Using websearch syntax from user input
SELECT title, category
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'PostgreSQL "window functions"');
Pre-Computing Tsvectors
For large tables, computing to_tsvector() on every query is expensive. The articles table already has a search_vector column, but here is how you would set that up:
-- Add a tsvector column
ALTER TABLE articles ADD COLUMN search_vector TSVECTOR;
-- Populate it
UPDATE articles
SET search_vector = to_tsvector('english', title || ' ' || body);
-- Keep it in sync with a trigger
CREATE FUNCTION fn_articles_search_vector()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $$
BEGIN
NEW.search_vector := to_tsvector('english', NEW.title || ' ' || NEW.body);
RETURN NEW;
END;
$$;
CREATE TRIGGER trg_articles_search_vector
BEFORE INSERT OR UPDATE OF title, body ON articles
FOR EACH ROW
EXECUTE FUNCTION fn_articles_search_vector();
Understanding Lexemes and Positions
You can inspect how PostgreSQL processes your text:
-- See all the lexemes and their positions
SELECT * FROM ts_debug('english', 'The quick brown foxes jumped over lazy dogs');
This shows each token, what dictionary handled it, and the resulting lexeme. It is invaluable for understanding why a search does or does not match.
-- See the tokens
SELECT alias, description, token, dictionaries, lexemes
FROM ts_debug('english', 'Running databases quickly');
Try It Yourself
Run these queries against the articles table in the sandbox:
- Find all articles that contain the word "PostgreSQL" using full-text search on the
bodycolumn. - Search for articles about "performance" OR "optimization" using the
search_vectorcolumn. - Try searching with
websearch_to_tsqueryusing the search string:database -beginner. - Compare the results of
plainto_tsqueryvs.phraseto_tsqueryfor "text search".
Summary
| Function | Purpose | Example |
|---|---|---|
to_tsvector(config, text) |
Convert text to searchable tsvector | to_tsvector('english', 'Hello world') |
to_tsquery(config, expr) |
Parse tsquery with manual operators | to_tsquery('english', 'cat & dog') |
plainto_tsquery(config, text) |
Plain text to AND-joined tsquery | plainto_tsquery('english', 'cat dog') |
phraseto_tsquery(config, text) |
Phrase (consecutive words) tsquery | phraseto_tsquery('english', 'full text') |
websearch_to_tsquery(config, text) |
Google-style search syntax | websearch_to_tsquery('english', 'cat -dog') |
@@ |
Match operator | tsvector @@ tsquery |
ts_debug(config, text) |
Inspect tokenization and stemming | ts_debug('english', 'running') |