wiki:AdvancedTopic

Version 2 (modified by 231550, 3 hours ago) ( diff )

--

Full Text Search in VitaDiet

Introduciton

Full Text Search (FTS) is a technique for searching natural language text in a database. Unlike a simple LIKE search which matches exact character patterns, FTS understands language, it normalizes words to their root form, ignores common words like "the" and "a", and ranks results by relevance.

For example a LIKE search for '%chicken%' will:

  • Scan every row in the table sequentially
  • Only match the exact string "chicken"
  • Return results in no particular order
  • Become slower as the table grows

A FTS search for 'chicken' will:

  • Use a GIN index to find matches instantly
  • Match "chicken", "chickens", "chickened"
  • Rank results by relevance
  • Perform consistently regardless of table size

For VitaDiet, FTS is used to power the food search feature, allowing users to search 350,000+ foods quickly and accurately.

How PostgreSQL FTS Works

PostgreSQL FTS is built around two core data types:

tsvector - a preprocessed representation of a text document. It stores the normalized lexemes (root forms of words) of a text column. For example:

SELECT to_tsvector('english', 'Grilled Chicken Breast');
Result: 'breast':3 'chicken':2 'grill':1

  • "Grilled" becomes "grill" (stemmed to root form)
  • Numbers indicate the position of each word
  • Stop words like "and", "the" are removed

tsquery - a processed search query. It converts user input into a format that can be matched against a tsvector:

SELECT plainto_tsquery('english', 'grilled chicken');
Result: 'grill' & 'chicken'

The @@ operator checks if a tsquery matches a tsvector:
SELECT to_tsvector('english', 'Grilled Chicken Breast') @@ plainto_tsquery('english', 'grilled chicken');
Result: true

Setup

The first step is adding a tsvector column to the Food table and populating it:

ALTER TABLE Food ADD COLUMN IF NOT EXISTS search_vector tsvector;

UPDATE Food
SET search_vector = to_tsvector('english', COALESCE(name, ));

CREATE INDEX idx_food_fts ON Food USING GIN(search_vector);

We use GIN (Generalized Inverted Index) because it is designed for composite values like tsvector where a single column contains multiple lexemes. GIN stores each lexeme separately and maps it to the rows it appears in, making it ideal for FTS. While the standard B-tree indexes work best for equality and range comparison.

Trigger - Keeping Search Data Up To Date

Since search_vector is a derived column based on name, it needs to be updated whenever a food's name changes. Without a trigger, the search index would become stale and return incorrect results.

CREATE FUNCTION update_food_search_vector()
RETURNS TRIGGER AS $$
BEGIN

NEW.search_vector := to_tsvector('english',COALESCE(NEW.name,));

END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_food_search_vector
BEFORE INSERT OR UPDATE OF name ON Food
FOR EACH ROW EXECUTE FUNCTION update_food_search_vector();

  • Activates BEFORE INSERT or UPDATE so the search_vector is set before the row is written to disk
  • Only fires on UPDATE OF name — if other columns like calories_per_100g are updated the trigger is skipped, avoiding unnecessary recomputation
  • COALESCE(NEW.name, ) handles null names gracefully by treating them as empty strings
  • NEW refers to the row being inserted or updated

The Search Function — search_foods

CREATE OR REPLACE FUNCTION search_foods(

p_query VARCHAR, p_limit INT DEFAULT 20, p_offset INT DEFAULT 0

) RETURNS TABLE (

food_id BIGINT, food_name VARCHAR, highlighted TEXT, category VARCHAR, calories INT, rank_score REAL, rank_normalized REAL

) AS $$ DECLARE

v_tsquery tsquery;

BEGIN

v_tsquery := plainto_tsquery('english',p_query);

RETURN QUERY SELECT

f.id, f.name, ts_headline(

'english', f.name,v_tsquery, 'StartSel=<b>, StopSel=</b>, MaxWords=10, MinWords=1'

) AS highlighted, c.name, f.calories_per_100g, ts_rank(f.search_vector, v_tsquery) AS rank_score, ts_rank_cd(f.search_vector, v_tsquery) AS rank_normalized

FROM Food f LEFT JOIN Category c ON f.category_id = c.category_id WHERE f.search_vector @@ v_tsquery ORDER BY rank_score DESC LIMIT p_limit OFFSET p_offset;

END; $$ LANGUAGE plpgsql;

Note: See TracWiki for help on using the wiki.