= Full Text Search in !VitaDiet = == Introduciton == Full Text Search (FTS) is a technique for searching natural language text in a database. Unlike a simple LIKE search which matches exact character patterns, FTS understands language, it normalizes words to their root form, ignores common words like "the" and "a", and ranks results by relevance. For example a LIKE search for '%chicken%' will: * Scan every row in the table sequentially * Only match the exact string "chicken" * Return results in no particular order * Become slower as the table grows A FTS search for 'chicken' will: [[BR]] * Use a GIN index to find matches instantly * Match "chicken", "chickens", "chickened" * Rank results by relevance * Perform consistently regardless of table size For !VitaDiet, FTS is used to power the food search feature, allowing users to search 350,000+ foods quickly and accurately. == How PostgreSQL FTS Works == PostgreSQL FTS is built around two core data types: **tsvector** - a preprocessed representation of a text document. It stores the normalized lexemes (root forms of words) of a text column. For example: SELECT to_tsvector('english', 'Grilled Chicken Breast'); [[BR]] Result: 'breast':3 'chicken':2 'grill':1 [[BR]] * "Grilled" becomes "grill" (stemmed to root form) [[BR]] * Numbers indicate the position of each word [[BR]] * Stop words like "and", "the" are removed **tsquery** - a processed search query. It converts user input into a format that can be matched against a tsvector: SELECT plainto_tsquery('english', 'grilled chicken'); [[BR]] Result: 'grill' & 'chicken' [[BR]] The @@ operator checks if a tsquery matches a tsvector: [[BR]] SELECT to_tsvector('english', 'Grilled Chicken Breast') @@ plainto_tsquery('english', 'grilled chicken'); [[BR]] Result: true [[BR]] == Setup == The first step is adding a **tsvector** column to the **Food** table and populating it: ALTER TABLE Food ADD COLUMN IF NOT EXISTS search_vector tsvector; UPDATE Food [[BR]] SET search_vector = to_tsvector('english', COALESCE(name, '')); CREATE INDEX idx_food_fts ON Food USING GIN(search_vector); We use GIN (Generalized Inverted Index) because it is designed for composite values like **tsvector** where a single column contains multiple lexemes. GIN stores each lexeme separately and maps it to the rows it appears in, making it ideal for FTS. While the standard B-tree indexes work best for equality and range comparison. == Trigger - Keeping Search Data Up To Date == Since **search_vector** is a derived column based on **name**, it needs to be updated whenever a food's name changes. Without a trigger, the search index would become stale and return incorrect results. CREATE FUNCTION update_food_search_vector() [[BR]] RETURNS TRIGGER AS $$ [[BR]] BEGIN [[BR]] NEW.search_vector := to_tsvector('english',COALESCE(NEW.name,'')); [[BR]] END; [[BR]] $$ LANGUAGE plpgsql; [[BR]] CREATE TRIGGER trg_food_search_vector [[BR]] BEFORE INSERT OR UPDATE OF name ON Food [[BR]] FOR EACH ROW EXECUTE FUNCTION update_food_search_vector(); * Activates BEFORE INSERT or UPDATE so the **search_vector** is set before the row is written to disk * Only fires on UPDATE OF name — if other columns like **calories_per_100g** are updated the trigger is skipped, avoiding unnecessary recomputation * COALESCE(NEW.name, '') handles null names gracefully by treating them as empty strings * NEW refers to the row being inserted or updated == The Search Function — search_foods == CREATE OR REPLACE FUNCTION search_foods( p_query VARCHAR, p_limit INT DEFAULT 20, p_offset INT DEFAULT 0 ) RETURNS TABLE ( food_id BIGINT, food_name VARCHAR, highlighted TEXT, category VARCHAR, calories INT, rank_score REAL, rank_normalized REAL ) AS $$ DECLARE v_tsquery tsquery; BEGIN v_tsquery := plainto_tsquery('english',p_query); RETURN QUERY SELECT f.id, f.name, ts_headline( 'english', f.name,v_tsquery, 'StartSel=, StopSel=, MaxWords=10, MinWords=1' ) AS highlighted, c.name, f.calories_per_100g, ts_rank(f.search_vector, v_tsquery) AS rank_score, ts_rank_cd(f.search_vector, v_tsquery) AS rank_normalized FROM Food f LEFT JOIN Category c ON f.category_id = c.category_id WHERE f.search_vector @@ v_tsquery ORDER BY rank_score DESC LIMIT p_limit OFFSET p_offset; END; $$ LANGUAGE plpgsql;