Changes between Version 1 and Version 2 of AdvancedTopic


Ignore:
Timestamp:
05/26/26 13:42:32 (5 hours ago)
Author:
231550
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • AdvancedTopic

    v1 v2  
    1 = Full Text Search in VitaDiet =
     1= Full Text Search in !VitaDiet =
    22
    33== Introduciton ==
    44
    5 Full Text Search (FTS) is a technique for searching natural language text in a database. Unlike a simple LIKE search which matches exact character patterns, FTS understands language — it normalizes words to their root form, ignores common words like "the" and "a", and ranks results by relevance.
     5Full Text Search (FTS) is a technique for searching natural language text in a database. Unlike a simple LIKE search which matches exact character patterns, FTS understands language, it normalizes words to their root form, ignores common words like "the" and "a", and ranks results by relevance.
    66
    77For example a LIKE search for '%chicken%' will:
    8 *Scan every row in the table sequentially
    9 *Only match the exact string "chicken"
    10 *Return results in no particular order
    11 *Become slower as the table grows
     8* Scan every row in the table sequentially
     9* Only match the exact string "chicken"
     10* Return results in no particular order
     11* Become slower as the table grows
    1212
    13 A FTS search for 'chicken' will:
    14 *Use a GIN index to find matches instantly
    15 *Match "chicken", "chickens", "chickened"
    16 *Rank results by relevance
    17 *Perform consistently regardless of table size
     13A FTS search for 'chicken' will: [[BR]]
     14* Use a GIN index to find matches instantly
     15* Match "chicken", "chickens", "chickened"
     16* Rank results by relevance
     17* Perform consistently regardless of table size
    1818
    19 For VitaDiet, FTS is used to power the food search feature, allowing users to search 350,000+ foods quickly and accurately.
     19For !VitaDiet, FTS is used to power the food search feature, allowing users to search 350,000+ foods quickly and accurately.
    2020
    2121== How PostgreSQL FTS Works ==
     
    2323PostgreSQL FTS is built around two core data types:
    2424
    25 tsvector — a preprocessed representation of a text document. It stores the normalized lexemes (root forms of words) of a text column. For example:
    26 SELECT to_tsvector('english', 'Grilled Chicken Breast');
    27 Result: 'breast':3 'chicken':2 'grill':1
    28 *"Grilled" becomes "grill" (stemmed to root form)
    29 *Numbers indicate the position of each word
    30 *Stop words like "and", "the" are removed
     25**tsvector** - a preprocessed representation of a text document. It stores the normalized lexemes (root forms of words) of a text column. For example:
    3126
    32 tsquery — a processed search query. It converts user input into a format that can be matched against a tsvector:
    33 SELECT plainto_tsquery('english', 'grilled chicken');
    34 Result: 'grill' & 'chicken'
    35 The @@ operator checks if a tsquery matches a tsvector:
    36 SELECT to_tsvector('english', 'Grilled Chicken Breast')
    37     @@ plainto_tsquery('english', 'grilled chicken');
    38 Result: true
     27SELECT to_tsvector('english', 'Grilled Chicken Breast'); [[BR]]
     28Result: 'breast':3 'chicken':2 'grill':1 [[BR]]
     29* "Grilled" becomes "grill" (stemmed to root form) [[BR]]
     30* Numbers indicate the position of each word [[BR]]
     31* Stop words like "and", "the" are removed
     32
     33**tsquery** - a processed search query. It converts user input into a format that can be matched against a tsvector:
     34
     35SELECT plainto_tsquery('english', 'grilled chicken'); [[BR]] 
     36Result: 'grill' & 'chicken' [[BR]]
     37
     38The @@ operator checks if a tsquery matches a tsvector: [[BR]] 
     39SELECT to_tsvector('english', 'Grilled Chicken Breast') @@ plainto_tsquery('english', 'grilled chicken'); [[BR]] 
     40Result: true [[BR]]
     41
     42== Setup ==
     43
     44The first step is adding a **tsvector** column to the **Food** table and populating it:
     45
     46ALTER TABLE Food ADD COLUMN IF NOT EXISTS search_vector tsvector;
     47
     48UPDATE Food [[BR]]
     49SET search_vector = to_tsvector('english', COALESCE(name, ''));
     50
     51CREATE INDEX idx_food_fts ON Food USING GIN(search_vector);
     52
     53We use GIN (Generalized Inverted Index) because it is designed for composite values like **tsvector** where a single column contains multiple lexemes. GIN stores each lexeme separately and maps it to the rows it appears in, making it ideal for FTS. While the standard B-tree indexes work best for equality and range comparison.
     54
     55== Trigger - Keeping Search Data Up To Date ==
     56
     57Since **search_vector** is a derived column based on **name**, it needs to be updated whenever a food's name changes. Without a trigger, the search index would become stale and return incorrect results.
     58
     59CREATE FUNCTION update_food_search_vector() [[BR]]
     60RETURNS TRIGGER AS $$ [[BR]]
     61BEGIN [[BR]]
     62    NEW.search_vector := to_tsvector('english',COALESCE(NEW.name,'')); [[BR]]
     63END; [[BR]]
     64$$ LANGUAGE plpgsql; [[BR]]
     65
     66CREATE TRIGGER trg_food_search_vector [[BR]]
     67BEFORE INSERT OR UPDATE OF name ON Food [[BR]]
     68FOR EACH ROW
     69EXECUTE FUNCTION update_food_search_vector();
     70
     71* Activates BEFORE INSERT or UPDATE so the **search_vector** is set before the row is written to disk
     72* Only fires on UPDATE OF name — if other columns like **calories_per_100g** are updated the trigger is skipped, avoiding unnecessary recomputation
     73* COALESCE(NEW.name, '') handles null names gracefully by treating them as empty strings
     74* NEW refers to the row being inserted or updated
     75
     76== The Search Function — search_foods ==
     77
     78CREATE OR REPLACE FUNCTION search_foods(
     79    p_query VARCHAR,
     80    p_limit INT DEFAULT 20,
     81    p_offset INT DEFAULT 0
     82)
     83RETURNS TABLE (
     84    food_id         BIGINT,
     85    food_name       VARCHAR,
     86    highlighted     TEXT,
     87    category        VARCHAR,
     88    calories        INT,
     89    rank_score      REAL,
     90    rank_normalized REAL
     91) AS $$
     92DECLARE
     93    v_tsquery tsquery;
     94BEGIN
     95    v_tsquery := plainto_tsquery('english',p_query);
     96
     97    RETURN QUERY
     98    SELECT
     99        f.id,
     100        f.name,
     101        ts_headline(
     102            'english', f.name,v_tsquery,
     103            'StartSel=<b>, StopSel=</b>, MaxWords=10, MinWords=1'
     104        ) AS highlighted,
     105        c.name,
     106        f.calories_per_100g,
     107        ts_rank(f.search_vector, v_tsquery) AS rank_score,
     108        ts_rank_cd(f.search_vector, v_tsquery) AS rank_normalized
     109    FROM Food f
     110    LEFT JOIN Category c ON f.category_id = c.category_id
     111    WHERE f.search_vector @@ v_tsquery
     112    ORDER BY rank_score DESC
     113    LIMIT p_limit
     114    OFFSET p_offset;
     115END;
     116$$ LANGUAGE plpgsql;