Changes between Version 1 and Version 2 of AdvancedTopics


Ignore:
Timestamp:
06/08/26 22:32:44 (13 days ago)
Author:
231035
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • AdvancedTopics

    v1 v2  
    11= Advanced Topics
     2== Vector Database and Property Recommendation System
     3A vector database is a database that stores data as mathematical vectors.
     4A vector is a list of numbers that represents the meaning of some text, image, product or object.
     5In our project, we use vectors to represent real estate properties.
     6[[br]]
     7Example:
     8"Modern apartment in Skopje with parking, Wi-Fi, balcony and 4.7 rating"
     9becomes:
     10
     11{{{[0.021, -0.112, 0.334, ..., 0.087]}}}
     12We need a vector database to build a recommendation system based on similarity.
     13
     14We added an embedding column to the properties table:
     15
     16{{{
     17ALTER TABLE properties
     18ADD COLUMN embedding vector(384);
     19}}}
     20
     21This column stores the semantic meaning of each property.We created a database view:{{{vw_property_embedding_text}}}
     22This view prepares one text description for every property.This text is then converted into an embedding.
     23[[br]]
     24[[br]]
     25The user recommendation profile represents the user’s preferences.
     26In our system, we do not ask the user to manually write preferences.
     27Instead, we learn preferences from the properties they have already booked.
     28[[br]]
     29We create a function that recalculates the user preference embedding.
     30{{{
     31CREATE OR REPLACE FUNCTION refresh_user_recommendation_profile(
     32    p_user_id BIGINT
     33)
     34RETURNS void
     35LANGUAGE plpgsql
     36AS $$
     37BEGIN
     38    INSERT INTO user_recommendation_profiles (
     39        user_id,
     40        preference_embedding,
     41        updated_at
     42    )
     43    SELECT
     44        u.user_id,
     45        AVG(p.embedding),
     46        CURRENT_TIMESTAMP
     47    FROM users u
     48    JOIN guests g
     49        ON g.user_id = u.user_id
     50    JOIN bookings b
     51        ON b.guest_id = g.guest_id
     52    JOIN rooms r
     53        ON r.room_id = b.room_id
     54    JOIN properties p
     55        ON p.property_id = r.property_id
     56    WHERE u.user_id = p_user_id
     57      AND p.embedding IS NOT NULL
     58      AND b.booking_status IN ('CONFIRMED', 'COMPLETED')
     59    GROUP BY u.user_id
     60    ON CONFLICT (user_id)
     61    DO UPDATE SET
     62        preference_embedding = EXCLUDED.preference_embedding,
     63        updated_at = CURRENT_TIMESTAMP;
     64END;
     65$$;
     66}}}
     67[[br]]
     68We use:
     69
     70<=>
     71
     72This is the cosine distance operator from pgvector.
     73[[br]]
     74The recommendation query returns properties that are most similar to the user profile.
     75Use:
     76{{{
     77SELECT *
     78FROM recommend_properties_for_user(1, 10);
     79}}}
     80This returns the top 10 recommended properties for user 1.
     81=== Index
     82To make vector search faster, we add an HNSW index:
     83{{{
     84CREATE INDEX IF NOT EXISTS idx_properties_embedding_hnsw
     85ON properties
     86USING hnsw (embedding vector_cosine_ops);
     87}}}