| | 2 | == Vector Database and Property Recommendation System |
| | 3 | A vector database is a database that stores data as mathematical vectors. |
| | 4 | A vector is a list of numbers that represents the meaning of some text, image, product or object. |
| | 5 | In our project, we use vectors to represent real estate properties. |
| | 6 | [[br]] |
| | 7 | Example: |
| | 8 | "Modern apartment in Skopje with parking, Wi-Fi, balcony and 4.7 rating" |
| | 9 | becomes: |
| | 10 | |
| | 11 | {{{[0.021, -0.112, 0.334, ..., 0.087]}}} |
| | 12 | We need a vector database to build a recommendation system based on similarity. |
| | 13 | |
| | 14 | We added an embedding column to the properties table: |
| | 15 | |
| | 16 | {{{ |
| | 17 | ALTER TABLE properties |
| | 18 | ADD COLUMN embedding vector(384); |
| | 19 | }}} |
| | 20 | |
| | 21 | This column stores the semantic meaning of each property.We created a database view:{{{vw_property_embedding_text}}} |
| | 22 | This view prepares one text description for every property.This text is then converted into an embedding. |
| | 23 | [[br]] |
| | 24 | [[br]] |
| | 25 | The user recommendation profile represents the user’s preferences. |
| | 26 | In our system, we do not ask the user to manually write preferences. |
| | 27 | Instead, we learn preferences from the properties they have already booked. |
| | 28 | [[br]] |
| | 29 | We create a function that recalculates the user preference embedding. |
| | 30 | {{{ |
| | 31 | CREATE OR REPLACE FUNCTION refresh_user_recommendation_profile( |
| | 32 | p_user_id BIGINT |
| | 33 | ) |
| | 34 | RETURNS void |
| | 35 | LANGUAGE plpgsql |
| | 36 | AS $$ |
| | 37 | BEGIN |
| | 38 | INSERT INTO user_recommendation_profiles ( |
| | 39 | user_id, |
| | 40 | preference_embedding, |
| | 41 | updated_at |
| | 42 | ) |
| | 43 | SELECT |
| | 44 | u.user_id, |
| | 45 | AVG(p.embedding), |
| | 46 | CURRENT_TIMESTAMP |
| | 47 | FROM users u |
| | 48 | JOIN guests g |
| | 49 | ON g.user_id = u.user_id |
| | 50 | JOIN bookings b |
| | 51 | ON b.guest_id = g.guest_id |
| | 52 | JOIN rooms r |
| | 53 | ON r.room_id = b.room_id |
| | 54 | JOIN properties p |
| | 55 | ON p.property_id = r.property_id |
| | 56 | WHERE u.user_id = p_user_id |
| | 57 | AND p.embedding IS NOT NULL |
| | 58 | AND b.booking_status IN ('CONFIRMED', 'COMPLETED') |
| | 59 | GROUP BY u.user_id |
| | 60 | ON CONFLICT (user_id) |
| | 61 | DO UPDATE SET |
| | 62 | preference_embedding = EXCLUDED.preference_embedding, |
| | 63 | updated_at = CURRENT_TIMESTAMP; |
| | 64 | END; |
| | 65 | $$; |
| | 66 | }}} |
| | 67 | [[br]] |
| | 68 | We use: |
| | 69 | |
| | 70 | <=> |
| | 71 | |
| | 72 | This is the cosine distance operator from pgvector. |
| | 73 | [[br]] |
| | 74 | The recommendation query returns properties that are most similar to the user profile. |
| | 75 | Use: |
| | 76 | {{{ |
| | 77 | SELECT * |
| | 78 | FROM recommend_properties_for_user(1, 10); |
| | 79 | }}} |
| | 80 | This returns the top 10 recommended properties for user 1. |
| | 81 | === Index |
| | 82 | To make vector search faster, we add an HNSW index: |
| | 83 | {{{ |
| | 84 | CREATE INDEX IF NOT EXISTS idx_properties_embedding_hnsw |
| | 85 | ON properties |
| | 86 | USING hnsw (embedding vector_cosine_ops); |
| | 87 | }}} |