= Advanced Topics == Vector Database and Property Recommendation System A vector database is a database that stores data as mathematical vectors. A vector is a list of numbers that represents the meaning of some text, image, product or object. In our project, we use vectors to represent real estate properties. [[br]] Example: "Modern apartment in Skopje with parking, Wi-Fi, balcony and 4.7 rating" becomes: {{{[0.021, -0.112, 0.334, ..., 0.087]}}} (python script that creates the embbedings [attachment:generate_property_embeddings.py]) We need a vector database to build a recommendation system based on similarity. We added an embedding column to the properties table: {{{ ALTER TABLE properties ADD COLUMN embedding vector(384); }}} This column stores the semantic meaning of each property. Additionaly we created a new table for this phase. {{{ CREATE TABLE user_recommendation_profiles ( user_id BIGINT PRIMARY KEY REFERENCES users(user_id) ON DELETE CASCADE ON UPDATE CASCADE, preference_embedding vector(384), updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ); }}} We created a database view:{{{vw_property_embedding_text}}} This view prepares one text description for every property. This text is then converted into an embedding. The view: {{{ CREATE OR REPLACE VIEW vw_property_embedding_text AS SELECT p.property_id, CONCAT( 'Title: ', p.title, '. ', 'Description: ', COALESCE(p.description, ''), '. ', 'Listing type: ', lt.type_name, '. ', 'City: ', a.city, '. ', 'Country: ', c.country_name, '. ', 'Base price: ', p.base_price, '. ', 'Max guests: ', p.max_guests, '. ', 'Property amenities: ', COALESCE(string_agg(DISTINCT pa_am.amenity_name, ', '), ''), '. ', 'Room types: ', COALESCE(string_agg(DISTINCT rt.type_name, ', '), ''), '. ', 'Room amenities: ', COALESCE(string_agg(DISTINCT ra_am.amenity_name, ', '), ''), '. ', 'Average rating: ', COALESCE(ROUND(AVG(rv.rating)::numeric, 2), 0), '.' ) AS embedding_text FROM properties p JOIN listing_types lt ON lt.listing_type_id = p.listing_type_id JOIN addresses a ON a.address_id = p.address_id JOIN countries c ON c.country_id = a.country_id LEFT JOIN property_amenities pa ON pa.property_id = p.property_id LEFT JOIN amenities pa_am ON pa_am.amenity_id = pa.amenity_id LEFT JOIN rooms r ON r.property_id = p.property_id LEFT JOIN room_types rt ON rt.room_type_id = r.room_type_id LEFT JOIN room_amenities ra ON ra.room_id = r.room_id LEFT JOIN amenities ra_am ON ra_am.amenity_id = ra.amenity_id LEFT JOIN reviews rv ON rv.property_id = p.property_id GROUP BY p.property_id, p.title, p.description, p.base_price, p.max_guests, lt.type_name, a.city, c.country_name; }}} [[br]] [[br]] The user recommendation profile represents the user’s preferences. In our system, we do not ask the user to manually write preferences. Instead, we learn preferences from the properties they have already booked. [[br]] We create a function that recalculates the user preference embedding. {{{ CREATE OR REPLACE FUNCTION refresh_user_recommendation_profile( p_user_id BIGINT ) RETURNS void LANGUAGE plpgsql AS $$ BEGIN INSERT INTO user_recommendation_profiles ( user_id, preference_embedding, updated_at ) SELECT u.user_id, AVG(p.embedding), CURRENT_TIMESTAMP FROM users u JOIN guests g ON g.user_id = u.user_id JOIN bookings b ON b.guest_id = g.guest_id JOIN rooms r ON r.room_id = b.room_id JOIN properties p ON p.property_id = r.property_id WHERE u.user_id = p_user_id AND p.embedding IS NOT NULL AND b.booking_status IN ('CONFIRMED', 'COMPLETED') GROUP BY u.user_id ON CONFLICT (user_id) DO UPDATE SET preference_embedding = EXCLUDED.preference_embedding, updated_at = CURRENT_TIMESTAMP; END; $$; }}} [[br]] We use: <=> This is the cosine distance operator from pgvector. [[br]] The recommendation query returns properties that are most similar to the user profile. Use: {{{ SELECT * FROM recommend_properties_for_user(1, 10); }}} This returns the top 10 recommended properties for user 1. {{{ CREATE OR REPLACE FUNCTION recommend_properties_for_user( p_user_id BIGINT, p_limit INT DEFAULT 50 ) RETURNS TABLE ( property_id BIGINT, title varchar, description text, base_price NUMERIC, max_guests INTEGER, city varchar, country_name varchar, distance DOUBLE PRECISION, recommendation_type text ) LANGUAGE plpgsql AS $$ BEGIN IF EXISTS ( SELECT 1 FROM user_recommendation_profiles urp WHERE urp.user_id = p_user_id AND urp.preference_embedding IS NOT NULL ) THEN RETURN QUERY SELECT p.property_id, p.title, p.description, p.base_price, p.max_guests, a.city, c.country_name, p.embedding <=> urp.preference_embedding AS distance, 'PERSONALIZED'::text AS recommendation_type FROM properties p JOIN addresses a ON a.address_id = p.address_id JOIN countries c ON c.country_id = a.country_id JOIN user_recommendation_profiles urp ON urp.user_id = p_user_id WHERE p.embedding IS NOT NULL AND urp.preference_embedding IS NOT NULL AND p.status = 'ACTIVE' AND p.property_id NOT IN ( SELECT DISTINCT r2.property_id FROM bookings b2 JOIN guests g2 ON g2.guest_id = b2.guest_id JOIN rooms r2 ON r2.room_id = b2.room_id WHERE g2.user_id = p_user_id ) ORDER BY distance ASC LIMIT p_limit; ELSE -- korisnikot nema prev bookings RETURN QUERY SELECT p.property_id, p.title, p.description, p.base_price, p.max_guests, a.city, c.country_name, NULL::DOUBLE PRECISION AS distance, 'FALLBACK_POPULAR'::text AS recommendation_type FROM properties p JOIN addresses a ON a.address_id = p.address_id JOIN countries c ON c.country_id = a.country_id LEFT JOIN reviews rv ON rv.property_id = p.property_id LEFT JOIN rooms r ON r.property_id = p.property_id LEFT JOIN bookings b ON b.room_id = r.room_id AND b.booking_status IN ('CONFIRMED', 'COMPLETED') WHERE p.status = 'ACTIVE' GROUP BY p.property_id, p.title, p.description, p.base_price, p.max_guests, a.city, c.country_name, p.created_at ORDER BY COUNT(b.booking_id) DESC, AVG(rv.rating) DESC NULLS LAST, p.created_at DESC LIMIT p_limit; END IF; END; $$; }}} === Index To make vector search faster, we add an HNSW index: {{{ CREATE INDEX IF NOT EXISTS idx_properties_embedding_hnsw ON properties USING hnsw (embedding vector_cosine_ops); }}} === Trigger for bookings {{{ CREATE OR REPLACE FUNCTION trg_refresh_profile_after_booking() RETURNS trigger LANGUAGE plpgsql AS $$ DECLARE v_user_id BIGINT; BEGIN SELECT g.user_id INTO v_user_id FROM guests g WHERE g.guest_id = NEW.guest_id; IF NEW.booking_status IN ('CONFIRMED', 'COMPLETED') THEN PERFORM refresh_user_recommendation_profile(v_user_id); END IF; RETURN NEW; END; $$; CREATE TRIGGER after_booking_refresh_recommendation_profile AFTER INSERT OR UPDATE OF booking_status ON bookings FOR EACH ROW EXECUTE FUNCTION trg_refresh_profile_after_booking(); }}}