= Recommendation System for worker feed with vector database =

For our recommendation system, we used a vector database so we can recommend task requests on the worker's feed so that they are closest to the tasks that the worker usually takes on.

For our recommendation system we will try to suggest task requests that are closest to the user's past completed tasks with a fallback option, that is, if the worker hasn't completed any tasks yet we will try to match the task requests closest to his experience description for the particular category. For this purpose we added a column in the `worker_category` table called `experience_description`. This is supposed to be the description that the worker has provided when signing up for the given category.

We will look at how both of the strategies are implemented.

== 1. Personalized Recommendations ==

This is the recommendation system that recommends open task requests to workers that have the most similar completed tasks. For this purpose we first embedded the open task requests and then as a starting point for our system we embedded three task requests that resulted in a completed task for each worker. We found this number of tasks enough as a starting point for our app. We built these embeddings with the script [attachment:embed_taskrequests.py] and we used a table to store these vectors:

{{{
#!sql
CREATE TABLE task_request_embeddings (
    task_request_id INT PRIMARY KEY
        REFERENCES TaskRequest(id)
        ON DELETE CASCADE
        ON UPDATE CASCADE,

    embedding vector(384) NOT NULL,

    embedded_at TIMESTAMP NOT NULL
        DEFAULT CURRENT_TIMESTAMP
);
}}}

Once we embedded all the task requests we needed — the completed ones so we can build the worker profiles and the open ones so we can recommend them — we continued with making the worker recommendation profiles. We did this with a Python script [attachment:embed_worker_profiles.py] and we stored them in the table:

{{{
#!sql
CREATE TABLE worker_recommendation_profiles (
    worker_id INT PRIMARY KEY
        REFERENCES Worker(id)
        ON DELETE CASCADE,
    preference_embedding vector(384),
    updated_at TIMESTAMP NOT NULL
        DEFAULT CURRENT_TIMESTAMP
);
}}}

With all this done, we are ready to compare the worker recommendation profile embeddings with the open task request ones and build the worker feed. We did this with the function `recommend_task_requests_for_worker`:

{{{
#!sql
CREATE OR REPLACE FUNCTION
recommend_task_requests_for_worker
(
    p_worker_id BIGINT,
    p_limit INT DEFAULT 50
)
RETURNS TABLE
(
    task_request_id BIGINT,
    distance DOUBLE PRECISION,
    recommendation_type TEXT
)
LANGUAGE plpgsql
AS $$
BEGIN
    IF EXISTS
    (
        SELECT 1
        FROM worker_recommendation_profiles wrp
        WHERE wrp.worker_id = p_worker_id
          AND wrp.preference_embedding IS NOT NULL
    )
    THEN
        RETURN QUERY
        SELECT
            tr.id::BIGINT,
            tre.embedding
            <=>
            wrp.preference_embedding
            AS distance,
            'PERSONALIZED'::TEXT
        FROM TaskRequest tr
        JOIN task_request_embeddings tre
            ON tre.task_request_id = tr.id
        JOIN worker_recommendation_profiles wrp
            ON wrp.worker_id = p_worker_id
        WHERE
            tre.embedding IS NOT NULL
            AND tr.id NOT IN
            (
                SELECT
                    o.task_request_id
                FROM Offer o
                WHERE
                    o.worker_id = p_worker_id
            )
         ORDER BY
             distance ASC
        LIMIT
            p_limit;
    ELSE
        RETURN QUERY
        SELECT
            tr.id::BIGINT,
            tre.embedding
            <=>
            wce.embedding
            AS distance,
            'CATEGORY_BASED'::TEXT
        FROM TaskRequest tr
        JOIN taskrequestvectorv3 tre
            ON tre.task_request_id = tr.id
        JOIN WorkerCategory wc
            ON wc.worker_id = p_worker_id
        JOIN workercategoryvectorv3 wce
            ON wce.worker_category_id = wc.id
        WHERE
            tre.embedding IS NOT NULL
            AND tr.id NOT IN
            (
                SELECT
                    o.task_request_id
                FROM Offer o
                WHERE
                    o.worker_id = p_worker_id
            )
        ORDER BY
            distance ASC
        LIMIT
            p_limit;
    END IF;
END;
$$;
}}}

For faster search we used the indexes:

{{{
#!sql
CREATE INDEX IF NOT EXISTS
idx_task_request_embeddings_hnsw
ON task_request_embeddings
USING hnsw
( embedding vector_cosine_ops );

CREATE INDEX IF NOT EXISTS
idx_worker_category_embeddings_hnsw
ON workercategoryvectorv3
USING hnsw
( embedding vector_cosine_ops );
}}}

We can now use:

{{{
#!sql
SELECT *
FROM recommend_task_requests_for_worker(19, 100);
}}}

to get the recommended 100 task requests for the worker with `worker_id = 19`.

We can see that in the `ELSE` block in `recommend_task_requests_for_worker` we have a different comparison than the one we explained. That is the one we use as a fallback, which we will discuss in section 2.

=== Keeping Worker Profiles Up to Date ===

Now we want to build the worker recommendation profiles as time passes and as workers complete tasks. As we said we started with 3, but that will change as the worker completes more tasks. For this purpose we will recalculate the worker recommendation profile every time the worker completes a task. For this purpose we define:

{{{
#!sql
CREATE OR REPLACE FUNCTION
recalculate_worker_profile(
    p_worker_id BIGINT
)
RETURNS void
LANGUAGE plpgsql
AS
$$
BEGIN
    INSERT INTO
    worker_recommendation_profiles
    (
        worker_id,
        preference_embedding,
        updated_at
    )
    SELECT
        o.worker_id,
        AVG(tre.embedding),
        CURRENT_TIMESTAMP
    FROM Task t
    JOIN Offer o
        ON o.id = t.offer_id
    JOIN task_request_embeddings tre
        ON tre.task_request_id =
           o.task_request_id
    WHERE
        o.worker_id = p_worker_id
        AND t.status = 'COMPLETED'
    GROUP BY
        o.worker_id
    ON CONFLICT (worker_id)
    DO UPDATE SET
        preference_embedding =
        EXCLUDED.preference_embedding,
        updated_at =
        CURRENT_TIMESTAMP;
END;
$$;

CREATE OR REPLACE FUNCTION
trg_recalculate_worker_profile()
RETURNS trigger
LANGUAGE plpgsql
AS
$$
DECLARE
    v_worker_id BIGINT;
BEGIN
    IF NEW.status = 'COMPLETED'
       AND (TG_OP = 'INSERT' OR OLD.status IS DISTINCT FROM NEW.status)
    THEN
        SELECT o.worker_id
        INTO v_worker_id
        FROM Offer o
        WHERE o.id = NEW.offer_id;
        PERFORM recalculate_worker_profile(v_worker_id);
    END IF;
    RETURN NEW;
END;
$$;

CREATE TRIGGER after_task_refresh_worker_profile
AFTER INSERT OR UPDATE OF status
ON Task
FOR EACH ROW
EXECUTE FUNCTION trg_recalculate_worker_profile();
}}}

With this we keep the worker's profile up to date with the worker's completed tasks.

For this approach to work, every new task request submitted by a client must be embedded before it can participate in the recommendation process.

== 2. Category Based Recommendation ==

If the worker has no completed tasks we will use the experience description that the worker provided when signing up for a category.

For this purpose we embed the `worker_category` experience description and we get one embedding per category per worker. We do this in the script [attachment:embed_worker_category_description.py]. We also embed the task requests with their description with the script [attachment:embed_task_requests_description.py] and do the matching this way. We do this in the `ELSE` block of the `recommend_task_requests_for_worker` function.

Similarly, new worker category descriptions must also be embedded before they can be used for category-based recommendations.

For saving the results from the scripts and for faster search we use these tables and indexes:

{{{
#!sql
CREATE TABLE worker_category_description_vector (
    id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    worker_category_id INT NOT NULL UNIQUE,
    embedding vector(384) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    FOREIGN KEY (worker_category_id)
    REFERENCES WorkerCategory(id)
    ON DELETE CASCADE
);

CREATE TABLE task_request_description_vector (
    id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    task_request_id INT NOT NULL UNIQUE,
    embedding vector(384) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    FOREIGN KEY (task_request_id)
    REFERENCES TaskRequest(id)
    ON DELETE CASCADE
);

CREATE INDEX idx_worker_category_embeddings_hnsw
ON WorkerCategoryVectorv3
USING hnsw (embedding vector_cosine_ops);

CREATE INDEX idx_task_request_desc_embeddings_hnsw
ON TaskRequestVectorv3
USING hnsw (embedding vector_cosine_ops);
}}}

= Worker Recommendation for Task Requests =

Besides recommending task requests to workers, the system also supports recommending suitable workers for a specific task request.

The function compares the embedding of the selected task request with the stored worker profile embeddings and ranks the workers according to their vector similarity. Workers who have already submitted an offer for the given task request are excluded from the results.

{{{#!sql
CREATE OR REPLACE FUNCTION
recommend_workers_for_task_request
(
    p_task_request_id BIGINT,
    p_limit INT DEFAULT 20
)
RETURNS TABLE
(
    worker_id BIGINT,
    distance DOUBLE PRECISION,
    recommendation_type TEXT
)
LANGUAGE plpgsql
AS
$$
BEGIN
    RETURN QUERY
    SELECT
        wrp.worker_id::BIGINT,
        tre.embedding
        <=>
        wrp.preference_embedding
        AS distance,
        'PERSONALIZED'::TEXT
    FROM task_request_embeddings tre
    JOIN worker_recommendation_profiles wrp
        ON wrp.preference_embedding IS NOT NULL
    WHERE
        tre.task_request_id = p_task_request_id
        AND wrp.worker_id NOT IN
        (
            SELECT
                o.worker_id
            FROM Offer o
            WHERE
                o.task_request_id = p_task_request_id
        )
    ORDER BY
        distance ASC
    LIMIT
        p_limit;
END;
$$;
}}}

{{{#!sql
SELECT *
FROM recommend_workers_for_task_request(150, 20);
}}}

The query returns the 20 workers whose previous completed tasks are the most semantically similar to the selected task request.