Changes between Version 1 and Version 2 of OtherTopics


Ignore:
Timestamp:
06/01/26 21:42:54 (3 weeks ago)
Author:
181201
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • OtherTopics

    v1 v2  
    33== Performance ==
    44
    5 === Adding 1000000 mock rows to 'bookings' table ===
     5=== 1. Adding 1000000 mock rows to 'bookings' table ===
    66
    77{{{
     
    101101Because the execution time has been massively lowered by bypassing the expensive sequential scan and memory sort, we '''keep''' this index.
    102102
    103 == Calculate average sitter rating ==
     103== 2. Calculate average sitter rating ==
    104104
    105105Benchmark query:
     
    193193
    194194There is practically no change in the execution time, and the query planner completely ignored the index because it determined that performing a Parallel Seq Scan and Parallel Hash Join was cheaper than doing individual index lookups on such a large dataset. Because the index is not utilized for this query and only adds complexity during `INSERT` and `UPDATE` operations, we will '''not''' keep this index.
     195
     196== 3. Pet Sitter Leaderboard, Analytics ==
     197
     198Benchmark query:
     199{{{
     200#!sql
     201EXPLAIN (ANALYZE, BUFFERS)
     202WITH params AS (
     203    SELECT
     204        (CURRENT_DATE - INTERVAL '1 year') AS start_date,
     205        CURRENT_DATE AS end_date
     206),
     207sitter_bookings AS (
     208    SELECT
     209        b.sitter_id,
     210        COUNT(*) AS total_bookings,
     211        COUNT(*) FILTER (WHERE b.status = 'Completed') AS completed_bookings,
     212        COUNT(*) FILTER (WHERE b.status = 'Canceled') AS canceled_bookings
     213    FROM project.bookings b
     214    JOIN params p ON b.date_from >= p.start_date AND b.date_from <= p.end_date
     215    GROUP BY b.sitter_id
     216),
     217sitter_reviews AS (
     218    SELECT
     219        b.sitter_id,
     220        AVG(r.rating)::numeric(10,2) AS avg_rating,
     221        COUNT(r.review_id) AS total_reviews
     222    FROM project.reviews r
     223    JOIN project.bookings b ON r.booking_id = b.booking_id
     224    GROUP BY b.sitter_id
     225)
     226SELECT
     227    u.username,
     228    u.first_name,
     229    u.last_name,
     230    COALESCE(sb.completed_bookings, 0) AS completed,
     231    COALESCE(sb.canceled_bookings, 0) AS canceled,
     232    COALESCE(sr.avg_rating, 0) AS rating,
     233    (
     234        COALESCE(sb.completed_bookings, 0) * 5
     235        + COALESCE(sr.avg_rating, 0) * 10
     236        - COALESCE(sb.canceled_bookings, 0) * 3
     237    ) AS activity_score,
     238    DENSE_RANK() OVER (
     239        ORDER BY (
     240            COALESCE(sb.completed_bookings, 0) * 5
     241            + COALESCE(sr.avg_rating, 0) * 10
     242            - COALESCE(sb.canceled_bookings, 0) * 3
     243        ) DESC
     244    ) AS leaderboard_rank
     245FROM project.pet_sitters ps
     246JOIN project.users u ON ps.user_id = u.user_id
     247LEFT JOIN sitter_bookings sb ON sb.sitter_id = ps.user_id
     248LEFT JOIN sitter_reviews sr ON sr.sitter_id = ps.user_id
     249ORDER BY leaderboard_rank
     250LIMIT 10;
     251}}}
     252
     253Execution without indexes:
     254
     255{{{
     256Limit  (cost=67216.49..67216.51 rows=10 width=115) (actual time=403.788..412.560 rows=4 loops=1)
     257  Buffers: shared hit=22278 read=25633
     258  ->  Sort  (cost=67216.49..67218.21 rows=690 width=115) (actual time=403.786..412.557 rows=4 loops=1)
     259        Sort Key: (dense_rank() OVER (?))
     260        Sort Method: quicksort  Memory: 25kB
     261        Buffers: shared hit=22278 read=25633
     262        ->  WindowAgg  (cost=67177.44..67201.58 rows=690 width=115) (actual time=403.716..412.514 rows=4 loops=1)
     263              Buffers: shared hit=22275 read=25633
     264              ->  Sort  (cost=67177.43..67179.15 rows=690 width=91) (actual time=403.675..412.445 rows=4 loops=1)
     265                    Sort Key: (((((COALESCE((count(*) FILTER (WHERE ((b.status)::text = 'Completed'::text))), '0'::bigint) * 5))::numeric + (COALESCE(((avg(r.rating))::numeric(10,2)), '0'::numeric) * '10'::numeric)) - ((COALESCE((count(*) FILTER (WHERE ((b.status)::text = 'Canceled'::text))), '0'::bigint) * 3))::numeric)) DESC
     266                    Sort Method: quicksort  Memory: 25kB
     267                    Buffers: shared hit=22275 read=25633
     268                    ->  Hash Join  (cost=67125.00..67144.89 rows=690 width=91) (actual time=403.585..412.387 rows=4 loops=1)
     269                          Hash Cond: ((ps.user_id)::text = (u.user_id)::text)
     270                          Buffers: shared hit=22272 read=25633
     271                          ->  Merge Left Join  (cost=67101.95..67107.94 rows=690 width=122) (actual time=403.503..412.297 rows=4 loops=1)
     272                                Merge Cond: ((ps.user_id)::text = (b_1.sitter_id)::text)
     273                                Buffers: shared hit=22272 read=25632
     274                                ->  Merge Left Join  (cost=34564.07..34567.93 rows=690 width=106) (actual time=212.894..213.031 rows=4 loops=1)
     275                                      Merge Cond: ((ps.user_id)::text = (b.sitter_id)::text)
     276                                      Buffers: shared hit=24 read=22225
     277                                      ->  Sort  (cost=49.44..51.16 rows=690 width=90) (actual time=0.049..0.051 rows=4 loops=1)
     278                                            Sort Key: ps.user_id
     279                                            Sort Method: quicksort  Memory: 25kB
     280                                            Buffers: shared read=1
     281                                            ->  Seq Scan on pet_sitters ps  (cost=0.00..16.90 rows=690 width=90) (actual time=0.021..0.022 rows=4 loops=1)
     282                                                  Buffers: shared read=1
     283                                      ->  Finalize GroupAggregate  (cost=34514.63..34515.02 rows=1 width=61) (actual time=212.840..212.970 rows=4 loops=1)
     284                                            Group Key: b.sitter_id
     285                                            Buffers: shared hit=24 read=22224
     286                                            ->  Gather Merge  (cost=34514.63..34514.99 rows=3 width=53) (actual time=212.822..212.945 rows=7 loops=1)
     287                                                  Workers Planned: 3
     288                                                  Workers Launched: 3
     289                                                  Buffers: shared hit=24 read=22224
     290                                                  ->  Sort  (cost=33514.59..33514.60 rows=1 width=53) (actual time=206.606..206.608 rows=2 loops=4)
     291                                                        Sort Key: b.sitter_id
     292                                                        Sort Method: quicksort  Memory: 25kB
     293                                                        Buffers: shared hit=24 read=22224
     294                                                        Worker 0:  Sort Method: quicksort  Memory: 25kB
     295                                                        Worker 1:  Sort Method: quicksort  Memory: 25kB
     296                                                        Worker 2:  Sort Method: quicksort  Memory: 25kB
     297                                                        ->  Partial HashAggregate  (cost=33514.57..33514.58 rows=1 width=53) (actual time=206.527..206.529 rows=2 loops=4)
     298                                                              Group Key: b.sitter_id
     299                                                              Batches: 1  Memory Usage: 24kB
     300                                                              Buffers: shared read=22224
     301                                                              Worker 0:  Batches: 1  Memory Usage: 24kB
     302                                                              Worker 1:  Batches: 1  Memory Usage: 24kB
     303                                                              Worker 2:  Batches: 1  Memory Usage: 24kB
     304                                                              ->  Parallel Seq Scan on bookings b  (cost=0.00..29482.22 rows=322588 width=45) (actual time=0.070..120.157 rows=250004 loops=4)
     305                                                                    Filter: ((date_from <= CURRENT_DATE) AND (date_from >= (CURRENT_DATE - '1 year'::interval)))
     306                                                                    Rows Removed by Filter: 1
     307                                                                    Buffers: shared read=22224
     308                                ->  Finalize GroupAggregate  (cost=32537.88..32538.27 rows=1 width=61) (actual time=190.603..199.257 rows=4 loops=1)
     309                                      Group Key: b_1.sitter_id
     310                                      Buffers: shared hit=22248 read=3407
     311                                      ->  Gather Merge  (cost=32537.88..32538.24 rows=3 width=69) (actual time=190.572..199.221 rows=7 loops=1)
     312                                            Workers Planned: 3
     313                                            Workers Launched: 3
     314                                            Buffers: shared hit=22248 read=3407
     315                                            ->  Sort  (cost=31537.84..31537.84 rows=1 width=69) (actual time=184.406..184.410 rows=2 loops=4)
     316                                                  Sort Key: b_1.sitter_id
     317                                                  Sort Method: quicksort  Memory: 25kB
     318                                                  Buffers: shared hit=22248 read=3407
     319                                                  Worker 0:  Sort Method: quicksort  Memory: 25kB
     320                                                  Worker 1:  Sort Method: quicksort  Memory: 25kB
     321                                                  Worker 2:  Sort Method: quicksort  Memory: 25kB
     322                                                  ->  Partial HashAggregate  (cost=31537.82..31537.83 rows=1 width=69) (actual time=184.336..184.340 rows=2 loops=4)
     323                                                        Group Key: b_1.sitter_id
     324                                                        Batches: 1  Memory Usage: 24kB
     325                                                        Buffers: shared hit=22224 read=3407
     326                                                        Worker 0:  Batches: 1  Memory Usage: 24kB
     327                                                        Worker 1:  Batches: 1  Memory Usage: 24kB
     328                                                        Worker 2:  Batches: 1  Memory Usage: 24kB
     329                                                        ->  Parallel Hash Join  (cost=4971.92..31268.59 rows=53846 width=41) (actual time=43.109..172.841 rows=41731 loops=4)
     330                                                              Hash Cond: ((b_1.booking_id)::text = (r.booking_id)::text)
     331                                                              Buffers: shared hit=22224 read=3407
     332                                                              ->  Parallel Seq Scan on bookings b_1  (cost=0.00..25449.88 rows=322588 width=74) (actual time=0.019..27.882 rows=250006 loops=4)
     333                                                                    Buffers: shared hit=22224
     334                                                              ->  Parallel Hash  (cost=4102.52..4102.52 rows=69552 width=41) (actual time=42.288..42.289 rows=41731 loops=4)
     335                                                                    Buckets: 262144  Batches: 1  Memory Usage: 15168kB
     336                                                                    Buffers: shared read=3407
     337                                                                    ->  Parallel Seq Scan on reviews r  (cost=0.00..4102.52 rows=69552 width=41) (actual time=0.063..16.909 rows=41731 loops=4)
     338                                                                          Buffers: shared read=3407
     339                          ->  Hash  (cost=15.80..15.80 rows=580 width=64) (actual time=0.049..0.050 rows=12 loops=1)
     340                                Buckets: 1024  Batches: 1  Memory Usage: 10kB
     341                                Buffers: shared read=1
     342                                ->  Seq Scan on users u  (cost=0.00..15.80 rows=580 width=64) (actual time=0.029..0.032 rows=12 loops=1)
     343                                      Buffers: shared read=1
     344Planning:
     345  Buffers: shared hit=402 read=29
     346Planning Time: 3.607 ms
     347Execution Time: 413.138 ms
     348}}}
     349
     350'''Average execution time (10 attempts):''' 410.225ms
     351
     352This execution time is perfectly acceptable for a heavy analytical reporting query. Because this query does not block user requests and is usually run as a background task or in an admin dashboard, we do not need to do more optimization with highly specific indexes that can slow down standard `INSERT` and `UPDATE` operations.
     353
     354== 4. Search for available sitters with highest rankings ==
     355
     356Benchmark query:
     357{{{
     358#!sql
     359EXPLAIN (ANALYZE, BUFFERS)
     360WITH requested_dates AS (
     361    SELECT
     362        (CURRENT_DATE + INTERVAL '10 days')::date AS req_start,
     363        (CURRENT_DATE + INTERVAL '15 days')::date AS req_end
     364),
     365unavailable_sitters AS (
     366    SELECT DISTINCT b.sitter_id
     367    FROM project.bookings b
     368    JOIN requested_dates rd ON true
     369    WHERE b.status IN ('Confirmed', 'Pending')
     370      AND b.date_from <= rd.req_end
     371      AND b.date_to >= rd.req_start
     372),
     373highly_rated_sitters AS (
     374    SELECT b.sitter_id
     375    FROM project.reviews r
     376    JOIN project.bookings b ON r.booking_id = b.booking_id
     377    GROUP BY b.sitter_id
     378    HAVING AVG(r.rating) >= 4.0
     379)
     380SELECT
     381    u.username,
     382    u.first_name,
     383    u.last_name,
     384    u.email
     385FROM project.pet_sitters ps
     386JOIN project.users u ON ps.user_id = u.user_id
     387JOIN highly_rated_sitters hrs ON hrs.sitter_id = ps.user_id
     388LEFT JOIN unavailable_sitters us ON us.sitter_id = ps.user_id
     389WHERE us.sitter_id IS NULL
     390LIMIT 20;
     391}}}
     392
     393Execution without indexes:
     394
     395{{{
     396Limit  (cost=66246.09..66295.07 rows=1 width=51) (actual time=279.739..287.004 rows=3 loops=1)
     397  Buffers: shared hit=47892 read=2
     398  ->  Nested Loop Anti Join  (cost=66246.09..66295.07 rows=1 width=51) (actual time=279.736..287.000 rows=3 loops=1)
     399        Join Filter: ((b_1.sitter_id)::text = (ps.user_id)::text)
     400        Buffers: shared hit=47892 read=2
     401        ->  Nested Loop  (cost=32537.88..32586.84 rows=1 width=141) (actual time=184.771..188.940 rows=3 loops=1)
     402              Join Filter: ((ps.user_id)::text = (u.user_id)::text)
     403              Rows Removed by Join Filter: 20
     404              Buffers: shared hit=25659 read=2
     405              ->  Nested Loop  (cost=32537.88..32563.79 rows=1 width=127) (actual time=184.747..188.897 rows=3 loops=1)
     406                    Join Filter: ((ps.user_id)::text = (b.sitter_id)::text)
     407                    Rows Removed by Join Filter: 3
     408                    Buffers: shared hit=25657 read=1
     409                    ->  Finalize GroupAggregate  (cost=32537.88..32538.27 rows=1 width=37) (actual time=184.705..188.834 rows=3 loops=1)
     410                          Group Key: b.sitter_id
     411                          Filter: (avg(r.rating) >= 4.0)
     412                          Rows Removed by Filter: 1
     413                          Buffers: shared hit=25655
     414                          ->  Gather Merge  (cost=32537.88..32538.24 rows=3 width=69) (actual time=184.688..188.798 rows=7 loops=1)
     415                                Workers Planned: 3
     416                                Workers Launched: 3
     417                                Buffers: shared hit=25655
     418                                ->  Sort  (cost=31537.84..31537.84 rows=1 width=69) (actual time=178.566..178.571 rows=2 loops=4)
     419                                      Sort Key: b.sitter_id
     420                                      Sort Method: quicksort  Memory: 25kB
     421                                      Buffers: shared hit=25655
     422                                      Worker 0:  Sort Method: quicksort  Memory: 25kB
     423                                      Worker 1:  Sort Method: quicksort  Memory: 25kB
     424                                      Worker 2:  Sort Method: quicksort  Memory: 25kB
     425                                      ->  Partial HashAggregate  (cost=31537.82..31537.83 rows=1 width=69) (actual time=178.495..178.500 rows=2 loops=4)
     426                                            Group Key: b.sitter_id
     427                                            Batches: 1  Memory Usage: 24kB
     428                                            Buffers: shared hit=25631
     429                                            Worker 0:  Batches: 1  Memory Usage: 24kB
     430                                            Worker 1:  Batches: 1  Memory Usage: 24kB
     431                                            Worker 2:  Batches: 1  Memory Usage: 24kB
     432                                            ->  Parallel Hash Join  (cost=4971.92..31268.59 rows=53846 width=41) (actual time=36.863..167.092 rows=41731 loops=4)
     433                                                  Hash Cond: ((b.booking_id)::text = (r.booking_id)::text)
     434                                                  Buffers: shared hit=25631
     435                                                  ->  Parallel Seq Scan on bookings b  (cost=0.00..25449.88 rows=322588 width=74) (actual time=0.019..27.928 rows=250006 loops=4)
     436                                                        Buffers: shared hit=22224
     437                                                  ->  Parallel Hash  (cost=4102.52..4102.52 rows=69552 width=41) (actual time=36.033..36.034 rows=41731 loops=4)
     438                                                        Buckets: 262144  Batches: 1  Memory Usage: 15200kB
     439                                                        Buffers: shared hit=3407
     440                                                        ->  Parallel Seq Scan on reviews r  (cost=0.00..4102.52 rows=69552 width=41) (actual time=0.021..11.538 rows=41731 loops=4)
     441                                                              Buffers: shared hit=3407
     442                    ->  Seq Scan on pet_sitters ps  (cost=0.00..16.90 rows=690 width=90) (actual time=0.014..0.014 rows=2 loops=3)
     443                          Buffers: shared hit=2 read=1
     444              ->  Seq Scan on users u  (cost=0.00..15.80 rows=580 width=88) (actual time=0.009..0.009 rows=8 loops=3)
     445                    Buffers: shared hit=2 read=1
     446        ->  Unique  (cost=33708.21..33708.22 rows=1 width=37) (actual time=31.654..32.684 rows=0 loops=3)
     447              Buffers: shared hit=22233
     448              ->  Sort  (cost=33708.21..33708.22 rows=1 width=37) (actual time=31.654..32.684 rows=0 loops=3)
     449                    Sort Key: b_1.sitter_id
     450                    Sort Method: quicksort  Memory: 25kB
     451                    Buffers: shared hit=22233
     452                    ->  Gather  (cost=1000.00..33708.20 rows=1 width=37) (actual time=94.936..98.026 rows=0 loops=1)
     453                          Workers Planned: 3
     454                          Workers Launched: 3
     455                          Buffers: shared hit=22233
     456                          ->  Parallel Seq Scan on bookings b_1  (cost=0.00..32708.10 rows=1 width=37) (actual time=89.286..89.286 rows=0 loops=4)
     457                                Filter: (((status)::text = ANY ('{Confirmed,Pending}'::text[])) AND (date_from <= ((CURRENT_DATE + '15 days'::interval))::date) AND (date_to >= ((CURRENT_DATE + '10 days'::interval))::date))
     458                                Rows Removed by Filter: 250006
     459                                Buffers: shared hit=22233
     460Planning:
     461  Buffers: shared hit=11 read=8
     462Planning Time: 1.269 ms
     463Execution Time: 287.147 ms
     464}}}
     465
     466'''Average execution time (10 attempts):''' 283.421ms
     467
     468We add this index to optimize the date overlapping:
     469{{{
     470#!sql
     471CREATE INDEX idx_bookings_sitter_status_dates
     472ON project.bookings (sitter_id, status, date_from, date_to);
     473}}}
     474
     475Execution with indexes:
     476
     477{{{
     478Limit  (cost=32538.32..48741.33 rows=1 width=51) (actual time=192.798..202.516 rows=3 loops=1)
     479  Buffers: shared hit=27463 read=901
     480  ->  Nested Loop Anti Join  (cost=32538.32..48741.33 rows=1 width=51) (actual time=192.796..202.512 rows=3 loops=1)
     481        Join Filter: ((b_1.sitter_id)::text = (ps.user_id)::text)
     482        Buffers: shared hit=27463 read=901
     483        ->  Nested Loop  (cost=32537.88..32586.84 rows=1 width=141) (actual time=180.755..185.162 rows=3 loops=1)
     484              Join Filter: ((ps.user_id)::text = (u.user_id)::text)
     485              Rows Removed by Join Filter: 20
     486              Buffers: shared hit=25661
     487              ->  Nested Loop  (cost=32537.88..32563.79 rows=1 width=127) (actual time=180.740..185.126 rows=3 loops=1)
     488                    Join Filter: ((ps.user_id)::text = (b.sitter_id)::text)
     489                    Rows Removed by Join Filter: 3
     490                    Buffers: shared hit=25658
     491                    ->  Finalize GroupAggregate  (cost=32537.88..32538.27 rows=1 width=37) (actual time=180.718..185.081 rows=3 loops=1)
     492                          Group Key: b.sitter_id
     493                          Filter: (avg(r.rating) >= 4.0)
     494                          Rows Removed by Filter: 1
     495                          Buffers: shared hit=25655
     496                          ->  Gather Merge  (cost=32537.88..32538.24 rows=3 width=69) (actual time=180.700..185.043 rows=7 loops=1)
     497                                Workers Planned: 3
     498                                Workers Launched: 3
     499                                Buffers: shared hit=25655
     500                                ->  Sort  (cost=31537.84..31537.84 rows=1 width=69) (actual time=174.919..174.925 rows=2 loops=4)
     501                                      Sort Key: b.sitter_id
     502                                      Sort Method: quicksort  Memory: 25kB
     503                                      Buffers: shared hit=25655
     504                                      Worker 0:  Sort Method: quicksort  Memory: 25kB
     505                                      Worker 1:  Sort Method: quicksort  Memory: 25kB
     506                                      Worker 2:  Sort Method: quicksort  Memory: 25kB
     507                                      ->  Partial HashAggregate  (cost=31537.82..31537.83 rows=1 width=69) (actual time=174.853..174.857 rows=2 loops=4)
     508                                            Group Key: b.sitter_id
     509                                            Batches: 1  Memory Usage: 24kB
     510                                            Buffers: shared hit=25631
     511                                            Worker 0:  Batches: 1  Memory Usage: 24kB
     512                                            Worker 1:  Batches: 1  Memory Usage: 24kB
     513                                            Worker 2:  Batches: 1  Memory Usage: 24kB
     514                                            ->  Parallel Hash Join  (cost=4971.92..31268.59 rows=53846 width=41) (actual time=36.095..163.581 rows=41731 loops=4)
     515                                                  Hash Cond: ((b.booking_id)::text = (r.booking_id)::text)
     516                                                  Buffers: shared hit=25631
     517                                                  ->  Parallel Seq Scan on bookings b  (cost=0.00..25449.88 rows=322588 width=74) (actual time=0.019..27.140 rows=250006 loops=4)
     518                                                        Buffers: shared hit=22224
     519                                                  ->  Parallel Hash  (cost=4102.52..4102.52 rows=69552 width=41) (actual time=35.280..35.281 rows=41731 loops=4)
     520                                                        Buckets: 262144  Batches: 1  Memory Usage: 15136kB
     521                                                        Buffers: shared hit=3407
     522                                                        ->  Parallel Seq Scan on reviews r  (cost=0.00..4102.52 rows=69552 width=41) (actual time=0.022..11.047 rows=41731 loops=4)
     523                                                              Buffers: shared hit=3407
     524                    ->  Seq Scan on pet_sitters ps  (cost=0.00..16.90 rows=690 width=90) (actual time=0.008..0.008 rows=2 loops=3)
     525                          Buffers: shared hit=3
     526              ->  Seq Scan on users u  (cost=0.00..15.80 rows=580 width=88) (actual time=0.006..0.006 rows=8 loops=3)
     527                    Buffers: shared hit=3
     528        ->  Unique  (cost=0.44..16154.48 rows=1 width=37) (actual time=5.776..5.776 rows=0 loops=3)
     529              Buffers: shared hit=1802 read=901
     530              ->  Index Only Scan using idx_bookings_sitter_status_dates on bookings b_1  (cost=0.44..16154.48 rows=1 width=37) (actual time=5.775..5.775 rows=0 loops=3)
     531                    Index Cond: ((status = ANY ('{Confirmed,Pending}'::text[])) AND (date_from <= ((CURRENT_DATE + '15 days'::interval))::date) AND (date_to >= ((CURRENT_DATE + '10 days'::interval))::date))
     532                    Heap Fetches: 0
     533                    Buffers: shared hit=1802 read=901
     534Planning:
     535  Buffers: shared hit=49 read=1
     536Planning Time: 1.463 ms
     537Execution Time: 202.649 ms
     538}}}
     539
     540'''Average execution time (10 attempts):''' 199.530ms
     541
     542With this index, we eliminated the expensive sequential scan in exchange for a highly efficient index scan. The query is still bottlenecked by `AVG(rating)`, this index provides a crucial performance improvement to the search feature, so we will '''keep''' it.
     543