wiki:OtherTopics

Version 1 (modified by 181201, 3 weeks ago) ( diff )

--

Other topics (Performance, Security, …)

Performance

Adding 1000000 mock rows to 'bookings' table

ALTER TABLE project.bookings DISABLE TRIGGER trg_booking_validation;

INSERT INTO project.bookings (status, date_from, date_to, address, owner_id, sitter_id)
SELECT
    (ARRAY['Pending', 'Confirmed', 'Completed', 'Canceled', 'Expired', 'Reviewed'])[floor(random()*6)+1] AS status,
    t.dt AS date_from,
    t.dt + (random() * INTERVAL '7 days') AS date_to,
    'Mock Address ' || gs AS address,
    o.user_id AS owner_id,
    s.user_id AS sitter_id
FROM generate_series(1, 1000000) gs
JOIN LATERAL (
    SELECT user_id FROM project.pet_owners ORDER BY random() LIMIT 1
) o ON true
JOIN LATERAL (
    SELECT user_id FROM project.pet_sitters ORDER BY random() LIMIT 1
) s ON true
JOIN LATERAL (
    SELECT CURRENT_DATE - (random() * INTERVAL '365 days') AS dt
) t ON true;

ALTER TABLE project.bookings ENABLE TRIGGER trg_booking_validation;

Benchmark query:

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM project.bookings 
WHERE sitter_id = (SELECT user_id FROM project.pet_sitters LIMIT 1)
ORDER BY date_from DESC;

Execution without indexes:

Gather Merge  (cost=81030.64..195549.38 rows=967764 width=146) (actual time=94.209..106.394 rows=2 loops=1)
  Workers Planned: 3
  Workers Launched: 3
  Buffers: shared hit=22336
  InitPlan 1
    ->  Limit  (cost=0.00..0.02 rows=1 width=90) (actual time=3.367..3.369 rows=1 loops=1)
          Buffers: shared hit=1
          ->  Seq Scan on pet_sitters  (cost=0.00..16.90 rows=690 width=90) (actual time=0.015..0.016 rows=1 loops=1)
                Buffers: shared hit=1
  ->  Sort  (cost=80030.57..80837.04 rows=322588 width=146) (actual time=60.212..60.213 rows=0 loops=4)
        Sort Key: bookings.date_from DESC
        Sort Method: quicksort  Memory: 25kB
        Buffers: shared hit=22335
        Worker 0:  Sort Method: quicksort  Memory: 25kB
        Worker 1:  Sort Method: quicksort  Memory: 25kB
        Worker 2:  Sort Method: quicksort  Memory: 25kB
        ->  Parallel Seq Scan on bookings  (cost=0.00..26256.35 rows=322588 width=146) (actual time=37.611..60.082 rows=0 loops=4)
              Filter: ((sitter_id)::text = ((InitPlan 1).col1)::text)
              Rows Removed by Filter: 250005
              Buffers: shared hit=22224
Planning Time: 0.168 ms
JIT:
  Functions: 12
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 1.466 ms (Deform 0.675 ms), Inlining 0.000 ms, Optimization 1.430 ms, Emission 19.219 ms, Total 22.115 ms
Execution Time: 106.834 ms

Average execution time (10 attempts): 105.235ms

Next, we add this index:

CREATE INDEX idx_bookings_sitter_date 
ON project.bookings (sitter_id, date_from DESC);

Execution with indexes:

Index Scan using idx_bookings_sitter_date on bookings  (cost=0.45..72503.52 rows=1000022 width=146) (actual time=0.066..0.068 rows=2 loops=1)
  Index Cond: ((sitter_id)::text = ((InitPlan 1).col1)::text)
  Buffers: shared hit=2 read=3
  InitPlan 1
    ->  Limit  (cost=0.00..0.02 rows=1 width=90) (actual time=0.012..0.013 rows=1 loops=1)
          Buffers: shared hit=1
          ->  Seq Scan on pet_sitters  (cost=0.00..16.90 rows=690 width=90) (actual time=0.011..0.011 rows=1 loops=1)
                Buffers: shared hit=1
Planning:
  Buffers: shared hit=39 read=2
Planning Time: 0.618 ms
Execution Time: 0.092 ms

Average execution time (10 attempts): 0.087ms

Because the execution time has been massively lowered by bypassing the expensive sequential scan and memory sort, we keep this index.

Calculate average sitter rating

Benchmark query:

EXPLAIN (ANALYZE, BUFFERS)
SELECT AVG(r.rating) 
FROM project.reviews r
JOIN project.bookings b ON r.booking_id = b.booking_id
WHERE b.sitter_id = (SELECT user_id FROM project.pet_sitters LIMIT 1);

Execution without indexes:

Finalize Aggregate  (cost=33210.02..33210.03 rows=1 width=32) (actual time=121.837..135.038 rows=1 loops=1)
  Buffers: shared hit=25818
  InitPlan 1
    ->  Limit  (cost=0.00..0.02 rows=1 width=90) (actual time=0.014..0.015 rows=1 loops=1)
          Buffers: shared hit=1
          ->  Seq Scan on pet_sitters  (cost=0.00..16.90 rows=690 width=90) (actual time=0.012..0.013 rows=1 loops=1)
                Buffers: shared hit=1
  ->  Gather  (cost=33209.67..33209.98 rows=3 width=32) (actual time=119.802..135.016 rows=4 loops=1)
        Workers Planned: 3
        Workers Launched: 3
        Buffers: shared hit=25818
        ->  Partial Aggregate  (cost=32209.67..32209.68 rows=1 width=32) (actual time=90.870..90.873 rows=1 loops=4)
              Buffers: shared hit=25817
              ->  Parallel Hash Join  (cost=4971.92..32075.06 rows=53846 width=4) (actual time=77.089..90.867 rows=0 loops=4)
                    Hash Cond: ((b.booking_id)::text = (r.booking_id)::text)
                    Buffers: shared hit=25817
                    ->  Parallel Seq Scan on bookings b  (cost=0.00..26256.35 rows=322588 width=37) (actual time=41.349..55.119 rows=0 loops=4)
                          Filter: ((sitter_id)::text = ((InitPlan 1).col1)::text)
                          Rows Removed by Filter: 250005
                          Buffers: shared hit=22224
                    ->  Parallel Hash  (cost=4102.52..4102.52 rows=69552 width=41) (actual time=34.636..34.637 rows=41731 loops=4)
                          Buckets: 262144  Batches: 1  Memory Usage: 15200kB
                          Buffers: shared hit=3407
                          ->  Parallel Seq Scan on reviews r  (cost=0.00..4102.52 rows=69552 width=41) (actual time=0.026..11.172 rows=41731 loops=4)
                                Buffers: shared hit=3407
Planning:
  Buffers: shared hit=16
Planning Time: 0.444 ms
Execution Time: 135.097 ms

Average execution time (10 attempts): 131.722ms

We add this index:

CREATE INDEX idx_reviews_booking_id 
ON project.reviews (booking_id);

Execution with indexes:

Finalize Aggregate  (cost=33210.02..33210.03 rows=1 width=32) (actual time=125.303..138.277 rows=1 loops=1)
  Buffers: shared hit=25818
  InitPlan 1
    ->  Limit  (cost=0.00..0.02 rows=1 width=90) (actual time=0.013..0.014 rows=1 loops=1)
          Buffers: shared hit=1
          ->  Seq Scan on pet_sitters  (cost=0.00..16.90 rows=690 width=90) (actual time=0.012..0.012 rows=1 loops=1)
                Buffers: shared hit=1
  ->  Gather  (cost=33209.67..33209.98 rows=3 width=32) (actual time=123.351..138.255 rows=4 loops=1)
        Workers Planned: 3
        Workers Launched: 3
        Buffers: shared hit=25818
        ->  Partial Aggregate  (cost=32209.67..32209.68 rows=1 width=32) (actual time=93.069..93.072 rows=1 loops=4)
              Buffers: shared hit=25817
              ->  Parallel Hash Join  (cost=4971.92..32075.06 rows=53846 width=4) (actual time=78.783..93.066 rows=0 loops=4)
                    Hash Cond: ((b.booking_id)::text = (r.booking_id)::text)
                    Buffers: shared hit=25817
                    ->  Parallel Seq Scan on bookings b  (cost=0.00..26256.35 rows=322588 width=37) (actual time=42.860..57.138 rows=0 loops=4)
                          Filter: ((sitter_id)::text = ((InitPlan 1).col1)::text)
                          Rows Removed by Filter: 250005
                          Buffers: shared hit=22224
                    ->  Parallel Hash  (cost=4102.52..4102.52 rows=69552 width=41) (actual time=34.827..34.828 rows=41731 loops=4)
                          Buckets: 262144  Batches: 1  Memory Usage: 15168kB
                          Buffers: shared hit=3407
                          ->  Parallel Seq Scan on reviews r  (cost=0.00..4102.52 rows=69552 width=41) (actual time=0.029..11.437 rows=41731 loops=4)
                                Buffers: shared hit=3407
Planning:
  Buffers: shared hit=16
Planning Time: 0.554 ms
Execution Time: 138.346 ms

Average execution time (10 attempts): 138.346ms

There is practically no change in the execution time, and the query planner completely ignored the index because it determined that performing a Parallel Seq Scan and Parallel Hash Join was cheaper than doing individual index lookups on such a large dataset. Because the index is not utilized for this query and only adds complexity during INSERT and UPDATE operations, we will not keep this index.

Note: See TracWiki for help on using the wiki.