| Version 1 (modified by , 3 weeks ago) ( diff ) |
|---|
Other topics (Performance, Security, …)
Performance
Adding 1000000 mock rows to 'bookings' table
ALTER TABLE project.bookings DISABLE TRIGGER trg_booking_validation;
INSERT INTO project.bookings (status, date_from, date_to, address, owner_id, sitter_id)
SELECT
(ARRAY['Pending', 'Confirmed', 'Completed', 'Canceled', 'Expired', 'Reviewed'])[floor(random()*6)+1] AS status,
t.dt AS date_from,
t.dt + (random() * INTERVAL '7 days') AS date_to,
'Mock Address ' || gs AS address,
o.user_id AS owner_id,
s.user_id AS sitter_id
FROM generate_series(1, 1000000) gs
JOIN LATERAL (
SELECT user_id FROM project.pet_owners ORDER BY random() LIMIT 1
) o ON true
JOIN LATERAL (
SELECT user_id FROM project.pet_sitters ORDER BY random() LIMIT 1
) s ON true
JOIN LATERAL (
SELECT CURRENT_DATE - (random() * INTERVAL '365 days') AS dt
) t ON true;
ALTER TABLE project.bookings ENABLE TRIGGER trg_booking_validation;
Benchmark query:
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM project.bookings WHERE sitter_id = (SELECT user_id FROM project.pet_sitters LIMIT 1) ORDER BY date_from DESC;
Execution without indexes:
Gather Merge (cost=81030.64..195549.38 rows=967764 width=146) (actual time=94.209..106.394 rows=2 loops=1)
Workers Planned: 3
Workers Launched: 3
Buffers: shared hit=22336
InitPlan 1
-> Limit (cost=0.00..0.02 rows=1 width=90) (actual time=3.367..3.369 rows=1 loops=1)
Buffers: shared hit=1
-> Seq Scan on pet_sitters (cost=0.00..16.90 rows=690 width=90) (actual time=0.015..0.016 rows=1 loops=1)
Buffers: shared hit=1
-> Sort (cost=80030.57..80837.04 rows=322588 width=146) (actual time=60.212..60.213 rows=0 loops=4)
Sort Key: bookings.date_from DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=22335
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
Worker 2: Sort Method: quicksort Memory: 25kB
-> Parallel Seq Scan on bookings (cost=0.00..26256.35 rows=322588 width=146) (actual time=37.611..60.082 rows=0 loops=4)
Filter: ((sitter_id)::text = ((InitPlan 1).col1)::text)
Rows Removed by Filter: 250005
Buffers: shared hit=22224
Planning Time: 0.168 ms
JIT:
Functions: 12
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 1.466 ms (Deform 0.675 ms), Inlining 0.000 ms, Optimization 1.430 ms, Emission 19.219 ms, Total 22.115 ms
Execution Time: 106.834 ms
Average execution time (10 attempts): 105.235ms
Next, we add this index:
CREATE INDEX idx_bookings_sitter_date ON project.bookings (sitter_id, date_from DESC);
Execution with indexes:
Index Scan using idx_bookings_sitter_date on bookings (cost=0.45..72503.52 rows=1000022 width=146) (actual time=0.066..0.068 rows=2 loops=1)
Index Cond: ((sitter_id)::text = ((InitPlan 1).col1)::text)
Buffers: shared hit=2 read=3
InitPlan 1
-> Limit (cost=0.00..0.02 rows=1 width=90) (actual time=0.012..0.013 rows=1 loops=1)
Buffers: shared hit=1
-> Seq Scan on pet_sitters (cost=0.00..16.90 rows=690 width=90) (actual time=0.011..0.011 rows=1 loops=1)
Buffers: shared hit=1
Planning:
Buffers: shared hit=39 read=2
Planning Time: 0.618 ms
Execution Time: 0.092 ms
Average execution time (10 attempts): 0.087ms
Because the execution time has been massively lowered by bypassing the expensive sequential scan and memory sort, we keep this index.
Calculate average sitter rating
Benchmark query:
EXPLAIN (ANALYZE, BUFFERS) SELECT AVG(r.rating) FROM project.reviews r JOIN project.bookings b ON r.booking_id = b.booking_id WHERE b.sitter_id = (SELECT user_id FROM project.pet_sitters LIMIT 1);
Execution without indexes:
Finalize Aggregate (cost=33210.02..33210.03 rows=1 width=32) (actual time=121.837..135.038 rows=1 loops=1)
Buffers: shared hit=25818
InitPlan 1
-> Limit (cost=0.00..0.02 rows=1 width=90) (actual time=0.014..0.015 rows=1 loops=1)
Buffers: shared hit=1
-> Seq Scan on pet_sitters (cost=0.00..16.90 rows=690 width=90) (actual time=0.012..0.013 rows=1 loops=1)
Buffers: shared hit=1
-> Gather (cost=33209.67..33209.98 rows=3 width=32) (actual time=119.802..135.016 rows=4 loops=1)
Workers Planned: 3
Workers Launched: 3
Buffers: shared hit=25818
-> Partial Aggregate (cost=32209.67..32209.68 rows=1 width=32) (actual time=90.870..90.873 rows=1 loops=4)
Buffers: shared hit=25817
-> Parallel Hash Join (cost=4971.92..32075.06 rows=53846 width=4) (actual time=77.089..90.867 rows=0 loops=4)
Hash Cond: ((b.booking_id)::text = (r.booking_id)::text)
Buffers: shared hit=25817
-> Parallel Seq Scan on bookings b (cost=0.00..26256.35 rows=322588 width=37) (actual time=41.349..55.119 rows=0 loops=4)
Filter: ((sitter_id)::text = ((InitPlan 1).col1)::text)
Rows Removed by Filter: 250005
Buffers: shared hit=22224
-> Parallel Hash (cost=4102.52..4102.52 rows=69552 width=41) (actual time=34.636..34.637 rows=41731 loops=4)
Buckets: 262144 Batches: 1 Memory Usage: 15200kB
Buffers: shared hit=3407
-> Parallel Seq Scan on reviews r (cost=0.00..4102.52 rows=69552 width=41) (actual time=0.026..11.172 rows=41731 loops=4)
Buffers: shared hit=3407
Planning:
Buffers: shared hit=16
Planning Time: 0.444 ms
Execution Time: 135.097 ms
Average execution time (10 attempts): 131.722ms
We add this index:
CREATE INDEX idx_reviews_booking_id ON project.reviews (booking_id);
Execution with indexes:
Finalize Aggregate (cost=33210.02..33210.03 rows=1 width=32) (actual time=125.303..138.277 rows=1 loops=1)
Buffers: shared hit=25818
InitPlan 1
-> Limit (cost=0.00..0.02 rows=1 width=90) (actual time=0.013..0.014 rows=1 loops=1)
Buffers: shared hit=1
-> Seq Scan on pet_sitters (cost=0.00..16.90 rows=690 width=90) (actual time=0.012..0.012 rows=1 loops=1)
Buffers: shared hit=1
-> Gather (cost=33209.67..33209.98 rows=3 width=32) (actual time=123.351..138.255 rows=4 loops=1)
Workers Planned: 3
Workers Launched: 3
Buffers: shared hit=25818
-> Partial Aggregate (cost=32209.67..32209.68 rows=1 width=32) (actual time=93.069..93.072 rows=1 loops=4)
Buffers: shared hit=25817
-> Parallel Hash Join (cost=4971.92..32075.06 rows=53846 width=4) (actual time=78.783..93.066 rows=0 loops=4)
Hash Cond: ((b.booking_id)::text = (r.booking_id)::text)
Buffers: shared hit=25817
-> Parallel Seq Scan on bookings b (cost=0.00..26256.35 rows=322588 width=37) (actual time=42.860..57.138 rows=0 loops=4)
Filter: ((sitter_id)::text = ((InitPlan 1).col1)::text)
Rows Removed by Filter: 250005
Buffers: shared hit=22224
-> Parallel Hash (cost=4102.52..4102.52 rows=69552 width=41) (actual time=34.827..34.828 rows=41731 loops=4)
Buckets: 262144 Batches: 1 Memory Usage: 15168kB
Buffers: shared hit=3407
-> Parallel Seq Scan on reviews r (cost=0.00..4102.52 rows=69552 width=41) (actual time=0.029..11.437 rows=41731 loops=4)
Buffers: shared hit=3407
Planning:
Buffers: shared hit=16
Planning Time: 0.554 ms
Execution Time: 138.346 ms
Average execution time (10 attempts): 138.346ms
There is practically no change in the execution time, and the query planner completely ignored the index because it determined that performing a Parallel Seq Scan and Parallel Hash Join was cheaper than doing individual index lookups on such a large dataset. Because the index is not utilized for this query and only adds complexity during INSERT and UPDATE operations, we will not keep this index.
