| 2 | | == SQL Performance |
| | 2 | == SQL Performance, adding Indexes |
| | 3 | To analyze the performance of our queries, we consider several different use cases. |
| | 4 | |
| | 5 | The testing approach is as follows: |
| | 6 | |
| | 7 | We first insert a large number of new records into the tables (around 1~2 million rows) so that using an index becomes meaningful and beneficial when applied correctly. It is clear that for a table with only a small number of records, indexing would not provide a significant advantage. |
| | 8 | |
| | 9 | Before creating the index, we execute the queries 10 times using {{{EXPLAIN ANALYZE}}}. We then calculate the average Execution Time and record the query plan, so we can later compare it with the new plan after the index is introduced. |
| | 10 | |
| | 11 | After adding the index, we run the same query again 10 times and compare the results in terms of execution time and query plan. |
| | 12 | === Scenario 1 |
| | 13 | ==== Making appointments table have 1milion rows |
| | 14 | {{{ |
| | 15 | ALTER TABLE appointments DISABLE TRIGGER trg_appointments_no_overlap; |
| | 16 | |
| | 17 | INSERT INTO appointments (clinic_id, animal_id, responsible_owner_id, status, date_time, notes) |
| | 18 | SELECT |
| | 19 | c.clinic_id, |
| | 20 | a.animal_id, |
| | 21 | a.owner_id, |
| | 22 | CASE |
| | 23 | WHEN dt < now() THEN |
| | 24 | (ARRAY['DONE','CANCELLED'])[floor(random()*2)+1] |
| | 25 | ELSE |
| | 26 | 'CONFIRMED' |
| | 27 | END, |
| | 28 | dt, |
| | 29 | 'Auto generated' |
| | 30 | FROM generate_series(1, 1000000) gs |
| | 31 | JOIN LATERAL ( |
| | 32 | SELECT clinic_id FROM vet_clinics ORDER BY random() LIMIT 1 |
| | 33 | ) c ON true |
| | 34 | JOIN LATERAL ( |
| | 35 | SELECT animal_id, owner_id FROM animals ORDER BY random() LIMIT 1 |
| | 36 | ) a ON true |
| | 37 | JOIN LATERAL ( |
| | 38 | SELECT NOW() - (random() * INTERVAL '90 days') AS dt |
| | 39 | ) t ON true; |
| | 40 | |
| | 41 | ALTER TABLE appointments ENABLE TRIGGER trg_appointments_no_overlap; |
| | 42 | }}} |
| | 43 | * I disabled the overlap trigger for easier testing, after I have inserted all the rows I enabled it again |
| | 44 | ==== Executing the query 10 times to see average execution time |
| | 45 | {{{ |
| | 46 | EXPLAIN ANALYZE |
| | 47 | SELECT * |
| | 48 | FROM appointments |
| | 49 | WHERE status = 'CONFIRMED' |
| | 50 | AND date_time < now() - interval '45 minutes'; |
| | 51 | }}} |
| | 52 | After the first execution |
| | 53 | {{{ |
| | 54 | Gather (cost=1000.00..21156.46 rows=1 width=62) (actual time=0.646..81.896 rows=1 loops=1) |
| | 55 | Workers Planned: 2 |
| | 56 | Workers Launched: 2 |
| | 57 | -> Parallel Seq Scan on appointments (cost=0.00..20156.36 rows=1 width=62) (actual time=44.874..69.423 rows=0 loops=3) |
| | 58 | Filter: (((status)::text = 'CONFIRMED'::text) AND (date_time < (now() - '00:45:00'::interval))) |
| | 59 | Rows Removed by Filter: 333334 |
| | 60 | Planning Time: 0.726 ms |
| | 61 | Execution Time: 81.930 ms |
| | 62 | }}} |
| | 63 | * Average execution time WITHOUT indexes |
| | 64 | {{{61.1925ms}}} |
| | 65 | {{{ |
| | 66 | Index Scan using idx_appointments_status_date_time on appointments (cost=0.43..7.88 rows=1 width=62) (actual time=0.089..0.090 rows=1 loops=1) |
| | 67 | Index Cond: (((status)::text = 'CONFIRMED'::text) AND (date_time < (now() - '00:45:00'::interval))) |
| | 68 | Planning Time: 1.020 ms |
| | 69 | Execution Time: 0.122 ms |
| | 70 | }}} |
| | 71 | * Average execution time WITH indexes |
| | 72 | {{{0.093ms}}} |
| | 73 | Because the execution time has been optimized with the use of index we will keep the index. |
| | 74 | === Scenario 2 |
| | 75 | ==== Making listings table have 2milion rows |
| | 76 | {{{ |
| | 77 | INSERT INTO listings (animal_id, owner_id, status, price, created_at) |
| | 78 | SELECT |
| | 79 | a.animal_id, |
| | 80 | a.owner_id, |
| | 81 | CASE |
| | 82 | WHEN random() < 0.7 THEN 'DRAFT' |
| | 83 | WHEN random() < 0.9 THEN 'ARCHIVED' |
| | 84 | ELSE 'SOLD' |
| | 85 | END, |
| | 86 | (random() * 1000 + 50)::numeric(10,2), |
| | 87 | NOW() - (random() * INTERVAL '120 days') |
| | 88 | FROM generate_series(1, 2000000) gs |
| | 89 | JOIN LATERAL ( |
| | 90 | SELECT animal_id, owner_id |
| | 91 | FROM animals |
| | 92 | ORDER BY random() |
| | 93 | LIMIT 1 |
| | 94 | ) a ON true; |
| | 95 | }}} |
| | 96 | ==== Executing the query 10 times to see average execution time |
| | 97 | {{{ |
| | 98 | EXPLAIN ANALYZE |
| | 99 | SELECT * |
| | 100 | FROM listings |
| | 101 | WHERE status = 'DRAFT' |
| | 102 | AND created_at < now() - interval '30 days'; |
| | 103 | }}} |
| | 104 | After the first execution |
| | 105 | {{{ |
| | 106 | Seq Scan on listings (cost=0.00..60619.18 rows=1058297 width=576) (actual time=0.014..453.562 rows=1051498 loops=1) |
| | 107 | Filter: (((status)::text = 'DRAFT'::text) AND (created_at < (now() - '30 days'::interval))) |
| | 108 | Rows Removed by Filter: 948511 |
| | 109 | Planning Time: 0.544 ms |
| | 110 | Execution Time: 487.292 ms |
| | 111 | }}} |
| | 112 | * Average execution time WITHOUT indexes |
| | 113 | {{{494.8984ms}}} |
| | 114 | {{{ |
| | 115 | Seq Scan on listings (cost=0.00..60619.18 rows=1058332 width=576) (actual time=0.044..473.306 rows=1051534 loops=1) |
| | 116 | Filter: (((status)::text = 'DRAFT'::text) AND (created_at < (now() - '30 days'::interval))) |
| | 117 | Rows Removed by Filter: 948475 |
| | 118 | Planning Time: 0.092 ms |
| | 119 | Execution Time: 507.535 ms |
| | 120 | |
| | 121 | }}} |
| | 122 | * Average execution time WITH indexes |
| | 123 | {{{500.327ms}}} |
| | 124 | There isn't a change in the execution time, the query doesn't even use the index because its's cheaper to just use seq scan, we won't be keeping this index. |
| | 125 | |