Changes between Version 1 and Version 2 of OtherTopics
- Timestamp:
- 04/22/26 11:12:55 (10 days ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
OtherTopics
v1 v2 3 3 == SQL Performance == 4 4 5 Performance analysis was conducted on the local development database with 200,000 generated reservations spread uniformly across 2 years (2024-2026). The small seed dataset (26 rows) is insufficient for meaningful index analysis, as sequential scans are optimal for small tables.5 Performance analysis was conducted against the full complex report queries from Phase P6 ([wiki:ComplexReports ComplexReports]). The tests were run on the local development database after loading 200,000 synthetic reservations spread uniformly across 2 years (2024-2026). The small seed dataset (26 rows) cannot reveal index usefulness because sequential scans are optimal at that scale. 6 6 7 The testing method: each query is run with {{{EXPLAIN (ANALYZE, BUFFERS)}}} before and after index creation. We compare execution time, scan type, and buffer usage.7 The testing method: each P6 report query is executed with {{{EXPLAIN (ANALYZE, BUFFERS)}}} before adding new indexes and after. We compare execution time, scan type, and buffer usage. 8 8 9 === Scenario 1: User Activity Aggregation===9 === Scenario 1: Phase P6 Report 1 — Resource Utilization (quarterly) === 10 10 11 '''Query:''' Per-user reservation statistics with status breakdown and distinct resource count (from Report 3: User Behavior Analysis). 12 13 {{{ 14 SELECT 15 rv.user_id, 16 COUNT(*) AS total_reservations, 17 COUNT(*) FILTER (WHERE rv.status IN ('approved', 'completed')) AS approved, 18 COUNT(*) FILTER (WHERE rv.status = 'rejected') AS rejected, 19 COUNT(*) FILTER (WHERE rv.status = 'cancelled') AS cancelled, 20 ROUND(AVG(EXTRACT(EPOCH FROM (rv.end_time - rv.start_time)) / 3600.0), 1) AS avg_duration, 21 COUNT(DISTINCT rv.resource_id) AS distinct_resources_used 22 FROM reservations rv 23 GROUP BY rv.user_id; 24 }}} 25 26 '''Proposed index:''' 27 {{{ 28 CREATE INDEX idx_reservations_user_status_resource 29 ON reservations (user_id, status, resource_id); 30 }}} 31 32 '''Before index:''' 33 {{{ 34 GroupAggregate (actual time=146.682..223.586 rows=12 loops=1) 35 Buffers: shared hit=3001, temp read=1129 written=1133 36 -> Sort (actual time=139.784..165.344 rows=200026 loops=1) 37 Sort Method: external merge Disk: 9032kB 38 -> Seq Scan on reservations rv (actual time=0.005..20.583 rows=200026 loops=1) 39 Execution Time: 226.085 ms 40 }}} 41 42 '''After index:''' 43 {{{ 44 GroupAggregate (actual time=22.372..157.807 rows=12 loops=1) 45 Buffers: shared hit=36040 46 -> Incremental Sort (actual time=8.413..96.559 rows=200026 loops=1) 47 Presorted Key: user_id 48 Full-sort Groups: 12 Sort Method: quicksort Average Memory: 1817kB 49 -> Index Scan using idx_reservations_user (actual time=0.015..54.051 rows=200026 loops=1) 50 Execution Time: 157.920 ms 51 }}} 52 53 '''Analysis:''' The index provides pre-sorted data by {{{user_id}}}, enabling incremental sort instead of a full external merge sort. The critical improvement is '''elimination of disk-based sorting''' (9032kB spilled to disk) in favor of in-memory quicksort. Buffer reads come entirely from shared memory (no temp I/O). 54 55 '''Result: 226ms -> 158ms (30% improvement), disk sort eliminated.''' 56 57 === Scenario 2: Monthly Reservation Trends with Top Resources === 58 59 '''Query:''' Monthly breakdown with aggregated stats and top demanded resource per month (from Report 2: Monthly Trends). Uses CTEs, window functions ({{{ROW_NUMBER}}}), date filtering, and a JOIN to the resources table. 60 61 {{{ 62 WITH monthly_overview AS ( 63 SELECT DATE_TRUNC('month', rv.start_time) AS month, COUNT(*) AS total, 64 COUNT(*) FILTER (WHERE rv.status IN ('approved','completed')) AS approved, 65 COUNT(DISTINCT rv.user_id) AS unique_users 66 FROM reservations rv 67 WHERE rv.start_time >= '2025-01-01' AND rv.start_time < '2025-07-01' 68 GROUP BY DATE_TRUNC('month', rv.start_time) 69 ), 70 resource_demand AS ( 71 SELECT DATE_TRUNC('month', rv.start_time) AS month, r.name, 72 COUNT(*) AS demand_count, 73 ROW_NUMBER() OVER (PARTITION BY DATE_TRUNC('month', rv.start_time) 74 ORDER BY COUNT(*) DESC) AS rank 75 FROM reservations rv JOIN resources r ON rv.resource_id = r.resource_id 76 WHERE rv.start_time >= '2025-01-01' AND rv.start_time < '2025-07-01' 77 GROUP BY DATE_TRUNC('month', rv.start_time), r.name 78 ) 79 SELECT mo.*, rd.name AS top_resource, rd.demand_count 80 FROM monthly_overview mo 81 LEFT JOIN resource_demand rd ON mo.month = rd.month AND rd.rank = 1 82 ORDER BY mo.month; 83 }}} 11 '''Query source:''' the full "Report 1: Resource Utilization and Demand Analysis" from [wiki:ComplexReports ComplexReports]. Uses 5 CTEs: quarter_bounds, quarter_days, resource_availability, reservation_stats, popular_day, peak_hour. Window function {{{RANK()}}} for demand ranking. Filters by a quarterly date range. 84 12 85 13 '''Proposed index:''' … … 89 17 }}} 90 18 91 '''Before index :'''19 '''Before index (excerpt of plan):''' 92 20 {{{ 93 Nested Loop Left Join (actual time=89.338..98.077 rows=6loops=1)94 Buffers: shared hit=7566895 ...96 -> Nested Loop (actual time=0.955..1.866 rows=1931 loops=27)97 -> Index Scan using idx_reservations_resource on reservations rv_198 Filter: (start_time >= ... AND start_time < ...)99 Rows Removed by Filter: 5478100 Buffers: shared hit=74673101 Execution Time: 98.322ms21 HashAggregate (actual rows=189 loops=1) 22 -> Nested Loop (actual rows=10383 loops=1) 23 -> Bitmap Heap Scan on reservations rv_1 (actual rows=10383 loops=1) 24 Recheck Cond: start_time >= q_start AND start_time < q_end 25 Filter: status IN ('approved','completed') 26 Rows Removed by Filter: 15537 27 Heap Blocks: exact=364 28 -> Bitmap Index Scan on idx_reservations_times 29 Execution Time: 181.324 ms 102 30 }}} 103 31 104 32 '''After index:''' 105 33 {{{ 106 Sort (actual time=56.456..56.525 rows=6 loops=1) 107 Buffers: shared hit=1055 read=261 108 ... 109 -> Parallel Index Only Scan using idx_reservations_start_resource_status 110 on reservations rv_1 111 Index Cond: (start_time >= ... AND start_time < ...) 112 Heap Fetches: 0 113 Buffers: shared hit=3 read=261 114 Execution Time: 56.818 ms 34 -> Index Only Scan using idx_reservations_start_resource_status 35 on reservations rv_1 36 Index Cond: start_time >= q_start AND start_time < q_end 37 Heap Fetches: 0 38 Execution Time: 82.307 ms 115 39 }}} 116 40 117 '''Analysis:''' The composite index on {{{(start_time, resource_id, status)}}} enables an '''Index Only Scan''' for the resource demand subquery — meaning PostgreSQL reads all needed columns directly from the index without touching the heap table ({{{Heap Fetches: 0}}}). Buffer usage dropped dramatically from 75,668 to 1,316 hits. The old plan used a nested loop scanning 5,478 irrelevant rows per resource; the new plan avoids this entirely.41 '''Analysis:''' The new composite index includes {{{status}}}, so the planner can filter by status inside the index without touching the heap. The {{{popular_day}}} and {{{peak_hour}}} CTEs both switched from Bitmap Heap Scan with "Rows Removed by Filter: 15537" per iteration to a clean Index Only Scan. Heap Fetches is 0. 118 42 119 '''Result: 98ms -> 57ms (42% improvement), buffer hits reduced 98%.'''43 '''Result: 181 ms -> 82 ms (55% faster). Index Only Scan confirmed in the plan.''' 120 44 121 === Scenario 3: Administrator Approval Workload===45 === Scenario 2: Phase P6 Report 3 — User Activity and Behavior Analysis === 122 46 123 '''Query :''' Per-admin aggregation of reviewed reservations with status breakdown and distinct counts (from Report 4: Admin Workload Analysis). Filters on {{{approved_by IS NOT NULL}}}.47 '''Query source:''' the full "Report 3: User Activity and Behavior Analysis" from [wiki:ComplexReports ComplexReports]. Uses 2 CTEs (user_stats, favorite_resource), {{{DISTINCT ON}}}, {{{RANK()}}}, {{{COUNT(*) FILTER (WHERE ...)}}} for per-status counts, and aggregates over the entire reservations table grouped by user_id. 124 48 49 '''Proposed index:''' 125 50 {{{ 126 SELECT 127 rv.approved_by AS admin_id, 128 COUNT(*) AS total_reviewed, 129 COUNT(*) FILTER (WHERE rv.status IN ('approved', 'completed')) AS approved_count, 130 COUNT(*) FILTER (WHERE rv.status = 'rejected') AS rejected_count, 131 COUNT(DISTINCT rv.resource_id) AS distinct_resources_handled, 132 COUNT(DISTINCT rv.user_id) AS distinct_users_served 133 FROM reservations rv 134 WHERE rv.approved_by IS NOT NULL 135 GROUP BY rv.approved_by; 51 CREATE INDEX idx_reservations_user_status_resource 52 ON reservations (user_id, status, resource_id) 53 INCLUDE (start_time, end_time); 136 54 }}} 55 56 The {{{INCLUDE}}} clause adds start_time and end_time as index payload (not used for ordering) so the query can compute {{{AVG(end_time - start_time)}}} and {{{SUM(...) FILTER (...)}}} without heap access. 57 58 '''Before index:''' 59 {{{ 60 Sort Method: external merge Disk: 9032kB 61 -> Seq Scan on reservations rv (actual rows=200026 loops=1) 62 Execution Time: 279.957 ms 63 }}} 64 65 '''After index:''' 66 {{{ 67 Incremental Sort 68 Presorted Key: user_id 69 Full-sort Groups: 12 Sort Method: quicksort Average Memory: 29kB 70 Pre-sorted Groups: 12 Sort Method: quicksort Average Memory: 1821kB 71 -> Index Only Scan using idx_reservations_user_status_resource 72 Heap Fetches: 0 73 Execution Time: 175.602 ms 74 }}} 75 76 '''Analysis:''' The sequential scan is replaced by an Index Only Scan. Disk-based external merge sort (9032kB to disk) is eliminated; because the index is sorted by user_id, PostgreSQL does an Incremental Sort entirely in memory. Heap Fetches: 0 confirms the INCLUDE columns are doing their job. 77 78 '''Result: 280 ms -> 176 ms (37% faster). Disk sort eliminated; Index Only Scan used.''' 79 80 === Scenario 3: Phase P6 Report 4 — Administrator Approval Workload === 81 82 '''Query source:''' the full "Report 4: Administrator Approval Workload and Bottleneck Analysis" from [wiki:ComplexReports ComplexReports]. Uses 3 CTEs (admin_stats, pending_stats, workload_share), {{{CROSS JOIN}}} to attach the single-row pending_stats to every admin, and {{{SUM(...) OVER ()}}} to compute workload share percentage. 137 83 138 84 '''Proposed index:''' … … 143 89 }}} 144 90 145 This is a '''partial covering index''' — it only indexes rows where {{{approved_by IS NOT NULL}}} (about 70% of rows), and includes all columns the query needs.91 This is a partial covering index: the {{{WHERE}}} clause excludes rows with no approver (roughly 30% of the data), and the listed columns are everything the {{{admin_stats}}} CTE needs to aggregate. 146 92 147 93 '''Before index:''' 148 94 {{{ 149 GroupAggregate (actual time=100.297..118.534 rows=2 loops=1) 150 Buffers: shared hit=3001, temp read=572 written=574 151 -> Sort (actual time=81.884..95.827 rows=140517 loops=1) 152 Sort Method: external merge Disk: 4576kB 153 -> Seq Scan on reservations rv (actual time=0.005..19.879 rows=140517 loops=1) 154 Filter: (approved_by IS NOT NULL) 155 Rows Removed by Filter: 59509 156 Execution Time: 121.020 ms 95 Sort Method: external merge Disk: 4568kB 96 -> Seq Scan on reservations rv_1 97 Filter: approved_by IS NOT NULL 98 Execution Time: 166.804 ms 157 99 }}} 158 100 159 101 '''After index:''' 160 102 {{{ 161 GroupAggregate (actual time=82.464..97.428 rows=2 loops=1) 162 Buffers: shared hit=4 read=132 163 -> Incremental Sort (actual time=30.378..75.557 rows=140517 loops=1) 164 Presorted Key: approved_by 165 -> Index Only Scan using idx_reservations_approver_status 166 on reservations rv (actual time=0.338..11.357 rows=140517 loops=1) 167 Heap Fetches: 0 168 Buffers: shared hit=1 read=132 169 Execution Time: 98.533 ms 103 -> Index Only Scan using idx_reservations_approver_status 104 Heap Fetches: 0 105 Execution Time: 129.167 ms 170 106 }}} 171 107 172 '''Analysis:''' The partial covering index eliminates the sequential scan entirely. PostgreSQL performs an '''Index Only Scan''' ({{{Heap Fetches: 0}}}) reading only 133 buffers instead of 3,001. The {{{WHERE approved_by IS NOT NULL}}} filter in the index definition means the index is smaller and doesn't waste space on irrelevant rows. Disk-based sorting (4576kB) is eliminated in favor of presorted incremental sort.108 '''Analysis:''' Sequential scan replaced by Index Only Scan. Disk-based external merge sort (4568kB) eliminated. The partial index is smaller than a full one because it excludes the 60,000 rows with {{{approved_by IS NULL}}}. 173 109 174 '''Result: 1 21ms -> 99ms (18% improvement), sequential scan eliminated, buffer reads reduced 96%.'''110 '''Result: 167 ms -> 129 ms (23% faster). Index Only Scan used; partial index avoids indexing irrelevant rows.''' 175 111 176 112 === Performance Summary === 177 113 178 ||'''Scenario'''||'''Before'''||'''After'''||'''Improvement'''||'''Key Change'''|| 179 ||User Activity Aggregation||226ms||158ms||30%||Disk sort eliminated, index scan|| 180 ||Monthly Trends with Top Resources||98ms||57ms||42%||Index Only Scan, 98% fewer buffers|| 181 ||Admin Approval Workload||121ms||99ms||18%||Partial covering index, no heap fetches|| 114 ||'''Phase P6 Report'''||'''Before'''||'''After'''||'''Gain'''||'''Key change'''|| 115 ||Report 1 — Resource Utilization (quarterly)||181 ms||82 ms||55%||Composite index enables Index Only Scan for CTE aggregation|| 116 ||Report 3 — User Activity Analysis||280 ms||176 ms||37%||INCLUDE covering index removes 9 MB disk sort and heap fetches|| 117 ||Report 4 — Admin Workload Analysis||167 ms||129 ms||23%||Partial covering index replaces seq scan, removes 4.5 MB disk sort|| 118 119 === Note on Reports That Do Not Benefit from Indexes === 120 121 Phase P6 Report 2 (Monthly Reservation Trends) without any date filter scans and aggregates the entire reservations table grouped by month. A full-table aggregation cannot be sped up by indexes alone — the planner would still have to read every row. For that report the correct optimization is either (a) to add a date filter (e.g. "last 12 months"), at which point the {{{idx_reservations_start_resource_status}}} index cuts execution time by about 10%, or (b) to pre-compute monthly summaries in a materialized view that is refreshed periodically. The same applies to any dashboard query that aggregates over the whole history. 182 122 183 123 == Security Measures == … … 197 137 }}} 198 138 199 psycopg2 sends the query template and parameters separately to PostgreSQL , which parses the query structure first and then binds the parameters. This makes SQL injection impossible regardless of userinput content.139 psycopg2 sends the query template and parameters separately to PostgreSQL. The server parses the template first and only then binds the parameters, so injection is impossible regardless of the input content. 200 140 201 141 === Password Hashing === 202 142 203 User passwords are never stored in plain text. The prototype uses '''bcrypt''' with a random salt for hashing:143 Passwords are never stored in plain text. The prototype uses '''bcrypt''' with a random per-password salt: 204 144 205 145 {{{ … … 212 152 }}} 213 153 214 bcrypt is a deliberately slow hashing algorithm designed to resist brute-force attacks. The salt is generated per-password, preventingrainbow table attacks.154 bcrypt is a deliberately slow hashing algorithm with a configurable work factor, which resists brute-force attacks. The per-password salt defeats rainbow table attacks. 215 155 216 156 === Role-Based Access Control === 217 157 218 Access control is enforced at two levels:158 Access control is enforced at two independent layers: 219 159 220 '''Application l evel:''' The main menu only shows actions appropriate for the user's role. Students see "Browse Resources" only, Teaching Staff can also make reservations, and Administrators get approval, analytics, and user management options.160 '''Application layer (main.py):''' the main menu exposes only the actions appropriate for the logged-in user's role. Students can browse; Teaching Staff can also reserve; Administrators see approval, analytics, and user management. 221 161 222 162 {{{ 223 # main.py224 163 if role == "Teaching Staff": 225 164 options.append("Make a Resource Reservation") … … 230 169 }}} 231 170 232 '''Database l evel:''' A trigger ({{{trg_check_approver_is_admin}}}, from Phase 7) enforces that only users with the Administrator role can be set as {{{approved_by}}} on reservations. This prevents privilege escalation even if the application layer is bypassed.171 '''Database layer (Phase P7 trigger {{{trg_check_approver_is_admin}}}):''' enforces that the {{{approved_by}}} column can only reference users whose {{{type_name}}} is {{{'Administrator'}}}. Even if the application layer is bypassed (for example, by a direct SQL client), the database rejects any attempt to record a non-admin as the approver. 233 172 234 173 {{{ 235 -- Trigger rejects if approver is not an Administrator236 174 IF v_type_name != 'Administrator' THEN 237 175 RAISE EXCEPTION 'Only administrators can approve reservations. User % is "%"', … … 242 180 === Connection Security === 243 181 244 Database connections use a '''connection pool''' (from Phase 8) with explicit transaction management. This provides:182 Database connections use a '''connection pool''' (from Phase P8) with explicit transaction management. This provides: 245 183 246 * '''Resource exhaustion protection:''' Maximum 10 connections prevents a runaway process from consuming all database connections.247 * '''Automatic cleanup:''' Connections are returned to the pool via context managers, preventing connection leaks.248 * '''Transaction isolation:''' Write operations use explicit transactions ({{{get_transaction()}}}) that automatically roll back on errors, preventing partial writes from corrupting data.184 * '''Resource-exhaustion protection:''' at most 10 concurrent connections, so a runaway process cannot consume all database connection slots. 185 * '''Automatic cleanup:''' connections are always returned to the pool via context managers, preventing connection leaks that could otherwise reach the server limit. 186 * '''Transaction isolation:''' write operations use {{{get_transaction()}}}, which rolls back automatically on any exception and therefore cannot leave the database in a partially-written state.
