Context Navigation

Changes between Version 1 and Version 2 of OtherTopics

Timestamp:: 04/22/26 11:12:55 (3 months ago)
Author:: 221511
Comment:: made the tests on the report queries from phase 6 and expanded the other explanations

Legend:

: Unmodified
: Added
: Removed
: Modified

OtherTopics

-              v1
+              v2
 == SQL Performance ==
 Performance analysis was conducted on the local development database with 200,000 generated reservations spread uniformly across 2 years (2024-2026). The small seed dataset (26 rows) is insufficient for meaningful index analysis, as sequential scans are optimal for small tables.
+Performance analysis was conducted against the full complex report queries from Phase P6 ([wiki:ComplexReports ComplexReports]). The tests were run on the local development database after loading 200,000 synthetic reservations spread uniformly across 2 years (2024-2026). The small seed dataset (26 rows) cannot reveal index usefulness because sequential scans are optimal at that scale.
 The testing method: each query is run with {{{EXPLAIN (ANALYZE, BUFFERS)}}} before and after index creation. We compare execution time, scan type, and buffer usage.
+The testing method: each P6 report query is executed with {{{EXPLAIN (ANALYZE, BUFFERS)}}} before adding new indexes and after. We compare execution time, scan type, and buffer usage.
 === Scenario 1: User Activity Aggregation ===
+=== Scenario 1: Phase P6 Report 1 — Resource Utilization (quarterly) ===
+'''Query:''' Per-user reservation statistics with status breakdown and distinct resource count (from Report 3: User Behavior Analysis).
+{{{
+SELECT
+    rv.user_id,
+    COUNT(*) AS total_reservations,
+    COUNT(*) FILTER (WHERE rv.status IN ('approved', 'completed')) AS approved,
+    COUNT(*) FILTER (WHERE rv.status = 'rejected') AS rejected,
+    COUNT(*) FILTER (WHERE rv.status = 'cancelled') AS cancelled,
+    ROUND(AVG(EXTRACT(EPOCH FROM (rv.end_time - rv.start_time)) / 3600.0), 1) AS avg_duration,
+    COUNT(DISTINCT rv.resource_id) AS distinct_resources_used
+FROM reservations rv
+GROUP BY rv.user_id;
+}}}
+'''Proposed index:'''
+{{{
+CREATE INDEX idx_reservations_user_status_resource
+    ON reservations (user_id, status, resource_id);
+}}}
+'''Before index:'''
+{{{
+ GroupAggregate  (actual time=146.682..223.586 rows=12 loops=1)
+   Buffers: shared hit=3001, temp read=1129 written=1133
+   ->  Sort  (actual time=139.784..165.344 rows=200026 loops=1)
+         Sort Method: external merge  Disk: 9032kB
+         ->  Seq Scan on reservations rv  (actual time=0.005..20.583 rows=200026 loops=1)
+ Execution Time: 226.085 ms
+}}}
+'''After index:'''
+{{{
+ GroupAggregate  (actual time=22.372..157.807 rows=12 loops=1)
+   Buffers: shared hit=36040
+   ->  Incremental Sort  (actual time=8.413..96.559 rows=200026 loops=1)
+         Presorted Key: user_id
+         Full-sort Groups: 12  Sort Method: quicksort  Average Memory: 1817kB
+         ->  Index Scan using idx_reservations_user  (actual time=0.015..54.051 rows=200026 loops=1)
+ Execution Time: 157.920 ms
+}}}
+'''Analysis:''' The index provides pre-sorted data by {{{user_id}}}, enabling incremental sort instead of a full external merge sort. The critical improvement is '''elimination of disk-based sorting''' (9032kB spilled to disk) in favor of in-memory quicksort. Buffer reads come entirely from shared memory (no temp I/O).
+'''Result: 226ms -> 158ms (30% improvement), disk sort eliminated.'''
+=== Scenario 2: Monthly Reservation Trends with Top Resources ===
+'''Query:''' Monthly breakdown with aggregated stats and top demanded resource per month (from Report 2: Monthly Trends). Uses CTEs, window functions ({{{ROW_NUMBER}}}), date filtering, and a JOIN to the resources table.
+{{{
+WITH monthly_overview AS (
+    SELECT DATE_TRUNC('month', rv.start_time) AS month, COUNT(*) AS total,
+           COUNT(*) FILTER (WHERE rv.status IN ('approved','completed')) AS approved,
+           COUNT(DISTINCT rv.user_id) AS unique_users
+    FROM reservations rv
+    WHERE rv.start_time >= '2025-01-01' AND rv.start_time < '2025-07-01'
+    GROUP BY DATE_TRUNC('month', rv.start_time)
+),
+resource_demand AS (
+    SELECT DATE_TRUNC('month', rv.start_time) AS month, r.name,
+           COUNT(*) AS demand_count,
+           ROW_NUMBER() OVER (PARTITION BY DATE_TRUNC('month', rv.start_time)
+                              ORDER BY COUNT(*) DESC) AS rank
+    FROM reservations rv JOIN resources r ON rv.resource_id = r.resource_id
+    WHERE rv.start_time >= '2025-01-01' AND rv.start_time < '2025-07-01'
+    GROUP BY DATE_TRUNC('month', rv.start_time), r.name
+)
+SELECT mo.*, rd.name AS top_resource, rd.demand_count
+FROM monthly_overview mo
+LEFT JOIN resource_demand rd ON mo.month = rd.month AND rd.rank = 1
+ORDER BY mo.month;
+}}}
+'''Query source:''' the full "Report 1: Resource Utilization and Demand Analysis" from [wiki:ComplexReports ComplexReports]. Uses 5 CTEs: quarter_bounds, quarter_days, resource_availability, reservation_stats, popular_day, peak_hour. Window function {{{RANK()}}} for demand ranking. Filters by a quarterly date range.
 '''Proposed index:'''
 …
 }}}
 '''Before index:'''
+'''Before index (excerpt of plan):'''
 {{{
  Nested Loop Left Join  (actual time=89.338..98.077 rows=6 loops=1)
    Buffers: shared hit=75668
    ...
    ->  Nested Loop  (actual time=0.955..1.866 rows=1931 loops=27)
          ->  Index Scan using idx_reservations_resource on reservations rv_1
                Filter: (start_time >= ... AND start_time < ...)
                Rows Removed by Filter: 5478
                Buffers: shared hit=74673
  Execution Time: 98.322 ms
+HashAggregate  (actual rows=189 loops=1)
+ ->  Nested Loop (actual rows=10383 loops=1)
+       ->  Bitmap Heap Scan on reservations rv_1 (actual rows=10383 loops=1)
+             Recheck Cond: start_time >= q_start AND start_time < q_end
+             Filter: status IN ('approved','completed')
+             Rows Removed by Filter: 15537
+             Heap Blocks: exact=364
+             ->  Bitmap Index Scan on idx_reservations_times
+Execution Time: 181.324 ms
 }}}
 '''After index:'''
 {{{
+ Sort  (actual time=56.456..56.525 rows=6 loops=1)
+   Buffers: shared hit=1055 read=261
+   ...
+   ->  Parallel Index Only Scan using idx_reservations_start_resource_status
+         on reservations rv_1
+         Index Cond: (start_time >= ... AND start_time < ...)
+         Heap Fetches: 0
+         Buffers: shared hit=3 read=261
+ Execution Time: 56.818 ms
+ ->  Index Only Scan using idx_reservations_start_resource_status
+       on reservations rv_1
+       Index Cond: start_time >= q_start AND start_time < q_end
+       Heap Fetches: 0
+Execution Time: 82.307 ms
 }}}
 '''Analysis:''' The composite index on {{{(start_time, resource_id, status)}}} enables an '''Index Only Scan''' for the resource demand subquery — meaning PostgreSQL reads all needed columns directly from the index without touching the heap table ({{{Heap Fetches: 0}}}). Buffer usage dropped dramatically from 75,668 to 1,316 hits. The old plan used a nested loop scanning 5,478 irrelevant rows per resource; the new plan avoids this entirely.
+'''Analysis:''' The new composite index includes {{{status}}}, so the planner can filter by status inside the index without touching the heap. The {{{popular_day}}} and {{{peak_hour}}} CTEs both switched from Bitmap Heap Scan with "Rows Removed by Filter: 15537" per iteration to a clean Index Only Scan. Heap Fetches is 0.
 '''Result: 98ms -> 57ms (42% improvement), buffer hits reduced 98%.'''
+'''Result: 181 ms -> 82 ms (55% faster). Index Only Scan confirmed in the plan.'''
 === Scenario 3: Administrator Approval Workload ===
+=== Scenario 2: Phase P6 Report 3 — User Activity and Behavior Analysis ===
 '''Query:''' Per-admin aggregation of reviewed reservations with status breakdown and distinct counts (from Report 4: Admin Workload Analysis). Filters on {{{approved_by IS NOT NULL}}}.
+'''Query source:''' the full "Report 3: User Activity and Behavior Analysis" from [wiki:ComplexReports ComplexReports]. Uses 2 CTEs (user_stats, favorite_resource), {{{DISTINCT ON}}}, {{{RANK()}}}, {{{COUNT(*) FILTER (WHERE ...)}}} for per-status counts, and aggregates over the entire reservations table grouped by user_id.
+'''Proposed index:'''
 {{{
+SELECT
+    rv.approved_by AS admin_id,
+    COUNT(*) AS total_reviewed,
+    COUNT(*) FILTER (WHERE rv.status IN ('approved', 'completed')) AS approved_count,
+    COUNT(*) FILTER (WHERE rv.status = 'rejected') AS rejected_count,
+    COUNT(DISTINCT rv.resource_id) AS distinct_resources_handled,
+    COUNT(DISTINCT rv.user_id) AS distinct_users_served
+FROM reservations rv
+WHERE rv.approved_by IS NOT NULL
+GROUP BY rv.approved_by;
+CREATE INDEX idx_reservations_user_status_resource
+    ON reservations (user_id, status, resource_id)
+    INCLUDE (start_time, end_time);
 }}}
+The {{{INCLUDE}}} clause adds start_time and end_time as index payload (not used for ordering) so the query can compute {{{AVG(end_time - start_time)}}} and {{{SUM(...) FILTER (...)}}} without heap access.
+'''Before index:'''
+{{{
+Sort Method: external merge  Disk: 9032kB
+ ->  Seq Scan on reservations rv  (actual rows=200026 loops=1)
+Execution Time: 279.957 ms
+}}}
+'''After index:'''
+{{{
+Incremental Sort
+  Presorted Key: user_id
+  Full-sort Groups: 12  Sort Method: quicksort  Average Memory: 29kB
+  Pre-sorted Groups: 12  Sort Method: quicksort  Average Memory: 1821kB
+  ->  Index Only Scan using idx_reservations_user_status_resource
+        Heap Fetches: 0
+Execution Time: 175.602 ms
+}}}
+'''Analysis:''' The sequential scan is replaced by an Index Only Scan. Disk-based external merge sort (9032kB to disk) is eliminated; because the index is sorted by user_id, PostgreSQL does an Incremental Sort entirely in memory. Heap Fetches: 0 confirms the INCLUDE columns are doing their job.
+'''Result: 280 ms -> 176 ms (37% faster). Disk sort eliminated; Index Only Scan used.'''
+=== Scenario 3: Phase P6 Report 4 — Administrator Approval Workload ===
+'''Query source:''' the full "Report 4: Administrator Approval Workload and Bottleneck Analysis" from [wiki:ComplexReports ComplexReports]. Uses 3 CTEs (admin_stats, pending_stats, workload_share), {{{CROSS JOIN}}} to attach the single-row pending_stats to every admin, and {{{SUM(...) OVER ()}}} to compute workload share percentage.
 '''Proposed index:'''
 …
 }}}
 This is a '''partial covering index''' — it only indexes rows where {{{approved_by IS NOT NULL}}} (about 70% of rows), and includes all columns the query needs.
+This is a partial covering index: the {{{WHERE}}} clause excludes rows with no approver (roughly 30% of the data), and the listed columns are everything the {{{admin_stats}}} CTE needs to aggregate.
 '''Before index:'''
 {{{
+ GroupAggregate  (actual time=100.297..118.534 rows=2 loops=1)
+   Buffers: shared hit=3001, temp read=572 written=574
+   ->  Sort  (actual time=81.884..95.827 rows=140517 loops=1)
+         Sort Method: external merge  Disk: 4576kB
+         ->  Seq Scan on reservations rv  (actual time=0.005..19.879 rows=140517 loops=1)
+               Filter: (approved_by IS NOT NULL)
+               Rows Removed by Filter: 59509
+ Execution Time: 121.020 ms
+Sort Method: external merge  Disk: 4568kB
+ ->  Seq Scan on reservations rv_1
+     Filter: approved_by IS NOT NULL
+Execution Time: 166.804 ms
 }}}
 '''After index:'''
 {{{
+ GroupAggregate  (actual time=82.464..97.428 rows=2 loops=1)
+   Buffers: shared hit=4 read=132
+   ->  Incremental Sort  (actual time=30.378..75.557 rows=140517 loops=1)
+         Presorted Key: approved_by
+         ->  Index Only Scan using idx_reservations_approver_status
+               on reservations rv  (actual time=0.338..11.357 rows=140517 loops=1)
+               Heap Fetches: 0
+               Buffers: shared hit=1 read=132
+ Execution Time: 98.533 ms
+ ->  Index Only Scan using idx_reservations_approver_status
+       Heap Fetches: 0
+Execution Time: 129.167 ms
 }}}
 '''Analysis:''' The partial covering index eliminates the sequential scan entirely. PostgreSQL performs an '''Index Only Scan''' ({{{Heap Fetches: 0}}}) reading only 133 buffers instead of 3,001. The {{{WHERE approved_by IS NOT NULL}}} filter in the index definition means the index is smaller and doesn't waste space on irrelevant rows. Disk-based sorting (4576kB) is eliminated in favor of presorted incremental sort.
+'''Analysis:''' Sequential scan replaced by Index Only Scan. Disk-based external merge sort (4568kB) eliminated. The partial index is smaller than a full one because it excludes the 60,000 rows with {{{approved_by IS NULL}}}.
 '''Result: 121ms -> 99ms (18% improvement), sequential scan eliminated, buffer reads reduced 96%.'''
+'''Result: 167 ms -> 129 ms (23% faster). Index Only Scan used; partial index avoids indexing irrelevant rows.'''
 === Performance Summary ===
+||'''Scenario'''||'''Before'''||'''After'''||'''Improvement'''||'''Key Change'''||
+||User Activity Aggregation||226ms||158ms||30%||Disk sort eliminated, index scan||
+||Monthly Trends with Top Resources||98ms||57ms||42%||Index Only Scan, 98% fewer buffers||
+||Admin Approval Workload||121ms||99ms||18%||Partial covering index, no heap fetches||
+||'''Phase P6 Report'''||'''Before'''||'''After'''||'''Gain'''||'''Key change'''||
+||Report 1 — Resource Utilization (quarterly)||181 ms||82 ms||55%||Composite index enables Index Only Scan for CTE aggregation||
+||Report 3 — User Activity Analysis||280 ms||176 ms||37%||INCLUDE covering index removes 9 MB disk sort and heap fetches||
+||Report 4 — Admin Workload Analysis||167 ms||129 ms||23%||Partial covering index replaces seq scan, removes 4.5 MB disk sort||
+=== Note on Reports That Do Not Benefit from Indexes ===
+Phase P6 Report 2 (Monthly Reservation Trends) without any date filter scans and aggregates the entire reservations table grouped by month. A full-table aggregation cannot be sped up by indexes alone — the planner would still have to read every row. For that report the correct optimization is either (a) to add a date filter (e.g. "last 12 months"), at which point the {{{idx_reservations_start_resource_status}}} index cuts execution time by about 10%, or (b) to pre-compute monthly summaries in a materialized view that is refreshed periodically. The same applies to any dashboard query that aggregates over the whole history.
 == Security Measures ==
 …
 }}}
 psycopg2 sends the query template and parameters separately to PostgreSQL, which parses the query structure first and then binds the parameters. This makes SQL injection impossible regardless of user input content.
+psycopg2 sends the query template and parameters separately to PostgreSQL. The server parses the template first and only then binds the parameters, so injection is impossible regardless of the input content.
 === Password Hashing ===
 User passwords are never stored in plain text. The prototype uses '''bcrypt''' with a random salt for hashing:
+Passwords are never stored in plain text. The prototype uses '''bcrypt''' with a random per-password salt:
 {{{
 …
 }}}
 bcrypt is a deliberately slow hashing algorithm designed to resist brute-force attacks. The salt is generated per-password, preventing rainbow table attacks.
+bcrypt is a deliberately slow hashing algorithm with a configurable work factor, which resists brute-force attacks. The per-password salt defeats rainbow table attacks.
 === Role-Based Access Control ===
 Access control is enforced at two levels:
+Access control is enforced at two independent layers:
 '''Application level:''' The main menu only shows actions appropriate for the user's role. Students see "Browse Resources" only, Teaching Staff can also make reservations, and Administrators get approval, analytics, and user management options.
+'''Application layer (main.py):''' the main menu exposes only the actions appropriate for the logged-in user's role. Students can browse; Teaching Staff can also reserve; Administrators see approval, analytics, and user management.
 {{{
-# main.py
 if role == "Teaching Staff":
     options.append("Make a Resource Reservation")
 …
 }}}
 '''Database level:''' A trigger ({{{trg_check_approver_is_admin}}}, from Phase 7) enforces that only users with the Administrator role can be set as {{{approved_by}}} on reservations. This prevents privilege escalation even if the application layer is bypassed.
+'''Database layer (Phase P7 trigger {{{trg_check_approver_is_admin}}}):''' enforces that the {{{approved_by}}} column can only reference users whose {{{type_name}}} is {{{'Administrator'}}}. Even if the application layer is bypassed (for example, by a direct SQL client), the database rejects any attempt to record a non-admin as the approver.
 {{{
--- Trigger rejects if approver is not an Administrator
 IF v_type_name != 'Administrator' THEN
     RAISE EXCEPTION 'Only administrators can approve reservations. User % is "%"',
 …
 === Connection Security ===
 Database connections use a '''connection pool''' (from Phase 8) with explicit transaction management. This provides:
+Database connections use a '''connection pool''' (from Phase P8) with explicit transaction management. This provides:
  * '''Resource exhaustion protection:''' Maximum 10 connections prevents a runaway process from consuming all database connections.
  * '''Automatic cleanup:''' Connections are returned to the pool via context managers, preventing connection leaks.
  * '''Transaction isolation:''' Write operations use explicit transactions ({{{get_transaction()}}}) that automatically roll back on errors, preventing partial writes from corrupting data.
+ * '''Resource-exhaustion protection:''' at most 10 concurrent connections, so a runaway process cannot consume all database connection slots.
+ * '''Automatic cleanup:''' connections are always returned to the pool via context managers, preventing connection leaks that could otherwise reach the server limit.
+ * '''Transaction isolation:''' write operations use {{{get_transaction()}}}, which rolls back automatically on any exception and therefore cannot leave the database in a partially-written state.