Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Initial Version and Version 1 of OtherTopics

Timestamp:: 04/05/26 20:17:36 (3 months ago)
Author:: 221511
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

OtherTopics

               v1
+= Other Topics =
+== SQL Performance ==
+Performance analysis was conducted on the local development database with 200,000 generated reservations spread uniformly across 2 years (2024-2026). The small seed dataset (26 rows) is insufficient for meaningful index analysis, as sequential scans are optimal for small tables.
+The testing method: each query is run with {{{EXPLAIN (ANALYZE, BUFFERS)}}} before and after index creation. We compare execution time, scan type, and buffer usage.
+=== Scenario 1: User Activity Aggregation ===
+'''Query:''' Per-user reservation statistics with status breakdown and distinct resource count (from Report 3: User Behavior Analysis).
+{{{
+SELECT
+    rv.user_id,
+    COUNT(*) AS total_reservations,
+    COUNT(*) FILTER (WHERE rv.status IN ('approved', 'completed')) AS approved,
+    COUNT(*) FILTER (WHERE rv.status = 'rejected') AS rejected,
+    COUNT(*) FILTER (WHERE rv.status = 'cancelled') AS cancelled,
+    ROUND(AVG(EXTRACT(EPOCH FROM (rv.end_time - rv.start_time)) / 3600.0), 1) AS avg_duration,
+    COUNT(DISTINCT rv.resource_id) AS distinct_resources_used
+FROM reservations rv
+GROUP BY rv.user_id;
+}}}
+'''Proposed index:'''
+{{{
+CREATE INDEX idx_reservations_user_status_resource
+    ON reservations (user_id, status, resource_id);
+}}}
+'''Before index:'''
+{{{
+ GroupAggregate  (actual time=146.682..223.586 rows=12 loops=1)
+   Buffers: shared hit=3001, temp read=1129 written=1133
+   ->  Sort  (actual time=139.784..165.344 rows=200026 loops=1)
+         Sort Method: external merge  Disk: 9032kB
+         ->  Seq Scan on reservations rv  (actual time=0.005..20.583 rows=200026 loops=1)
+ Execution Time: 226.085 ms
+}}}
+'''After index:'''
+{{{
+ GroupAggregate  (actual time=22.372..157.807 rows=12 loops=1)
+   Buffers: shared hit=36040
+   ->  Incremental Sort  (actual time=8.413..96.559 rows=200026 loops=1)
+         Presorted Key: user_id
+         Full-sort Groups: 12  Sort Method: quicksort  Average Memory: 1817kB
+         ->  Index Scan using idx_reservations_user  (actual time=0.015..54.051 rows=200026 loops=1)
+ Execution Time: 157.920 ms
+}}}
+'''Analysis:''' The index provides pre-sorted data by {{{user_id}}}, enabling incremental sort instead of a full external merge sort. The critical improvement is '''elimination of disk-based sorting''' (9032kB spilled to disk) in favor of in-memory quicksort. Buffer reads come entirely from shared memory (no temp I/O).
+'''Result: 226ms -> 158ms (30% improvement), disk sort eliminated.'''
+=== Scenario 2: Monthly Reservation Trends with Top Resources ===
+'''Query:''' Monthly breakdown with aggregated stats and top demanded resource per month (from Report 2: Monthly Trends). Uses CTEs, window functions ({{{ROW_NUMBER}}}), date filtering, and a JOIN to the resources table.
+{{{
+WITH monthly_overview AS (
+    SELECT DATE_TRUNC('month', rv.start_time) AS month, COUNT(*) AS total,
+           COUNT(*) FILTER (WHERE rv.status IN ('approved','completed')) AS approved,
+           COUNT(DISTINCT rv.user_id) AS unique_users
+    FROM reservations rv
+    WHERE rv.start_time >= '2025-01-01' AND rv.start_time < '2025-07-01'
+    GROUP BY DATE_TRUNC('month', rv.start_time)
+),
+resource_demand AS (
+    SELECT DATE_TRUNC('month', rv.start_time) AS month, r.name,
+           COUNT(*) AS demand_count,
+           ROW_NUMBER() OVER (PARTITION BY DATE_TRUNC('month', rv.start_time)
+                              ORDER BY COUNT(*) DESC) AS rank
+    FROM reservations rv JOIN resources r ON rv.resource_id = r.resource_id
+    WHERE rv.start_time >= '2025-01-01' AND rv.start_time < '2025-07-01'
+    GROUP BY DATE_TRUNC('month', rv.start_time), r.name
+)
+SELECT mo.*, rd.name AS top_resource, rd.demand_count
+FROM monthly_overview mo
+LEFT JOIN resource_demand rd ON mo.month = rd.month AND rd.rank = 1
+ORDER BY mo.month;
+}}}
+'''Proposed index:'''
+{{{
+CREATE INDEX idx_reservations_start_resource_status
+    ON reservations (start_time, resource_id, status);
+}}}
+'''Before index:'''
+{{{
+ Nested Loop Left Join  (actual time=89.338..98.077 rows=6 loops=1)
+   Buffers: shared hit=75668
+   ...
+   ->  Nested Loop  (actual time=0.955..1.866 rows=1931 loops=27)
+         ->  Index Scan using idx_reservations_resource on reservations rv_1
+               Filter: (start_time >= ... AND start_time < ...)
+               Rows Removed by Filter: 5478
+               Buffers: shared hit=74673
+ Execution Time: 98.322 ms
+}}}
+'''After index:'''
+{{{
+ Sort  (actual time=56.456..56.525 rows=6 loops=1)
+   Buffers: shared hit=1055 read=261
+   ...
+   ->  Parallel Index Only Scan using idx_reservations_start_resource_status
+         on reservations rv_1
+         Index Cond: (start_time >= ... AND start_time < ...)
+         Heap Fetches: 0
+         Buffers: shared hit=3 read=261
+ Execution Time: 56.818 ms
+}}}
+'''Analysis:''' The composite index on {{{(start_time, resource_id, status)}}} enables an '''Index Only Scan''' for the resource demand subquery — meaning PostgreSQL reads all needed columns directly from the index without touching the heap table ({{{Heap Fetches: 0}}}). Buffer usage dropped dramatically from 75,668 to 1,316 hits. The old plan used a nested loop scanning 5,478 irrelevant rows per resource; the new plan avoids this entirely.
+'''Result: 98ms -> 57ms (42% improvement), buffer hits reduced 98%.'''
+=== Scenario 3: Administrator Approval Workload ===
+'''Query:''' Per-admin aggregation of reviewed reservations with status breakdown and distinct counts (from Report 4: Admin Workload Analysis). Filters on {{{approved_by IS NOT NULL}}}.
+{{{
+SELECT
+    rv.approved_by AS admin_id,
+    COUNT(*) AS total_reviewed,
+    COUNT(*) FILTER (WHERE rv.status IN ('approved', 'completed')) AS approved_count,
+    COUNT(*) FILTER (WHERE rv.status = 'rejected') AS rejected_count,
+    COUNT(DISTINCT rv.resource_id) AS distinct_resources_handled,
+    COUNT(DISTINCT rv.user_id) AS distinct_users_served
+FROM reservations rv
+WHERE rv.approved_by IS NOT NULL
+GROUP BY rv.approved_by;
+}}}
+'''Proposed index:'''
+{{{
+CREATE INDEX idx_reservations_approver_status
+    ON reservations (approved_by, status, resource_id, user_id)
+    WHERE approved_by IS NOT NULL;
+}}}
+This is a '''partial covering index''' — it only indexes rows where {{{approved_by IS NOT NULL}}} (about 70% of rows), and includes all columns the query needs.
+'''Before index:'''
+{{{
+ GroupAggregate  (actual time=100.297..118.534 rows=2 loops=1)
+   Buffers: shared hit=3001, temp read=572 written=574
+   ->  Sort  (actual time=81.884..95.827 rows=140517 loops=1)
+         Sort Method: external merge  Disk: 4576kB
+         ->  Seq Scan on reservations rv  (actual time=0.005..19.879 rows=140517 loops=1)
+               Filter: (approved_by IS NOT NULL)
+               Rows Removed by Filter: 59509
+ Execution Time: 121.020 ms
+}}}
+'''After index:'''
+{{{
+ GroupAggregate  (actual time=82.464..97.428 rows=2 loops=1)
+   Buffers: shared hit=4 read=132
+   ->  Incremental Sort  (actual time=30.378..75.557 rows=140517 loops=1)
+         Presorted Key: approved_by
+         ->  Index Only Scan using idx_reservations_approver_status
+               on reservations rv  (actual time=0.338..11.357 rows=140517 loops=1)
+               Heap Fetches: 0
+               Buffers: shared hit=1 read=132
+ Execution Time: 98.533 ms
+}}}
+'''Analysis:''' The partial covering index eliminates the sequential scan entirely. PostgreSQL performs an '''Index Only Scan''' ({{{Heap Fetches: 0}}}) reading only 133 buffers instead of 3,001. The {{{WHERE approved_by IS NOT NULL}}} filter in the index definition means the index is smaller and doesn't waste space on irrelevant rows. Disk-based sorting (4576kB) is eliminated in favor of presorted incremental sort.
+'''Result: 121ms -> 99ms (18% improvement), sequential scan eliminated, buffer reads reduced 96%.'''
+=== Performance Summary ===
+||'''Scenario'''||'''Before'''||'''After'''||'''Improvement'''||'''Key Change'''||
+||User Activity Aggregation||226ms||158ms||30%||Disk sort eliminated, index scan||
+||Monthly Trends with Top Resources||98ms||57ms||42%||Index Only Scan, 98% fewer buffers||
+||Admin Approval Workload||121ms||99ms||18%||Partial covering index, no heap fetches||
+== Security Measures ==
+=== SQL Injection Prevention ===
+All database queries in the prototype use '''parameterized queries''' via psycopg2's {{{%s}}} placeholders. User input is never concatenated into SQL strings.
+Example from {{{uc_reserve.py}}}:
+{{{
+cur.execute("""
+    INSERT INTO reservations
+        (start_time, end_time, status, purpose, created_at, user_id, resource_id)
+    VALUES (%s, %s, 'pending', %s, CURRENT_TIMESTAMP, %s, %s)
+    RETURNING reservation_id, status, created_at
+""", (start_dt, end_dt, purpose, user_id, rid))
+}}}
+psycopg2 sends the query template and parameters separately to PostgreSQL, which parses the query structure first and then binds the parameters. This makes SQL injection impossible regardless of user input content.
+=== Password Hashing ===
+User passwords are never stored in plain text. The prototype uses '''bcrypt''' with a random salt for hashing:
+{{{
+# Registration (uc_users.py)
+hashed = bcrypt.hashpw(password.encode(), bcrypt.gensalt()).decode()
+# Login verification (main.py)
+if bcrypt.checkpw(password.encode(), row[3].encode()):
+    # authenticated
+}}}
+bcrypt is a deliberately slow hashing algorithm designed to resist brute-force attacks. The salt is generated per-password, preventing rainbow table attacks.
+=== Role-Based Access Control ===
+Access control is enforced at two levels:
+'''Application level:''' The main menu only shows actions appropriate for the user's role. Students see "Browse Resources" only, Teaching Staff can also make reservations, and Administrators get approval, analytics, and user management options.
+{{{
+# main.py
+if role == "Teaching Staff":
+    options.append("Make a Resource Reservation")
+if role == "Administrator":
+    options.append("Approve or Reject Reservations")
+    options.append("View Resource Usage Analytics")
+    options.append("Register a New User")
+}}}
+'''Database level:''' A trigger ({{{trg_check_approver_is_admin}}}, from Phase 7) enforces that only users with the Administrator role can be set as {{{approved_by}}} on reservations. This prevents privilege escalation even if the application layer is bypassed.
+{{{
+-- Trigger rejects if approver is not an Administrator
+IF v_type_name != 'Administrator' THEN
+    RAISE EXCEPTION 'Only administrators can approve reservations. User % is "%"',
+        NEW.approved_by, v_type_name;
+END IF;
+}}}
+=== Connection Security ===
+Database connections use a '''connection pool''' (from Phase 8) with explicit transaction management. This provides:
+ * '''Resource exhaustion protection:''' Maximum 10 connections prevents a runaway process from consuming all database connections.
+ * '''Automatic cleanup:''' Connections are returned to the pool via context managers, preventing connection leaks.
+ * '''Transaction isolation:''' Write operations use explicit transactions ({{{get_transaction()}}}) that automatically roll back on errors, preventing partial writes from corrupting data.