Changes between Initial Version and Version 1 of OtherTopics


Ignore:
Timestamp:
04/05/26 20:17:36 (6 days ago)
Author:
221511
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • OtherTopics

    v1 v1  
     1= Other Topics =
     2
     3== SQL Performance ==
     4
     5Performance analysis was conducted on the local development database with 200,000 generated reservations spread uniformly across 2 years (2024-2026). The small seed dataset (26 rows) is insufficient for meaningful index analysis, as sequential scans are optimal for small tables.
     6
     7The testing method: each query is run with {{{EXPLAIN (ANALYZE, BUFFERS)}}} before and after index creation. We compare execution time, scan type, and buffer usage.
     8
     9=== Scenario 1: User Activity Aggregation ===
     10
     11'''Query:''' Per-user reservation statistics with status breakdown and distinct resource count (from Report 3: User Behavior Analysis).
     12
     13{{{
     14SELECT
     15    rv.user_id,
     16    COUNT(*) AS total_reservations,
     17    COUNT(*) FILTER (WHERE rv.status IN ('approved', 'completed')) AS approved,
     18    COUNT(*) FILTER (WHERE rv.status = 'rejected') AS rejected,
     19    COUNT(*) FILTER (WHERE rv.status = 'cancelled') AS cancelled,
     20    ROUND(AVG(EXTRACT(EPOCH FROM (rv.end_time - rv.start_time)) / 3600.0), 1) AS avg_duration,
     21    COUNT(DISTINCT rv.resource_id) AS distinct_resources_used
     22FROM reservations rv
     23GROUP BY rv.user_id;
     24}}}
     25
     26'''Proposed index:'''
     27{{{
     28CREATE INDEX idx_reservations_user_status_resource
     29    ON reservations (user_id, status, resource_id);
     30}}}
     31
     32'''Before index:'''
     33{{{
     34 GroupAggregate  (actual time=146.682..223.586 rows=12 loops=1)
     35   Buffers: shared hit=3001, temp read=1129 written=1133
     36   ->  Sort  (actual time=139.784..165.344 rows=200026 loops=1)
     37         Sort Method: external merge  Disk: 9032kB
     38         ->  Seq Scan on reservations rv  (actual time=0.005..20.583 rows=200026 loops=1)
     39 Execution Time: 226.085 ms
     40}}}
     41
     42'''After index:'''
     43{{{
     44 GroupAggregate  (actual time=22.372..157.807 rows=12 loops=1)
     45   Buffers: shared hit=36040
     46   ->  Incremental Sort  (actual time=8.413..96.559 rows=200026 loops=1)
     47         Presorted Key: user_id
     48         Full-sort Groups: 12  Sort Method: quicksort  Average Memory: 1817kB
     49         ->  Index Scan using idx_reservations_user  (actual time=0.015..54.051 rows=200026 loops=1)
     50 Execution Time: 157.920 ms
     51}}}
     52
     53'''Analysis:''' The index provides pre-sorted data by {{{user_id}}}, enabling incremental sort instead of a full external merge sort. The critical improvement is '''elimination of disk-based sorting''' (9032kB spilled to disk) in favor of in-memory quicksort. Buffer reads come entirely from shared memory (no temp I/O).
     54
     55'''Result: 226ms -> 158ms (30% improvement), disk sort eliminated.'''
     56
     57=== Scenario 2: Monthly Reservation Trends with Top Resources ===
     58
     59'''Query:''' Monthly breakdown with aggregated stats and top demanded resource per month (from Report 2: Monthly Trends). Uses CTEs, window functions ({{{ROW_NUMBER}}}), date filtering, and a JOIN to the resources table.
     60
     61{{{
     62WITH monthly_overview AS (
     63    SELECT DATE_TRUNC('month', rv.start_time) AS month, COUNT(*) AS total,
     64           COUNT(*) FILTER (WHERE rv.status IN ('approved','completed')) AS approved,
     65           COUNT(DISTINCT rv.user_id) AS unique_users
     66    FROM reservations rv
     67    WHERE rv.start_time >= '2025-01-01' AND rv.start_time < '2025-07-01'
     68    GROUP BY DATE_TRUNC('month', rv.start_time)
     69),
     70resource_demand AS (
     71    SELECT DATE_TRUNC('month', rv.start_time) AS month, r.name,
     72           COUNT(*) AS demand_count,
     73           ROW_NUMBER() OVER (PARTITION BY DATE_TRUNC('month', rv.start_time)
     74                              ORDER BY COUNT(*) DESC) AS rank
     75    FROM reservations rv JOIN resources r ON rv.resource_id = r.resource_id
     76    WHERE rv.start_time >= '2025-01-01' AND rv.start_time < '2025-07-01'
     77    GROUP BY DATE_TRUNC('month', rv.start_time), r.name
     78)
     79SELECT mo.*, rd.name AS top_resource, rd.demand_count
     80FROM monthly_overview mo
     81LEFT JOIN resource_demand rd ON mo.month = rd.month AND rd.rank = 1
     82ORDER BY mo.month;
     83}}}
     84
     85'''Proposed index:'''
     86{{{
     87CREATE INDEX idx_reservations_start_resource_status
     88    ON reservations (start_time, resource_id, status);
     89}}}
     90
     91'''Before index:'''
     92{{{
     93 Nested Loop Left Join  (actual time=89.338..98.077 rows=6 loops=1)
     94   Buffers: shared hit=75668
     95   ...
     96   ->  Nested Loop  (actual time=0.955..1.866 rows=1931 loops=27)
     97         ->  Index Scan using idx_reservations_resource on reservations rv_1
     98               Filter: (start_time >= ... AND start_time < ...)
     99               Rows Removed by Filter: 5478
     100               Buffers: shared hit=74673
     101 Execution Time: 98.322 ms
     102}}}
     103
     104'''After index:'''
     105{{{
     106 Sort  (actual time=56.456..56.525 rows=6 loops=1)
     107   Buffers: shared hit=1055 read=261
     108   ...
     109   ->  Parallel Index Only Scan using idx_reservations_start_resource_status
     110         on reservations rv_1
     111         Index Cond: (start_time >= ... AND start_time < ...)
     112         Heap Fetches: 0
     113         Buffers: shared hit=3 read=261
     114 Execution Time: 56.818 ms
     115}}}
     116
     117'''Analysis:''' The composite index on {{{(start_time, resource_id, status)}}} enables an '''Index Only Scan''' for the resource demand subquery — meaning PostgreSQL reads all needed columns directly from the index without touching the heap table ({{{Heap Fetches: 0}}}). Buffer usage dropped dramatically from 75,668 to 1,316 hits. The old plan used a nested loop scanning 5,478 irrelevant rows per resource; the new plan avoids this entirely.
     118
     119'''Result: 98ms -> 57ms (42% improvement), buffer hits reduced 98%.'''
     120
     121=== Scenario 3: Administrator Approval Workload ===
     122
     123'''Query:''' Per-admin aggregation of reviewed reservations with status breakdown and distinct counts (from Report 4: Admin Workload Analysis). Filters on {{{approved_by IS NOT NULL}}}.
     124
     125{{{
     126SELECT
     127    rv.approved_by AS admin_id,
     128    COUNT(*) AS total_reviewed,
     129    COUNT(*) FILTER (WHERE rv.status IN ('approved', 'completed')) AS approved_count,
     130    COUNT(*) FILTER (WHERE rv.status = 'rejected') AS rejected_count,
     131    COUNT(DISTINCT rv.resource_id) AS distinct_resources_handled,
     132    COUNT(DISTINCT rv.user_id) AS distinct_users_served
     133FROM reservations rv
     134WHERE rv.approved_by IS NOT NULL
     135GROUP BY rv.approved_by;
     136}}}
     137
     138'''Proposed index:'''
     139{{{
     140CREATE INDEX idx_reservations_approver_status
     141    ON reservations (approved_by, status, resource_id, user_id)
     142    WHERE approved_by IS NOT NULL;
     143}}}
     144
     145This is a '''partial covering index''' — it only indexes rows where {{{approved_by IS NOT NULL}}} (about 70% of rows), and includes all columns the query needs.
     146
     147'''Before index:'''
     148{{{
     149 GroupAggregate  (actual time=100.297..118.534 rows=2 loops=1)
     150   Buffers: shared hit=3001, temp read=572 written=574
     151   ->  Sort  (actual time=81.884..95.827 rows=140517 loops=1)
     152         Sort Method: external merge  Disk: 4576kB
     153         ->  Seq Scan on reservations rv  (actual time=0.005..19.879 rows=140517 loops=1)
     154               Filter: (approved_by IS NOT NULL)
     155               Rows Removed by Filter: 59509
     156 Execution Time: 121.020 ms
     157}}}
     158
     159'''After index:'''
     160{{{
     161 GroupAggregate  (actual time=82.464..97.428 rows=2 loops=1)
     162   Buffers: shared hit=4 read=132
     163   ->  Incremental Sort  (actual time=30.378..75.557 rows=140517 loops=1)
     164         Presorted Key: approved_by
     165         ->  Index Only Scan using idx_reservations_approver_status
     166               on reservations rv  (actual time=0.338..11.357 rows=140517 loops=1)
     167               Heap Fetches: 0
     168               Buffers: shared hit=1 read=132
     169 Execution Time: 98.533 ms
     170}}}
     171
     172'''Analysis:''' The partial covering index eliminates the sequential scan entirely. PostgreSQL performs an '''Index Only Scan''' ({{{Heap Fetches: 0}}}) reading only 133 buffers instead of 3,001. The {{{WHERE approved_by IS NOT NULL}}} filter in the index definition means the index is smaller and doesn't waste space on irrelevant rows. Disk-based sorting (4576kB) is eliminated in favor of presorted incremental sort.
     173
     174'''Result: 121ms -> 99ms (18% improvement), sequential scan eliminated, buffer reads reduced 96%.'''
     175
     176=== Performance Summary ===
     177
     178||'''Scenario'''||'''Before'''||'''After'''||'''Improvement'''||'''Key Change'''||
     179||User Activity Aggregation||226ms||158ms||30%||Disk sort eliminated, index scan||
     180||Monthly Trends with Top Resources||98ms||57ms||42%||Index Only Scan, 98% fewer buffers||
     181||Admin Approval Workload||121ms||99ms||18%||Partial covering index, no heap fetches||
     182
     183== Security Measures ==
     184
     185=== SQL Injection Prevention ===
     186
     187All database queries in the prototype use '''parameterized queries''' via psycopg2's {{{%s}}} placeholders. User input is never concatenated into SQL strings.
     188
     189Example from {{{uc_reserve.py}}}:
     190{{{
     191cur.execute("""
     192    INSERT INTO reservations
     193        (start_time, end_time, status, purpose, created_at, user_id, resource_id)
     194    VALUES (%s, %s, 'pending', %s, CURRENT_TIMESTAMP, %s, %s)
     195    RETURNING reservation_id, status, created_at
     196""", (start_dt, end_dt, purpose, user_id, rid))
     197}}}
     198
     199psycopg2 sends the query template and parameters separately to PostgreSQL, which parses the query structure first and then binds the parameters. This makes SQL injection impossible regardless of user input content.
     200
     201=== Password Hashing ===
     202
     203User passwords are never stored in plain text. The prototype uses '''bcrypt''' with a random salt for hashing:
     204
     205{{{
     206# Registration (uc_users.py)
     207hashed = bcrypt.hashpw(password.encode(), bcrypt.gensalt()).decode()
     208
     209# Login verification (main.py)
     210if bcrypt.checkpw(password.encode(), row[3].encode()):
     211    # authenticated
     212}}}
     213
     214bcrypt is a deliberately slow hashing algorithm designed to resist brute-force attacks. The salt is generated per-password, preventing rainbow table attacks.
     215
     216=== Role-Based Access Control ===
     217
     218Access control is enforced at two levels:
     219
     220'''Application level:''' The main menu only shows actions appropriate for the user's role. Students see "Browse Resources" only, Teaching Staff can also make reservations, and Administrators get approval, analytics, and user management options.
     221
     222{{{
     223# main.py
     224if role == "Teaching Staff":
     225    options.append("Make a Resource Reservation")
     226if role == "Administrator":
     227    options.append("Approve or Reject Reservations")
     228    options.append("View Resource Usage Analytics")
     229    options.append("Register a New User")
     230}}}
     231
     232'''Database level:''' A trigger ({{{trg_check_approver_is_admin}}}, from Phase 7) enforces that only users with the Administrator role can be set as {{{approved_by}}} on reservations. This prevents privilege escalation even if the application layer is bypassed.
     233
     234{{{
     235-- Trigger rejects if approver is not an Administrator
     236IF v_type_name != 'Administrator' THEN
     237    RAISE EXCEPTION 'Only administrators can approve reservations. User % is "%"',
     238        NEW.approved_by, v_type_name;
     239END IF;
     240}}}
     241
     242=== Connection Security ===
     243
     244Database connections use a '''connection pool''' (from Phase 8) with explicit transaction management. This provides:
     245
     246 * '''Resource exhaustion protection:''' Maximum 10 connections prevents a runaway process from consuming all database connections.
     247 * '''Automatic cleanup:''' Connections are returned to the pool via context managers, preventing connection leaks.
     248 * '''Transaction isolation:''' Write operations use explicit transactions ({{{get_transaction()}}}) that automatically roll back on errors, preventing partial writes from corrupting data.