wiki:OtherTopics

Other Topics

SQL Performance

Performance analysis was conducted on the local development database with 200,000 generated reservations spread uniformly across 2 years (2024-2026). The small seed dataset (26 rows) is insufficient for meaningful index analysis, as sequential scans are optimal for small tables.

The testing method: each query is run with EXPLAIN (ANALYZE, BUFFERS) before and after index creation. We compare execution time, scan type, and buffer usage.

Scenario 1: User Activity Aggregation

Query: Per-user reservation statistics with status breakdown and distinct resource count (from Report 3: User Behavior Analysis).

SELECT
    rv.user_id,
    COUNT(*) AS total_reservations,
    COUNT(*) FILTER (WHERE rv.status IN ('approved', 'completed')) AS approved,
    COUNT(*) FILTER (WHERE rv.status = 'rejected') AS rejected,
    COUNT(*) FILTER (WHERE rv.status = 'cancelled') AS cancelled,
    ROUND(AVG(EXTRACT(EPOCH FROM (rv.end_time - rv.start_time)) / 3600.0), 1) AS avg_duration,
    COUNT(DISTINCT rv.resource_id) AS distinct_resources_used
FROM reservations rv
GROUP BY rv.user_id;

Proposed index:

CREATE INDEX idx_reservations_user_status_resource
    ON reservations (user_id, status, resource_id);

Before index:

 GroupAggregate  (actual time=146.682..223.586 rows=12 loops=1)
   Buffers: shared hit=3001, temp read=1129 written=1133
   ->  Sort  (actual time=139.784..165.344 rows=200026 loops=1)
         Sort Method: external merge  Disk: 9032kB
         ->  Seq Scan on reservations rv  (actual time=0.005..20.583 rows=200026 loops=1)
 Execution Time: 226.085 ms

After index:

 GroupAggregate  (actual time=22.372..157.807 rows=12 loops=1)
   Buffers: shared hit=36040
   ->  Incremental Sort  (actual time=8.413..96.559 rows=200026 loops=1)
         Presorted Key: user_id
         Full-sort Groups: 12  Sort Method: quicksort  Average Memory: 1817kB
         ->  Index Scan using idx_reservations_user  (actual time=0.015..54.051 rows=200026 loops=1)
 Execution Time: 157.920 ms

Analysis: The index provides pre-sorted data by user_id, enabling incremental sort instead of a full external merge sort. The critical improvement is elimination of disk-based sorting (9032kB spilled to disk) in favor of in-memory quicksort. Buffer reads come entirely from shared memory (no temp I/O).

Result: 226ms -> 158ms (30% improvement), disk sort eliminated.

Scenario 2: Monthly Reservation Trends with Top Resources

Query: Monthly breakdown with aggregated stats and top demanded resource per month (from Report 2: Monthly Trends). Uses CTEs, window functions (ROW_NUMBER), date filtering, and a JOIN to the resources table.

WITH monthly_overview AS (
    SELECT DATE_TRUNC('month', rv.start_time) AS month, COUNT(*) AS total,
           COUNT(*) FILTER (WHERE rv.status IN ('approved','completed')) AS approved,
           COUNT(DISTINCT rv.user_id) AS unique_users
    FROM reservations rv
    WHERE rv.start_time >= '2025-01-01' AND rv.start_time < '2025-07-01'
    GROUP BY DATE_TRUNC('month', rv.start_time)
),
resource_demand AS (
    SELECT DATE_TRUNC('month', rv.start_time) AS month, r.name,
           COUNT(*) AS demand_count,
           ROW_NUMBER() OVER (PARTITION BY DATE_TRUNC('month', rv.start_time)
                              ORDER BY COUNT(*) DESC) AS rank
    FROM reservations rv JOIN resources r ON rv.resource_id = r.resource_id
    WHERE rv.start_time >= '2025-01-01' AND rv.start_time < '2025-07-01'
    GROUP BY DATE_TRUNC('month', rv.start_time), r.name
)
SELECT mo.*, rd.name AS top_resource, rd.demand_count
FROM monthly_overview mo
LEFT JOIN resource_demand rd ON mo.month = rd.month AND rd.rank = 1
ORDER BY mo.month;

Proposed index:

CREATE INDEX idx_reservations_start_resource_status
    ON reservations (start_time, resource_id, status);

Before index:

 Nested Loop Left Join  (actual time=89.338..98.077 rows=6 loops=1)
   Buffers: shared hit=75668
   ...
   ->  Nested Loop  (actual time=0.955..1.866 rows=1931 loops=27)
         ->  Index Scan using idx_reservations_resource on reservations rv_1
               Filter: (start_time >= ... AND start_time < ...)
               Rows Removed by Filter: 5478
               Buffers: shared hit=74673
 Execution Time: 98.322 ms

After index:

 Sort  (actual time=56.456..56.525 rows=6 loops=1)
   Buffers: shared hit=1055 read=261
   ...
   ->  Parallel Index Only Scan using idx_reservations_start_resource_status
         on reservations rv_1
         Index Cond: (start_time >= ... AND start_time < ...)
         Heap Fetches: 0
         Buffers: shared hit=3 read=261
 Execution Time: 56.818 ms

Analysis: The composite index on (start_time, resource_id, status) enables an Index Only Scan for the resource demand subquery — meaning PostgreSQL reads all needed columns directly from the index without touching the heap table (Heap Fetches: 0). Buffer usage dropped dramatically from 75,668 to 1,316 hits. The old plan used a nested loop scanning 5,478 irrelevant rows per resource; the new plan avoids this entirely.

Result: 98ms -> 57ms (42% improvement), buffer hits reduced 98%.

Scenario 3: Administrator Approval Workload

Query: Per-admin aggregation of reviewed reservations with status breakdown and distinct counts (from Report 4: Admin Workload Analysis). Filters on approved_by IS NOT NULL.

SELECT
    rv.approved_by AS admin_id,
    COUNT(*) AS total_reviewed,
    COUNT(*) FILTER (WHERE rv.status IN ('approved', 'completed')) AS approved_count,
    COUNT(*) FILTER (WHERE rv.status = 'rejected') AS rejected_count,
    COUNT(DISTINCT rv.resource_id) AS distinct_resources_handled,
    COUNT(DISTINCT rv.user_id) AS distinct_users_served
FROM reservations rv
WHERE rv.approved_by IS NOT NULL
GROUP BY rv.approved_by;

Proposed index:

CREATE INDEX idx_reservations_approver_status
    ON reservations (approved_by, status, resource_id, user_id)
    WHERE approved_by IS NOT NULL;

This is a partial covering index — it only indexes rows where approved_by IS NOT NULL (about 70% of rows), and includes all columns the query needs.

Before index:

 GroupAggregate  (actual time=100.297..118.534 rows=2 loops=1)
   Buffers: shared hit=3001, temp read=572 written=574
   ->  Sort  (actual time=81.884..95.827 rows=140517 loops=1)
         Sort Method: external merge  Disk: 4576kB
         ->  Seq Scan on reservations rv  (actual time=0.005..19.879 rows=140517 loops=1)
               Filter: (approved_by IS NOT NULL)
               Rows Removed by Filter: 59509
 Execution Time: 121.020 ms

After index:

 GroupAggregate  (actual time=82.464..97.428 rows=2 loops=1)
   Buffers: shared hit=4 read=132
   ->  Incremental Sort  (actual time=30.378..75.557 rows=140517 loops=1)
         Presorted Key: approved_by
         ->  Index Only Scan using idx_reservations_approver_status
               on reservations rv  (actual time=0.338..11.357 rows=140517 loops=1)
               Heap Fetches: 0
               Buffers: shared hit=1 read=132
 Execution Time: 98.533 ms

Analysis: The partial covering index eliminates the sequential scan entirely. PostgreSQL performs an Index Only Scan (Heap Fetches: 0) reading only 133 buffers instead of 3,001. The WHERE approved_by IS NOT NULL filter in the index definition means the index is smaller and doesn't waste space on irrelevant rows. Disk-based sorting (4576kB) is eliminated in favor of presorted incremental sort.

Result: 121ms -> 99ms (18% improvement), sequential scan eliminated, buffer reads reduced 96%.

Performance Summary

ScenarioBeforeAfterImprovementKey Change
User Activity Aggregation226ms158ms30%Disk sort eliminated, index scan
Monthly Trends with Top Resources98ms57ms42%Index Only Scan, 98% fewer buffers
Admin Approval Workload121ms99ms18%Partial covering index, no heap fetches

Security Measures

SQL Injection Prevention

All database queries in the prototype use parameterized queries via psycopg2's %s placeholders. User input is never concatenated into SQL strings.

Example from uc_reserve.py:

cur.execute("""
    INSERT INTO reservations
        (start_time, end_time, status, purpose, created_at, user_id, resource_id)
    VALUES (%s, %s, 'pending', %s, CURRENT_TIMESTAMP, %s, %s)
    RETURNING reservation_id, status, created_at
""", (start_dt, end_dt, purpose, user_id, rid))

psycopg2 sends the query template and parameters separately to PostgreSQL, which parses the query structure first and then binds the parameters. This makes SQL injection impossible regardless of user input content.

Password Hashing

User passwords are never stored in plain text. The prototype uses bcrypt with a random salt for hashing:

# Registration (uc_users.py)
hashed = bcrypt.hashpw(password.encode(), bcrypt.gensalt()).decode()

# Login verification (main.py)
if bcrypt.checkpw(password.encode(), row[3].encode()):
    # authenticated

bcrypt is a deliberately slow hashing algorithm designed to resist brute-force attacks. The salt is generated per-password, preventing rainbow table attacks.

Role-Based Access Control

Access control is enforced at two levels:

Application level: The main menu only shows actions appropriate for the user's role. Students see "Browse Resources" only, Teaching Staff can also make reservations, and Administrators get approval, analytics, and user management options.

# main.py
if role == "Teaching Staff":
    options.append("Make a Resource Reservation")
if role == "Administrator":
    options.append("Approve or Reject Reservations")
    options.append("View Resource Usage Analytics")
    options.append("Register a New User")

Database level: A trigger (trg_check_approver_is_admin, from Phase 7) enforces that only users with the Administrator role can be set as approved_by on reservations. This prevents privilege escalation even if the application layer is bypassed.

-- Trigger rejects if approver is not an Administrator
IF v_type_name != 'Administrator' THEN
    RAISE EXCEPTION 'Only administrators can approve reservations. User % is "%"',
        NEW.approved_by, v_type_name;
END IF;

Connection Security

Database connections use a connection pool (from Phase 8) with explicit transaction management. This provides:

  • Resource exhaustion protection: Maximum 10 connections prevents a runaway process from consuming all database connections.
  • Automatic cleanup: Connections are returned to the pool via context managers, preventing connection leaks.
  • Transaction isolation: Write operations use explicit transactions (get_transaction()) that automatically roll back on errors, preventing partial writes from corrupting data.
Last modified 6 days ago Last modified on 04/05/26 20:17:36
Note: See TracWiki for help on using the wiki.