Changes between Version 1 and Version 2 of OtherTopics


Ignore:
Timestamp:
04/22/26 11:12:55 (10 days ago)
Author:
221511
Comment:

made the tests on the report queries from phase 6 and expanded the other explanations

Legend:

Unmodified
Added
Removed
Modified
  • OtherTopics

    v1 v2  
    33== SQL Performance ==
    44
    5 Performance analysis was conducted on the local development database with 200,000 generated reservations spread uniformly across 2 years (2024-2026). The small seed dataset (26 rows) is insufficient for meaningful index analysis, as sequential scans are optimal for small tables.
     5Performance analysis was conducted against the full complex report queries from Phase P6 ([wiki:ComplexReports ComplexReports]). The tests were run on the local development database after loading 200,000 synthetic reservations spread uniformly across 2 years (2024-2026). The small seed dataset (26 rows) cannot reveal index usefulness because sequential scans are optimal at that scale.
    66
    7 The testing method: each query is run with {{{EXPLAIN (ANALYZE, BUFFERS)}}} before and after index creation. We compare execution time, scan type, and buffer usage.
     7The testing method: each P6 report query is executed with {{{EXPLAIN (ANALYZE, BUFFERS)}}} before adding new indexes and after. We compare execution time, scan type, and buffer usage.
    88
    9 === Scenario 1: User Activity Aggregation ===
     9=== Scenario 1: Phase P6 Report 1 — Resource Utilization (quarterly) ===
    1010
    11 '''Query:''' Per-user reservation statistics with status breakdown and distinct resource count (from Report 3: User Behavior Analysis).
    12 
    13 {{{
    14 SELECT
    15     rv.user_id,
    16     COUNT(*) AS total_reservations,
    17     COUNT(*) FILTER (WHERE rv.status IN ('approved', 'completed')) AS approved,
    18     COUNT(*) FILTER (WHERE rv.status = 'rejected') AS rejected,
    19     COUNT(*) FILTER (WHERE rv.status = 'cancelled') AS cancelled,
    20     ROUND(AVG(EXTRACT(EPOCH FROM (rv.end_time - rv.start_time)) / 3600.0), 1) AS avg_duration,
    21     COUNT(DISTINCT rv.resource_id) AS distinct_resources_used
    22 FROM reservations rv
    23 GROUP BY rv.user_id;
    24 }}}
    25 
    26 '''Proposed index:'''
    27 {{{
    28 CREATE INDEX idx_reservations_user_status_resource
    29     ON reservations (user_id, status, resource_id);
    30 }}}
    31 
    32 '''Before index:'''
    33 {{{
    34  GroupAggregate  (actual time=146.682..223.586 rows=12 loops=1)
    35    Buffers: shared hit=3001, temp read=1129 written=1133
    36    ->  Sort  (actual time=139.784..165.344 rows=200026 loops=1)
    37          Sort Method: external merge  Disk: 9032kB
    38          ->  Seq Scan on reservations rv  (actual time=0.005..20.583 rows=200026 loops=1)
    39  Execution Time: 226.085 ms
    40 }}}
    41 
    42 '''After index:'''
    43 {{{
    44  GroupAggregate  (actual time=22.372..157.807 rows=12 loops=1)
    45    Buffers: shared hit=36040
    46    ->  Incremental Sort  (actual time=8.413..96.559 rows=200026 loops=1)
    47          Presorted Key: user_id
    48          Full-sort Groups: 12  Sort Method: quicksort  Average Memory: 1817kB
    49          ->  Index Scan using idx_reservations_user  (actual time=0.015..54.051 rows=200026 loops=1)
    50  Execution Time: 157.920 ms
    51 }}}
    52 
    53 '''Analysis:''' The index provides pre-sorted data by {{{user_id}}}, enabling incremental sort instead of a full external merge sort. The critical improvement is '''elimination of disk-based sorting''' (9032kB spilled to disk) in favor of in-memory quicksort. Buffer reads come entirely from shared memory (no temp I/O).
    54 
    55 '''Result: 226ms -> 158ms (30% improvement), disk sort eliminated.'''
    56 
    57 === Scenario 2: Monthly Reservation Trends with Top Resources ===
    58 
    59 '''Query:''' Monthly breakdown with aggregated stats and top demanded resource per month (from Report 2: Monthly Trends). Uses CTEs, window functions ({{{ROW_NUMBER}}}), date filtering, and a JOIN to the resources table.
    60 
    61 {{{
    62 WITH monthly_overview AS (
    63     SELECT DATE_TRUNC('month', rv.start_time) AS month, COUNT(*) AS total,
    64            COUNT(*) FILTER (WHERE rv.status IN ('approved','completed')) AS approved,
    65            COUNT(DISTINCT rv.user_id) AS unique_users
    66     FROM reservations rv
    67     WHERE rv.start_time >= '2025-01-01' AND rv.start_time < '2025-07-01'
    68     GROUP BY DATE_TRUNC('month', rv.start_time)
    69 ),
    70 resource_demand AS (
    71     SELECT DATE_TRUNC('month', rv.start_time) AS month, r.name,
    72            COUNT(*) AS demand_count,
    73            ROW_NUMBER() OVER (PARTITION BY DATE_TRUNC('month', rv.start_time)
    74                               ORDER BY COUNT(*) DESC) AS rank
    75     FROM reservations rv JOIN resources r ON rv.resource_id = r.resource_id
    76     WHERE rv.start_time >= '2025-01-01' AND rv.start_time < '2025-07-01'
    77     GROUP BY DATE_TRUNC('month', rv.start_time), r.name
    78 )
    79 SELECT mo.*, rd.name AS top_resource, rd.demand_count
    80 FROM monthly_overview mo
    81 LEFT JOIN resource_demand rd ON mo.month = rd.month AND rd.rank = 1
    82 ORDER BY mo.month;
    83 }}}
     11'''Query source:''' the full "Report 1: Resource Utilization and Demand Analysis" from [wiki:ComplexReports ComplexReports]. Uses 5 CTEs: quarter_bounds, quarter_days, resource_availability, reservation_stats, popular_day, peak_hour. Window function {{{RANK()}}} for demand ranking. Filters by a quarterly date range.
    8412
    8513'''Proposed index:'''
     
    8917}}}
    9018
    91 '''Before index:'''
     19'''Before index (excerpt of plan):'''
    9220{{{
    93  Nested Loop Left Join  (actual time=89.338..98.077 rows=6 loops=1)
    94    Buffers: shared hit=75668
    95    ...
    96    ->  Nested Loop  (actual time=0.955..1.866 rows=1931 loops=27)
    97          ->  Index Scan using idx_reservations_resource on reservations rv_1
    98                Filter: (start_time >= ... AND start_time < ...)
    99                Rows Removed by Filter: 5478
    100                Buffers: shared hit=74673
    101  Execution Time: 98.322 ms
     21HashAggregate  (actual rows=189 loops=1)
     22 ->  Nested Loop (actual rows=10383 loops=1)
     23       ->  Bitmap Heap Scan on reservations rv_1 (actual rows=10383 loops=1)
     24             Recheck Cond: start_time >= q_start AND start_time < q_end
     25             Filter: status IN ('approved','completed')
     26             Rows Removed by Filter: 15537
     27             Heap Blocks: exact=364
     28             ->  Bitmap Index Scan on idx_reservations_times
     29Execution Time: 181.324 ms
    10230}}}
    10331
    10432'''After index:'''
    10533{{{
    106  Sort  (actual time=56.456..56.525 rows=6 loops=1)
    107    Buffers: shared hit=1055 read=261
    108    ...
    109    ->  Parallel Index Only Scan using idx_reservations_start_resource_status
    110          on reservations rv_1
    111          Index Cond: (start_time >= ... AND start_time < ...)
    112          Heap Fetches: 0
    113          Buffers: shared hit=3 read=261
    114  Execution Time: 56.818 ms
     34 ->  Index Only Scan using idx_reservations_start_resource_status
     35       on reservations rv_1
     36       Index Cond: start_time >= q_start AND start_time < q_end
     37       Heap Fetches: 0
     38Execution Time: 82.307 ms
    11539}}}
    11640
    117 '''Analysis:''' The composite index on {{{(start_time, resource_id, status)}}} enables an '''Index Only Scan''' for the resource demand subquery — meaning PostgreSQL reads all needed columns directly from the index without touching the heap table ({{{Heap Fetches: 0}}}). Buffer usage dropped dramatically from 75,668 to 1,316 hits. The old plan used a nested loop scanning 5,478 irrelevant rows per resource; the new plan avoids this entirely.
     41'''Analysis:''' The new composite index includes {{{status}}}, so the planner can filter by status inside the index without touching the heap. The {{{popular_day}}} and {{{peak_hour}}} CTEs both switched from Bitmap Heap Scan with "Rows Removed by Filter: 15537" per iteration to a clean Index Only Scan. Heap Fetches is 0.
    11842
    119 '''Result: 98ms -> 57ms (42% improvement), buffer hits reduced 98%.'''
     43'''Result: 181 ms -> 82 ms (55% faster). Index Only Scan confirmed in the plan.'''
    12044
    121 === Scenario 3: Administrator Approval Workload ===
     45=== Scenario 2: Phase P6 Report 3 — User Activity and Behavior Analysis ===
    12246
    123 '''Query:''' Per-admin aggregation of reviewed reservations with status breakdown and distinct counts (from Report 4: Admin Workload Analysis). Filters on {{{approved_by IS NOT NULL}}}.
     47'''Query source:''' the full "Report 3: User Activity and Behavior Analysis" from [wiki:ComplexReports ComplexReports]. Uses 2 CTEs (user_stats, favorite_resource), {{{DISTINCT ON}}}, {{{RANK()}}}, {{{COUNT(*) FILTER (WHERE ...)}}} for per-status counts, and aggregates over the entire reservations table grouped by user_id.
    12448
     49'''Proposed index:'''
    12550{{{
    126 SELECT
    127     rv.approved_by AS admin_id,
    128     COUNT(*) AS total_reviewed,
    129     COUNT(*) FILTER (WHERE rv.status IN ('approved', 'completed')) AS approved_count,
    130     COUNT(*) FILTER (WHERE rv.status = 'rejected') AS rejected_count,
    131     COUNT(DISTINCT rv.resource_id) AS distinct_resources_handled,
    132     COUNT(DISTINCT rv.user_id) AS distinct_users_served
    133 FROM reservations rv
    134 WHERE rv.approved_by IS NOT NULL
    135 GROUP BY rv.approved_by;
     51CREATE INDEX idx_reservations_user_status_resource
     52    ON reservations (user_id, status, resource_id)
     53    INCLUDE (start_time, end_time);
    13654}}}
     55
     56The {{{INCLUDE}}} clause adds start_time and end_time as index payload (not used for ordering) so the query can compute {{{AVG(end_time - start_time)}}} and {{{SUM(...) FILTER (...)}}} without heap access.
     57
     58'''Before index:'''
     59{{{
     60Sort Method: external merge  Disk: 9032kB
     61 ->  Seq Scan on reservations rv  (actual rows=200026 loops=1)
     62Execution Time: 279.957 ms
     63}}}
     64
     65'''After index:'''
     66{{{
     67Incremental Sort
     68  Presorted Key: user_id
     69  Full-sort Groups: 12  Sort Method: quicksort  Average Memory: 29kB
     70  Pre-sorted Groups: 12  Sort Method: quicksort  Average Memory: 1821kB
     71  ->  Index Only Scan using idx_reservations_user_status_resource
     72        Heap Fetches: 0
     73Execution Time: 175.602 ms
     74}}}
     75
     76'''Analysis:''' The sequential scan is replaced by an Index Only Scan. Disk-based external merge sort (9032kB to disk) is eliminated; because the index is sorted by user_id, PostgreSQL does an Incremental Sort entirely in memory. Heap Fetches: 0 confirms the INCLUDE columns are doing their job.
     77
     78'''Result: 280 ms -> 176 ms (37% faster). Disk sort eliminated; Index Only Scan used.'''
     79
     80=== Scenario 3: Phase P6 Report 4 — Administrator Approval Workload ===
     81
     82'''Query source:''' the full "Report 4: Administrator Approval Workload and Bottleneck Analysis" from [wiki:ComplexReports ComplexReports]. Uses 3 CTEs (admin_stats, pending_stats, workload_share), {{{CROSS JOIN}}} to attach the single-row pending_stats to every admin, and {{{SUM(...) OVER ()}}} to compute workload share percentage.
    13783
    13884'''Proposed index:'''
     
    14389}}}
    14490
    145 This is a '''partial covering index''' — it only indexes rows where {{{approved_by IS NOT NULL}}} (about 70% of rows), and includes all columns the query needs.
     91This is a partial covering index: the {{{WHERE}}} clause excludes rows with no approver (roughly 30% of the data), and the listed columns are everything the {{{admin_stats}}} CTE needs to aggregate.
    14692
    14793'''Before index:'''
    14894{{{
    149  GroupAggregate  (actual time=100.297..118.534 rows=2 loops=1)
    150    Buffers: shared hit=3001, temp read=572 written=574
    151    ->  Sort  (actual time=81.884..95.827 rows=140517 loops=1)
    152          Sort Method: external merge  Disk: 4576kB
    153          ->  Seq Scan on reservations rv  (actual time=0.005..19.879 rows=140517 loops=1)
    154                Filter: (approved_by IS NOT NULL)
    155                Rows Removed by Filter: 59509
    156  Execution Time: 121.020 ms
     95Sort Method: external merge  Disk: 4568kB
     96 ->  Seq Scan on reservations rv_1
     97     Filter: approved_by IS NOT NULL
     98Execution Time: 166.804 ms
    15799}}}
    158100
    159101'''After index:'''
    160102{{{
    161  GroupAggregate  (actual time=82.464..97.428 rows=2 loops=1)
    162    Buffers: shared hit=4 read=132
    163    ->  Incremental Sort  (actual time=30.378..75.557 rows=140517 loops=1)
    164          Presorted Key: approved_by
    165          ->  Index Only Scan using idx_reservations_approver_status
    166                on reservations rv  (actual time=0.338..11.357 rows=140517 loops=1)
    167                Heap Fetches: 0
    168                Buffers: shared hit=1 read=132
    169  Execution Time: 98.533 ms
     103 ->  Index Only Scan using idx_reservations_approver_status
     104       Heap Fetches: 0
     105Execution Time: 129.167 ms
    170106}}}
    171107
    172 '''Analysis:''' The partial covering index eliminates the sequential scan entirely. PostgreSQL performs an '''Index Only Scan''' ({{{Heap Fetches: 0}}}) reading only 133 buffers instead of 3,001. The {{{WHERE approved_by IS NOT NULL}}} filter in the index definition means the index is smaller and doesn't waste space on irrelevant rows. Disk-based sorting (4576kB) is eliminated in favor of presorted incremental sort.
     108'''Analysis:''' Sequential scan replaced by Index Only Scan. Disk-based external merge sort (4568kB) eliminated. The partial index is smaller than a full one because it excludes the 60,000 rows with {{{approved_by IS NULL}}}.
    173109
    174 '''Result: 121ms -> 99ms (18% improvement), sequential scan eliminated, buffer reads reduced 96%.'''
     110'''Result: 167 ms -> 129 ms (23% faster). Index Only Scan used; partial index avoids indexing irrelevant rows.'''
    175111
    176112=== Performance Summary ===
    177113
    178 ||'''Scenario'''||'''Before'''||'''After'''||'''Improvement'''||'''Key Change'''||
    179 ||User Activity Aggregation||226ms||158ms||30%||Disk sort eliminated, index scan||
    180 ||Monthly Trends with Top Resources||98ms||57ms||42%||Index Only Scan, 98% fewer buffers||
    181 ||Admin Approval Workload||121ms||99ms||18%||Partial covering index, no heap fetches||
     114||'''Phase P6 Report'''||'''Before'''||'''After'''||'''Gain'''||'''Key change'''||
     115||Report 1 — Resource Utilization (quarterly)||181 ms||82 ms||55%||Composite index enables Index Only Scan for CTE aggregation||
     116||Report 3 — User Activity Analysis||280 ms||176 ms||37%||INCLUDE covering index removes 9 MB disk sort and heap fetches||
     117||Report 4 — Admin Workload Analysis||167 ms||129 ms||23%||Partial covering index replaces seq scan, removes 4.5 MB disk sort||
     118
     119=== Note on Reports That Do Not Benefit from Indexes ===
     120
     121Phase P6 Report 2 (Monthly Reservation Trends) without any date filter scans and aggregates the entire reservations table grouped by month. A full-table aggregation cannot be sped up by indexes alone — the planner would still have to read every row. For that report the correct optimization is either (a) to add a date filter (e.g. "last 12 months"), at which point the {{{idx_reservations_start_resource_status}}} index cuts execution time by about 10%, or (b) to pre-compute monthly summaries in a materialized view that is refreshed periodically. The same applies to any dashboard query that aggregates over the whole history.
    182122
    183123== Security Measures ==
     
    197137}}}
    198138
    199 psycopg2 sends the query template and parameters separately to PostgreSQL, which parses the query structure first and then binds the parameters. This makes SQL injection impossible regardless of user input content.
     139psycopg2 sends the query template and parameters separately to PostgreSQL. The server parses the template first and only then binds the parameters, so injection is impossible regardless of the input content.
    200140
    201141=== Password Hashing ===
    202142
    203 User passwords are never stored in plain text. The prototype uses '''bcrypt''' with a random salt for hashing:
     143Passwords are never stored in plain text. The prototype uses '''bcrypt''' with a random per-password salt:
    204144
    205145{{{
     
    212152}}}
    213153
    214 bcrypt is a deliberately slow hashing algorithm designed to resist brute-force attacks. The salt is generated per-password, preventing rainbow table attacks.
     154bcrypt is a deliberately slow hashing algorithm with a configurable work factor, which resists brute-force attacks. The per-password salt defeats rainbow table attacks.
    215155
    216156=== Role-Based Access Control ===
    217157
    218 Access control is enforced at two levels:
     158Access control is enforced at two independent layers:
    219159
    220 '''Application level:''' The main menu only shows actions appropriate for the user's role. Students see "Browse Resources" only, Teaching Staff can also make reservations, and Administrators get approval, analytics, and user management options.
     160'''Application layer (main.py):''' the main menu exposes only the actions appropriate for the logged-in user's role. Students can browse; Teaching Staff can also reserve; Administrators see approval, analytics, and user management.
    221161
    222162{{{
    223 # main.py
    224163if role == "Teaching Staff":
    225164    options.append("Make a Resource Reservation")
     
    230169}}}
    231170
    232 '''Database level:''' A trigger ({{{trg_check_approver_is_admin}}}, from Phase 7) enforces that only users with the Administrator role can be set as {{{approved_by}}} on reservations. This prevents privilege escalation even if the application layer is bypassed.
     171'''Database layer (Phase P7 trigger {{{trg_check_approver_is_admin}}}):''' enforces that the {{{approved_by}}} column can only reference users whose {{{type_name}}} is {{{'Administrator'}}}. Even if the application layer is bypassed (for example, by a direct SQL client), the database rejects any attempt to record a non-admin as the approver.
    233172
    234173{{{
    235 -- Trigger rejects if approver is not an Administrator
    236174IF v_type_name != 'Administrator' THEN
    237175    RAISE EXCEPTION 'Only administrators can approve reservations. User % is "%"',
     
    242180=== Connection Security ===
    243181
    244 Database connections use a '''connection pool''' (from Phase 8) with explicit transaction management. This provides:
     182Database connections use a '''connection pool''' (from Phase P8) with explicit transaction management. This provides:
    245183
    246  * '''Resource exhaustion protection:''' Maximum 10 connections prevents a runaway process from consuming all database connections.
    247  * '''Automatic cleanup:''' Connections are returned to the pool via context managers, preventing connection leaks.
    248  * '''Transaction isolation:''' Write operations use explicit transactions ({{{get_transaction()}}}) that automatically roll back on errors, preventing partial writes from corrupting data.
     184 * '''Resource-exhaustion protection:''' at most 10 concurrent connections, so a runaway process cannot consume all database connection slots.
     185 * '''Automatic cleanup:''' connections are always returned to the pool via context managers, preventing connection leaks that could otherwise reach the server limit.
     186 * '''Transaction isolation:''' write operations use {{{get_transaction()}}}, which rolls back automatically on any exception and therefore cannot leave the database in a partially-written state.