Changes between Version 6 and Version 7 of P9


Ignore:
Timestamp:
05/28/26 00:47:36 (4 hours ago)
Author:
193284
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • P9

    v6 v7  
    66== Executive Summary ==
    77
    8 Phase 9 elevates the Wedding Planner database into an enterprise-grade platform capable of supporting:
    9 
    10  * Thousands of concurrent users
    11  * Millions of transactional and analytical records
    12  * Sub-second response times for mission-critical workflows
    13  * Guaranteed data security, integrity, and regulatory compliance
    14 
    15 This phase delivers:
    16 
    17  * Advanced performance optimization (indexing, query tuning, caching, partitioning)
    18  * Comprehensive analysis of complex queries with real benchmarking data
    19  * System-wide data security protections
    20  * Recommendations for future scalability, automation, and observability
    21 
    22 ----
    23 
    24 == 1. Performance Analysis of Complex Queries ==
    25 
    26 This section evaluates high-complexity SQL workloads across the Booking, Client,
    27 Vendor, Payment, and Event Schedule domains. Benchmarks were conducted on a
    28 production-scale dataset (10M+ rows) using cost-based optimizer introspection,
    29 execution plan inspection, and concurrency stress tests.
    30 
    31 === 1.1 Workload Characterization ===
    32 
    33 The top resource-consuming query types identified are:
    34 
    35  * '''Multi-table JOIN queries''' — involving Clients → Bookings → Venues → Vendors → Payments
    36  * '''Aggregation-heavy analytical queries''' — revenue forecasting, vendor performance, seasonal booking trends
    37  * '''Nested subqueries and correlated predicates''' — used in availability checks, resource allocation, and dynamic pricing
    38  * '''Complex filtering with low-selectivity predicates''' — primarily affecting searches on date ranges and venue capacities
    39  * '''Full-text search operations''' — on vendor descriptions, package details, and client notes
    40 
    41 === 1.2 Execution Plan Diagnostics ===
    42 
    43 Execution plan inspection revealed several recurring patterns impacting performance:
    44 
    45  * Hash JOINs used by default when selective indexes are missing
    46  * Sequential scans on large tables due to low column selectivity
    47  * Repeated sort operations caused by the absence of composite indexes
    48  * Repeated function execution inside `WHERE` clauses preventing index usage
    49  * Overly granular nested-loop JOINs triggered by incorrect join-order estimates
    50 
    51 === 1.3 Bottleneck Summary ===
    52 
    53 || '''Bottleneck Type'''                 || '''Impact'''                  || '''Root Cause'''                                        ||
    54 || High I/O during JOINs                 || Slow response times           || Missing composite indexes, poor table clustering        ||
    55 || CPU spikes during aggregations         || Concurrency collapse          || No partial indexes or pre-aggregated materialized views ||
    56 || Lock contention                        || User-facing delays            || Unbounded long-running reporting queries                 ||
    57 || Slow full-text search operations       || Poor user experience          || No GIN or full-text indexing                            ||
    58 
    59 === 1.4 Benchmark Results ===
    60 
    61 Benchmarked on 10M bookings, 5M payments, and 2M vendor records:
    62 
    63 || '''Metric'''                     || '''Value'''                      ||
    64 || Pre-optimization P95 latency     || 2.8 – 7.4 seconds                ||
    65 || Post-optimization P95 latency    || 0.21 – 0.65 seconds              ||
    66 || Observed improvement             || 89% – 96% (query dependent)      ||
    67 
    68 === 1.5 Complex Query Sample: Budget Analysis with Spend Tracking ===
    69 
    70 This query joins five tables to generate a full budget and guest confirmation report per wedding:
    71 
    72 {{{
    73 #!sql
    74 SELECT
    75     w.wedding_id,
    76     w.wedding_name,
    77     COALESCE(SUM(b.budget_amount), 0)                     AS total_budget,
    78     COALESCE(SUM(e.actual_cost), 0)                       AS total_spent,
    79     COUNT(DISTINCT ev.event_id)                           AS total_events,
    80     COUNT(DISTINCT CASE WHEN r.status = 'Accepted'
    81         THEN g.guest_id END)                              AS confirmed_guests,
    82     ROUND(
    83         100.0 * COALESCE(SUM(e.actual_cost), 0)
    84               / NULLIF(SUM(b.budget_amount), 0), 2
    85     )                                                     AS spend_percentage
    86 FROM wedding         w
    87 LEFT JOIN budget     b  ON w.wedding_id  = b.wedding_id
    88 LEFT JOIN event      ev ON w.wedding_id  = ev.wedding_id
    89                         AND ev.status   != 'Cancelled'
    90 LEFT JOIN expense    e  ON ev.event_id   = e.event_id
    91                         AND e.status     = 'Approved'
    92 LEFT JOIN guest      g  ON w.wedding_id  = g.wedding_id
    93 LEFT JOIN event_rsvp r  ON g.guest_id    = r.guest_id
    94 GROUP BY w.wedding_id, w.wedding_name
    95 ORDER BY spend_percentage DESC;
    96 }}}
    97 
    98 ==== Explanation ====
    99 
    100  * `COALESCE(..., 0)` — prevents NULL values from breaking aggregation when no expenses exist
    101  * `NULLIF(SUM(b.budget_amount), 0)` — prevents division-by-zero errors when no budget is defined
    102  * Filter `ev.status != 'Cancelled'` is pushed into the JOIN condition to reduce row cardinality early
    103  * Filter `e.status = 'Approved'` ensures only confirmed expenses are counted toward the total
    104 
    105 ==== Verification with EXPLAIN ANALYZE ====
     8Phase 9 focuses on database performance, optimization, scalability, and security for the Wedding Planner Management System.
     9
     10The primary focus of this phase is the execution analysis and optimization of the complex analytical queries implemented in Phase 6.
     11
     12The analyzed analytical reports are:
     13* Budget vs Actual Expenditure Analysis
     14* Venue Capacity Utilization Analysis
     15* RSVP Conversion Rate Analysis
     16
     17This phase includes:
     18* EXPLAIN ANALYZE-based query analysis
     19* identification of execution bottlenecks
     20* indexing and optimization strategies
     21* materialized view and partitioning recommendations
     22* transaction and locking analysis
     23* database security mechanisms
     24* backup, audit logging, and data protection strategies
     25
     26The implementation demonstrates how PostgreSQL can support both transactional processing and analytical workloads within a scalable Wedding Planner Management System.
     27
     28== 1. Performance Analysis of Phase 6 Queries ==
     29
     30== 1.1 Query Workload Characteristics ==
     31
     32The analytical queries from Phase 6 are reporting-oriented workloads.
     33
     34These queries are significantly more expensive than standard CRUD operations because they combine data from multiple related tables before generating aggregated analytical metrics.
     35
     36|| Query || Main Operations || Potential Bottleneck ||
     37|| Budget Analysis || Multiple LEFT JOIN operations, SUM aggregation, temporal calculations || Aggregation and repeated booking joins ||
     38|| Venue Capacity || JOINs, COUNT(DISTINCT), CASE categorization || Attendance aggregation and GROUP BY ||
     39|| RSVP Conversion || Multiple LEFT JOINs, COUNT(DISTINCT), conditional aggregation || DISTINCT counting and aggregation ||
     40
     41The most expensive operations identified are:
     42* COUNT(DISTINCT ...)
     43* GROUP BY aggregation
     44* LEFT JOIN chains
     45* temporal cost calculations
     46* conditional CASE calculations
     47
     48== 1.2 General Execution Plan Expectations ==
     49
     50For the Phase 6 analytical reports, PostgreSQL is expected to use:
     51* Sequential Scan
     52* Index Scan
     53* Hash Join
     54* HashAggregate
     55* Sort
     56
     57The selected execution strategy depends on:
     58* table size
     59* index availability
     60* row selectivity
     61* join cardinality
     62* aggregation complexity
     63
     64The most important optimization factors are:
     65* indexes on foreign-key columns
     66* indexes on status columns
     67* optimization of GROUP BY operations
     68* reduction of unnecessary sequential scans
     69* efficient JOIN ordering
     70
     71== 1.3 Budget Analysis Query – Execution Analysis ==
     72
     73The Budget Analysis query evaluates the financial relationship between the planned wedding budget and the actual expenses generated by venue, photographer, and band bookings.
     74
     75This query is computationally expensive because it combines multiple booking-related tables and performs aggregation and temporal cost calculations.
     76
     77=== Query Characteristics ===
     78
     79The query includes:
     80* multiple LEFT JOIN operations
     81* SUM() aggregation
     82* temporal calculations using EXTRACT(EPOCH)
     83* GROUP BY aggregation
     84* COALESCE() handling of NULL values
     85
     86The query combines:
     87* wedding
     88* user
     89* venue_booking
     90* photographer_booking
     91* photographer
     92* band_booking
     93* band
     94
     95=== Performance-Sensitive Operations ===
     96
     97The most expensive operations identified are:
     98
     99* Multiple LEFT JOIN operations:
     100  * all weddings must remain in the result set even when certain bookings do not exist
     101
     102* Aggregation:
     103  * SUM() calculations process multiple booking records per wedding
     104
     105* Temporal Calculations:
     106  * EXTRACT(EPOCH FROM (...)) converts booking durations into hours
     107
     108* GROUP BY:
     109  * aggregation requires grouping all joined rows by wedding attributes
     110
     111=== EXPLAIN ANALYZE ===
    106112
    107113{{{
     
    110116SELECT
    111117    w.wedding_id,
    112     w.wedding_name,
    113     COALESCE(SUM(b.budget_amount), 0)                     AS total_budget,
    114     COALESCE(SUM(e.actual_cost), 0)                       AS total_spent,
    115     COUNT(DISTINCT ev.event_id)                           AS total_events,
    116     COUNT(DISTINCT CASE WHEN r.status = 'Accepted'
    117         THEN g.guest_id END)                              AS confirmed_guests,
    118     ROUND(
    119         100.0 * COALESCE(SUM(e.actual_cost), 0)
    120               / NULLIF(SUM(b.budget_amount), 0), 2
    121     )                                                     AS spend_percentage
    122 FROM wedding         w
    123 LEFT JOIN budget     b  ON w.wedding_id  = b.wedding_id
    124 LEFT JOIN event      ev ON w.wedding_id  = ev.wedding_id
    125                         AND ev.status   != 'Cancelled'
    126 LEFT JOIN expense    e  ON ev.event_id   = e.event_id
    127                         AND e.status     = 'Approved'
    128 LEFT JOIN guest      g  ON w.wedding_id  = g.wedding_id
    129 LEFT JOIN event_rsvp r  ON g.guest_id    = r.guest_id
    130 GROUP BY w.wedding_id, w.wedding_name
    131 ORDER BY spend_percentage DESC;
    132 }}}
    133 
    134 Expected output (with indexes applied):
    135 
    136 {{{
    137 HashAggregate  (cost=4821.30..4823.45 rows=215 width=72)
    138                (actual time=143.221..143.598 rows=215 loops=1)
    139   ->  Hash Left Join  (cost=... actual time=12.4..98.7 rows=51420 loops=1)
    140         Hash Cond: (g.guest_id = r.guest_id)
    141         ->  Index Scan using idx_guest_wedding on guest g
    142               (actual time=0.031..4.812 rows=12500 loops=1)
    143 Planning Time: 3.4 ms
    144 Execution Time: 143.9 ms
    145 }}}
    146 
    147 ==== Validation ====
    148 
    149 {{{
    150 #!sql
    151 -- Validate that no wedding has a spend_percentage above 100% without a reason
    152 SELECT wedding_id, spend_percentage
    153 FROM (
    154     SELECT
    155         w.wedding_id,
    156         ROUND(
    157             100.0 * COALESCE(SUM(e.actual_cost), 0)
    158                   / NULLIF(SUM(b.budget_amount), 0), 2
    159         ) AS spend_percentage
    160     FROM wedding w
    161     LEFT JOIN budget     b  ON w.wedding_id = b.wedding_id
    162     LEFT JOIN event      ev ON w.wedding_id = ev.wedding_id
    163     LEFT JOIN expense    e  ON ev.event_id  = e.event_id
    164                             AND e.status    = 'Approved'
    165     GROUP BY w.wedding_id
    166 ) sub
    167 WHERE spend_percentage > 100
    168 ORDER BY spend_percentage DESC;
    169 }}}
    170 
    171 This validation detects weddings exceeding their planned budget, which may indicate
    172 data entry errors or unapproved expenditures that require review.
    173 
    174 === 1.6 Recommended Query Optimization Techniques ===
    175 
    176  * Rewrite correlated subqueries as explicit JOINs or CTEs where possible
    177  * Enforce predicate pushdown and index-friendly expressions
    178  * Replace expensive full scans with materialized views for analytical workloads
    179  * Introduce result caching for deterministic queries (e.g., venue availability lookups)
    180  * Use `CUBE` or `ROLLUP` for multi-dimensional reporting queries
    181 
    182 ----
    183 
    184 == 2. Indexing Strategy & Optimization Framework ==
    185 
    186 A well-designed indexing strategy is essential for maintaining predictable performance
    187 under high data volumes and concurrent user load.
    188 
    189 === 2.1 Current Index Landscape ===
    190 
    191 A schema audit of the existing system revealed:
    192 
    193  * Correct primary keys defined on all major tables
    194  * Partial and inconsistent foreign-key indexing
    195  * No composite indexes covering common JOIN paths
    196  * No GIN or full-text indexes
    197  * No partitioning-aware indexes
    198  * No expression-based indexes
    199 
    200 === 2.2 Index Selectivity & Cardinality Analysis ===
    201 
    202 || '''Column'''       || '''Cardinality''' || '''Index Candidate?''' ||
    203 || `BookingDate`      || High              || Yes — B-tree           ||
    204 || `EventType`        || High              || Yes — composite        ||
    205 || `VenueID`          || High              || Yes — composite        ||
    206 || `ClientID`         || High              || Yes — composite        ||
    207 || `PaymentStatus`    || High              || Yes — partial          ||
    208 || `State`            || Low               || No — poor selectivity  ||
    209 || `PackageType`      || Low               || No — poor selectivity  ||
    210 || `Category`         || Low               || No — standalone B-tree ineffective ||
    211 
    212 === 2.3 Recommended Index Types & Structures ===
    213 
    214 ==== 2.3.1 Composite B-tree Indexes ====
    215 
    216 {{{
    217 #!sql
    218 -- Optimizes client-specific booking history queries
    219 CREATE INDEX idx_client_booking_date
    220 ON booking(client_id, booking_date);
    221 
    222 -- Optimizes venue availability queries sorted by event date
    223 CREATE INDEX idx_venue_event_date
    224 ON booking(venue_id, event_date);
    225 
    226 -- Optimizes vendor service filtering and lookup
    227 CREATE INDEX idx_vendor_service_category
    228 ON vendor(vendor_id, service_category);
    229 
    230 -- Optimizes payment reconciliation reports
    231 CREATE INDEX idx_payment_status_date
    232 ON payment(payment_status, payment_date);
    233 
    234 -- Optimizes multi-dimensional event search
    235 CREATE INDEX idx_event_type_region_date
    236 ON event(event_type, region, event_date);
    237 }}}
    238 
    239 ==== Explanation ====
    240 
    241 Composite indexes reduce sorting costs and accelerate JOINs by aligning the index
    242 structure with the actual column access patterns in real queries. Column order
    243 matters: the most selective or most frequently filtered column should come first.
    244 
    245 ==== Verification ====
    246 
    247 {{{
    248 #!sql
    249 -- Verify the index is being used by the query planner
    250 EXPLAIN ANALYZE
    251 SELECT booking_id, booking_date
    252 FROM booking
    253 WHERE client_id = 42
    254   AND booking_date >= '2025-01-01'
    255 ORDER BY booking_date;
    256 }}}
    257 
    258 Expected result: `Index Scan using idx_client_booking_date on booking`
    259 
    260 ==== Validation ====
    261 
    262 {{{
    263 #!sql
    264 -- Confirm index exists and is not bloated
    265 SELECT
    266     indexname,
    267     pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    268     idx_scan,
    269     idx_tup_read,
    270     idx_tup_fetch
    271 FROM pg_stat_user_indexes
    272 WHERE indexname = 'idx_client_booking_date';
    273 }}}
    274 
    275 A healthy index should show a non-zero `idx_scan` value within 24 hours of production
    276 traffic. An `idx_scan` of 0 after several days indicates the index is unused and
    277 should be dropped.
    278 
    279 ----
    280 
    281 ==== 2.3.2 Partial Indexes ====
    282 
    283 {{{
    284 #!sql
    285 -- Index only active vendors — reduces index size significantly
    286 CREATE INDEX idx_vendor_active
    287 ON vendor(vendor_id, service_category)
    288 WHERE status = 'Active';
    289 
    290 -- Index only pending payments — avoids indexing historical data
    291 CREATE INDEX idx_payment_pending
    292 ON payment(payment_date, client_id)
    293 WHERE payment_status = 'Pending';
    294 
    295 -- Index only bookings within the current fiscal year
    296 CREATE INDEX idx_booking_current_year
    297 ON booking(client_id, event_date)
    298 WHERE event_date >= DATE_TRUNC('year', CURRENT_DATE);
    299 }}}
    300 
    301 ==== Explanation ====
    302 
    303 Partial indexes cover only the rows that satisfy a specific WHERE condition.
    304 This reduces both the index size and the I/O cost of index maintenance, making
    305 them ideal for tables where only a fraction of rows are operationally active at
    306 any given time.
    307 
    308 ==== Verification ====
    309 
    310 {{{
    311 #!sql
    312 EXPLAIN ANALYZE
    313 SELECT vendor_id, service_category
    314 FROM vendor
    315 WHERE status = 'Active'
    316   AND service_category = 'Catering';
    317 }}}
    318 
    319 Expected result: `Index Scan using idx_vendor_active on vendor`
    320 
    321 ==== Validation ====
    322 
    323 {{{
    324 #!sql
    325 -- Confirm that the partial index covers fewer rows than the full table
    326 SELECT
    327     schemaname,
    328     tablename,
    329     indexname,
    330     pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
    331 FROM pg_stat_user_indexes
    332 WHERE indexname IN (
    333     'idx_vendor_active',
    334     'idx_payment_pending',
    335     'idx_booking_current_year'
    336 )
    337 ORDER BY pg_relation_size(indexrelid) DESC;
    338 }}}
    339 
    340 ----
    341 
    342 ==== 2.3.3 Full-Text Search Indexing (GIN) ====
    343 
    344 {{{
    345 #!sql
    346 -- Add a tsvector column for full-text search on vendor descriptions
    347 ALTER TABLE vendor
    348 ADD COLUMN description_tsv tsvector
    349 GENERATED ALWAYS AS (
    350     to_tsvector('english', COALESCE(description, ''))
    351 ) STORED;
    352 
    353 -- Create a GIN index on the generated column
    354 CREATE INDEX idx_vendor_description_gin
    355 ON vendor USING GIN(description_tsv);
    356 }}}
    357 
    358 ==== Explanation ====
    359 
    360 GIN (Generalized Inverted Index) indexes are specifically designed for full-text
    361 search workloads. They store a mapping from each unique lexeme (word) to the rows
    362 that contain it, enabling full-text queries to execute in milliseconds rather
    363 than scanning entire columns.
    364 
    365 ==== Usage Sample ====
    366 
    367 {{{
    368 #!sql
    369 -- Search for vendors offering floral or decoration services
    370 SELECT vendor_id, name, description
    371 FROM vendor
    372 WHERE description_tsv @@ to_tsquery('english', 'floral | decoration')
    373 ORDER BY ts_rank(description_tsv, to_tsquery('english', 'floral | decoration')) DESC
    374 LIMIT 20;
    375 }}}
    376 
    377 ==== Verification ====
    378 
    379 {{{
    380 #!sql
    381 EXPLAIN ANALYZE
    382 SELECT vendor_id, name
    383 FROM vendor
    384 WHERE description_tsv @@ to_tsquery('english', 'floral | decoration');
    385 }}}
    386 
    387 Expected result: `Bitmap Index Scan using idx_vendor_description_gin on vendor`
    388 
    389 ==== Validation ====
    390 
    391 {{{
    392 #!sql
    393 -- Confirm the tsvector column is populated correctly
    394 SELECT vendor_id, description_tsv
    395 FROM vendor
    396 WHERE description ILIKE '%floral%'
    397 LIMIT 5;
    398 }}}
    399 
    400 All returned rows should have the lexeme `floral` present in the `description_tsv` column.
    401 
    402 ----
    403 
    404 ==== 2.3.4 Expression Indexes ====
    405 
    406 {{{
    407 #!sql
    408 -- Allows case-insensitive email lookup without full table scans
    409 CREATE INDEX idx_client_email_lower
    410 ON client(LOWER(email));
    411 
    412 -- Allows date-only filtering on a timestamp column
    413 CREATE INDEX idx_booking_date_only
    414 ON booking(DATE(booking_date));
    415 
    416 -- Handles NULL fallback contact lookups efficiently
    417 CREATE INDEX idx_contact_coalesce
    418 ON client(COALESCE(alternate_contact, primary_contact));
    419 }}}
    420 
    421 ==== Explanation ====
    422 
    423 Expression indexes store the result of a computed expression rather than a raw
    424 column value. They allow the query planner to use the index when the same
    425 expression appears in a `WHERE` clause, eliminating the need for function-level
    426 full scans.
    427 
    428 ==== Verification ====
    429 
    430 {{{
    431 #!sql
    432 -- Case-insensitive email search — should use the expression index
    433 EXPLAIN ANALYZE
    434 SELECT client_id, name
    435 FROM client
    436 WHERE LOWER(email) = 'ivan@example.com';
    437 }}}
    438 
    439 Expected result: `Index Scan using idx_client_email_lower on client`
    440 
    441 ==== Validation ====
    442 
    443 {{{
    444 #!sql
    445 -- Confirm no duplicate emails exist after normalization
    446 SELECT LOWER(email) AS normalized_email, COUNT(*) AS occurrences
    447 FROM client
    448 GROUP BY LOWER(email)
    449 HAVING COUNT(*) > 1
    450 ORDER BY occurrences DESC;
    451 }}}
    452 
    453 The result set should be empty. Any returned rows indicate duplicate email
    454 registrations that must be investigated and resolved.
    455 
    456 === 2.4 Automatic Index Maintenance Framework ===
    457 
    458 Implement the following maintenance procedures:
    459 
    460  * Scheduled index bloat detection using `pg_stat_user_indexes` and `pg_relation_size`
    461  * Automated `REINDEX CONCURRENTLY` based on fragmentation thresholds
    462  * Usage tracking to identify and drop unused indexes (`idx_scan = 0`)
    463  * Heatmap-based index popularity analytics for DBA review dashboards
    464 
    465 === 2.5 Index Implementation Priority ===
    466 
    467 || '''Index Name'''                  || '''Table / Columns'''                         || '''Type'''            || '''Priority''' ||
    468 || `idx_guest_wedding`               || `guest(wedding_id)`                           || Single-column         || CRITICAL       ||
    469 || `idx_event_active_timeline`       || `event(wedding_id, date, start_time)`         || Composite + Partial   || CRITICAL       ||
    470 || `idx_guest_qr`                    || `guest(qr_code)`                              || Unique                || CRITICAL       ||
    471 || `idx_rsvp_guest`                  || `event_rsvp(guest_id, status)`                || Composite             || HIGH           ||
    472 || `idx_budget_wedding`              || `budget(wedding_id)`                          || Single-column         || HIGH           ||
    473 || `idx_vendor_description_gin`      || `vendor(description_tsv)`                     || GIN / Full-text       || HIGH           ||
    474 || `idx_client_email_lower`          || `client(LOWER(email))`                        || Expression            || MEDIUM         ||
    475 || `idx_booking_current_year`        || `booking(client_id, event_date)`              || Partial               || MEDIUM         ||
    476 
    477 ----
    478 
    479 == 3. Caching, Partitioning & Storage Optimization ==
    480 
    481 === 3.1 Caching Layer ===
    482 
    483 Establish a multi-tier caching architecture:
    484 
    485  * '''Application-level cache''' (Redis / Memcached) — venue availability, vendor listings, package catalogs
    486  * '''Database query result cache''' — read-heavy dashboards with low update frequency
    487  * '''Materialized views''' — pre-aggregated financial and booking metrics refreshed on schedule
    488 
    489 ==== Sample: Materialized View for Monthly Revenue ====
    490 
    491 {{{
    492 #!sql
    493 CREATE MATERIALIZED VIEW mv_monthly_revenue AS
    494 SELECT
    495     DATE_TRUNC('month', p.payment_date)   AS revenue_month,
    496     v.service_category,
    497     COUNT(p.payment_id)                   AS total_payments,
    498     SUM(p.amount)                         AS total_revenue,
    499     AVG(p.amount)                         AS average_payment
    500 FROM payment p
    501 JOIN booking b ON p.booking_id  = b.booking_id
    502 JOIN vendor  v ON b.vendor_id   = v.vendor_id
    503 WHERE p.payment_status = 'Completed'
    504 GROUP BY DATE_TRUNC('month', p.payment_date), v.service_category
    505 ORDER BY revenue_month DESC;
    506 
    507 -- Create an index on the materialized view for fast dashboard queries
    508 CREATE INDEX idx_mv_revenue_month
    509 ON mv_monthly_revenue(revenue_month, service_category);
    510 }}}
    511 
    512 ==== Refreshing the Materialized View ====
    513 
    514 {{{
    515 #!sql
    516 -- Refresh without locking reads (safe for production)
    517 REFRESH MATERIALIZED VIEW CONCURRENTLY mv_monthly_revenue;
    518 }}}
    519 
    520 ==== Validation ====
    521 
    522 {{{
    523 #!sql
    524 -- Confirm data freshness after refresh
    525 SELECT MAX(revenue_month) AS last_updated_month
    526 FROM mv_monthly_revenue;
    527 }}}
    528 
    529 The result should reflect the most recently completed calendar month. If it is
    530 more than one month behind, review the scheduled refresh job.
    531 
    532 === 3.2 Horizontal Partitioning ===
    533 
    534 Partition large tables — `booking`, `payment`, `vendor` — by the most common
    535 access dimension:
    536 
    537 {{{
    538 #!sql
    539 -- Partition the booking table by year and quarter
    540 CREATE TABLE booking (
    541     booking_id     SERIAL,
    542     client_id      INTEGER      NOT NULL,
    543     venue_id       INTEGER      NOT NULL,
    544     booking_date   TIMESTAMP    NOT NULL,
    545     event_date     DATE         NOT NULL,
    546     status         VARCHAR(20)  NOT NULL,
    547     total_cost     NUMERIC(12,2),
    548     PRIMARY KEY (booking_id, event_date)
    549 ) PARTITION BY RANGE (event_date);
    550 
    551 CREATE TABLE booking_2024_q1 PARTITION OF booking
    552 FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
    553 
    554 CREATE TABLE booking_2024_q2 PARTITION OF booking
    555 FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
    556 
    557 CREATE TABLE booking_2025_q1 PARTITION OF booking
    558 FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');
    559 }}}
    560 
    561 ==== Explanation ====
    562 
    563 Partitioning allows PostgreSQL to skip irrelevant partitions entirely (partition
    564 pruning), dramatically reducing I/O for date-range queries. Each partition can
    565 also be independently vacuumed, archived, or dropped without affecting others.
    566 
    567 ==== Verification ====
    568 
    569 {{{
    570 #!sql
    571 -- Confirm partition pruning is active for a date-range query
    572 EXPLAIN ANALYZE
    573 SELECT booking_id, client_id, event_date
    574 FROM booking
    575 WHERE event_date BETWEEN '2025-01-01' AND '2025-03-31';
    576 }}}
    577 
    578 Expected result: Only `booking_2025_q1` appears in the plan. All other partitions
    579 should be marked as pruned.
    580 
    581 ==== Validation ====
    582 
    583 {{{
    584 #!sql
    585 -- Confirm row distribution across partitions
    586 SELECT
    587     tableoid::regclass   AS partition_name,
    588     COUNT(*)             AS row_count
    589 FROM booking
    590 GROUP BY tableoid
    591 ORDER BY partition_name;
    592 }}}
    593 
    594 === 3.3 Storage-Level Optimizations ===
    595 
    596  * Enable column compression for archival partitions (`TOAST` settings)
    597  * Tune WAL settings (`wal_buffers`, `checkpoint_completion_target`) for high-insert workloads
    598  * Separate I/O tiers: NVMe for hot partitions, SSD for warm data, object storage for cold archives
    599 
    600 ----
    601 
    602 == 4. Concurrency, Transaction Management & Locking Strategy ==
    603 
    604 === 4.1 Transaction Isolation Levels ===
    605 
    606 || '''Isolation Level'''   || '''Recommended Use Case'''                              ||
    607 || `READ COMMITTED`        || Standard read and write operations (default)            ||
    608 || `REPEATABLE READ`       || Financial reconciliation, double-booking prevention     ||
    609 || `SERIALIZABLE`          || Business-critical workflows requiring strict consistency ||
    610 
    611 === 4.2 Deadlock Prevention ===
    612 
    613 Implement the following practices to eliminate deadlock risk:
    614 
    615  * Enforce a consistent ordering of table modifications across all transactions
    616  * Keep transactions as short-lived as possible
    617  * Ensure indexes exist on all foreign-key columns to prevent lock escalation
    618  * Route batch and reporting jobs through a dedicated queue or job scheduler
    619 
    620 ==== Sample: Safe Atomic Booking Reservation ====
    621 
    622 {{{
    623 #!sql
    624 BEGIN;
    625 
    626 -- Lock the venue row to prevent double-booking
    627 SELECT venue_id, capacity, status
    628 FROM venue
    629 WHERE venue_id = 12
    630 FOR UPDATE;
    631 
    632 -- Verify availability before inserting
    633 INSERT INTO booking (client_id, venue_id, event_date, status, total_cost)
    634 SELECT 88, 12, '2025-09-14', 'Confirmed', 4500.00
    635 WHERE NOT EXISTS (
    636     SELECT 1
    637     FROM booking
    638     WHERE venue_id   = 12
    639       AND event_date = '2025-09-14'
    640       AND status    != 'Cancelled'
    641 );
    642 
    643 COMMIT;
    644 }}}
    645 
    646 ==== Explanation ====
    647 
    648 `SELECT ... FOR UPDATE` acquires an exclusive row-level lock on the venue record,
    649 preventing any concurrent transaction from modifying or double-booking the same
    650 venue for the same date until this transaction commits or rolls back.
    651 
    652 ==== Validation ====
    653 
    654 {{{
    655 #!sql
    656 -- Confirm no double-bookings exist for the same venue and date
    657 SELECT venue_id, event_date, COUNT(*) AS booking_count
    658 FROM booking
    659 WHERE status != 'Cancelled'
    660 GROUP BY venue_id, event_date
    661 HAVING COUNT(*) > 1
    662 ORDER BY booking_count DESC;
    663 }}}
    664 
    665 The result set must always be empty in a correctly operating system. Any returned
    666 rows represent a critical data integrity violation requiring immediate investigation.
    667 
    668 === 4.3 Row-Level vs Advisory Locks ===
    669 
    670  * Use '''row-level locks''' (`FOR UPDATE`, `FOR SHARE`) for booking atomicity and inventory management
    671  * Use '''advisory locks''' (`pg_try_advisory_lock`) for complex multi-step workflows such as payment batching
    672 
    673 ----
    674 
    675 == 5. Performance Analysis with EXPLAIN / EXPLAIN ANALYZE ==
    676 
    677 === 5.1 Purpose ===
    678 
    679 || '''Command'''        || '''Behavior'''                                                          ||
    680 || `EXPLAIN`            || Shows the execution plan without running the query                      ||
    681 || `EXPLAIN ANALYZE`    || Executes the query and shows real timing, row counts, and loop data     ||
    682 || `EXPLAIN (BUFFERS)`  || Also reports cache hit / miss ratios for each plan node                 ||
    683 
    684 === 5.2 Scan Type Reference ===
    685 
    686 || '''Scan Type'''         || '''When It Occurs'''                                         ||
    687 || Seq Scan                || No usable index; entire table is read                        ||
    688 || Index Scan              || Index narrows row set; table is accessed for full row data   ||
    689 || Bitmap Index Scan       || Multiple index results are combined before table access      ||
    690 || Index Only Scan         || All required columns exist in the index; table is not read   ||
    691 
    692 === 5.3 Sample 1: Sequential Scan Without an Index ===
    693 
    694 {{{
    695 #!sql
    696 EXPLAIN
    697 SELECT *
    698 FROM event
    699 WHERE wedding_id = 3
    700   AND status IN ('Scheduled', 'Confirmed');
    701 }}}
    702 
    703 Typical output before optimization:
    704 
    705 {{{
    706 Seq Scan on event  (cost=0.00..4821.00 rows=12 width=96)
    707   Filter: ((wedding_id = 3) AND (status = ANY ('{Scheduled,Confirmed}'::text[])))
    708 }}}
    709 
    710 ==== Diagnosis ====
    711 
    712 A `Seq Scan` on a large `event` table indicates no index exists to support the
    713 predicate. With millions of rows, this translates directly to high I/O and
    714 slow response times. A composite partial index is required.
    715 
    716 === 5.4 Sample 2: Index Scan After Creating a Partial Composite Index ===
    717 
    718 {{{
    719 #!sql
    720 -- Create the partial composite index
    721 CREATE INDEX idx_event_active_timeline
    722 ON event(wedding_id, date, start_time)
    723 WHERE status IN ('Scheduled', 'Confirmed');
    724 
    725 -- Re-run EXPLAIN ANALYZE after index creation
     118
     119    u.first_name || ' ' || u.last_name
     120        AS organizer_name,
     121
     122    w.date,
     123
     124    w.budget,
     125
     126    COALESCE(SUM(vb.price), 0)
     127        AS venue_cost,
     128
     129    COALESCE(SUM(
     130        EXTRACT(EPOCH FROM (pb.end_time - pb.start_time))/3600
     131        * p.price_per_hour
     132    ), 0) AS photographer_cost,
     133
     134    COALESCE(SUM(
     135        EXTRACT(EPOCH FROM (bb.end_time - bb.start_time))/3600
     136        * b.price_per_hour
     137    ), 0) AS band_cost
     138
     139FROM wedding w
     140
     141LEFT JOIN "user" u
     142    ON w.user_id = u.user_id
     143
     144LEFT JOIN venue_booking vb
     145    ON w.wedding_id = vb.wedding_id
     146
     147LEFT JOIN photographer_booking pb
     148    ON w.wedding_id = pb.wedding_id
     149
     150LEFT JOIN photographer p
     151    ON pb.photographer_id = p.photographer_id
     152
     153LEFT JOIN band_booking bb
     154    ON w.wedding_id = bb.wedding_id
     155
     156LEFT JOIN band b
     157    ON bb.band_id = b.band_id
     158
     159GROUP BY
     160    w.wedding_id,
     161    u.first_name,
     162    u.last_name,
     163    w.date,
     164    w.budget
     165
     166ORDER BY w.wedding_id;
     167}}}
     168
     169=== Expected Execution Plan ===
     170
     171A typical execution plan for this query may include:
     172
     173{{{
     174HashAggregate
     175  -> Hash Left Join
     176       -> Hash Left Join
     177            -> Hash Left Join
     178                 -> Seq Scan on wedding
     179}}}
     180
     181=== Interpretation ===
     182
     183* Seq Scan on wedding:
     184  * PostgreSQL scans the wedding table as the base relation
     185
     186* Hash Left Join:
     187  * booking tables are joined using hash joins because of the multiple LEFT JOIN operations
     188
     189* HashAggregate:
     190  * aggregation is performed after all joins are completed
     191
     192The query execution cost increases proportionally with:
     193* number of weddings
     194* number of booking records
     195* number of vendors per wedding
     196
     197=== Recommended Indexes ===
     198
     199{{{
     200#!sql
     201CREATE INDEX idx_wedding_user
     202ON wedding(user_id);
     203
     204CREATE INDEX idx_venue_booking_wedding
     205ON venue_booking(wedding_id);
     206
     207CREATE INDEX idx_photographer_booking_wedding
     208ON photographer_booking(wedding_id);
     209
     210CREATE INDEX idx_photographer_booking_photographer
     211ON photographer_booking(photographer_id);
     212
     213CREATE INDEX idx_band_booking_wedding
     214ON band_booking(wedding_id);
     215
     216CREATE INDEX idx_band_booking_band
     217ON band_booking(band_id);
     218}}}
     219
     220=== Optimization Benefits ===
     221
     222The proposed indexes improve:
     223* JOIN performance
     224* row lookup speed
     225* aggregation preparation
     226* scalability for large booking datasets
     227
     228The indexes reduce:
     229* sequential scans
     230* unnecessary I/O operations
     231* execution latency for analytical reports
     232
     233=== Validation ===
     234
     235{{{
     236#!sql
    726237EXPLAIN ANALYZE
    727238SELECT *
    728 FROM event
    729 WHERE wedding_id = 3
    730   AND status IN ('Scheduled', 'Confirmed')
    731 ORDER BY date, start_time;
    732 }}}
    733 
    734 Expected output after optimization:
    735 
    736 {{{
    737 Index Scan using idx_event_active_timeline on event
    738   (cost=0.29..18.43 rows=12 width=96)
    739   (actual time=0.041..0.213 rows=12 loops=1)
    740   Index Cond: (wedding_id = 3)
    741 Planning Time:  1.2 ms
    742 Execution Time: 0.3 ms
    743 }}}
    744 
    745 ==== Validation ====
    746 
    747 {{{
    748 #!sql
    749 -- Confirm the index is being actively used in production
     239FROM venue_booking
     240WHERE wedding_id = 1;
     241}}}
     242
     243Expected result:
     244
     245{{{
     246Index Scan using idx_venue_booking_wedding on venue_booking
     247}}}
     248
     249=== Conclusion ===
     250
     251The Budget Analysis query represents one of the most computationally intensive analytical reports in the system because it combines multiple booking sources and performs aggregation across several relations simultaneously.
     252
     253Efficient indexing of foreign-key columns is essential for maintaining acceptable execution time as the number of weddings and bookings increases.
     254
     255== 1.4 Venue Capacity Query – Execution Analysis ==
     256
     257The Venue Capacity Utilization query analyzes the relationship between venue capacity and guest attendance for wedding events.
     258
     259This query combines venue, booking, wedding, event, and attendance data in order to calculate occupancy metrics and utilization categories.
     260
     261=== Query Characteristics ===
     262
     263The query includes:
     264* multiple INNER JOIN and LEFT JOIN operations
     265* COUNT(DISTINCT ...) aggregation
     266* CASE-based categorization
     267* GROUP BY aggregation
     268* occupancy percentage calculations
     269
     270The query combines:
     271* venue
     272* venue_booking
     273* wedding
     274* user
     275* event
     276* attendance
     277
     278=== Performance-Sensitive Operations ===
     279
     280The most expensive operations identified are:
     281
     282* COUNT(DISTINCT a.guest_id):
     283  * distinct counting requires additional aggregation work
     284
     285* GROUP BY:
     286  * all joined attendance records must be grouped by venue and wedding attributes
     287
     288* LEFT JOIN attendance:
     289  * attendance rows must remain optional to preserve events without attendance data
     290
     291* CASE categorization:
     292  * occupancy thresholds are evaluated during aggregation
     293
     294=== EXPLAIN ANALYZE ===
     295
     296{{{
     297#!sql
     298EXPLAIN ANALYZE
    750299SELECT
    751     indexrelname    AS index_name,
    752     idx_scan        AS times_used,
    753     idx_tup_read    AS tuples_read,
    754     idx_tup_fetch   AS tuples_fetched
    755 FROM pg_stat_user_indexes
    756 WHERE indexrelname = 'idx_event_active_timeline';
    757 }}}
    758 
    759 After deploying to production, `idx_scan` should increase steadily with each
    760 dashboard load or event timeline request. A value that remains at 0 indicates
    761 the index condition does not match the real query predicates.
    762 
    763 === 5.5 Sample 3: EXPLAIN ANALYZE on a JOIN with Aggregation ===
     300    v.venue_id,
     301    v.name AS venue_name,
     302    v.capacity AS venue_capacity,
     303
     304    w.wedding_id,
     305
     306    u.first_name || ' ' || u.last_name
     307        AS organizer_name,
     308
     309    w.date AS wedding_date,
     310
     311    COUNT(DISTINCT a.guest_id)
     312        AS confirmed_attendees,
     313
     314    COUNT(
     315        DISTINCT CASE
     316            WHEN a.status = 'ATTENDED'
     317            THEN a.guest_id
     318        END
     319    ) AS actual_attendance,
     320
     321    v.capacity - COUNT(DISTINCT a.guest_id)
     322        AS available_seats,
     323
     324    ROUND(
     325        (
     326            CAST(COUNT(DISTINCT a.guest_id) AS NUMERIC)
     327            / v.capacity
     328        ) * 100,
     329        2
     330    ) AS occupancy_rate_percent
     331
     332FROM venue v
     333
     334INNER JOIN venue_booking vb
     335    ON v.venue_id = vb.venue_id
     336
     337INNER JOIN wedding w
     338    ON vb.wedding_id = w.wedding_id
     339
     340INNER JOIN "user" u
     341    ON w.user_id = u.user_id
     342
     343LEFT JOIN event e
     344    ON w.wedding_id = e.wedding_id
     345
     346LEFT JOIN attendance a
     347    ON e.event_id = a.event_id
     348    AND a.status IN ('ATTENDED', 'CONFIRMED')
     349
     350GROUP BY
     351    v.venue_id,
     352    v.name,
     353    v.capacity,
     354    w.wedding_id,
     355    u.first_name,
     356    u.last_name,
     357    w.date
     358
     359ORDER BY
     360    v.venue_id,
     361    w.wedding_id;
     362}}}
     363
     364=== Expected Execution Plan ===
     365
     366A typical execution plan may include:
     367
     368{{{
     369HashAggregate
     370  -> Hash Left Join
     371       -> Hash Join
     372            -> Seq Scan on attendance
     373}}}
     374
     375=== Interpretation ===
     376
     377* Seq Scan on attendance:
     378  * attendance records are scanned before aggregation
     379
     380* Hash Join:
     381  * attendance records are matched with event and wedding relations
     382
     383* HashAggregate:
     384  * PostgreSQL groups attendance rows by venue and wedding information
     385
     386The execution cost depends mainly on:
     387* number of attendance records
     388* number of events
     389* number of guests per wedding
     390
     391=== Recommended Indexes ===
     392
     393{{{
     394#!sql
     395CREATE INDEX idx_venue_booking_venue
     396ON venue_booking(venue_id);
     397
     398CREATE INDEX idx_venue_booking_wedding
     399ON venue_booking(wedding_id);
     400
     401CREATE INDEX idx_event_wedding
     402ON event(wedding_id);
     403
     404CREATE INDEX idx_attendance_event
     405ON attendance(event_id);
     406
     407CREATE INDEX idx_attendance_status
     408ON attendance(status);
     409
     410CREATE INDEX idx_attendance_guest
     411ON attendance(guest_id);
     412}}}
     413
     414=== Optimization Benefits ===
     415
     416The proposed indexes improve:
     417* attendance lookup performance
     418* JOIN efficiency
     419* aggregation preparation
     420* occupancy calculation speed
     421
     422The indexes reduce:
     423* full table scans
     424* aggregation overhead
     425* JOIN latency
     426
     427=== Validation ===
     428
     429{{{
     430#!sql
     431EXPLAIN ANALYZE
     432SELECT *
     433FROM attendance
     434WHERE event_id = 1;
     435}}}
     436
     437Expected result:
     438
     439{{{
     440Index Scan using idx_attendance_event on attendance
     441}}}
     442
     443=== Conclusion ===
     444
     445The Venue Capacity query is aggregation-heavy because it calculates attendance and occupancy metrics across multiple joined relations.
     446
     447The most performance-sensitive component is the DISTINCT attendance aggregation.
     448
     449Proper indexing of attendance and event relations significantly improves report scalability and execution efficiency.
     450
     451== 1.5 RSVP Conversion Query – Execution Analysis ==
     452
     453The RSVP Conversion query analyzes how invited guests move through the RSVP process and how many confirmed guests actually attend the event.
     454
     455This query is important because it evaluates guest engagement and invitation effectiveness using RSVP and attendance data.
     456
     457=== Query Characteristics ===
     458
     459The query includes:
     460* multiple INNER JOIN and LEFT JOIN operations
     461* COUNT(DISTINCT ...) aggregation
     462* conditional aggregation with CASE WHEN
     463* NULLIF() division protection
     464* percentage calculations
     465* GROUP BY aggregation
     466
     467The query combines:
     468* wedding
     469* user
     470* event
     471* guest
     472* event_rsvp
     473* attendance
     474
     475=== Performance-Sensitive Operations ===
     476
     477The most expensive operations identified are:
     478
     479* COUNT(DISTINCT g.guest_id):
     480  * calculates total invitations
     481
     482* COUNT(DISTINCT r.response_id):
     483  * calculates RSVP responses
     484
     485* Conditional COUNT(DISTINCT ...):
     486  * calculates confirmed and declined RSVP responses
     487
     488* LEFT JOIN event_rsvp:
     489  * preserves guests even when they have not submitted an RSVP
     490
     491* LEFT JOIN attendance:
     492  * preserves invited guests even when attendance data does not exist
     493
     494=== EXPLAIN ANALYZE ===
    764495
    765496{{{
     
    768499SELECT
    769500    w.wedding_id,
    770     COUNT(g.guest_id)                                  AS total_guests,
    771     COUNT(r.rsvp_id) FILTER (WHERE r.status = 'Accepted') AS confirmed
    772 FROM wedding     w
    773 JOIN guest       g ON w.wedding_id = g.wedding_id
    774 LEFT JOIN event_rsvp r ON g.guest_id   = r.guest_id
    775 GROUP BY w.wedding_id;
    776 }}}
    777 
    778 Expected output:
    779 
    780 {{{
    781 HashAggregate  (cost=3241.10..3243.25 rows=215 width=24)
    782                (actual time=89.3..89.8 rows=215 loops=1)
    783   Group Key: w.wedding_id
    784   ->  Hash Left Join  (cost=... actual time=8.1..71.4 rows=48200 loops=1)
    785         Hash Cond: (g.guest_id = r.guest_id)
    786         ->  Index Scan using idx_guest_wedding on guest g
    787               (actual time=0.02..3.4 rows=12500 loops=1)
    788 Planning Time: 2.8 ms
    789 Execution Time: 90.1 ms
    790 }}}
    791 
    792 ==== Interpretation ====
    793 
    794  * `HashAggregate` — efficient grouping algorithm chosen by the planner
    795  * `Hash Left Join` — appropriate for large result sets without a nested-loop alternative
    796  * `Index Scan using idx_guest_wedding` — confirms the index is being used for the JOIN
    797 
    798 === 5.6 Benchmark Results Summary ===
    799 
    800 || '''SQL Operation'''          || '''Without Index'''  || '''With Index''' || '''Improvement''' ||
    801 || Guest lookup by wedding      || 120 ms               || 2 ms             || 98.3%             ||
    802 || Event timeline (active only) || 850 ms               || 35 ms            || 95.9%             ||
    803 || RSVP aggregation             || 2400 ms              || 180 ms           || 92.5%             ||
    804 || Budget analysis (4 joins)    || 3200 ms              || 145 ms           || 95.5%             ||
    805 || Full-text vendor search      || 4100 ms              || 18 ms            || 99.6%             ||
    806 
    807 ----
     501
     502    u.first_name || ' ' || u.last_name
     503        AS organizer_name,
     504
     505    w.date AS wedding_date,
     506
     507    e.event_id,
     508    e.event_type,
     509
     510    COUNT(DISTINCT g.guest_id)
     511        AS total_invitations,
     512
     513    COUNT(DISTINCT r.response_id)
     514        AS rsvp_responses,
     515
     516    COUNT(DISTINCT CASE
     517        WHEN r.status = 'CONFIRMED'
     518        THEN r.response_id
     519    END) AS confirmed_rsvps,
     520
     521    COUNT(DISTINCT CASE
     522        WHEN r.status = 'DECLINED'
     523        THEN r.response_id
     524    END) AS declined_rsvps,
     525
     526    COUNT(DISTINCT a.attendance_id)
     527        AS attendance_records,
     528
     529    COUNT(DISTINCT CASE
     530        WHEN a.status = 'ATTENDED'
     531        THEN a.attendance_id
     532    END) AS actual_attendees
     533
     534FROM wedding w
     535
     536INNER JOIN "user" u
     537    ON w.user_id = u.user_id
     538
     539INNER JOIN event e
     540    ON w.wedding_id = e.wedding_id
     541
     542LEFT JOIN guest g
     543    ON w.wedding_id = g.wedding_id
     544
     545LEFT JOIN event_rsvp r
     546    ON g.guest_id = r.guest_id
     547    AND e.event_id = r.event_id
     548
     549LEFT JOIN attendance a
     550    ON g.guest_id = a.guest_id
     551    AND e.event_id = a.event_id
     552
     553GROUP BY
     554    w.wedding_id,
     555    u.first_name,
     556    u.last_name,
     557    w.date,
     558    e.event_id,
     559    e.event_type
     560
     561ORDER BY
     562    w.wedding_id,
     563    e.event_id;
     564}}}
     565
     566=== Expected Execution Plan ===
     567
     568A typical execution plan may include:
     569
     570{{{
     571HashAggregate
     572  -> Hash Left Join
     573       -> Hash Left Join
     574            -> Hash Join
     575                 -> Seq Scan on guest
     576}}}
     577
     578=== Interpretation ===
     579
     580* Seq Scan on guest:
     581  * guest rows are scanned before RSVP and attendance matching
     582
     583* Hash Left Join:
     584  * guests are matched with RSVP and attendance records
     585
     586* HashAggregate:
     587  * PostgreSQL groups records by wedding and event before calculating conversion metrics
     588
     589The execution cost increases with:
     590* number of guests
     591* number of events per wedding
     592* number of RSVP records
     593* number of attendance records
     594
     595=== Recommended Indexes ===
     596
     597{{{
     598#!sql
     599CREATE INDEX idx_event_wedding
     600ON event(wedding_id);
     601
     602CREATE INDEX idx_guest_wedding
     603ON guest(wedding_id);
     604
     605CREATE INDEX idx_event_rsvp_guest
     606ON event_rsvp(guest_id);
     607
     608CREATE INDEX idx_event_rsvp_event
     609ON event_rsvp(event_id);
     610
     611CREATE INDEX idx_event_rsvp_status
     612ON event_rsvp(status);
     613
     614CREATE INDEX idx_attendance_guest
     615ON attendance(guest_id);
     616
     617CREATE INDEX idx_attendance_event
     618ON attendance(event_id);
     619
     620CREATE INDEX idx_attendance_status
     621ON attendance(status);
     622}}}
     623
     624=== Optimization Benefits ===
     625
     626The proposed indexes improve:
     627* guest lookup by wedding
     628* RSVP lookup by guest and event
     629* attendance lookup by guest and event
     630* filtering by RSVP and attendance status
     631* GROUP BY preparation
     632
     633The indexes reduce:
     634* unnecessary sequential scans
     635* large intermediate join results
     636* execution time for RSVP reporting
     637
     638=== Validation ===
     639
     640{{{
     641#!sql
     642EXPLAIN ANALYZE
     643SELECT *
     644FROM event_rsvp
     645WHERE guest_id = 1
     646  AND event_id = 1;
     647}}}
     648
     649Expected result:
     650
     651{{{
     652Index Scan using idx_event_rsvp_guest on event_rsvp
     653}}}
     654
     655=== Conclusion ===
     656
     657The RSVP Conversion query is one of the most aggregation-heavy Phase 6 reports because it combines invitation, RSVP, and attendance records into conversion metrics.
     658
     659The most expensive operations are COUNT(DISTINCT ...) and conditional aggregation.
     660
     661Indexing guest, RSVP, attendance, and event foreign-key columns directly improves execution performance and makes the report scalable for larger weddings with many guests.
     662
     663== 1.6 Performance Analysis Summary ==
     664
     665The Phase 6 analytical reports demonstrate significantly higher execution complexity compared to standard transactional queries.
     666
     667The main performance-intensive operations identified throughout the analysis are:
     668
     669* multiple JOIN operations
     670* LEFT JOIN preservation of incomplete relations
     671* GROUP BY aggregation
     672* COUNT(DISTINCT ...) calculations
     673* conditional aggregation using CASE
     674* temporal calculations using EXTRACT(EPOCH)
     675* percentage calculations using ROUND() and NULLIF()
     676
     677== 1.6.1 Main Bottlenecks ==
     678
     679|| Bottleneck || Impact ||
     680|| Multiple LEFT JOIN chains || Increased intermediate result size ||
     681|| COUNT(DISTINCT ...) || Expensive aggregation and sorting ||
     682|| GROUP BY over joined relations || Higher memory and CPU usage ||
     683|| Temporal calculations || Additional CPU processing ||
     684|| Missing indexes on foreign keys || Sequential scans and slow joins ||
     685
     686The most expensive queries are:
     687* RSVP Conversion Analysis
     688* Venue Capacity Utilization Analysis
     689
     690because they process:
     691* attendance records
     692* RSVP records
     693* DISTINCT aggregations
     694* multiple optional relations
     695
     696== 1.6.2 Most Important Indexes ==
     697
     698The following indexes provide the greatest performance improvements for the Phase 6 analytical workload:
     699
     700{{{
     701#!sql
     702CREATE INDEX idx_guest_wedding
     703ON guest(wedding_id);
     704
     705CREATE INDEX idx_event_wedding
     706ON event(wedding_id);
     707
     708CREATE INDEX idx_event_rsvp_guest
     709ON event_rsvp(guest_id);
     710
     711CREATE INDEX idx_attendance_event
     712ON attendance(event_id);
     713
     714CREATE INDEX idx_venue_booking_wedding
     715ON venue_booking(wedding_id);
     716
     717CREATE INDEX idx_photographer_booking_wedding
     718ON photographer_booking(wedding_id);
     719
     720CREATE INDEX idx_band_booking_wedding
     721ON band_booking(wedding_id);
     722}}}
     723
     724These indexes improve:
     725* JOIN performance
     726* aggregation preparation
     727* filtering efficiency
     728* report scalability
     729
     730== 1.6.3 Expected Optimization Improvements ==
     731
     732After applying the recommended indexes, PostgreSQL is expected to:
     733* replace Sequential Scans with Index Scans
     734* reduce JOIN execution cost
     735* reduce aggregation preparation time
     736* reduce overall execution latency
     737
     738Expected improvements include:
     739* faster analytical report generation
     740* lower memory consumption
     741* lower disk I/O
     742* better scalability for larger wedding datasets
     743
     744== 1.6.4 Scalability Considerations ==
     745
     746As the database grows, the analytical queries from Phase 6 become increasingly dependent on:
     747* index quality
     748* efficient JOIN ordering
     749* aggregation optimization
     750* table statistics maintenance
     751
     752The largest future scalability risks are:
     753* very large attendance datasets
     754* large RSVP histories
     755* repeated analytical aggregation over historical weddings
     756
     757To maintain acceptable performance in large-scale deployments, the following strategies are recommended:
     758* materialized views for analytical reports
     759* periodic archiving of historical weddings
     760* automatic VACUUM and ANALYZE maintenance
     761* partitioning of large attendance and RSVP tables
     762
     763== 1.6.5 Final Interpretation ==
     764
     765The analysis confirms that the Phase 6 analytical reports are computationally more expensive than standard transactional operations because they combine multiple relations and perform aggregation-intensive calculations.
     766
     767However, with proper indexing and optimization strategies, PostgreSQL can efficiently execute these analytical reports while maintaining acceptable scalability and execution performance.
     768
     769== 2. Indexing Strategy & Optimization Framework ==
     770
     771A well-designed indexing strategy is essential for maintaining predictable performance
     772under high data volumes and concurrent user load.
     773
     774=== 2.1 Current Index Landscape ===
     775
     776The Phase 6 analytical reports rely heavily on foreign-key relationships and aggregation over attendance, RSVP, and booking data.
     777
     778A schema analysis identified the following important optimization requirements:
     779* indexing of foreign-key columns
     780* indexing of frequently grouped relations
     781* optimization of JOIN paths
     782* optimization of attendance and RSVP aggregation
     783
     784The most performance-sensitive tables are:
     785* attendance
     786* event_rsvp
     787* venue_booking
     788* photographer_booking
     789* band_booking
     790* guest
     791* event
     792
     793=== 2.2 Index Selectivity & Cardinality Analysis ===
     794
     795|| Column || Cardinality || Index Recommendation ||
     796|| wedding_id || High || YES ||
     797|| event_id || High || YES ||
     798|| guest_id || High || YES ||
     799|| venue_id || High || YES ||
     800|| photographer_id || High || YES ||
     801|| band_id || High || YES ||
     802|| status || Medium || YES (combined indexes) ||
     803|| event_type || Medium || Optional ||
     804|| date || High || YES ||
     805
     806Columns with high cardinality provide the best index selectivity and improve JOIN performance significantly.
     807
     808=== 2.3 Recommended Indexes for Phase 6 Reports ===
     809
     810==== 2.3.1 Foreign-Key Optimization Indexes ====
     811
     812{{{
     813#!sql
     814CREATE INDEX idx_guest_wedding
     815ON guest(wedding_id);
     816
     817CREATE INDEX idx_event_wedding
     818ON event(wedding_id);
     819
     820CREATE INDEX idx_venue_booking_wedding
     821ON venue_booking(wedding_id);
     822
     823CREATE INDEX idx_venue_booking_venue
     824ON venue_booking(venue_id);
     825
     826CREATE INDEX idx_photographer_booking_wedding
     827ON photographer_booking(wedding_id);
     828
     829CREATE INDEX idx_photographer_booking_photographer
     830ON photographer_booking(photographer_id);
     831
     832CREATE INDEX idx_band_booking_wedding
     833ON band_booking(wedding_id);
     834
     835CREATE INDEX idx_band_booking_band
     836ON band_booking(band_id);
     837
     838CREATE INDEX idx_attendance_event
     839ON attendance(event_id);
     840
     841CREATE INDEX idx_attendance_guest
     842ON attendance(guest_id);
     843
     844CREATE INDEX idx_event_rsvp_guest
     845ON event_rsvp(guest_id);
     846
     847CREATE INDEX idx_event_rsvp_event
     848ON event_rsvp(event_id);
     849}}}
     850
     851==== Explanation ====
     852
     853These indexes optimize:
     854* JOIN operations
     855* attendance aggregation
     856* RSVP lookup
     857* booking analysis
     858* analytical report generation
     859
     860The indexes reduce:
     861* sequential scans
     862* join latency
     863* aggregation preparation cost
     864
     865=== 2.3.2 Composite Analytical Indexes ====
     866
     867{{{
     868#!sql
     869CREATE INDEX idx_attendance_event_status
     870ON attendance(event_id, status);
     871
     872CREATE INDEX idx_event_rsvp_guest_status
     873ON event_rsvp(guest_id, status);
     874
     875CREATE INDEX idx_event_wedding_date
     876ON event(wedding_id, date);
     877
     878CREATE INDEX idx_venue_booking_date
     879ON venue_booking(wedding_id, date);
     880}}}
     881
     882==== Explanation ====
     883
     884Composite indexes improve analytical filtering because the Phase 6 reports frequently:
     885* filter by status
     886* group by wedding/event
     887* analyze attendance by event
     888* analyze RSVP responses by status
     889
     890These indexes significantly improve:
     891* conditional aggregation
     892* GROUP BY preparation
     893* attendance filtering
     894* RSVP reporting
     895
     896=== 2.4 EXPLAIN ANALYZE Validation ===
     897
     898The following queries can be used to validate index utilization:
     899
     900{{{
     901#!sql
     902EXPLAIN ANALYZE
     903SELECT *
     904FROM attendance
     905WHERE event_id = 1;
     906
     907EXPLAIN ANALYZE
     908SELECT *
     909FROM event_rsvp
     910WHERE guest_id = 1;
     911
     912EXPLAIN ANALYZE
     913SELECT *
     914FROM venue_booking
     915WHERE wedding_id = 1;
     916}}}
     917
     918Expected PostgreSQL output:
     919
     920{{{
     921Index Scan using idx_attendance_event on attendance
     922
     923Index Scan using idx_event_rsvp_guest on event_rsvp
     924
     925Index Scan using idx_venue_booking_wedding on venue_booking
     926}}}
     927
     928=== 2.5 Optimization Benefits ===
     929
     930The proposed indexing strategy improves:
     931* JOIN performance
     932* aggregation efficiency
     933* analytical reporting speed
     934* scalability of Phase 6 queries
     935
     936The indexing strategy reduces:
     937* full table scans
     938* unnecessary disk I/O
     939* aggregation overhead
     940* execution latency
     941
     942=== 2.6 Maintenance Recommendations ===
     943
     944To maintain stable analytical performance, the following maintenance procedures are recommended:
     945* periodic VACUUM execution
     946* regular ANALYZE statistics updates
     947* monitoring unused indexes
     948* reindexing fragmented indexes
     949* monitoring query execution plans with EXPLAIN ANALYZE
     950
     951These maintenance operations help PostgreSQL preserve optimal execution plans for the analytical reports implemented in Phase 6.
     952
     953== 3. Caching, Partitioning & Storage Optimization ==
     954
     955=== 3.1 Caching & Analytical Optimization ===
     956
     957The Phase 6 analytical reports execute complex aggregation queries across attendance, RSVP, booking, and event relations.
     958
     959As the database grows, repeatedly calculating these analytical metrics may increase execution time and server load.
     960
     961To improve scalability, the following optimization strategies are recommended:
     962* materialized analytical reports
     963* cached aggregation results
     964* periodic statistics refresh
     965* optimization of historical analytical workloads
     966
     967=== 3.2 Materialized Views ===
     968
     969Materialized views can precompute expensive analytical calculations and significantly reduce report execution time.
     970
     971The following analytical reports are good candidates for materialization:
     972* Budget Analysis
     973* Venue Capacity Utilization
     974* RSVP Conversion Analysis
     975
     976==== Example: RSVP Conversion Materialized View ====
     977
     978{{{
     979#!sql
     980CREATE MATERIALIZED VIEW mv_rsvp_conversion AS
     981
     982SELECT
     983    w.wedding_id,
     984
     985    e.event_id,
     986
     987    COUNT(DISTINCT g.guest_id)
     988        AS total_invitations,
     989
     990    COUNT(DISTINCT r.response_id)
     991        AS rsvp_responses,
     992
     993    COUNT(DISTINCT CASE
     994        WHEN r.status = 'CONFIRMED'
     995        THEN r.response_id
     996    END) AS confirmed_rsvps,
     997
     998    COUNT(DISTINCT CASE
     999        WHEN a.status = 'ATTENDED'
     1000        THEN a.attendance_id
     1001    END) AS actual_attendees
     1002
     1003FROM wedding w
     1004
     1005INNER JOIN event e
     1006    ON w.wedding_id = e.wedding_id
     1007
     1008LEFT JOIN guest g
     1009    ON w.wedding_id = g.wedding_id
     1010
     1011LEFT JOIN event_rsvp r
     1012    ON g.guest_id = r.guest_id
     1013    AND e.event_id = r.event_id
     1014
     1015LEFT JOIN attendance a
     1016    ON g.guest_id = a.guest_id
     1017    AND e.event_id = a.event_id
     1018
     1019GROUP BY
     1020    w.wedding_id,
     1021    e.event_id;
     1022}}}
     1023
     1024==== Benefits ====
     1025
     1026Materialized views improve:
     1027* analytical query speed
     1028* dashboard loading
     1029* repeated reporting performance
     1030* scalability for historical analytics
     1031
     1032The materialized view stores precomputed aggregation results and avoids recalculating complex JOIN operations during every report execution.
     1033
     1034==== Refreshing the Materialized View ====
     1035
     1036{{{
     1037#!sql
     1038REFRESH MATERIALIZED VIEW mv_rsvp_conversion;
     1039}}}
     1040
     1041=== 3.3 Partitioning Considerations ===
     1042
     1043The largest future analytical tables are expected to be:
     1044* attendance
     1045* event_rsvp
     1046* guest
     1047
     1048As historical wedding data grows, partitioning may improve scalability.
     1049
     1050Recommended partitioning strategy:
     1051* partition attendance by event date
     1052* partition RSVP records by wedding date
     1053* archive historical weddings separately
     1054
     1055==== Example: Attendance Partitioning ====
     1056
     1057The following example demonstrates a conceptual partitioned version of the attendance table for large-scale deployments.
     1058
     1059{{{
     1060#!sql
     1061CREATE TABLE attendance (
     1062
     1063    attendance_id SERIAL,
     1064    status VARCHAR(30),
     1065    table_number INTEGER,
     1066    role VARCHAR(50),
     1067    guest_id INTEGER,
     1068    event_id INTEGER,
     1069    attendance_date DATE,
     1070
     1071    PRIMARY KEY(attendance_id, attendance_date)
     1072
     1073) PARTITION BY RANGE (attendance_date);
     1074}}}
     1075
     1076==== Benefits ====
     1077
     1078Partitioning improves:
     1079* analytical query performance
     1080* historical data management
     1081* maintenance operations
     1082* VACUUM efficiency
     1083* scalability of large attendance datasets
     1084
     1085=== 3.4 Storage Optimization ===
     1086
     1087To improve long-term database performance, the following storage optimizations are recommended:
     1088* periodic VACUUM execution
     1089* ANALYZE statistics updates
     1090* archival of historical wedding records
     1091* separation of analytical and transactional workloads
     1092
     1093These optimizations reduce:
     1094* table fragmentation
     1095* outdated planner statistics
     1096* unnecessary sequential scans
     1097* analytical execution overhead
     1098
     1099=== 3.5 Scalability Interpretation ===
     1100
     1101The analytical queries from Phase 6 are aggregation-heavy and become increasingly expensive as attendance and RSVP data grows.
     1102
     1103Caching, materialized views, and partitioning help PostgreSQL maintain predictable execution performance even when processing large analytical datasets.
     1104
     1105These strategies are especially important for:
     1106* large weddings
     1107* long-term historical reporting
     1108* repeated dashboard analytics
     1109* concurrent report execution
     1110
     1111== 4. Concurrency, Transaction Management & Locking Strategy ==
     1112
     1113=== 4.1 Transaction Isolation Levels ===
     1114
     1115The Wedding Planner Management System includes several operations that require transactional consistency and protection from concurrent modification.
     1116
     1117Examples include:
     1118* venue booking
     1119* attendance updates
     1120* RSVP processing
     1121* wedding scheduling
     1122* event creation
     1123
     1124The following PostgreSQL isolation levels are recommended:
     1125
     1126|| Isolation Level || Recommended Usage ||
     1127|| READ COMMITTED || Standard CRUD operations ||
     1128|| REPEATABLE READ || Venue booking validation ||
     1129|| SERIALIZABLE || Critical scheduling operations ||
     1130
     1131=== 4.2 Concurrency Risks ===
     1132
     1133The most important concurrency risks identified are:
     1134* double-booking of venues
     1135* simultaneous RSVP modifications
     1136* concurrent attendance updates
     1137* overlapping event scheduling
     1138
     1139Without proper transaction management, multiple users may modify the same wedding-related records simultaneously.
     1140
     1141=== 4.3 Deadlock Prevention Strategies ===
     1142
     1143The following practices reduce deadlock risk:
     1144* consistent transaction ordering
     1145* short transaction duration
     1146* indexing foreign-key columns
     1147* avoiding unnecessary table locking
     1148
     1149The most sensitive operations are:
     1150* venue reservation
     1151* event scheduling
     1152* attendance confirmation
     1153
     1154=== 4.4 Safe Venue Reservation Transaction ===
     1155
     1156The following transaction prevents double-booking of venues.
     1157
     1158{{{
     1159#!sql
     1160BEGIN;
     1161
     1162SELECT venue_id
     1163FROM venue
     1164WHERE venue_id = 1
     1165FOR UPDATE;
     1166
     1167INSERT INTO venue_booking (
     1168    date,
     1169    start_time,
     1170    end_time,
     1171    status,
     1172    price,
     1173    venue_id,
     1174    wedding_id
     1175)
     1176SELECT
     1177    '2025-08-20',
     1178    '18:00:00',
     1179    '23:00:00',
     1180    'CONFIRMED',
     1181    5000.00,
     1182    1,
     1183    2
     1184
     1185WHERE NOT EXISTS (
     1186
     1187    SELECT 1
     1188
     1189    FROM venue_booking
     1190
     1191    WHERE venue_id = 1
     1192      AND date = '2025-08-20'
     1193      AND status != 'CANCELLED'
     1194);
     1195
     1196COMMIT;
     1197}}}
     1198
     1199=== Explanation ===
     1200
     1201`FOR UPDATE` locks the selected venue row during the transaction.
     1202
     1203This prevents concurrent transactions from simultaneously booking the same venue for overlapping dates.
     1204
     1205The `NOT EXISTS` condition ensures that no conflicting booking already exists.
     1206
     1207=== Validation ===
     1208
     1209{{{
     1210#!sql
     1211SELECT
     1212    venue_id,
     1213    date,
     1214    COUNT(*)
     1215
     1216FROM venue_booking
     1217
     1218WHERE status != 'CANCELLED'
     1219
     1220GROUP BY
     1221    venue_id,
     1222    date
     1223
     1224HAVING COUNT(*) > 1;
     1225}}}
     1226
     1227Expected result:
     1228* empty result set
     1229
     1230Any returned rows indicate conflicting venue bookings.
     1231
     1232=== 4.5 Row-Level Locking ===
     1233
     1234The system primarily relies on:
     1235* row-level locks
     1236* transaction isolation
     1237* foreign-key consistency
     1238
     1239Row-level locking is preferred because it:
     1240* minimizes blocking
     1241* improves concurrency
     1242* prevents unnecessary table-wide locks
     1243
     1244=== 4.6 Transaction Scalability ===
     1245
     1246As the number of weddings and concurrent users increases, transaction management becomes increasingly important.
     1247
     1248The following practices improve scalability:
     1249* keeping transactions short
     1250* indexing transactional lookup columns
     1251* separating analytical reports from transactional operations
     1252* avoiding long-running locks
     1253
     1254=== 4.7 Final Interpretation ===
     1255
     1256The Wedding Planner Management System contains several scheduling and booking operations that require transactional consistency.
     1257
     1258PostgreSQL transaction isolation and row-level locking mechanisms help prevent:
     1259* double-booking
     1260* inconsistent RSVP updates
     1261* concurrent attendance conflicts
     1262* invalid scheduling states
     1263
     1264Proper transaction management ensures both:
     1265* data integrity
     1266* stable concurrent system behavior
     1267
     1268== 5. Performance Analysis with EXPLAIN / EXPLAIN ANALYZE ==
     1269
     1270=== 5.1 Purpose of EXPLAIN and EXPLAIN ANALYZE ===
     1271
     1272PostgreSQL provides several commands for analyzing query execution behavior.
     1273
     1274|| Command || Purpose ||
     1275|| EXPLAIN || Shows the planned execution strategy without running the query ||
     1276|| EXPLAIN ANALYZE || Executes the query and shows real execution statistics ||
     1277|| EXPLAIN (ANALYZE, BUFFERS) || Shows execution statistics plus buffer/cache usage ||
     1278
     1279In this phase, `EXPLAIN ANALYZE` is used to evaluate the complex analytical queries from Phase 6.
     1280
     1281It helps identify:
     1282* scan types
     1283* join strategies
     1284* aggregation methods
     1285* actual execution time
     1286* row counts
     1287* loops
     1288* possible bottlenecks
     1289
     1290=== 5.2 Important Execution Plan Elements ===
     1291
     1292|| Plan Element || Meaning ||
     1293|| Seq Scan || PostgreSQL scans the entire table ||
     1294|| Index Scan || PostgreSQL uses an index to locate rows faster ||
     1295|| Hash Join || PostgreSQL builds a hash table to join larger datasets ||
     1296|| Nested Loop || PostgreSQL repeatedly scans one relation for each row of another relation ||
     1297|| HashAggregate || PostgreSQL performs grouping and aggregation using a hash table ||
     1298|| Sort || PostgreSQL sorts rows for ORDER BY or aggregation operations ||
     1299
     1300For Phase 6 reports, the most common expected elements are:
     1301* Hash Join
     1302* HashAggregate
     1303* Index Scan
     1304* Seq Scan on small tables
     1305* Sort
     1306
     1307=== 5.3 Sequential Scan vs Index Scan ===
     1308
     1309A Sequential Scan is not always a problem.
     1310
     1311PostgreSQL may choose a Sequential Scan when:
     1312* the table is small
     1313* most rows are needed
     1314* the cost of using an index is higher than scanning the table
     1315
     1316However, for large Phase 6 analytical tables such as:
     1317* attendance
     1318* event_rsvp
     1319* guest
     1320* booking tables
     1321
     1322Index Scans are preferred when filtering or joining by foreign-key columns.
     1323
     1324=== 5.4 Example: Attendance Lookup Before Optimization ===
     1325
     1326{{{
     1327#!sql
     1328EXPLAIN ANALYZE
     1329SELECT *
     1330FROM attendance
     1331WHERE event_id = 1;
     1332}}}
     1333
     1334Without an index on `attendance(event_id)`, PostgreSQL may use:
     1335
     1336{{{
     1337Seq Scan on attendance
     1338  Filter: (event_id = 1)
     1339}}}
     1340
     1341This means that all attendance rows are scanned before matching rows are returned.
     1342
     1343For large attendance tables, this increases:
     1344* disk I/O
     1345* CPU usage
     1346* query latency
     1347
     1348=== 5.5 Example: Attendance Lookup After Optimization ===
     1349
     1350{{{
     1351#!sql
     1352CREATE INDEX idx_attendance_event
     1353ON attendance(event_id);
     1354
     1355EXPLAIN ANALYZE
     1356SELECT *
     1357FROM attendance
     1358WHERE event_id = 1;
     1359}}}
     1360
     1361Expected result:
     1362
     1363{{{
     1364Index Scan using idx_attendance_event on attendance
     1365  Index Cond: (event_id = 1)
     1366}}}
     1367
     1368This confirms that PostgreSQL can directly locate attendance records for a specific event.
     1369
     1370=== 5.6 EXPLAIN ANALYZE for Phase 6 Reports ===
     1371
     1372The Phase 6 reports should be analyzed using EXPLAIN ANALYZE because they include:
     1373* joins across multiple tables
     1374* aggregation
     1375* distinct counting
     1376* conditional calculations
     1377
     1378The analysis should be performed on:
     1379* Budget Analysis query
     1380* Venue Capacity query
     1381* RSVP Conversion query
     1382
     1383These queries represent the real analytical workload of the Wedding Planner Management System.
     1384
     1385=== 5.7 Interpreting Execution Time ===
     1386
     1387When reading EXPLAIN ANALYZE output, the most important values are:
     1388* Planning Time
     1389* Execution Time
     1390* actual rows
     1391* loops
     1392* scan type
     1393* join type
     1394
     1395High execution time may indicate:
     1396* missing indexes
     1397* inefficient JOIN order
     1398* expensive aggregation
     1399* large intermediate result sets
     1400* outdated PostgreSQL statistics
     1401
     1402=== 5.8 Validation of Index Usage ===
     1403
     1404After creating indexes, index usage can be checked using:
     1405
     1406{{{
     1407#!sql
     1408SELECT
     1409    indexrelname AS index_name,
     1410    idx_scan AS times_used,
     1411    idx_tup_read AS tuples_read,
     1412    idx_tup_fetch AS tuples_fetched
     1413
     1414FROM pg_stat_user_indexes
     1415
     1416WHERE indexrelname IN (
     1417    'idx_attendance_event',
     1418    'idx_guest_wedding',
     1419    'idx_event_rsvp_guest',
     1420    'idx_venue_booking_wedding'
     1421);
     1422}}}
     1423
     1424If `idx_scan` increases after report execution, the index is being used.
     1425
     1426If `idx_scan` remains 0, the index may be unused or the query planner may prefer another execution strategy.
     1427
     1428=== 5.9 Summary ===
     1429
     1430EXPLAIN ANALYZE is essential for validating the performance of the Phase 6 analytical reports.
     1431
     1432It helps confirm whether:
     1433* indexes are used correctly
     1434* joins are efficient
     1435* aggregation is acceptable
     1436* execution time is reasonable
     1437
     1438This makes EXPLAIN ANALYZE an important part of database performance tuning and report optimization.
    8081439
    8091440== 6. Security Architecture & Data Protection ==
     
    8111442=== 6.1 Authentication & Authorization ===
    8121443
    813  * Role-based access control (RBAC) with clearly defined privilege separation
    814  * Separate roles for: admin, operations, reporting, and external API access
    815  * Multi-factor authentication (MFA) required for all admin-level operations
    816 
    817 ==== Role Definitions ====
    818 
    819 || '''Role'''   || '''Permissions'''                                      || '''Restrictions'''                                             || '''MFA Required''' ||
    820 || Guest        || Read own RSVP and profile data                        || No access to other guests, budgets, or admin functions         || No                 ||
    821 || Planner      || Full CRUD on own wedding data, limited report access  || Cannot modify other planners' data or system configuration     || Recommended        ||
    822 || Vendor       || Read contracts and proposals; message planners        || Cannot access guest lists, budgets, or modify records          || Yes                ||
    823 || Admin        || Full system access, user management, audit logs       || All actions monitored; access restricted by IP allowlist       || Yes (Required)     ||
    824 
    825 ==== Sample: PostgreSQL RBAC Implementation ====
    826 
    827 {{{
    828 #!sql
    829 -- Create a read-only reporting role
    830 CREATE ROLE reporting_readonly;
    831 
    832 GRANT CONNECT ON DATABASE wedding_planner TO reporting_readonly;
    833 GRANT USAGE   ON SCHEMA public             TO reporting_readonly;
    834 GRANT SELECT  ON ALL TABLES IN SCHEMA public TO reporting_readonly;
    835 
    836 -- Create a planner role with limited write access
    837 CREATE ROLE planner_role;
    838 
    839 GRANT SELECT, INSERT, UPDATE ON wedding, guest, event, event_rsvp
    840     TO planner_role;
    841 
    842 REVOKE DELETE ON wedding FROM planner_role;
    843 }}}
    844 
    845 ==== Row-Level Security (RLS) ====
    846 
    847 {{{
    848 #!sql
    849 -- Enable row-level security on the guest table
    850 ALTER TABLE guest ENABLE ROW LEVEL SECURITY;
    851 
    852 -- Planners can only access guests belonging to their own weddings
    853 CREATE POLICY guest_isolation ON guest
     1444The Wedding Planner Management System stores:
     1445* wedding schedules
     1446* guest information
     1447* RSVP records
     1448* attendance data
     1449* booking information
     1450
     1451Because the system contains sensitive personal and operational data, controlled database access is required.
     1452
     1453The recommended approach is Role-Based Access Control (RBAC).
     1454
     1455=== Recommended Roles ===
     1456
     1457|| Role || Permissions || Restrictions ||
     1458|| Guest User || Read personal RSVP information || Cannot modify wedding data ||
     1459|| Wedding Organizer || CRUD operations for own weddings and events || Cannot access unrelated weddings ||
     1460|| Administrator || Full database access || Restricted to authorized personnel only ||
     1461
     1462=== PostgreSQL Role Implementation ===
     1463
     1464{{{
     1465#!sql
     1466CREATE ROLE wedding_guest;
     1467
     1468GRANT CONNECT ON DATABASE wedding_planner
     1469TO wedding_guest;
     1470
     1471GRANT SELECT ON guest, event_rsvp
     1472TO wedding_guest;
     1473
     1474CREATE ROLE wedding_organizer;
     1475
     1476GRANT SELECT, INSERT, UPDATE
     1477ON wedding, event, guest, event_rsvp, attendance, venue_booking
     1478TO wedding_organizer;
     1479
     1480CREATE ROLE wedding_admin;
     1481
     1482GRANT ALL PRIVILEGES
     1483ON ALL TABLES IN SCHEMA public
     1484TO wedding_admin;
     1485}}}
     1486
     1487=== Row-Level Security ===
     1488
     1489Row-Level Security (RLS) can restrict organizers to accessing only their own weddings.
     1490
     1491{{{
     1492#!sql
     1493ALTER TABLE wedding
     1494ENABLE ROW LEVEL SECURITY;
     1495
     1496CREATE POLICY wedding_access_policy
     1497ON wedding
     1498
    8541499USING (
    855     wedding_id IN (
    856         SELECT wedding_id
    857         FROM wedding
    858         WHERE planner_id = current_setting('app.current_user_id')::INTEGER
     1500    user_id =
     1501    current_setting('app.current_user_id')::INTEGER
     1502);
     1503}}}
     1504
     1505=== Explanation ===
     1506
     1507This policy ensures that each organizer can only access weddings linked to their own user account.
     1508
     1509This prevents unauthorized access to:
     1510* guest lists
     1511* attendance records
     1512* RSVP information
     1513* booking information
     1514
     1515=== Validation ===
     1516
     1517{{{
     1518#!sql
     1519SELECT *
     1520FROM wedding;
     1521}}}
     1522
     1523Expected behavior:
     1524* users only see weddings associated with their own user_id
     1525
     1526=== Security Benefits ===
     1527
     1528The proposed RBAC and RLS configuration improves:
     1529* access control
     1530* data isolation
     1531* organizer privacy
     1532* protection against unauthorized data access
     1533
     1534These mechanisms are especially important in multi-user environments where several organizers use the system simultaneously.
     1535
     1536=== 6.2 Encryption Strategy ===
     1537
     1538==== Encryption in Transit ====
     1539
     1540All communication between the application and PostgreSQL database should use encrypted connections.
     1541
     1542Recommended protections:
     1543* TLS-secured database connections
     1544* encrypted API communication
     1545* secure administrator access
     1546
     1547These protections prevent interception of:
     1548* guest information
     1549* RSVP records
     1550* attendance data
     1551* wedding schedules
     1552
     1553==== Encryption at Rest ====
     1554
     1555Sensitive information stored inside the database should be protected using encryption mechanisms.
     1556
     1557Sensitive fields include:
     1558* phone numbers
     1559* email addresses
     1560* guest notes
     1561* RSVP comments
     1562
     1563PostgreSQL extensions such as `pgcrypto` can be used for column-level encryption.
     1564
     1565==== Example: pgcrypto Encryption ====
     1566
     1567{{{
     1568#!sql
     1569INSERT INTO guest (
     1570    first_name,
     1571    last_name,
     1572    email
     1573)
     1574VALUES (
     1575    'Ivan',
     1576    'Petrov',
     1577    pgp_sym_encrypt(
     1578        'ivan@email.com',
     1579        'wedding_secret_key'
    8591580    )
    8601581);
    8611582}}}
    8621583
    863 ==== Verification ====
    864 
    865 {{{
    866 #!sql
    867 -- Test that the policy correctly isolates data between planners
    868 SET app.current_user_id = '1001';
    869 
    870 -- Should return only guests from wedding(s) owned by planner 1001
    871 SELECT guest_id, full_name, wedding_id
     1584==== Example: Decryption ====
     1585
     1586{{{
     1587#!sql
     1588SELECT
     1589    first_name,
     1590    last_name,
     1591
     1592    pgp_sym_decrypt(
     1593        email::bytea,
     1594        'wedding_secret_key'
     1595    ) AS decrypted_email
     1596
     1597FROM guest;
     1598}}}
     1599
     1600==== Implementation Note ====
     1601
     1602The encryption example demonstrates the conceptual usage of PostgreSQL `pgcrypto`.
     1603
     1604In a production implementation, encrypted values should be stored in dedicated encrypted columns (for example `email_encrypted BYTEA`) instead of replacing the original plaintext column directly.
     1605
     1606This approach improves schema consistency and avoids datatype conflicts during encryption and decryption operations.
     1607
     1608==== Validation ====
     1609
     1610{{{
     1611#!sql
     1612SELECT email
     1613FROM guest;
     1614}}}
     1615
     1616Expected result:
     1617* encrypted binary data instead of readable plaintext
     1618
     1619=== Security Benefits ===
     1620
     1621Encryption improves:
     1622* protection of sensitive guest data
     1623* privacy of wedding participants
     1624* protection against unauthorized database access
     1625* compliance with modern data-protection practices
     1626
     1627=== 6.3 Data Masking & Anonymization ===
     1628
     1629The Wedding Planner Management System stores sensitive personal information related to wedding guests and organizers.
     1630
     1631For analytical and testing environments, sensitive information should be masked or anonymized.
     1632
     1633Sensitive information includes:
     1634* guest names
     1635* phone numbers
     1636* email addresses
     1637* RSVP comments
     1638
     1639=== Example: Masked Reporting View ===
     1640
     1641{{{
     1642#!sql
     1643CREATE VIEW v_guest_reporting AS
     1644
     1645SELECT
     1646    guest_id,
     1647
     1648    CONCAT(
     1649        LEFT(first_name, 1),
     1650        REPEAT('*', LENGTH(first_name) - 1)
     1651    ) AS masked_first_name,
     1652
     1653    CONCAT(
     1654        LEFT(last_name, 1),
     1655        REPEAT('*', LENGTH(last_name) - 1)
     1656    ) AS masked_last_name,
     1657
     1658    CONCAT(
     1659        REPEAT('*', 5),
     1660        RIGHT(email, POSITION('@' IN email))
     1661    ) AS masked_email,
     1662
     1663    wedding_id
     1664
     1665FROM guest;
     1666}}}
     1667
     1668=== Explanation ===
     1669
     1670The view masks personally identifiable information while still allowing analytical reporting.
     1671
     1672This approach is useful for:
     1673* testing environments
     1674* reporting dashboards
     1675* analytical exports
     1676* demonstration systems
     1677
     1678=== Validation ===
     1679
     1680{{{
     1681#!sql
     1682SELECT *
     1683FROM v_guest_reporting;
     1684}}}
     1685
     1686Expected result:
     1687* masked guest names
     1688* masked email addresses
     1689* preserved wedding relationships
     1690
     1691=== Security Benefits ===
     1692
     1693Data masking improves:
     1694* privacy protection
     1695* safer analytical reporting
     1696* reduced exposure of sensitive data
     1697* compliance with data-protection principles
     1698
     1699=== 6.4 SQL Injection Prevention ===
     1700
     1701The Wedding Planner Management System accepts user-generated input through:
     1702* RSVP forms
     1703* guest registration
     1704* wedding creation
     1705* event scheduling
     1706* attendance management
     1707
     1708If SQL queries are constructed incorrectly, malicious input may compromise database security.
     1709
     1710=== Unsafe Query Example ===
     1711
     1712The following example is vulnerable to SQL injection:
     1713
     1714{{{
     1715#!sql
     1716query =
     1717"SELECT * FROM guest WHERE first_name = '" + user_input + "'"
     1718}}}
     1719
     1720This approach allows attackers to manipulate SQL syntax through malicious input.
     1721
     1722=== Secure Parameterized Query ===
     1723
     1724The recommended approach is parameterized execution.
     1725
     1726{{{
     1727#!sql
     1728PREPARE guest_lookup(TEXT) AS
     1729
     1730SELECT
     1731    guest_id,
     1732    first_name,
     1733    last_name,
     1734    wedding_id
     1735
    8721736FROM guest
    873 LIMIT 10;
    874 }}}
    875 
    876 ==== Validation ====
    877 
    878 {{{
    879 #!sql
    880 -- Confirm RLS policies are active on all sensitive tables
     1737
     1738WHERE first_name = $1;
     1739}}}
     1740
     1741=== Example Execution ===
     1742
     1743{{{
     1744#!sql
     1745EXECUTE guest_lookup('Ivan');
     1746}}}
     1747
     1748=== Explanation ===
     1749
     1750Parameterized queries separate:
     1751* SQL structure
     1752* user-provided input
     1753
     1754This prevents user input from being interpreted as executable SQL code.
     1755
     1756=== Security Benefits ===
     1757
     1758Parameterized queries protect against:
     1759* unauthorized data access
     1760* SQL injection attacks
     1761* manipulation of RSVP data
     1762* unauthorized attendance modification
     1763* wedding data corruption
     1764
     1765=== Recommended Practices ===
     1766
     1767The following practices are recommended throughout the system:
     1768* parameterized queries
     1769* input validation
     1770* restricted database permissions
     1771* transaction isolation
     1772* prepared statements
     1773
     1774These protections significantly improve overall database security.
     1775
     1776=== 6.5 GDPR Compliance ===
     1777
     1778The Wedding Planner Management System stores personal information related to:
     1779* wedding organizers
     1780* guests
     1781* RSVP records
     1782* attendance information
     1783
     1784Because the system processes personal data, several GDPR-related principles should be respected.
     1785
     1786=== Data Minimization ===
     1787
     1788Only information necessary for wedding organization should be stored.
     1789
     1790Examples:
     1791* guest names
     1792* RSVP responses
     1793* attendance status
     1794* organizer contact information
     1795
     1796Unnecessary personal information should not be collected.
     1797
     1798=== Access Control ===
     1799
     1800Only authorized users should access:
     1801* guest information
     1802* RSVP records
     1803* attendance reports
     1804* wedding schedules
     1805
     1806Role-Based Access Control and Row-Level Security help enforce this restriction.
     1807
     1808=== Right to Access ===
     1809
     1810Users should be able to:
     1811* view their stored information
     1812* verify RSVP information
     1813* review attendance-related records
     1814
     1815=== Right to Deletion ===
     1816
     1817When requested, personal information should be removable from the database.
     1818
     1819Example deletion operation:
     1820
     1821{{{
     1822#!sql
     1823DELETE FROM guest
     1824WHERE guest_id = 10;
     1825}}}
     1826
     1827=== Backup Protection ===
     1828
     1829Backups containing personal information should:
     1830* remain encrypted
     1831* be access-controlled
     1832* be stored securely
     1833
     1834=== GDPR Benefits ===
     1835
     1836Applying GDPR principles improves:
     1837* privacy protection
     1838* organizer trust
     1839* legal compliance
     1840* secure handling of wedding-related data
     1841
     1842=== 6.6 Backup, Restore & Disaster Recovery ===
     1843
     1844The Wedding Planner Management System stores important operational and personal data.
     1845
     1846Database backups are necessary to protect:
     1847* wedding schedules
     1848* RSVP records
     1849* attendance information
     1850* booking data
     1851* organizer information
     1852
     1853=== Recommended Backup Strategy ===
     1854
     1855|| Backup Type || Frequency ||
     1856|| Full Backup || Daily ||
     1857|| Incremental Backup || Every few hours ||
     1858|| WAL Archiving || Continuous ||
     1859
     1860=== Backup Validation ===
     1861
     1862After backup restoration, database consistency should be verified.
     1863
     1864{{{
     1865#!sql
    8811866SELECT
    882     tablename,
    883     rowsecurity
    884 FROM pg_tables
    885 WHERE schemaname = 'public'
    886   AND tablename IN ('guest', 'wedding', 'payment', 'budget')
    887 ORDER BY tablename;
    888 }}}
    889 
    890 All rows in the result must show `rowsecurity = true`. Any table showing `false`
    891 has an unprotected surface that must be remediated before deployment.
    892 
    893 === 6.2 Encryption Strategy ===
    894 
    895 ==== Encryption in Transit ====
    896 
    897  * Enforce TLS 1.2+ across all communication channels (API, database connections, replication streams)
    898  * Reject plaintext connections at the PostgreSQL `pg_hba.conf` level using `hostssl`
    899 
    900 ==== Encryption at Rest ====
    901 
    902  * Encrypt client PII, payment methods, contracts, and legal documents
    903  * Use dedicated KMS (Key Management Service) for key storage and rotation policies
    904  * Apply column-level encryption using `pgcrypto` for the most sensitive fields
    905 
    906 ==== Sample: Column-Level Encryption with pgcrypto ====
    907 
    908 {{{
    909 #!sql
    910 -- Store an encrypted phone number
    911 INSERT INTO client (name, phone_number_encrypted)
    912 VALUES (
    913     'Ivan Petrov',
    914     pgp_sym_encrypt('+389 70 123 456', current_setting('app.encryption_key'))
     1867    'wedding' AS table_name,
     1868    COUNT(*) AS row_count
     1869FROM wedding
     1870
     1871UNION ALL
     1872
     1873SELECT
     1874    'guest',
     1875    COUNT(*)
     1876FROM guest
     1877
     1878UNION ALL
     1879
     1880SELECT
     1881    'event',
     1882    COUNT(*)
     1883FROM event
     1884
     1885UNION ALL
     1886
     1887SELECT
     1888    'attendance',
     1889    COUNT(*)
     1890FROM attendance;
     1891}}}
     1892
     1893=== Explanation ===
     1894
     1895The validation query confirms that:
     1896* important tables were restored correctly
     1897* row counts are preserved
     1898* critical wedding data still exists
     1899
     1900=== Disaster Recovery Recommendations ===
     1901
     1902The following practices improve recovery reliability:
     1903* encrypted backups
     1904* off-site backup storage
     1905* periodic restore testing
     1906* automatic backup scheduling
     1907
     1908=== Security Benefits ===
     1909
     1910Proper backup management improves:
     1911* data recovery capability
     1912* protection against accidental deletion
     1913* recovery after hardware failure
     1914* long-term database reliability
     1915
     1916=== 6.7 Audit Logging ===
     1917
     1918Audit logging is important for tracking sensitive modifications inside the Wedding Planner Management System.
     1919
     1920The system should record:
     1921* RSVP modifications
     1922* attendance updates
     1923* venue booking changes
     1924* wedding schedule modifications
     1925* guest list updates
     1926
     1927Audit logging improves:
     1928* accountability
     1929* traceability
     1930* security monitoring
     1931* recovery after accidental modification
     1932
     1933=== Example Audit Table ===
     1934
     1935{{{
     1936#!sql
     1937CREATE TABLE audit_log (
     1938
     1939    audit_id SERIAL PRIMARY KEY,
     1940
     1941    table_name VARCHAR(100),
     1942
     1943    operation_type VARCHAR(30),
     1944
     1945    changed_by VARCHAR(100),
     1946
     1947    changed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
     1948
     1949    affected_record_id INTEGER
    9151950);
    916 
    917 -- Decrypt for authorized retrieval
    918 SELECT
    919     name,
    920     pgp_sym_decrypt(
    921         phone_number_encrypted,
    922         current_setting('app.encryption_key')
    923     ) AS phone_number
    924 FROM client
    925 WHERE client_id = 101;
    926 }}}
    927 
    928 ==== Validation ====
    929 
    930 {{{
    931 #!sql
    932 -- Confirm that the encrypted column cannot be read as plaintext
    933 SELECT name, phone_number_encrypted
    934 FROM client
    935 WHERE client_id = 101;
    936 }}}
    937 
    938 The `phone_number_encrypted` column must display binary/ciphertext only.
    939 Any readable plaintext indicates that the encryption step was skipped during insertion.
    940 
    941 === 6.3 Data Masking & Anonymization ===
    942 
    943  * Mask sensitive fields (names, emails, phone numbers) in all non-production environments
    944  * Tokenize personally identifiable information for use in analytics pipelines
    945  * Apply role-based de-identification so that reporting users never see raw PII
    946 
    947 ==== Sample: Masked View for Reporting ====
    948 
    949 {{{
    950 #!sql
    951 CREATE VIEW v_client_reporting AS
    952 SELECT
    953     client_id,
    954     CONCAT(LEFT(name, 1), REPEAT('*', LENGTH(name) - 1))          AS masked_name,
    955     CONCAT(REPEAT('*', 6), RIGHT(email, POSITION('@' IN email)))   AS masked_email,
    956     DATE_TRUNC('month', created_at)                                AS registration_month
    957 FROM client;
    958 
    959 -- Grant only the reporting role access to this view
    960 GRANT SELECT ON v_client_reporting TO reporting_readonly;
    961 }}}
    962 
    963 === 6.4 SQL Injection Prevention ===
    964 
    965 Never construct queries using string concatenation with user input:
    966 
    967 {{{
    968 #!sql
    969 -- WRONG: critically vulnerable to SQL injection
    970 query = "SELECT * FROM guest WHERE name = '" + user_input + "'"
    971 }}}
    972 
    973 Always use parameterized queries:
    974 
    975 {{{
    976 #!sql
    977 -- CORRECT: safe parameterized query using $1 placeholder
    978 PREPARE guest_lookup (TEXT) AS
    979     SELECT guest_id, full_name, wedding_id
    980     FROM guest
    981     WHERE full_name = $1;
    982 
    983 EXECUTE guest_lookup('Ivan Petrov');
    984 }}}
    985 
    986 === 6.5 GDPR Compliance ===
    987 
    988  * '''Right to Access''' — users can export their complete personal dataset on demand
    989  * '''Right to Erasure''' — secure data deletion with cryptographic key destruction for encrypted fields
    990  * '''Data Minimization''' — only data strictly necessary for operations is collected and stored
    991  * '''Consent Management''' — explicit opt-in required for all non-essential data processing
    992 
    993 === 6.6 Backup, Restore & Disaster Recovery ===
    994 
    995 || '''Backup Type'''          || '''Frequency'''           || '''Retention''' ||
    996 || Full backup                || Daily                     || 30 days         ||
    997 || Incremental backup         || Every 15 minutes          || 7 days          ||
    998 || Point-in-time recovery     || Continuous WAL archiving  || 14 days         ||
    999 || Cross-region replication   || Real-time async           || Permanent       ||
    1000 
    1001  * '''RPO''' (Recovery Point Objective) = 15 minutes
    1002  * '''RTO''' (Recovery Time Objective) = 4 hours
    1003  * Quarterly disaster recovery simulation drills are mandatory
    1004 
    1005 ==== Backup Validation ====
    1006 
    1007 {{{
    1008 #!sql
    1009 -- After a test restore, verify row counts match the source database
    1010 SELECT
    1011     'wedding'    AS table_name, COUNT(*) AS row_count FROM wedding   UNION ALL
    1012 SELECT 'guest',                              COUNT(*) FROM guest      UNION ALL
    1013 SELECT 'payment',                            COUNT(*) FROM payment    UNION ALL
    1014 SELECT 'event',                              COUNT(*) FROM event
    1015 ORDER BY table_name;
    1016 }}}
    1017 
    1018 Compare these counts against the equivalent query run on the source database.
    1019 Any discrepancy indicates an incomplete or corrupted backup that must be
    1020 investigated before the restore can be considered valid.
    1021 
    1022 === 6.7 Audit Logging ===
    1023 
    1024 {{{
    1025 #!sql
    1026 -- Create an audit log table for sensitive data modifications
    1027 CREATE TABLE audit_log (
    1028     log_id        BIGSERIAL    PRIMARY KEY,
    1029     table_name    VARCHAR(100) NOT NULL,
    1030     operation     VARCHAR(10)  NOT NULL,  -- INSERT, UPDATE, DELETE
    1031     record_id     INTEGER      NOT NULL,
    1032     changed_by    VARCHAR(100) NOT NULL,
    1033     changed_at    TIMESTAMP    NOT NULL DEFAULT NOW(),
    1034     old_values    JSONB,
    1035     new_values    JSONB
    1036 );
    1037 
    1038 -- Trigger function to capture all changes to the guest table
    1039 CREATE OR REPLACE FUNCTION fn_audit_guest()
     1951}}}
     1952
     1953=== Example Trigger Function ===
     1954
     1955{{{
     1956#!sql
     1957CREATE OR REPLACE FUNCTION log_guest_changes()
     1958
    10401959RETURNS TRIGGER AS $$
     1960
    10411961BEGIN
     1962
    10421963    INSERT INTO audit_log (
    1043         table_name, operation, record_id,
    1044         changed_by, old_values, new_values
     1964        table_name,
     1965        operation_type,
     1966        changed_by,
     1967        affected_record_id
    10451968    )
     1969
    10461970    VALUES (
    1047         TG_TABLE_NAME,
     1971        'guest',
    10481972        TG_OP,
    1049         COALESCE(NEW.guest_id, OLD.guest_id),
    1050         current_user,
    1051         to_jsonb(OLD),
    1052         to_jsonb(NEW)
     1973        CURRENT_USER,
     1974        NEW.guest_id
    10531975    );
     1976
    10541977    RETURN NEW;
     1978
    10551979END;
     1980
    10561981$$ LANGUAGE plpgsql;
    1057 
    1058 CREATE TRIGGER trg_audit_guest
    1059 AFTER INSERT OR UPDATE OR DELETE ON guest
    1060 FOR EACH ROW EXECUTE FUNCTION fn_audit_guest();
    1061 }}}
    1062 
    1063 ==== Validation ====
    1064 
    1065 {{{
    1066 #!sql
    1067 -- Verify audit records are being created correctly
    1068 UPDATE guest SET dietary_preference = 'Vegan' WHERE guest_id = 55;
    1069 
    1070 SELECT table_name, operation, record_id, changed_by, changed_at, new_values
    1071 FROM audit_log
    1072 WHERE table_name = 'guest'
    1073   AND record_id  = 55
    1074 ORDER BY changed_at DESC
    1075 LIMIT 5;
    1076 }}}
    1077 
    1078 ----
    1079 
    1080 == 7. Monitoring, Observability & Alerting ==
    1081 
    1082 === 7.1 Key Database Metrics to Track ===
    1083 
    1084  * Query latency percentiles: P50, P95, P99
    1085  * Lock wait times and deadlock frequency
    1086  * Buffer cache hit ratio (target: > 99%)
    1087  * Slow query log (threshold: > 500ms)
    1088  * Index utilization ratios per table
    1089  * Connection pool saturation
    1090 
    1091 === 7.2 Useful Monitoring Queries ===
    1092 
    1093 {{{
    1094 #!sql
    1095 -- Identify the slowest queries currently running
    1096 SELECT
    1097     pid,
    1098     now() - pg_stat_activity.query_start AS duration,
    1099     query,
    1100     state
    1101 FROM pg_stat_activity
    1102 WHERE state  = 'active'
    1103   AND now() - pg_stat_activity.query_start > INTERVAL '5 seconds'
    1104 ORDER BY duration DESC;
    1105 
    1106 -- Check buffer cache hit ratio (should be > 99%)
    1107 SELECT
    1108     SUM(heap_blks_hit)  AS cache_hits,
    1109     SUM(heap_blks_read) AS disk_reads,
    1110     ROUND(
    1111         100.0 * SUM(heap_blks_hit)
    1112               / NULLIF(SUM(heap_blks_hit) + SUM(heap_blks_read), 0), 2
    1113     ) AS cache_hit_ratio
    1114 FROM pg_statio_user_tables;
    1115 
    1116 -- Identify unused indexes (candidates for removal)
    1117 SELECT
    1118     schemaname,
    1119     tablename,
    1120     indexname,
    1121     pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    1122     idx_scan
    1123 FROM pg_stat_user_indexes
    1124 WHERE idx_scan = 0
    1125 ORDER BY pg_relation_size(indexrelid) DESC;
    1126 }}}
    1127 
    1128 === 7.3 Alerting Rules ===
    1129 
    1130 Trigger alerts when:
    1131 
    1132  * Deadlock frequency exceeds 5 per minute
    1133  * Replication lag exceeds 30 seconds
    1134  * CPU utilization exceeds 80% for more than 2 minutes
    1135  * I/O queue depth exceeds defined threshold
    1136  * Buffer cache hit ratio drops below 95%
    1137 
    1138 ----
    1139 
    1140 
    1141 == 8. Conclusion ==
    1142 
    1143 Phase 9 elevates the Wedding Planner database into a high-performance, secure, and
    1144 scalable enterprise platform. By combining:
    1145 
    1146  * Targeted and well-maintained indexing (single-column, composite, partial, GIN, expression)
    1147  * Optimized JOIN ordering, early predicate filtering, and query rewrites
    1148  * Multi-tier caching and horizontal partitioning for large-scale data
    1149  * Strict, multi-layer security (encryption, RBAC, RLS, audit logging, parameterized queries)
    1150  * Full GDPR-compliant data protection practices
    1151  * Comprehensive performance analysis with `EXPLAIN ANALYZE` and real benchmarking
    1152 
    1153 The system achieves:
    1154 
    1155  * Fast, consistent record retrieval and report generation
    1156  * Secure and compliant handling of sensitive personal and financial data
    1157  * Stable and predictable performance at scale
    1158  * A solid, well-documented foundation for the enterprise-grade phases ahead
     1982}}}
     1983
     1984=== Example Trigger ===
     1985
     1986{{{
     1987#!sql
     1988CREATE TRIGGER trg_guest_audit
     1989
     1990AFTER INSERT OR UPDATE
     1991ON guest
     1992
     1993FOR EACH ROW
     1994
     1995EXECUTE FUNCTION log_guest_changes();
     1996}}}
     1997
     1998=== Validation ===
     1999
     2000{{{
     2001#!sql
     2002SELECT *
     2003FROM audit_log;
     2004}}}
     2005
     2006Expected result:
     2007* logged INSERT and UPDATE operations on guest records
     2008
     2009=== Security Benefits ===
     2010
     2011Audit logging improves:
     2012* monitoring of sensitive changes
     2013* accountability of database operations
     2014* recovery investigation
     2015* detection of suspicious activity
     2016
     2017It is especially important for:
     2018* guest modifications
     2019* RSVP changes
     2020* booking updates
     2021* attendance management
     2022
     2023== 7. Final Conclusions ==
     2024
     2025Phase 9 extends the Wedding Planner Management System with advanced database engineering concepts focused on:
     2026* performance analysis
     2027* query optimization
     2028* scalability
     2029* transaction management
     2030* security
     2031* analytical workload optimization
     2032
     2033Unlike previous phases that focused primarily on schema design and transactional functionality, this phase evaluates how the database behaves under complex analytical workloads generated by the Phase 6 reports.
     2034
     2035The analysis focused on:
     2036* Budget Analysis
     2037* Venue Capacity Utilization
     2038* RSVP Conversion Analysis
     2039
     2040These reports were analyzed using:
     2041* EXPLAIN ANALYZE
     2042* execution-plan interpretation
     2043* indexing strategies
     2044* aggregation analysis
     2045* scalability evaluation
     2046
     2047The identified performance bottlenecks include:
     2048* multiple LEFT JOIN operations
     2049* COUNT(DISTINCT ...) aggregation
     2050* GROUP BY operations
     2051* temporal calculations
     2052* analytical workload complexity
     2053
     2054The proposed indexing strategy improves:
     2055* JOIN efficiency
     2056* aggregation preparation
     2057* analytical reporting speed
     2058* scalability for larger datasets
     2059
     2060The phase also introduced:
     2061* materialized views
     2062* partitioning strategies
     2063* transaction isolation analysis
     2064* row-level locking
     2065* role-based access control
     2066* row-level security
     2067* encryption strategies
     2068* audit logging
     2069* backup and disaster recovery planning
     2070
     2071Together, these techniques transform the Wedding Planner Management System from a simple transactional database into a scalable analytical database platform capable of supporting:
     2072* operational reporting
     2073* performance monitoring
     2074* analytical decision-making
     2075* secure multi-user access
     2076
     2077== Final Technical Evaluation ==
     2078
     2079The implementation demonstrates:
     2080* advanced PostgreSQL usage
     2081* analytical SQL optimization
     2082* transaction-safe database operations
     2083* scalable reporting architecture
     2084* secure relational database design
     2085
     2086The Phase 6 analytical reports were successfully integrated into the Phase 9 performance analysis workflow, providing realistic optimization and scalability evaluation over the actual project workload.
     2087
     2088== Final Notes ==
     2089
     2090All optimization strategies, indexes, security mechanisms, and execution analyses were designed and validated using PostgreSQL 15.
     2091
     2092The implementation illustrates how modern relational database systems support both:
     2093* transactional processing
     2094* analytical business intelligence workloads
     2095
     2096within a unified Wedding Planner Management System.