| 8 | | Phase 9 elevates the Wedding Planner database into an enterprise-grade platform capable of supporting: |
| 9 | | |
| 10 | | * Thousands of concurrent users |
| 11 | | * Millions of transactional and analytical records |
| 12 | | * Sub-second response times for mission-critical workflows |
| 13 | | * Guaranteed data security, integrity, and regulatory compliance |
| 14 | | |
| 15 | | This phase delivers: |
| 16 | | |
| 17 | | * Advanced performance optimization (indexing, query tuning, caching, partitioning) |
| 18 | | * Comprehensive analysis of complex queries with real benchmarking data |
| 19 | | * System-wide data security protections |
| 20 | | * Recommendations for future scalability, automation, and observability |
| 21 | | |
| 22 | | ---- |
| 23 | | |
| 24 | | == 1. Performance Analysis of Complex Queries == |
| 25 | | |
| 26 | | This section evaluates high-complexity SQL workloads across the Booking, Client, |
| 27 | | Vendor, Payment, and Event Schedule domains. Benchmarks were conducted on a |
| 28 | | production-scale dataset (10M+ rows) using cost-based optimizer introspection, |
| 29 | | execution plan inspection, and concurrency stress tests. |
| 30 | | |
| 31 | | === 1.1 Workload Characterization === |
| 32 | | |
| 33 | | The top resource-consuming query types identified are: |
| 34 | | |
| 35 | | * '''Multi-table JOIN queries''' — involving Clients → Bookings → Venues → Vendors → Payments |
| 36 | | * '''Aggregation-heavy analytical queries''' — revenue forecasting, vendor performance, seasonal booking trends |
| 37 | | * '''Nested subqueries and correlated predicates''' — used in availability checks, resource allocation, and dynamic pricing |
| 38 | | * '''Complex filtering with low-selectivity predicates''' — primarily affecting searches on date ranges and venue capacities |
| 39 | | * '''Full-text search operations''' — on vendor descriptions, package details, and client notes |
| 40 | | |
| 41 | | === 1.2 Execution Plan Diagnostics === |
| 42 | | |
| 43 | | Execution plan inspection revealed several recurring patterns impacting performance: |
| 44 | | |
| 45 | | * Hash JOINs used by default when selective indexes are missing |
| 46 | | * Sequential scans on large tables due to low column selectivity |
| 47 | | * Repeated sort operations caused by the absence of composite indexes |
| 48 | | * Repeated function execution inside `WHERE` clauses preventing index usage |
| 49 | | * Overly granular nested-loop JOINs triggered by incorrect join-order estimates |
| 50 | | |
| 51 | | === 1.3 Bottleneck Summary === |
| 52 | | |
| 53 | | || '''Bottleneck Type''' || '''Impact''' || '''Root Cause''' || |
| 54 | | || High I/O during JOINs || Slow response times || Missing composite indexes, poor table clustering || |
| 55 | | || CPU spikes during aggregations || Concurrency collapse || No partial indexes or pre-aggregated materialized views || |
| 56 | | || Lock contention || User-facing delays || Unbounded long-running reporting queries || |
| 57 | | || Slow full-text search operations || Poor user experience || No GIN or full-text indexing || |
| 58 | | |
| 59 | | === 1.4 Benchmark Results === |
| 60 | | |
| 61 | | Benchmarked on 10M bookings, 5M payments, and 2M vendor records: |
| 62 | | |
| 63 | | || '''Metric''' || '''Value''' || |
| 64 | | || Pre-optimization P95 latency || 2.8 – 7.4 seconds || |
| 65 | | || Post-optimization P95 latency || 0.21 – 0.65 seconds || |
| 66 | | || Observed improvement || 89% – 96% (query dependent) || |
| 67 | | |
| 68 | | === 1.5 Complex Query Sample: Budget Analysis with Spend Tracking === |
| 69 | | |
| 70 | | This query joins five tables to generate a full budget and guest confirmation report per wedding: |
| 71 | | |
| 72 | | {{{ |
| 73 | | #!sql |
| 74 | | SELECT |
| 75 | | w.wedding_id, |
| 76 | | w.wedding_name, |
| 77 | | COALESCE(SUM(b.budget_amount), 0) AS total_budget, |
| 78 | | COALESCE(SUM(e.actual_cost), 0) AS total_spent, |
| 79 | | COUNT(DISTINCT ev.event_id) AS total_events, |
| 80 | | COUNT(DISTINCT CASE WHEN r.status = 'Accepted' |
| 81 | | THEN g.guest_id END) AS confirmed_guests, |
| 82 | | ROUND( |
| 83 | | 100.0 * COALESCE(SUM(e.actual_cost), 0) |
| 84 | | / NULLIF(SUM(b.budget_amount), 0), 2 |
| 85 | | ) AS spend_percentage |
| 86 | | FROM wedding w |
| 87 | | LEFT JOIN budget b ON w.wedding_id = b.wedding_id |
| 88 | | LEFT JOIN event ev ON w.wedding_id = ev.wedding_id |
| 89 | | AND ev.status != 'Cancelled' |
| 90 | | LEFT JOIN expense e ON ev.event_id = e.event_id |
| 91 | | AND e.status = 'Approved' |
| 92 | | LEFT JOIN guest g ON w.wedding_id = g.wedding_id |
| 93 | | LEFT JOIN event_rsvp r ON g.guest_id = r.guest_id |
| 94 | | GROUP BY w.wedding_id, w.wedding_name |
| 95 | | ORDER BY spend_percentage DESC; |
| 96 | | }}} |
| 97 | | |
| 98 | | ==== Explanation ==== |
| 99 | | |
| 100 | | * `COALESCE(..., 0)` — prevents NULL values from breaking aggregation when no expenses exist |
| 101 | | * `NULLIF(SUM(b.budget_amount), 0)` — prevents division-by-zero errors when no budget is defined |
| 102 | | * Filter `ev.status != 'Cancelled'` is pushed into the JOIN condition to reduce row cardinality early |
| 103 | | * Filter `e.status = 'Approved'` ensures only confirmed expenses are counted toward the total |
| 104 | | |
| 105 | | ==== Verification with EXPLAIN ANALYZE ==== |
| | 8 | Phase 9 focuses on database performance, optimization, scalability, and security for the Wedding Planner Management System. |
| | 9 | |
| | 10 | The primary focus of this phase is the execution analysis and optimization of the complex analytical queries implemented in Phase 6. |
| | 11 | |
| | 12 | The analyzed analytical reports are: |
| | 13 | * Budget vs Actual Expenditure Analysis |
| | 14 | * Venue Capacity Utilization Analysis |
| | 15 | * RSVP Conversion Rate Analysis |
| | 16 | |
| | 17 | This phase includes: |
| | 18 | * EXPLAIN ANALYZE-based query analysis |
| | 19 | * identification of execution bottlenecks |
| | 20 | * indexing and optimization strategies |
| | 21 | * materialized view and partitioning recommendations |
| | 22 | * transaction and locking analysis |
| | 23 | * database security mechanisms |
| | 24 | * backup, audit logging, and data protection strategies |
| | 25 | |
| | 26 | The implementation demonstrates how PostgreSQL can support both transactional processing and analytical workloads within a scalable Wedding Planner Management System. |
| | 27 | |
| | 28 | == 1. Performance Analysis of Phase 6 Queries == |
| | 29 | |
| | 30 | == 1.1 Query Workload Characteristics == |
| | 31 | |
| | 32 | The analytical queries from Phase 6 are reporting-oriented workloads. |
| | 33 | |
| | 34 | These queries are significantly more expensive than standard CRUD operations because they combine data from multiple related tables before generating aggregated analytical metrics. |
| | 35 | |
| | 36 | || Query || Main Operations || Potential Bottleneck || |
| | 37 | || Budget Analysis || Multiple LEFT JOIN operations, SUM aggregation, temporal calculations || Aggregation and repeated booking joins || |
| | 38 | || Venue Capacity || JOINs, COUNT(DISTINCT), CASE categorization || Attendance aggregation and GROUP BY || |
| | 39 | || RSVP Conversion || Multiple LEFT JOINs, COUNT(DISTINCT), conditional aggregation || DISTINCT counting and aggregation || |
| | 40 | |
| | 41 | The most expensive operations identified are: |
| | 42 | * COUNT(DISTINCT ...) |
| | 43 | * GROUP BY aggregation |
| | 44 | * LEFT JOIN chains |
| | 45 | * temporal cost calculations |
| | 46 | * conditional CASE calculations |
| | 47 | |
| | 48 | == 1.2 General Execution Plan Expectations == |
| | 49 | |
| | 50 | For the Phase 6 analytical reports, PostgreSQL is expected to use: |
| | 51 | * Sequential Scan |
| | 52 | * Index Scan |
| | 53 | * Hash Join |
| | 54 | * HashAggregate |
| | 55 | * Sort |
| | 56 | |
| | 57 | The selected execution strategy depends on: |
| | 58 | * table size |
| | 59 | * index availability |
| | 60 | * row selectivity |
| | 61 | * join cardinality |
| | 62 | * aggregation complexity |
| | 63 | |
| | 64 | The most important optimization factors are: |
| | 65 | * indexes on foreign-key columns |
| | 66 | * indexes on status columns |
| | 67 | * optimization of GROUP BY operations |
| | 68 | * reduction of unnecessary sequential scans |
| | 69 | * efficient JOIN ordering |
| | 70 | |
| | 71 | == 1.3 Budget Analysis Query – Execution Analysis == |
| | 72 | |
| | 73 | The Budget Analysis query evaluates the financial relationship between the planned wedding budget and the actual expenses generated by venue, photographer, and band bookings. |
| | 74 | |
| | 75 | This query is computationally expensive because it combines multiple booking-related tables and performs aggregation and temporal cost calculations. |
| | 76 | |
| | 77 | === Query Characteristics === |
| | 78 | |
| | 79 | The query includes: |
| | 80 | * multiple LEFT JOIN operations |
| | 81 | * SUM() aggregation |
| | 82 | * temporal calculations using EXTRACT(EPOCH) |
| | 83 | * GROUP BY aggregation |
| | 84 | * COALESCE() handling of NULL values |
| | 85 | |
| | 86 | The query combines: |
| | 87 | * wedding |
| | 88 | * user |
| | 89 | * venue_booking |
| | 90 | * photographer_booking |
| | 91 | * photographer |
| | 92 | * band_booking |
| | 93 | * band |
| | 94 | |
| | 95 | === Performance-Sensitive Operations === |
| | 96 | |
| | 97 | The most expensive operations identified are: |
| | 98 | |
| | 99 | * Multiple LEFT JOIN operations: |
| | 100 | * all weddings must remain in the result set even when certain bookings do not exist |
| | 101 | |
| | 102 | * Aggregation: |
| | 103 | * SUM() calculations process multiple booking records per wedding |
| | 104 | |
| | 105 | * Temporal Calculations: |
| | 106 | * EXTRACT(EPOCH FROM (...)) converts booking durations into hours |
| | 107 | |
| | 108 | * GROUP BY: |
| | 109 | * aggregation requires grouping all joined rows by wedding attributes |
| | 110 | |
| | 111 | === EXPLAIN ANALYZE === |
| 112 | | w.wedding_name, |
| 113 | | COALESCE(SUM(b.budget_amount), 0) AS total_budget, |
| 114 | | COALESCE(SUM(e.actual_cost), 0) AS total_spent, |
| 115 | | COUNT(DISTINCT ev.event_id) AS total_events, |
| 116 | | COUNT(DISTINCT CASE WHEN r.status = 'Accepted' |
| 117 | | THEN g.guest_id END) AS confirmed_guests, |
| 118 | | ROUND( |
| 119 | | 100.0 * COALESCE(SUM(e.actual_cost), 0) |
| 120 | | / NULLIF(SUM(b.budget_amount), 0), 2 |
| 121 | | ) AS spend_percentage |
| 122 | | FROM wedding w |
| 123 | | LEFT JOIN budget b ON w.wedding_id = b.wedding_id |
| 124 | | LEFT JOIN event ev ON w.wedding_id = ev.wedding_id |
| 125 | | AND ev.status != 'Cancelled' |
| 126 | | LEFT JOIN expense e ON ev.event_id = e.event_id |
| 127 | | AND e.status = 'Approved' |
| 128 | | LEFT JOIN guest g ON w.wedding_id = g.wedding_id |
| 129 | | LEFT JOIN event_rsvp r ON g.guest_id = r.guest_id |
| 130 | | GROUP BY w.wedding_id, w.wedding_name |
| 131 | | ORDER BY spend_percentage DESC; |
| 132 | | }}} |
| 133 | | |
| 134 | | Expected output (with indexes applied): |
| 135 | | |
| 136 | | {{{ |
| 137 | | HashAggregate (cost=4821.30..4823.45 rows=215 width=72) |
| 138 | | (actual time=143.221..143.598 rows=215 loops=1) |
| 139 | | -> Hash Left Join (cost=... actual time=12.4..98.7 rows=51420 loops=1) |
| 140 | | Hash Cond: (g.guest_id = r.guest_id) |
| 141 | | -> Index Scan using idx_guest_wedding on guest g |
| 142 | | (actual time=0.031..4.812 rows=12500 loops=1) |
| 143 | | Planning Time: 3.4 ms |
| 144 | | Execution Time: 143.9 ms |
| 145 | | }}} |
| 146 | | |
| 147 | | ==== Validation ==== |
| 148 | | |
| 149 | | {{{ |
| 150 | | #!sql |
| 151 | | -- Validate that no wedding has a spend_percentage above 100% without a reason |
| 152 | | SELECT wedding_id, spend_percentage |
| 153 | | FROM ( |
| 154 | | SELECT |
| 155 | | w.wedding_id, |
| 156 | | ROUND( |
| 157 | | 100.0 * COALESCE(SUM(e.actual_cost), 0) |
| 158 | | / NULLIF(SUM(b.budget_amount), 0), 2 |
| 159 | | ) AS spend_percentage |
| 160 | | FROM wedding w |
| 161 | | LEFT JOIN budget b ON w.wedding_id = b.wedding_id |
| 162 | | LEFT JOIN event ev ON w.wedding_id = ev.wedding_id |
| 163 | | LEFT JOIN expense e ON ev.event_id = e.event_id |
| 164 | | AND e.status = 'Approved' |
| 165 | | GROUP BY w.wedding_id |
| 166 | | ) sub |
| 167 | | WHERE spend_percentage > 100 |
| 168 | | ORDER BY spend_percentage DESC; |
| 169 | | }}} |
| 170 | | |
| 171 | | This validation detects weddings exceeding their planned budget, which may indicate |
| 172 | | data entry errors or unapproved expenditures that require review. |
| 173 | | |
| 174 | | === 1.6 Recommended Query Optimization Techniques === |
| 175 | | |
| 176 | | * Rewrite correlated subqueries as explicit JOINs or CTEs where possible |
| 177 | | * Enforce predicate pushdown and index-friendly expressions |
| 178 | | * Replace expensive full scans with materialized views for analytical workloads |
| 179 | | * Introduce result caching for deterministic queries (e.g., venue availability lookups) |
| 180 | | * Use `CUBE` or `ROLLUP` for multi-dimensional reporting queries |
| 181 | | |
| 182 | | ---- |
| 183 | | |
| 184 | | == 2. Indexing Strategy & Optimization Framework == |
| 185 | | |
| 186 | | A well-designed indexing strategy is essential for maintaining predictable performance |
| 187 | | under high data volumes and concurrent user load. |
| 188 | | |
| 189 | | === 2.1 Current Index Landscape === |
| 190 | | |
| 191 | | A schema audit of the existing system revealed: |
| 192 | | |
| 193 | | * Correct primary keys defined on all major tables |
| 194 | | * Partial and inconsistent foreign-key indexing |
| 195 | | * No composite indexes covering common JOIN paths |
| 196 | | * No GIN or full-text indexes |
| 197 | | * No partitioning-aware indexes |
| 198 | | * No expression-based indexes |
| 199 | | |
| 200 | | === 2.2 Index Selectivity & Cardinality Analysis === |
| 201 | | |
| 202 | | || '''Column''' || '''Cardinality''' || '''Index Candidate?''' || |
| 203 | | || `BookingDate` || High || Yes — B-tree || |
| 204 | | || `EventType` || High || Yes — composite || |
| 205 | | || `VenueID` || High || Yes — composite || |
| 206 | | || `ClientID` || High || Yes — composite || |
| 207 | | || `PaymentStatus` || High || Yes — partial || |
| 208 | | || `State` || Low || No — poor selectivity || |
| 209 | | || `PackageType` || Low || No — poor selectivity || |
| 210 | | || `Category` || Low || No — standalone B-tree ineffective || |
| 211 | | |
| 212 | | === 2.3 Recommended Index Types & Structures === |
| 213 | | |
| 214 | | ==== 2.3.1 Composite B-tree Indexes ==== |
| 215 | | |
| 216 | | {{{ |
| 217 | | #!sql |
| 218 | | -- Optimizes client-specific booking history queries |
| 219 | | CREATE INDEX idx_client_booking_date |
| 220 | | ON booking(client_id, booking_date); |
| 221 | | |
| 222 | | -- Optimizes venue availability queries sorted by event date |
| 223 | | CREATE INDEX idx_venue_event_date |
| 224 | | ON booking(venue_id, event_date); |
| 225 | | |
| 226 | | -- Optimizes vendor service filtering and lookup |
| 227 | | CREATE INDEX idx_vendor_service_category |
| 228 | | ON vendor(vendor_id, service_category); |
| 229 | | |
| 230 | | -- Optimizes payment reconciliation reports |
| 231 | | CREATE INDEX idx_payment_status_date |
| 232 | | ON payment(payment_status, payment_date); |
| 233 | | |
| 234 | | -- Optimizes multi-dimensional event search |
| 235 | | CREATE INDEX idx_event_type_region_date |
| 236 | | ON event(event_type, region, event_date); |
| 237 | | }}} |
| 238 | | |
| 239 | | ==== Explanation ==== |
| 240 | | |
| 241 | | Composite indexes reduce sorting costs and accelerate JOINs by aligning the index |
| 242 | | structure with the actual column access patterns in real queries. Column order |
| 243 | | matters: the most selective or most frequently filtered column should come first. |
| 244 | | |
| 245 | | ==== Verification ==== |
| 246 | | |
| 247 | | {{{ |
| 248 | | #!sql |
| 249 | | -- Verify the index is being used by the query planner |
| 250 | | EXPLAIN ANALYZE |
| 251 | | SELECT booking_id, booking_date |
| 252 | | FROM booking |
| 253 | | WHERE client_id = 42 |
| 254 | | AND booking_date >= '2025-01-01' |
| 255 | | ORDER BY booking_date; |
| 256 | | }}} |
| 257 | | |
| 258 | | Expected result: `Index Scan using idx_client_booking_date on booking` |
| 259 | | |
| 260 | | ==== Validation ==== |
| 261 | | |
| 262 | | {{{ |
| 263 | | #!sql |
| 264 | | -- Confirm index exists and is not bloated |
| 265 | | SELECT |
| 266 | | indexname, |
| 267 | | pg_size_pretty(pg_relation_size(indexrelid)) AS index_size, |
| 268 | | idx_scan, |
| 269 | | idx_tup_read, |
| 270 | | idx_tup_fetch |
| 271 | | FROM pg_stat_user_indexes |
| 272 | | WHERE indexname = 'idx_client_booking_date'; |
| 273 | | }}} |
| 274 | | |
| 275 | | A healthy index should show a non-zero `idx_scan` value within 24 hours of production |
| 276 | | traffic. An `idx_scan` of 0 after several days indicates the index is unused and |
| 277 | | should be dropped. |
| 278 | | |
| 279 | | ---- |
| 280 | | |
| 281 | | ==== 2.3.2 Partial Indexes ==== |
| 282 | | |
| 283 | | {{{ |
| 284 | | #!sql |
| 285 | | -- Index only active vendors — reduces index size significantly |
| 286 | | CREATE INDEX idx_vendor_active |
| 287 | | ON vendor(vendor_id, service_category) |
| 288 | | WHERE status = 'Active'; |
| 289 | | |
| 290 | | -- Index only pending payments — avoids indexing historical data |
| 291 | | CREATE INDEX idx_payment_pending |
| 292 | | ON payment(payment_date, client_id) |
| 293 | | WHERE payment_status = 'Pending'; |
| 294 | | |
| 295 | | -- Index only bookings within the current fiscal year |
| 296 | | CREATE INDEX idx_booking_current_year |
| 297 | | ON booking(client_id, event_date) |
| 298 | | WHERE event_date >= DATE_TRUNC('year', CURRENT_DATE); |
| 299 | | }}} |
| 300 | | |
| 301 | | ==== Explanation ==== |
| 302 | | |
| 303 | | Partial indexes cover only the rows that satisfy a specific WHERE condition. |
| 304 | | This reduces both the index size and the I/O cost of index maintenance, making |
| 305 | | them ideal for tables where only a fraction of rows are operationally active at |
| 306 | | any given time. |
| 307 | | |
| 308 | | ==== Verification ==== |
| 309 | | |
| 310 | | {{{ |
| 311 | | #!sql |
| 312 | | EXPLAIN ANALYZE |
| 313 | | SELECT vendor_id, service_category |
| 314 | | FROM vendor |
| 315 | | WHERE status = 'Active' |
| 316 | | AND service_category = 'Catering'; |
| 317 | | }}} |
| 318 | | |
| 319 | | Expected result: `Index Scan using idx_vendor_active on vendor` |
| 320 | | |
| 321 | | ==== Validation ==== |
| 322 | | |
| 323 | | {{{ |
| 324 | | #!sql |
| 325 | | -- Confirm that the partial index covers fewer rows than the full table |
| 326 | | SELECT |
| 327 | | schemaname, |
| 328 | | tablename, |
| 329 | | indexname, |
| 330 | | pg_size_pretty(pg_relation_size(indexrelid)) AS index_size |
| 331 | | FROM pg_stat_user_indexes |
| 332 | | WHERE indexname IN ( |
| 333 | | 'idx_vendor_active', |
| 334 | | 'idx_payment_pending', |
| 335 | | 'idx_booking_current_year' |
| 336 | | ) |
| 337 | | ORDER BY pg_relation_size(indexrelid) DESC; |
| 338 | | }}} |
| 339 | | |
| 340 | | ---- |
| 341 | | |
| 342 | | ==== 2.3.3 Full-Text Search Indexing (GIN) ==== |
| 343 | | |
| 344 | | {{{ |
| 345 | | #!sql |
| 346 | | -- Add a tsvector column for full-text search on vendor descriptions |
| 347 | | ALTER TABLE vendor |
| 348 | | ADD COLUMN description_tsv tsvector |
| 349 | | GENERATED ALWAYS AS ( |
| 350 | | to_tsvector('english', COALESCE(description, '')) |
| 351 | | ) STORED; |
| 352 | | |
| 353 | | -- Create a GIN index on the generated column |
| 354 | | CREATE INDEX idx_vendor_description_gin |
| 355 | | ON vendor USING GIN(description_tsv); |
| 356 | | }}} |
| 357 | | |
| 358 | | ==== Explanation ==== |
| 359 | | |
| 360 | | GIN (Generalized Inverted Index) indexes are specifically designed for full-text |
| 361 | | search workloads. They store a mapping from each unique lexeme (word) to the rows |
| 362 | | that contain it, enabling full-text queries to execute in milliseconds rather |
| 363 | | than scanning entire columns. |
| 364 | | |
| 365 | | ==== Usage Sample ==== |
| 366 | | |
| 367 | | {{{ |
| 368 | | #!sql |
| 369 | | -- Search for vendors offering floral or decoration services |
| 370 | | SELECT vendor_id, name, description |
| 371 | | FROM vendor |
| 372 | | WHERE description_tsv @@ to_tsquery('english', 'floral | decoration') |
| 373 | | ORDER BY ts_rank(description_tsv, to_tsquery('english', 'floral | decoration')) DESC |
| 374 | | LIMIT 20; |
| 375 | | }}} |
| 376 | | |
| 377 | | ==== Verification ==== |
| 378 | | |
| 379 | | {{{ |
| 380 | | #!sql |
| 381 | | EXPLAIN ANALYZE |
| 382 | | SELECT vendor_id, name |
| 383 | | FROM vendor |
| 384 | | WHERE description_tsv @@ to_tsquery('english', 'floral | decoration'); |
| 385 | | }}} |
| 386 | | |
| 387 | | Expected result: `Bitmap Index Scan using idx_vendor_description_gin on vendor` |
| 388 | | |
| 389 | | ==== Validation ==== |
| 390 | | |
| 391 | | {{{ |
| 392 | | #!sql |
| 393 | | -- Confirm the tsvector column is populated correctly |
| 394 | | SELECT vendor_id, description_tsv |
| 395 | | FROM vendor |
| 396 | | WHERE description ILIKE '%floral%' |
| 397 | | LIMIT 5; |
| 398 | | }}} |
| 399 | | |
| 400 | | All returned rows should have the lexeme `floral` present in the `description_tsv` column. |
| 401 | | |
| 402 | | ---- |
| 403 | | |
| 404 | | ==== 2.3.4 Expression Indexes ==== |
| 405 | | |
| 406 | | {{{ |
| 407 | | #!sql |
| 408 | | -- Allows case-insensitive email lookup without full table scans |
| 409 | | CREATE INDEX idx_client_email_lower |
| 410 | | ON client(LOWER(email)); |
| 411 | | |
| 412 | | -- Allows date-only filtering on a timestamp column |
| 413 | | CREATE INDEX idx_booking_date_only |
| 414 | | ON booking(DATE(booking_date)); |
| 415 | | |
| 416 | | -- Handles NULL fallback contact lookups efficiently |
| 417 | | CREATE INDEX idx_contact_coalesce |
| 418 | | ON client(COALESCE(alternate_contact, primary_contact)); |
| 419 | | }}} |
| 420 | | |
| 421 | | ==== Explanation ==== |
| 422 | | |
| 423 | | Expression indexes store the result of a computed expression rather than a raw |
| 424 | | column value. They allow the query planner to use the index when the same |
| 425 | | expression appears in a `WHERE` clause, eliminating the need for function-level |
| 426 | | full scans. |
| 427 | | |
| 428 | | ==== Verification ==== |
| 429 | | |
| 430 | | {{{ |
| 431 | | #!sql |
| 432 | | -- Case-insensitive email search — should use the expression index |
| 433 | | EXPLAIN ANALYZE |
| 434 | | SELECT client_id, name |
| 435 | | FROM client |
| 436 | | WHERE LOWER(email) = 'ivan@example.com'; |
| 437 | | }}} |
| 438 | | |
| 439 | | Expected result: `Index Scan using idx_client_email_lower on client` |
| 440 | | |
| 441 | | ==== Validation ==== |
| 442 | | |
| 443 | | {{{ |
| 444 | | #!sql |
| 445 | | -- Confirm no duplicate emails exist after normalization |
| 446 | | SELECT LOWER(email) AS normalized_email, COUNT(*) AS occurrences |
| 447 | | FROM client |
| 448 | | GROUP BY LOWER(email) |
| 449 | | HAVING COUNT(*) > 1 |
| 450 | | ORDER BY occurrences DESC; |
| 451 | | }}} |
| 452 | | |
| 453 | | The result set should be empty. Any returned rows indicate duplicate email |
| 454 | | registrations that must be investigated and resolved. |
| 455 | | |
| 456 | | === 2.4 Automatic Index Maintenance Framework === |
| 457 | | |
| 458 | | Implement the following maintenance procedures: |
| 459 | | |
| 460 | | * Scheduled index bloat detection using `pg_stat_user_indexes` and `pg_relation_size` |
| 461 | | * Automated `REINDEX CONCURRENTLY` based on fragmentation thresholds |
| 462 | | * Usage tracking to identify and drop unused indexes (`idx_scan = 0`) |
| 463 | | * Heatmap-based index popularity analytics for DBA review dashboards |
| 464 | | |
| 465 | | === 2.5 Index Implementation Priority === |
| 466 | | |
| 467 | | || '''Index Name''' || '''Table / Columns''' || '''Type''' || '''Priority''' || |
| 468 | | || `idx_guest_wedding` || `guest(wedding_id)` || Single-column || CRITICAL || |
| 469 | | || `idx_event_active_timeline` || `event(wedding_id, date, start_time)` || Composite + Partial || CRITICAL || |
| 470 | | || `idx_guest_qr` || `guest(qr_code)` || Unique || CRITICAL || |
| 471 | | || `idx_rsvp_guest` || `event_rsvp(guest_id, status)` || Composite || HIGH || |
| 472 | | || `idx_budget_wedding` || `budget(wedding_id)` || Single-column || HIGH || |
| 473 | | || `idx_vendor_description_gin` || `vendor(description_tsv)` || GIN / Full-text || HIGH || |
| 474 | | || `idx_client_email_lower` || `client(LOWER(email))` || Expression || MEDIUM || |
| 475 | | || `idx_booking_current_year` || `booking(client_id, event_date)` || Partial || MEDIUM || |
| 476 | | |
| 477 | | ---- |
| 478 | | |
| 479 | | == 3. Caching, Partitioning & Storage Optimization == |
| 480 | | |
| 481 | | === 3.1 Caching Layer === |
| 482 | | |
| 483 | | Establish a multi-tier caching architecture: |
| 484 | | |
| 485 | | * '''Application-level cache''' (Redis / Memcached) — venue availability, vendor listings, package catalogs |
| 486 | | * '''Database query result cache''' — read-heavy dashboards with low update frequency |
| 487 | | * '''Materialized views''' — pre-aggregated financial and booking metrics refreshed on schedule |
| 488 | | |
| 489 | | ==== Sample: Materialized View for Monthly Revenue ==== |
| 490 | | |
| 491 | | {{{ |
| 492 | | #!sql |
| 493 | | CREATE MATERIALIZED VIEW mv_monthly_revenue AS |
| 494 | | SELECT |
| 495 | | DATE_TRUNC('month', p.payment_date) AS revenue_month, |
| 496 | | v.service_category, |
| 497 | | COUNT(p.payment_id) AS total_payments, |
| 498 | | SUM(p.amount) AS total_revenue, |
| 499 | | AVG(p.amount) AS average_payment |
| 500 | | FROM payment p |
| 501 | | JOIN booking b ON p.booking_id = b.booking_id |
| 502 | | JOIN vendor v ON b.vendor_id = v.vendor_id |
| 503 | | WHERE p.payment_status = 'Completed' |
| 504 | | GROUP BY DATE_TRUNC('month', p.payment_date), v.service_category |
| 505 | | ORDER BY revenue_month DESC; |
| 506 | | |
| 507 | | -- Create an index on the materialized view for fast dashboard queries |
| 508 | | CREATE INDEX idx_mv_revenue_month |
| 509 | | ON mv_monthly_revenue(revenue_month, service_category); |
| 510 | | }}} |
| 511 | | |
| 512 | | ==== Refreshing the Materialized View ==== |
| 513 | | |
| 514 | | {{{ |
| 515 | | #!sql |
| 516 | | -- Refresh without locking reads (safe for production) |
| 517 | | REFRESH MATERIALIZED VIEW CONCURRENTLY mv_monthly_revenue; |
| 518 | | }}} |
| 519 | | |
| 520 | | ==== Validation ==== |
| 521 | | |
| 522 | | {{{ |
| 523 | | #!sql |
| 524 | | -- Confirm data freshness after refresh |
| 525 | | SELECT MAX(revenue_month) AS last_updated_month |
| 526 | | FROM mv_monthly_revenue; |
| 527 | | }}} |
| 528 | | |
| 529 | | The result should reflect the most recently completed calendar month. If it is |
| 530 | | more than one month behind, review the scheduled refresh job. |
| 531 | | |
| 532 | | === 3.2 Horizontal Partitioning === |
| 533 | | |
| 534 | | Partition large tables — `booking`, `payment`, `vendor` — by the most common |
| 535 | | access dimension: |
| 536 | | |
| 537 | | {{{ |
| 538 | | #!sql |
| 539 | | -- Partition the booking table by year and quarter |
| 540 | | CREATE TABLE booking ( |
| 541 | | booking_id SERIAL, |
| 542 | | client_id INTEGER NOT NULL, |
| 543 | | venue_id INTEGER NOT NULL, |
| 544 | | booking_date TIMESTAMP NOT NULL, |
| 545 | | event_date DATE NOT NULL, |
| 546 | | status VARCHAR(20) NOT NULL, |
| 547 | | total_cost NUMERIC(12,2), |
| 548 | | PRIMARY KEY (booking_id, event_date) |
| 549 | | ) PARTITION BY RANGE (event_date); |
| 550 | | |
| 551 | | CREATE TABLE booking_2024_q1 PARTITION OF booking |
| 552 | | FOR VALUES FROM ('2024-01-01') TO ('2024-04-01'); |
| 553 | | |
| 554 | | CREATE TABLE booking_2024_q2 PARTITION OF booking |
| 555 | | FOR VALUES FROM ('2024-04-01') TO ('2024-07-01'); |
| 556 | | |
| 557 | | CREATE TABLE booking_2025_q1 PARTITION OF booking |
| 558 | | FOR VALUES FROM ('2025-01-01') TO ('2025-04-01'); |
| 559 | | }}} |
| 560 | | |
| 561 | | ==== Explanation ==== |
| 562 | | |
| 563 | | Partitioning allows PostgreSQL to skip irrelevant partitions entirely (partition |
| 564 | | pruning), dramatically reducing I/O for date-range queries. Each partition can |
| 565 | | also be independently vacuumed, archived, or dropped without affecting others. |
| 566 | | |
| 567 | | ==== Verification ==== |
| 568 | | |
| 569 | | {{{ |
| 570 | | #!sql |
| 571 | | -- Confirm partition pruning is active for a date-range query |
| 572 | | EXPLAIN ANALYZE |
| 573 | | SELECT booking_id, client_id, event_date |
| 574 | | FROM booking |
| 575 | | WHERE event_date BETWEEN '2025-01-01' AND '2025-03-31'; |
| 576 | | }}} |
| 577 | | |
| 578 | | Expected result: Only `booking_2025_q1` appears in the plan. All other partitions |
| 579 | | should be marked as pruned. |
| 580 | | |
| 581 | | ==== Validation ==== |
| 582 | | |
| 583 | | {{{ |
| 584 | | #!sql |
| 585 | | -- Confirm row distribution across partitions |
| 586 | | SELECT |
| 587 | | tableoid::regclass AS partition_name, |
| 588 | | COUNT(*) AS row_count |
| 589 | | FROM booking |
| 590 | | GROUP BY tableoid |
| 591 | | ORDER BY partition_name; |
| 592 | | }}} |
| 593 | | |
| 594 | | === 3.3 Storage-Level Optimizations === |
| 595 | | |
| 596 | | * Enable column compression for archival partitions (`TOAST` settings) |
| 597 | | * Tune WAL settings (`wal_buffers`, `checkpoint_completion_target`) for high-insert workloads |
| 598 | | * Separate I/O tiers: NVMe for hot partitions, SSD for warm data, object storage for cold archives |
| 599 | | |
| 600 | | ---- |
| 601 | | |
| 602 | | == 4. Concurrency, Transaction Management & Locking Strategy == |
| 603 | | |
| 604 | | === 4.1 Transaction Isolation Levels === |
| 605 | | |
| 606 | | || '''Isolation Level''' || '''Recommended Use Case''' || |
| 607 | | || `READ COMMITTED` || Standard read and write operations (default) || |
| 608 | | || `REPEATABLE READ` || Financial reconciliation, double-booking prevention || |
| 609 | | || `SERIALIZABLE` || Business-critical workflows requiring strict consistency || |
| 610 | | |
| 611 | | === 4.2 Deadlock Prevention === |
| 612 | | |
| 613 | | Implement the following practices to eliminate deadlock risk: |
| 614 | | |
| 615 | | * Enforce a consistent ordering of table modifications across all transactions |
| 616 | | * Keep transactions as short-lived as possible |
| 617 | | * Ensure indexes exist on all foreign-key columns to prevent lock escalation |
| 618 | | * Route batch and reporting jobs through a dedicated queue or job scheduler |
| 619 | | |
| 620 | | ==== Sample: Safe Atomic Booking Reservation ==== |
| 621 | | |
| 622 | | {{{ |
| 623 | | #!sql |
| 624 | | BEGIN; |
| 625 | | |
| 626 | | -- Lock the venue row to prevent double-booking |
| 627 | | SELECT venue_id, capacity, status |
| 628 | | FROM venue |
| 629 | | WHERE venue_id = 12 |
| 630 | | FOR UPDATE; |
| 631 | | |
| 632 | | -- Verify availability before inserting |
| 633 | | INSERT INTO booking (client_id, venue_id, event_date, status, total_cost) |
| 634 | | SELECT 88, 12, '2025-09-14', 'Confirmed', 4500.00 |
| 635 | | WHERE NOT EXISTS ( |
| 636 | | SELECT 1 |
| 637 | | FROM booking |
| 638 | | WHERE venue_id = 12 |
| 639 | | AND event_date = '2025-09-14' |
| 640 | | AND status != 'Cancelled' |
| 641 | | ); |
| 642 | | |
| 643 | | COMMIT; |
| 644 | | }}} |
| 645 | | |
| 646 | | ==== Explanation ==== |
| 647 | | |
| 648 | | `SELECT ... FOR UPDATE` acquires an exclusive row-level lock on the venue record, |
| 649 | | preventing any concurrent transaction from modifying or double-booking the same |
| 650 | | venue for the same date until this transaction commits or rolls back. |
| 651 | | |
| 652 | | ==== Validation ==== |
| 653 | | |
| 654 | | {{{ |
| 655 | | #!sql |
| 656 | | -- Confirm no double-bookings exist for the same venue and date |
| 657 | | SELECT venue_id, event_date, COUNT(*) AS booking_count |
| 658 | | FROM booking |
| 659 | | WHERE status != 'Cancelled' |
| 660 | | GROUP BY venue_id, event_date |
| 661 | | HAVING COUNT(*) > 1 |
| 662 | | ORDER BY booking_count DESC; |
| 663 | | }}} |
| 664 | | |
| 665 | | The result set must always be empty in a correctly operating system. Any returned |
| 666 | | rows represent a critical data integrity violation requiring immediate investigation. |
| 667 | | |
| 668 | | === 4.3 Row-Level vs Advisory Locks === |
| 669 | | |
| 670 | | * Use '''row-level locks''' (`FOR UPDATE`, `FOR SHARE`) for booking atomicity and inventory management |
| 671 | | * Use '''advisory locks''' (`pg_try_advisory_lock`) for complex multi-step workflows such as payment batching |
| 672 | | |
| 673 | | ---- |
| 674 | | |
| 675 | | == 5. Performance Analysis with EXPLAIN / EXPLAIN ANALYZE == |
| 676 | | |
| 677 | | === 5.1 Purpose === |
| 678 | | |
| 679 | | || '''Command''' || '''Behavior''' || |
| 680 | | || `EXPLAIN` || Shows the execution plan without running the query || |
| 681 | | || `EXPLAIN ANALYZE` || Executes the query and shows real timing, row counts, and loop data || |
| 682 | | || `EXPLAIN (BUFFERS)` || Also reports cache hit / miss ratios for each plan node || |
| 683 | | |
| 684 | | === 5.2 Scan Type Reference === |
| 685 | | |
| 686 | | || '''Scan Type''' || '''When It Occurs''' || |
| 687 | | || Seq Scan || No usable index; entire table is read || |
| 688 | | || Index Scan || Index narrows row set; table is accessed for full row data || |
| 689 | | || Bitmap Index Scan || Multiple index results are combined before table access || |
| 690 | | || Index Only Scan || All required columns exist in the index; table is not read || |
| 691 | | |
| 692 | | === 5.3 Sample 1: Sequential Scan Without an Index === |
| 693 | | |
| 694 | | {{{ |
| 695 | | #!sql |
| 696 | | EXPLAIN |
| 697 | | SELECT * |
| 698 | | FROM event |
| 699 | | WHERE wedding_id = 3 |
| 700 | | AND status IN ('Scheduled', 'Confirmed'); |
| 701 | | }}} |
| 702 | | |
| 703 | | Typical output before optimization: |
| 704 | | |
| 705 | | {{{ |
| 706 | | Seq Scan on event (cost=0.00..4821.00 rows=12 width=96) |
| 707 | | Filter: ((wedding_id = 3) AND (status = ANY ('{Scheduled,Confirmed}'::text[]))) |
| 708 | | }}} |
| 709 | | |
| 710 | | ==== Diagnosis ==== |
| 711 | | |
| 712 | | A `Seq Scan` on a large `event` table indicates no index exists to support the |
| 713 | | predicate. With millions of rows, this translates directly to high I/O and |
| 714 | | slow response times. A composite partial index is required. |
| 715 | | |
| 716 | | === 5.4 Sample 2: Index Scan After Creating a Partial Composite Index === |
| 717 | | |
| 718 | | {{{ |
| 719 | | #!sql |
| 720 | | -- Create the partial composite index |
| 721 | | CREATE INDEX idx_event_active_timeline |
| 722 | | ON event(wedding_id, date, start_time) |
| 723 | | WHERE status IN ('Scheduled', 'Confirmed'); |
| 724 | | |
| 725 | | -- Re-run EXPLAIN ANALYZE after index creation |
| | 118 | |
| | 119 | u.first_name || ' ' || u.last_name |
| | 120 | AS organizer_name, |
| | 121 | |
| | 122 | w.date, |
| | 123 | |
| | 124 | w.budget, |
| | 125 | |
| | 126 | COALESCE(SUM(vb.price), 0) |
| | 127 | AS venue_cost, |
| | 128 | |
| | 129 | COALESCE(SUM( |
| | 130 | EXTRACT(EPOCH FROM (pb.end_time - pb.start_time))/3600 |
| | 131 | * p.price_per_hour |
| | 132 | ), 0) AS photographer_cost, |
| | 133 | |
| | 134 | COALESCE(SUM( |
| | 135 | EXTRACT(EPOCH FROM (bb.end_time - bb.start_time))/3600 |
| | 136 | * b.price_per_hour |
| | 137 | ), 0) AS band_cost |
| | 138 | |
| | 139 | FROM wedding w |
| | 140 | |
| | 141 | LEFT JOIN "user" u |
| | 142 | ON w.user_id = u.user_id |
| | 143 | |
| | 144 | LEFT JOIN venue_booking vb |
| | 145 | ON w.wedding_id = vb.wedding_id |
| | 146 | |
| | 147 | LEFT JOIN photographer_booking pb |
| | 148 | ON w.wedding_id = pb.wedding_id |
| | 149 | |
| | 150 | LEFT JOIN photographer p |
| | 151 | ON pb.photographer_id = p.photographer_id |
| | 152 | |
| | 153 | LEFT JOIN band_booking bb |
| | 154 | ON w.wedding_id = bb.wedding_id |
| | 155 | |
| | 156 | LEFT JOIN band b |
| | 157 | ON bb.band_id = b.band_id |
| | 158 | |
| | 159 | GROUP BY |
| | 160 | w.wedding_id, |
| | 161 | u.first_name, |
| | 162 | u.last_name, |
| | 163 | w.date, |
| | 164 | w.budget |
| | 165 | |
| | 166 | ORDER BY w.wedding_id; |
| | 167 | }}} |
| | 168 | |
| | 169 | === Expected Execution Plan === |
| | 170 | |
| | 171 | A typical execution plan for this query may include: |
| | 172 | |
| | 173 | {{{ |
| | 174 | HashAggregate |
| | 175 | -> Hash Left Join |
| | 176 | -> Hash Left Join |
| | 177 | -> Hash Left Join |
| | 178 | -> Seq Scan on wedding |
| | 179 | }}} |
| | 180 | |
| | 181 | === Interpretation === |
| | 182 | |
| | 183 | * Seq Scan on wedding: |
| | 184 | * PostgreSQL scans the wedding table as the base relation |
| | 185 | |
| | 186 | * Hash Left Join: |
| | 187 | * booking tables are joined using hash joins because of the multiple LEFT JOIN operations |
| | 188 | |
| | 189 | * HashAggregate: |
| | 190 | * aggregation is performed after all joins are completed |
| | 191 | |
| | 192 | The query execution cost increases proportionally with: |
| | 193 | * number of weddings |
| | 194 | * number of booking records |
| | 195 | * number of vendors per wedding |
| | 196 | |
| | 197 | === Recommended Indexes === |
| | 198 | |
| | 199 | {{{ |
| | 200 | #!sql |
| | 201 | CREATE INDEX idx_wedding_user |
| | 202 | ON wedding(user_id); |
| | 203 | |
| | 204 | CREATE INDEX idx_venue_booking_wedding |
| | 205 | ON venue_booking(wedding_id); |
| | 206 | |
| | 207 | CREATE INDEX idx_photographer_booking_wedding |
| | 208 | ON photographer_booking(wedding_id); |
| | 209 | |
| | 210 | CREATE INDEX idx_photographer_booking_photographer |
| | 211 | ON photographer_booking(photographer_id); |
| | 212 | |
| | 213 | CREATE INDEX idx_band_booking_wedding |
| | 214 | ON band_booking(wedding_id); |
| | 215 | |
| | 216 | CREATE INDEX idx_band_booking_band |
| | 217 | ON band_booking(band_id); |
| | 218 | }}} |
| | 219 | |
| | 220 | === Optimization Benefits === |
| | 221 | |
| | 222 | The proposed indexes improve: |
| | 223 | * JOIN performance |
| | 224 | * row lookup speed |
| | 225 | * aggregation preparation |
| | 226 | * scalability for large booking datasets |
| | 227 | |
| | 228 | The indexes reduce: |
| | 229 | * sequential scans |
| | 230 | * unnecessary I/O operations |
| | 231 | * execution latency for analytical reports |
| | 232 | |
| | 233 | === Validation === |
| | 234 | |
| | 235 | {{{ |
| | 236 | #!sql |
| 770 | | COUNT(g.guest_id) AS total_guests, |
| 771 | | COUNT(r.rsvp_id) FILTER (WHERE r.status = 'Accepted') AS confirmed |
| 772 | | FROM wedding w |
| 773 | | JOIN guest g ON w.wedding_id = g.wedding_id |
| 774 | | LEFT JOIN event_rsvp r ON g.guest_id = r.guest_id |
| 775 | | GROUP BY w.wedding_id; |
| 776 | | }}} |
| 777 | | |
| 778 | | Expected output: |
| 779 | | |
| 780 | | {{{ |
| 781 | | HashAggregate (cost=3241.10..3243.25 rows=215 width=24) |
| 782 | | (actual time=89.3..89.8 rows=215 loops=1) |
| 783 | | Group Key: w.wedding_id |
| 784 | | -> Hash Left Join (cost=... actual time=8.1..71.4 rows=48200 loops=1) |
| 785 | | Hash Cond: (g.guest_id = r.guest_id) |
| 786 | | -> Index Scan using idx_guest_wedding on guest g |
| 787 | | (actual time=0.02..3.4 rows=12500 loops=1) |
| 788 | | Planning Time: 2.8 ms |
| 789 | | Execution Time: 90.1 ms |
| 790 | | }}} |
| 791 | | |
| 792 | | ==== Interpretation ==== |
| 793 | | |
| 794 | | * `HashAggregate` — efficient grouping algorithm chosen by the planner |
| 795 | | * `Hash Left Join` — appropriate for large result sets without a nested-loop alternative |
| 796 | | * `Index Scan using idx_guest_wedding` — confirms the index is being used for the JOIN |
| 797 | | |
| 798 | | === 5.6 Benchmark Results Summary === |
| 799 | | |
| 800 | | || '''SQL Operation''' || '''Without Index''' || '''With Index''' || '''Improvement''' || |
| 801 | | || Guest lookup by wedding || 120 ms || 2 ms || 98.3% || |
| 802 | | || Event timeline (active only) || 850 ms || 35 ms || 95.9% || |
| 803 | | || RSVP aggregation || 2400 ms || 180 ms || 92.5% || |
| 804 | | || Budget analysis (4 joins) || 3200 ms || 145 ms || 95.5% || |
| 805 | | || Full-text vendor search || 4100 ms || 18 ms || 99.6% || |
| 806 | | |
| 807 | | ---- |
| | 501 | |
| | 502 | u.first_name || ' ' || u.last_name |
| | 503 | AS organizer_name, |
| | 504 | |
| | 505 | w.date AS wedding_date, |
| | 506 | |
| | 507 | e.event_id, |
| | 508 | e.event_type, |
| | 509 | |
| | 510 | COUNT(DISTINCT g.guest_id) |
| | 511 | AS total_invitations, |
| | 512 | |
| | 513 | COUNT(DISTINCT r.response_id) |
| | 514 | AS rsvp_responses, |
| | 515 | |
| | 516 | COUNT(DISTINCT CASE |
| | 517 | WHEN r.status = 'CONFIRMED' |
| | 518 | THEN r.response_id |
| | 519 | END) AS confirmed_rsvps, |
| | 520 | |
| | 521 | COUNT(DISTINCT CASE |
| | 522 | WHEN r.status = 'DECLINED' |
| | 523 | THEN r.response_id |
| | 524 | END) AS declined_rsvps, |
| | 525 | |
| | 526 | COUNT(DISTINCT a.attendance_id) |
| | 527 | AS attendance_records, |
| | 528 | |
| | 529 | COUNT(DISTINCT CASE |
| | 530 | WHEN a.status = 'ATTENDED' |
| | 531 | THEN a.attendance_id |
| | 532 | END) AS actual_attendees |
| | 533 | |
| | 534 | FROM wedding w |
| | 535 | |
| | 536 | INNER JOIN "user" u |
| | 537 | ON w.user_id = u.user_id |
| | 538 | |
| | 539 | INNER JOIN event e |
| | 540 | ON w.wedding_id = e.wedding_id |
| | 541 | |
| | 542 | LEFT JOIN guest g |
| | 543 | ON w.wedding_id = g.wedding_id |
| | 544 | |
| | 545 | LEFT JOIN event_rsvp r |
| | 546 | ON g.guest_id = r.guest_id |
| | 547 | AND e.event_id = r.event_id |
| | 548 | |
| | 549 | LEFT JOIN attendance a |
| | 550 | ON g.guest_id = a.guest_id |
| | 551 | AND e.event_id = a.event_id |
| | 552 | |
| | 553 | GROUP BY |
| | 554 | w.wedding_id, |
| | 555 | u.first_name, |
| | 556 | u.last_name, |
| | 557 | w.date, |
| | 558 | e.event_id, |
| | 559 | e.event_type |
| | 560 | |
| | 561 | ORDER BY |
| | 562 | w.wedding_id, |
| | 563 | e.event_id; |
| | 564 | }}} |
| | 565 | |
| | 566 | === Expected Execution Plan === |
| | 567 | |
| | 568 | A typical execution plan may include: |
| | 569 | |
| | 570 | {{{ |
| | 571 | HashAggregate |
| | 572 | -> Hash Left Join |
| | 573 | -> Hash Left Join |
| | 574 | -> Hash Join |
| | 575 | -> Seq Scan on guest |
| | 576 | }}} |
| | 577 | |
| | 578 | === Interpretation === |
| | 579 | |
| | 580 | * Seq Scan on guest: |
| | 581 | * guest rows are scanned before RSVP and attendance matching |
| | 582 | |
| | 583 | * Hash Left Join: |
| | 584 | * guests are matched with RSVP and attendance records |
| | 585 | |
| | 586 | * HashAggregate: |
| | 587 | * PostgreSQL groups records by wedding and event before calculating conversion metrics |
| | 588 | |
| | 589 | The execution cost increases with: |
| | 590 | * number of guests |
| | 591 | * number of events per wedding |
| | 592 | * number of RSVP records |
| | 593 | * number of attendance records |
| | 594 | |
| | 595 | === Recommended Indexes === |
| | 596 | |
| | 597 | {{{ |
| | 598 | #!sql |
| | 599 | CREATE INDEX idx_event_wedding |
| | 600 | ON event(wedding_id); |
| | 601 | |
| | 602 | CREATE INDEX idx_guest_wedding |
| | 603 | ON guest(wedding_id); |
| | 604 | |
| | 605 | CREATE INDEX idx_event_rsvp_guest |
| | 606 | ON event_rsvp(guest_id); |
| | 607 | |
| | 608 | CREATE INDEX idx_event_rsvp_event |
| | 609 | ON event_rsvp(event_id); |
| | 610 | |
| | 611 | CREATE INDEX idx_event_rsvp_status |
| | 612 | ON event_rsvp(status); |
| | 613 | |
| | 614 | CREATE INDEX idx_attendance_guest |
| | 615 | ON attendance(guest_id); |
| | 616 | |
| | 617 | CREATE INDEX idx_attendance_event |
| | 618 | ON attendance(event_id); |
| | 619 | |
| | 620 | CREATE INDEX idx_attendance_status |
| | 621 | ON attendance(status); |
| | 622 | }}} |
| | 623 | |
| | 624 | === Optimization Benefits === |
| | 625 | |
| | 626 | The proposed indexes improve: |
| | 627 | * guest lookup by wedding |
| | 628 | * RSVP lookup by guest and event |
| | 629 | * attendance lookup by guest and event |
| | 630 | * filtering by RSVP and attendance status |
| | 631 | * GROUP BY preparation |
| | 632 | |
| | 633 | The indexes reduce: |
| | 634 | * unnecessary sequential scans |
| | 635 | * large intermediate join results |
| | 636 | * execution time for RSVP reporting |
| | 637 | |
| | 638 | === Validation === |
| | 639 | |
| | 640 | {{{ |
| | 641 | #!sql |
| | 642 | EXPLAIN ANALYZE |
| | 643 | SELECT * |
| | 644 | FROM event_rsvp |
| | 645 | WHERE guest_id = 1 |
| | 646 | AND event_id = 1; |
| | 647 | }}} |
| | 648 | |
| | 649 | Expected result: |
| | 650 | |
| | 651 | {{{ |
| | 652 | Index Scan using idx_event_rsvp_guest on event_rsvp |
| | 653 | }}} |
| | 654 | |
| | 655 | === Conclusion === |
| | 656 | |
| | 657 | The RSVP Conversion query is one of the most aggregation-heavy Phase 6 reports because it combines invitation, RSVP, and attendance records into conversion metrics. |
| | 658 | |
| | 659 | The most expensive operations are COUNT(DISTINCT ...) and conditional aggregation. |
| | 660 | |
| | 661 | Indexing guest, RSVP, attendance, and event foreign-key columns directly improves execution performance and makes the report scalable for larger weddings with many guests. |
| | 662 | |
| | 663 | == 1.6 Performance Analysis Summary == |
| | 664 | |
| | 665 | The Phase 6 analytical reports demonstrate significantly higher execution complexity compared to standard transactional queries. |
| | 666 | |
| | 667 | The main performance-intensive operations identified throughout the analysis are: |
| | 668 | |
| | 669 | * multiple JOIN operations |
| | 670 | * LEFT JOIN preservation of incomplete relations |
| | 671 | * GROUP BY aggregation |
| | 672 | * COUNT(DISTINCT ...) calculations |
| | 673 | * conditional aggregation using CASE |
| | 674 | * temporal calculations using EXTRACT(EPOCH) |
| | 675 | * percentage calculations using ROUND() and NULLIF() |
| | 676 | |
| | 677 | == 1.6.1 Main Bottlenecks == |
| | 678 | |
| | 679 | || Bottleneck || Impact || |
| | 680 | || Multiple LEFT JOIN chains || Increased intermediate result size || |
| | 681 | || COUNT(DISTINCT ...) || Expensive aggregation and sorting || |
| | 682 | || GROUP BY over joined relations || Higher memory and CPU usage || |
| | 683 | || Temporal calculations || Additional CPU processing || |
| | 684 | || Missing indexes on foreign keys || Sequential scans and slow joins || |
| | 685 | |
| | 686 | The most expensive queries are: |
| | 687 | * RSVP Conversion Analysis |
| | 688 | * Venue Capacity Utilization Analysis |
| | 689 | |
| | 690 | because they process: |
| | 691 | * attendance records |
| | 692 | * RSVP records |
| | 693 | * DISTINCT aggregations |
| | 694 | * multiple optional relations |
| | 695 | |
| | 696 | == 1.6.2 Most Important Indexes == |
| | 697 | |
| | 698 | The following indexes provide the greatest performance improvements for the Phase 6 analytical workload: |
| | 699 | |
| | 700 | {{{ |
| | 701 | #!sql |
| | 702 | CREATE INDEX idx_guest_wedding |
| | 703 | ON guest(wedding_id); |
| | 704 | |
| | 705 | CREATE INDEX idx_event_wedding |
| | 706 | ON event(wedding_id); |
| | 707 | |
| | 708 | CREATE INDEX idx_event_rsvp_guest |
| | 709 | ON event_rsvp(guest_id); |
| | 710 | |
| | 711 | CREATE INDEX idx_attendance_event |
| | 712 | ON attendance(event_id); |
| | 713 | |
| | 714 | CREATE INDEX idx_venue_booking_wedding |
| | 715 | ON venue_booking(wedding_id); |
| | 716 | |
| | 717 | CREATE INDEX idx_photographer_booking_wedding |
| | 718 | ON photographer_booking(wedding_id); |
| | 719 | |
| | 720 | CREATE INDEX idx_band_booking_wedding |
| | 721 | ON band_booking(wedding_id); |
| | 722 | }}} |
| | 723 | |
| | 724 | These indexes improve: |
| | 725 | * JOIN performance |
| | 726 | * aggregation preparation |
| | 727 | * filtering efficiency |
| | 728 | * report scalability |
| | 729 | |
| | 730 | == 1.6.3 Expected Optimization Improvements == |
| | 731 | |
| | 732 | After applying the recommended indexes, PostgreSQL is expected to: |
| | 733 | * replace Sequential Scans with Index Scans |
| | 734 | * reduce JOIN execution cost |
| | 735 | * reduce aggregation preparation time |
| | 736 | * reduce overall execution latency |
| | 737 | |
| | 738 | Expected improvements include: |
| | 739 | * faster analytical report generation |
| | 740 | * lower memory consumption |
| | 741 | * lower disk I/O |
| | 742 | * better scalability for larger wedding datasets |
| | 743 | |
| | 744 | == 1.6.4 Scalability Considerations == |
| | 745 | |
| | 746 | As the database grows, the analytical queries from Phase 6 become increasingly dependent on: |
| | 747 | * index quality |
| | 748 | * efficient JOIN ordering |
| | 749 | * aggregation optimization |
| | 750 | * table statistics maintenance |
| | 751 | |
| | 752 | The largest future scalability risks are: |
| | 753 | * very large attendance datasets |
| | 754 | * large RSVP histories |
| | 755 | * repeated analytical aggregation over historical weddings |
| | 756 | |
| | 757 | To maintain acceptable performance in large-scale deployments, the following strategies are recommended: |
| | 758 | * materialized views for analytical reports |
| | 759 | * periodic archiving of historical weddings |
| | 760 | * automatic VACUUM and ANALYZE maintenance |
| | 761 | * partitioning of large attendance and RSVP tables |
| | 762 | |
| | 763 | == 1.6.5 Final Interpretation == |
| | 764 | |
| | 765 | The analysis confirms that the Phase 6 analytical reports are computationally more expensive than standard transactional operations because they combine multiple relations and perform aggregation-intensive calculations. |
| | 766 | |
| | 767 | However, with proper indexing and optimization strategies, PostgreSQL can efficiently execute these analytical reports while maintaining acceptable scalability and execution performance. |
| | 768 | |
| | 769 | == 2. Indexing Strategy & Optimization Framework == |
| | 770 | |
| | 771 | A well-designed indexing strategy is essential for maintaining predictable performance |
| | 772 | under high data volumes and concurrent user load. |
| | 773 | |
| | 774 | === 2.1 Current Index Landscape === |
| | 775 | |
| | 776 | The Phase 6 analytical reports rely heavily on foreign-key relationships and aggregation over attendance, RSVP, and booking data. |
| | 777 | |
| | 778 | A schema analysis identified the following important optimization requirements: |
| | 779 | * indexing of foreign-key columns |
| | 780 | * indexing of frequently grouped relations |
| | 781 | * optimization of JOIN paths |
| | 782 | * optimization of attendance and RSVP aggregation |
| | 783 | |
| | 784 | The most performance-sensitive tables are: |
| | 785 | * attendance |
| | 786 | * event_rsvp |
| | 787 | * venue_booking |
| | 788 | * photographer_booking |
| | 789 | * band_booking |
| | 790 | * guest |
| | 791 | * event |
| | 792 | |
| | 793 | === 2.2 Index Selectivity & Cardinality Analysis === |
| | 794 | |
| | 795 | || Column || Cardinality || Index Recommendation || |
| | 796 | || wedding_id || High || YES || |
| | 797 | || event_id || High || YES || |
| | 798 | || guest_id || High || YES || |
| | 799 | || venue_id || High || YES || |
| | 800 | || photographer_id || High || YES || |
| | 801 | || band_id || High || YES || |
| | 802 | || status || Medium || YES (combined indexes) || |
| | 803 | || event_type || Medium || Optional || |
| | 804 | || date || High || YES || |
| | 805 | |
| | 806 | Columns with high cardinality provide the best index selectivity and improve JOIN performance significantly. |
| | 807 | |
| | 808 | === 2.3 Recommended Indexes for Phase 6 Reports === |
| | 809 | |
| | 810 | ==== 2.3.1 Foreign-Key Optimization Indexes ==== |
| | 811 | |
| | 812 | {{{ |
| | 813 | #!sql |
| | 814 | CREATE INDEX idx_guest_wedding |
| | 815 | ON guest(wedding_id); |
| | 816 | |
| | 817 | CREATE INDEX idx_event_wedding |
| | 818 | ON event(wedding_id); |
| | 819 | |
| | 820 | CREATE INDEX idx_venue_booking_wedding |
| | 821 | ON venue_booking(wedding_id); |
| | 822 | |
| | 823 | CREATE INDEX idx_venue_booking_venue |
| | 824 | ON venue_booking(venue_id); |
| | 825 | |
| | 826 | CREATE INDEX idx_photographer_booking_wedding |
| | 827 | ON photographer_booking(wedding_id); |
| | 828 | |
| | 829 | CREATE INDEX idx_photographer_booking_photographer |
| | 830 | ON photographer_booking(photographer_id); |
| | 831 | |
| | 832 | CREATE INDEX idx_band_booking_wedding |
| | 833 | ON band_booking(wedding_id); |
| | 834 | |
| | 835 | CREATE INDEX idx_band_booking_band |
| | 836 | ON band_booking(band_id); |
| | 837 | |
| | 838 | CREATE INDEX idx_attendance_event |
| | 839 | ON attendance(event_id); |
| | 840 | |
| | 841 | CREATE INDEX idx_attendance_guest |
| | 842 | ON attendance(guest_id); |
| | 843 | |
| | 844 | CREATE INDEX idx_event_rsvp_guest |
| | 845 | ON event_rsvp(guest_id); |
| | 846 | |
| | 847 | CREATE INDEX idx_event_rsvp_event |
| | 848 | ON event_rsvp(event_id); |
| | 849 | }}} |
| | 850 | |
| | 851 | ==== Explanation ==== |
| | 852 | |
| | 853 | These indexes optimize: |
| | 854 | * JOIN operations |
| | 855 | * attendance aggregation |
| | 856 | * RSVP lookup |
| | 857 | * booking analysis |
| | 858 | * analytical report generation |
| | 859 | |
| | 860 | The indexes reduce: |
| | 861 | * sequential scans |
| | 862 | * join latency |
| | 863 | * aggregation preparation cost |
| | 864 | |
| | 865 | === 2.3.2 Composite Analytical Indexes ==== |
| | 866 | |
| | 867 | {{{ |
| | 868 | #!sql |
| | 869 | CREATE INDEX idx_attendance_event_status |
| | 870 | ON attendance(event_id, status); |
| | 871 | |
| | 872 | CREATE INDEX idx_event_rsvp_guest_status |
| | 873 | ON event_rsvp(guest_id, status); |
| | 874 | |
| | 875 | CREATE INDEX idx_event_wedding_date |
| | 876 | ON event(wedding_id, date); |
| | 877 | |
| | 878 | CREATE INDEX idx_venue_booking_date |
| | 879 | ON venue_booking(wedding_id, date); |
| | 880 | }}} |
| | 881 | |
| | 882 | ==== Explanation ==== |
| | 883 | |
| | 884 | Composite indexes improve analytical filtering because the Phase 6 reports frequently: |
| | 885 | * filter by status |
| | 886 | * group by wedding/event |
| | 887 | * analyze attendance by event |
| | 888 | * analyze RSVP responses by status |
| | 889 | |
| | 890 | These indexes significantly improve: |
| | 891 | * conditional aggregation |
| | 892 | * GROUP BY preparation |
| | 893 | * attendance filtering |
| | 894 | * RSVP reporting |
| | 895 | |
| | 896 | === 2.4 EXPLAIN ANALYZE Validation === |
| | 897 | |
| | 898 | The following queries can be used to validate index utilization: |
| | 899 | |
| | 900 | {{{ |
| | 901 | #!sql |
| | 902 | EXPLAIN ANALYZE |
| | 903 | SELECT * |
| | 904 | FROM attendance |
| | 905 | WHERE event_id = 1; |
| | 906 | |
| | 907 | EXPLAIN ANALYZE |
| | 908 | SELECT * |
| | 909 | FROM event_rsvp |
| | 910 | WHERE guest_id = 1; |
| | 911 | |
| | 912 | EXPLAIN ANALYZE |
| | 913 | SELECT * |
| | 914 | FROM venue_booking |
| | 915 | WHERE wedding_id = 1; |
| | 916 | }}} |
| | 917 | |
| | 918 | Expected PostgreSQL output: |
| | 919 | |
| | 920 | {{{ |
| | 921 | Index Scan using idx_attendance_event on attendance |
| | 922 | |
| | 923 | Index Scan using idx_event_rsvp_guest on event_rsvp |
| | 924 | |
| | 925 | Index Scan using idx_venue_booking_wedding on venue_booking |
| | 926 | }}} |
| | 927 | |
| | 928 | === 2.5 Optimization Benefits === |
| | 929 | |
| | 930 | The proposed indexing strategy improves: |
| | 931 | * JOIN performance |
| | 932 | * aggregation efficiency |
| | 933 | * analytical reporting speed |
| | 934 | * scalability of Phase 6 queries |
| | 935 | |
| | 936 | The indexing strategy reduces: |
| | 937 | * full table scans |
| | 938 | * unnecessary disk I/O |
| | 939 | * aggregation overhead |
| | 940 | * execution latency |
| | 941 | |
| | 942 | === 2.6 Maintenance Recommendations === |
| | 943 | |
| | 944 | To maintain stable analytical performance, the following maintenance procedures are recommended: |
| | 945 | * periodic VACUUM execution |
| | 946 | * regular ANALYZE statistics updates |
| | 947 | * monitoring unused indexes |
| | 948 | * reindexing fragmented indexes |
| | 949 | * monitoring query execution plans with EXPLAIN ANALYZE |
| | 950 | |
| | 951 | These maintenance operations help PostgreSQL preserve optimal execution plans for the analytical reports implemented in Phase 6. |
| | 952 | |
| | 953 | == 3. Caching, Partitioning & Storage Optimization == |
| | 954 | |
| | 955 | === 3.1 Caching & Analytical Optimization === |
| | 956 | |
| | 957 | The Phase 6 analytical reports execute complex aggregation queries across attendance, RSVP, booking, and event relations. |
| | 958 | |
| | 959 | As the database grows, repeatedly calculating these analytical metrics may increase execution time and server load. |
| | 960 | |
| | 961 | To improve scalability, the following optimization strategies are recommended: |
| | 962 | * materialized analytical reports |
| | 963 | * cached aggregation results |
| | 964 | * periodic statistics refresh |
| | 965 | * optimization of historical analytical workloads |
| | 966 | |
| | 967 | === 3.2 Materialized Views === |
| | 968 | |
| | 969 | Materialized views can precompute expensive analytical calculations and significantly reduce report execution time. |
| | 970 | |
| | 971 | The following analytical reports are good candidates for materialization: |
| | 972 | * Budget Analysis |
| | 973 | * Venue Capacity Utilization |
| | 974 | * RSVP Conversion Analysis |
| | 975 | |
| | 976 | ==== Example: RSVP Conversion Materialized View ==== |
| | 977 | |
| | 978 | {{{ |
| | 979 | #!sql |
| | 980 | CREATE MATERIALIZED VIEW mv_rsvp_conversion AS |
| | 981 | |
| | 982 | SELECT |
| | 983 | w.wedding_id, |
| | 984 | |
| | 985 | e.event_id, |
| | 986 | |
| | 987 | COUNT(DISTINCT g.guest_id) |
| | 988 | AS total_invitations, |
| | 989 | |
| | 990 | COUNT(DISTINCT r.response_id) |
| | 991 | AS rsvp_responses, |
| | 992 | |
| | 993 | COUNT(DISTINCT CASE |
| | 994 | WHEN r.status = 'CONFIRMED' |
| | 995 | THEN r.response_id |
| | 996 | END) AS confirmed_rsvps, |
| | 997 | |
| | 998 | COUNT(DISTINCT CASE |
| | 999 | WHEN a.status = 'ATTENDED' |
| | 1000 | THEN a.attendance_id |
| | 1001 | END) AS actual_attendees |
| | 1002 | |
| | 1003 | FROM wedding w |
| | 1004 | |
| | 1005 | INNER JOIN event e |
| | 1006 | ON w.wedding_id = e.wedding_id |
| | 1007 | |
| | 1008 | LEFT JOIN guest g |
| | 1009 | ON w.wedding_id = g.wedding_id |
| | 1010 | |
| | 1011 | LEFT JOIN event_rsvp r |
| | 1012 | ON g.guest_id = r.guest_id |
| | 1013 | AND e.event_id = r.event_id |
| | 1014 | |
| | 1015 | LEFT JOIN attendance a |
| | 1016 | ON g.guest_id = a.guest_id |
| | 1017 | AND e.event_id = a.event_id |
| | 1018 | |
| | 1019 | GROUP BY |
| | 1020 | w.wedding_id, |
| | 1021 | e.event_id; |
| | 1022 | }}} |
| | 1023 | |
| | 1024 | ==== Benefits ==== |
| | 1025 | |
| | 1026 | Materialized views improve: |
| | 1027 | * analytical query speed |
| | 1028 | * dashboard loading |
| | 1029 | * repeated reporting performance |
| | 1030 | * scalability for historical analytics |
| | 1031 | |
| | 1032 | The materialized view stores precomputed aggregation results and avoids recalculating complex JOIN operations during every report execution. |
| | 1033 | |
| | 1034 | ==== Refreshing the Materialized View ==== |
| | 1035 | |
| | 1036 | {{{ |
| | 1037 | #!sql |
| | 1038 | REFRESH MATERIALIZED VIEW mv_rsvp_conversion; |
| | 1039 | }}} |
| | 1040 | |
| | 1041 | === 3.3 Partitioning Considerations === |
| | 1042 | |
| | 1043 | The largest future analytical tables are expected to be: |
| | 1044 | * attendance |
| | 1045 | * event_rsvp |
| | 1046 | * guest |
| | 1047 | |
| | 1048 | As historical wedding data grows, partitioning may improve scalability. |
| | 1049 | |
| | 1050 | Recommended partitioning strategy: |
| | 1051 | * partition attendance by event date |
| | 1052 | * partition RSVP records by wedding date |
| | 1053 | * archive historical weddings separately |
| | 1054 | |
| | 1055 | ==== Example: Attendance Partitioning ==== |
| | 1056 | |
| | 1057 | The following example demonstrates a conceptual partitioned version of the attendance table for large-scale deployments. |
| | 1058 | |
| | 1059 | {{{ |
| | 1060 | #!sql |
| | 1061 | CREATE TABLE attendance ( |
| | 1062 | |
| | 1063 | attendance_id SERIAL, |
| | 1064 | status VARCHAR(30), |
| | 1065 | table_number INTEGER, |
| | 1066 | role VARCHAR(50), |
| | 1067 | guest_id INTEGER, |
| | 1068 | event_id INTEGER, |
| | 1069 | attendance_date DATE, |
| | 1070 | |
| | 1071 | PRIMARY KEY(attendance_id, attendance_date) |
| | 1072 | |
| | 1073 | ) PARTITION BY RANGE (attendance_date); |
| | 1074 | }}} |
| | 1075 | |
| | 1076 | ==== Benefits ==== |
| | 1077 | |
| | 1078 | Partitioning improves: |
| | 1079 | * analytical query performance |
| | 1080 | * historical data management |
| | 1081 | * maintenance operations |
| | 1082 | * VACUUM efficiency |
| | 1083 | * scalability of large attendance datasets |
| | 1084 | |
| | 1085 | === 3.4 Storage Optimization === |
| | 1086 | |
| | 1087 | To improve long-term database performance, the following storage optimizations are recommended: |
| | 1088 | * periodic VACUUM execution |
| | 1089 | * ANALYZE statistics updates |
| | 1090 | * archival of historical wedding records |
| | 1091 | * separation of analytical and transactional workloads |
| | 1092 | |
| | 1093 | These optimizations reduce: |
| | 1094 | * table fragmentation |
| | 1095 | * outdated planner statistics |
| | 1096 | * unnecessary sequential scans |
| | 1097 | * analytical execution overhead |
| | 1098 | |
| | 1099 | === 3.5 Scalability Interpretation === |
| | 1100 | |
| | 1101 | The analytical queries from Phase 6 are aggregation-heavy and become increasingly expensive as attendance and RSVP data grows. |
| | 1102 | |
| | 1103 | Caching, materialized views, and partitioning help PostgreSQL maintain predictable execution performance even when processing large analytical datasets. |
| | 1104 | |
| | 1105 | These strategies are especially important for: |
| | 1106 | * large weddings |
| | 1107 | * long-term historical reporting |
| | 1108 | * repeated dashboard analytics |
| | 1109 | * concurrent report execution |
| | 1110 | |
| | 1111 | == 4. Concurrency, Transaction Management & Locking Strategy == |
| | 1112 | |
| | 1113 | === 4.1 Transaction Isolation Levels === |
| | 1114 | |
| | 1115 | The Wedding Planner Management System includes several operations that require transactional consistency and protection from concurrent modification. |
| | 1116 | |
| | 1117 | Examples include: |
| | 1118 | * venue booking |
| | 1119 | * attendance updates |
| | 1120 | * RSVP processing |
| | 1121 | * wedding scheduling |
| | 1122 | * event creation |
| | 1123 | |
| | 1124 | The following PostgreSQL isolation levels are recommended: |
| | 1125 | |
| | 1126 | || Isolation Level || Recommended Usage || |
| | 1127 | || READ COMMITTED || Standard CRUD operations || |
| | 1128 | || REPEATABLE READ || Venue booking validation || |
| | 1129 | || SERIALIZABLE || Critical scheduling operations || |
| | 1130 | |
| | 1131 | === 4.2 Concurrency Risks === |
| | 1132 | |
| | 1133 | The most important concurrency risks identified are: |
| | 1134 | * double-booking of venues |
| | 1135 | * simultaneous RSVP modifications |
| | 1136 | * concurrent attendance updates |
| | 1137 | * overlapping event scheduling |
| | 1138 | |
| | 1139 | Without proper transaction management, multiple users may modify the same wedding-related records simultaneously. |
| | 1140 | |
| | 1141 | === 4.3 Deadlock Prevention Strategies === |
| | 1142 | |
| | 1143 | The following practices reduce deadlock risk: |
| | 1144 | * consistent transaction ordering |
| | 1145 | * short transaction duration |
| | 1146 | * indexing foreign-key columns |
| | 1147 | * avoiding unnecessary table locking |
| | 1148 | |
| | 1149 | The most sensitive operations are: |
| | 1150 | * venue reservation |
| | 1151 | * event scheduling |
| | 1152 | * attendance confirmation |
| | 1153 | |
| | 1154 | === 4.4 Safe Venue Reservation Transaction === |
| | 1155 | |
| | 1156 | The following transaction prevents double-booking of venues. |
| | 1157 | |
| | 1158 | {{{ |
| | 1159 | #!sql |
| | 1160 | BEGIN; |
| | 1161 | |
| | 1162 | SELECT venue_id |
| | 1163 | FROM venue |
| | 1164 | WHERE venue_id = 1 |
| | 1165 | FOR UPDATE; |
| | 1166 | |
| | 1167 | INSERT INTO venue_booking ( |
| | 1168 | date, |
| | 1169 | start_time, |
| | 1170 | end_time, |
| | 1171 | status, |
| | 1172 | price, |
| | 1173 | venue_id, |
| | 1174 | wedding_id |
| | 1175 | ) |
| | 1176 | SELECT |
| | 1177 | '2025-08-20', |
| | 1178 | '18:00:00', |
| | 1179 | '23:00:00', |
| | 1180 | 'CONFIRMED', |
| | 1181 | 5000.00, |
| | 1182 | 1, |
| | 1183 | 2 |
| | 1184 | |
| | 1185 | WHERE NOT EXISTS ( |
| | 1186 | |
| | 1187 | SELECT 1 |
| | 1188 | |
| | 1189 | FROM venue_booking |
| | 1190 | |
| | 1191 | WHERE venue_id = 1 |
| | 1192 | AND date = '2025-08-20' |
| | 1193 | AND status != 'CANCELLED' |
| | 1194 | ); |
| | 1195 | |
| | 1196 | COMMIT; |
| | 1197 | }}} |
| | 1198 | |
| | 1199 | === Explanation === |
| | 1200 | |
| | 1201 | `FOR UPDATE` locks the selected venue row during the transaction. |
| | 1202 | |
| | 1203 | This prevents concurrent transactions from simultaneously booking the same venue for overlapping dates. |
| | 1204 | |
| | 1205 | The `NOT EXISTS` condition ensures that no conflicting booking already exists. |
| | 1206 | |
| | 1207 | === Validation === |
| | 1208 | |
| | 1209 | {{{ |
| | 1210 | #!sql |
| | 1211 | SELECT |
| | 1212 | venue_id, |
| | 1213 | date, |
| | 1214 | COUNT(*) |
| | 1215 | |
| | 1216 | FROM venue_booking |
| | 1217 | |
| | 1218 | WHERE status != 'CANCELLED' |
| | 1219 | |
| | 1220 | GROUP BY |
| | 1221 | venue_id, |
| | 1222 | date |
| | 1223 | |
| | 1224 | HAVING COUNT(*) > 1; |
| | 1225 | }}} |
| | 1226 | |
| | 1227 | Expected result: |
| | 1228 | * empty result set |
| | 1229 | |
| | 1230 | Any returned rows indicate conflicting venue bookings. |
| | 1231 | |
| | 1232 | === 4.5 Row-Level Locking === |
| | 1233 | |
| | 1234 | The system primarily relies on: |
| | 1235 | * row-level locks |
| | 1236 | * transaction isolation |
| | 1237 | * foreign-key consistency |
| | 1238 | |
| | 1239 | Row-level locking is preferred because it: |
| | 1240 | * minimizes blocking |
| | 1241 | * improves concurrency |
| | 1242 | * prevents unnecessary table-wide locks |
| | 1243 | |
| | 1244 | === 4.6 Transaction Scalability === |
| | 1245 | |
| | 1246 | As the number of weddings and concurrent users increases, transaction management becomes increasingly important. |
| | 1247 | |
| | 1248 | The following practices improve scalability: |
| | 1249 | * keeping transactions short |
| | 1250 | * indexing transactional lookup columns |
| | 1251 | * separating analytical reports from transactional operations |
| | 1252 | * avoiding long-running locks |
| | 1253 | |
| | 1254 | === 4.7 Final Interpretation === |
| | 1255 | |
| | 1256 | The Wedding Planner Management System contains several scheduling and booking operations that require transactional consistency. |
| | 1257 | |
| | 1258 | PostgreSQL transaction isolation and row-level locking mechanisms help prevent: |
| | 1259 | * double-booking |
| | 1260 | * inconsistent RSVP updates |
| | 1261 | * concurrent attendance conflicts |
| | 1262 | * invalid scheduling states |
| | 1263 | |
| | 1264 | Proper transaction management ensures both: |
| | 1265 | * data integrity |
| | 1266 | * stable concurrent system behavior |
| | 1267 | |
| | 1268 | == 5. Performance Analysis with EXPLAIN / EXPLAIN ANALYZE == |
| | 1269 | |
| | 1270 | === 5.1 Purpose of EXPLAIN and EXPLAIN ANALYZE === |
| | 1271 | |
| | 1272 | PostgreSQL provides several commands for analyzing query execution behavior. |
| | 1273 | |
| | 1274 | || Command || Purpose || |
| | 1275 | || EXPLAIN || Shows the planned execution strategy without running the query || |
| | 1276 | || EXPLAIN ANALYZE || Executes the query and shows real execution statistics || |
| | 1277 | || EXPLAIN (ANALYZE, BUFFERS) || Shows execution statistics plus buffer/cache usage || |
| | 1278 | |
| | 1279 | In this phase, `EXPLAIN ANALYZE` is used to evaluate the complex analytical queries from Phase 6. |
| | 1280 | |
| | 1281 | It helps identify: |
| | 1282 | * scan types |
| | 1283 | * join strategies |
| | 1284 | * aggregation methods |
| | 1285 | * actual execution time |
| | 1286 | * row counts |
| | 1287 | * loops |
| | 1288 | * possible bottlenecks |
| | 1289 | |
| | 1290 | === 5.2 Important Execution Plan Elements === |
| | 1291 | |
| | 1292 | || Plan Element || Meaning || |
| | 1293 | || Seq Scan || PostgreSQL scans the entire table || |
| | 1294 | || Index Scan || PostgreSQL uses an index to locate rows faster || |
| | 1295 | || Hash Join || PostgreSQL builds a hash table to join larger datasets || |
| | 1296 | || Nested Loop || PostgreSQL repeatedly scans one relation for each row of another relation || |
| | 1297 | || HashAggregate || PostgreSQL performs grouping and aggregation using a hash table || |
| | 1298 | || Sort || PostgreSQL sorts rows for ORDER BY or aggregation operations || |
| | 1299 | |
| | 1300 | For Phase 6 reports, the most common expected elements are: |
| | 1301 | * Hash Join |
| | 1302 | * HashAggregate |
| | 1303 | * Index Scan |
| | 1304 | * Seq Scan on small tables |
| | 1305 | * Sort |
| | 1306 | |
| | 1307 | === 5.3 Sequential Scan vs Index Scan === |
| | 1308 | |
| | 1309 | A Sequential Scan is not always a problem. |
| | 1310 | |
| | 1311 | PostgreSQL may choose a Sequential Scan when: |
| | 1312 | * the table is small |
| | 1313 | * most rows are needed |
| | 1314 | * the cost of using an index is higher than scanning the table |
| | 1315 | |
| | 1316 | However, for large Phase 6 analytical tables such as: |
| | 1317 | * attendance |
| | 1318 | * event_rsvp |
| | 1319 | * guest |
| | 1320 | * booking tables |
| | 1321 | |
| | 1322 | Index Scans are preferred when filtering or joining by foreign-key columns. |
| | 1323 | |
| | 1324 | === 5.4 Example: Attendance Lookup Before Optimization === |
| | 1325 | |
| | 1326 | {{{ |
| | 1327 | #!sql |
| | 1328 | EXPLAIN ANALYZE |
| | 1329 | SELECT * |
| | 1330 | FROM attendance |
| | 1331 | WHERE event_id = 1; |
| | 1332 | }}} |
| | 1333 | |
| | 1334 | Without an index on `attendance(event_id)`, PostgreSQL may use: |
| | 1335 | |
| | 1336 | {{{ |
| | 1337 | Seq Scan on attendance |
| | 1338 | Filter: (event_id = 1) |
| | 1339 | }}} |
| | 1340 | |
| | 1341 | This means that all attendance rows are scanned before matching rows are returned. |
| | 1342 | |
| | 1343 | For large attendance tables, this increases: |
| | 1344 | * disk I/O |
| | 1345 | * CPU usage |
| | 1346 | * query latency |
| | 1347 | |
| | 1348 | === 5.5 Example: Attendance Lookup After Optimization === |
| | 1349 | |
| | 1350 | {{{ |
| | 1351 | #!sql |
| | 1352 | CREATE INDEX idx_attendance_event |
| | 1353 | ON attendance(event_id); |
| | 1354 | |
| | 1355 | EXPLAIN ANALYZE |
| | 1356 | SELECT * |
| | 1357 | FROM attendance |
| | 1358 | WHERE event_id = 1; |
| | 1359 | }}} |
| | 1360 | |
| | 1361 | Expected result: |
| | 1362 | |
| | 1363 | {{{ |
| | 1364 | Index Scan using idx_attendance_event on attendance |
| | 1365 | Index Cond: (event_id = 1) |
| | 1366 | }}} |
| | 1367 | |
| | 1368 | This confirms that PostgreSQL can directly locate attendance records for a specific event. |
| | 1369 | |
| | 1370 | === 5.6 EXPLAIN ANALYZE for Phase 6 Reports === |
| | 1371 | |
| | 1372 | The Phase 6 reports should be analyzed using EXPLAIN ANALYZE because they include: |
| | 1373 | * joins across multiple tables |
| | 1374 | * aggregation |
| | 1375 | * distinct counting |
| | 1376 | * conditional calculations |
| | 1377 | |
| | 1378 | The analysis should be performed on: |
| | 1379 | * Budget Analysis query |
| | 1380 | * Venue Capacity query |
| | 1381 | * RSVP Conversion query |
| | 1382 | |
| | 1383 | These queries represent the real analytical workload of the Wedding Planner Management System. |
| | 1384 | |
| | 1385 | === 5.7 Interpreting Execution Time === |
| | 1386 | |
| | 1387 | When reading EXPLAIN ANALYZE output, the most important values are: |
| | 1388 | * Planning Time |
| | 1389 | * Execution Time |
| | 1390 | * actual rows |
| | 1391 | * loops |
| | 1392 | * scan type |
| | 1393 | * join type |
| | 1394 | |
| | 1395 | High execution time may indicate: |
| | 1396 | * missing indexes |
| | 1397 | * inefficient JOIN order |
| | 1398 | * expensive aggregation |
| | 1399 | * large intermediate result sets |
| | 1400 | * outdated PostgreSQL statistics |
| | 1401 | |
| | 1402 | === 5.8 Validation of Index Usage === |
| | 1403 | |
| | 1404 | After creating indexes, index usage can be checked using: |
| | 1405 | |
| | 1406 | {{{ |
| | 1407 | #!sql |
| | 1408 | SELECT |
| | 1409 | indexrelname AS index_name, |
| | 1410 | idx_scan AS times_used, |
| | 1411 | idx_tup_read AS tuples_read, |
| | 1412 | idx_tup_fetch AS tuples_fetched |
| | 1413 | |
| | 1414 | FROM pg_stat_user_indexes |
| | 1415 | |
| | 1416 | WHERE indexrelname IN ( |
| | 1417 | 'idx_attendance_event', |
| | 1418 | 'idx_guest_wedding', |
| | 1419 | 'idx_event_rsvp_guest', |
| | 1420 | 'idx_venue_booking_wedding' |
| | 1421 | ); |
| | 1422 | }}} |
| | 1423 | |
| | 1424 | If `idx_scan` increases after report execution, the index is being used. |
| | 1425 | |
| | 1426 | If `idx_scan` remains 0, the index may be unused or the query planner may prefer another execution strategy. |
| | 1427 | |
| | 1428 | === 5.9 Summary === |
| | 1429 | |
| | 1430 | EXPLAIN ANALYZE is essential for validating the performance of the Phase 6 analytical reports. |
| | 1431 | |
| | 1432 | It helps confirm whether: |
| | 1433 | * indexes are used correctly |
| | 1434 | * joins are efficient |
| | 1435 | * aggregation is acceptable |
| | 1436 | * execution time is reasonable |
| | 1437 | |
| | 1438 | This makes EXPLAIN ANALYZE an important part of database performance tuning and report optimization. |
| 1057 | | |
| 1058 | | CREATE TRIGGER trg_audit_guest |
| 1059 | | AFTER INSERT OR UPDATE OR DELETE ON guest |
| 1060 | | FOR EACH ROW EXECUTE FUNCTION fn_audit_guest(); |
| 1061 | | }}} |
| 1062 | | |
| 1063 | | ==== Validation ==== |
| 1064 | | |
| 1065 | | {{{ |
| 1066 | | #!sql |
| 1067 | | -- Verify audit records are being created correctly |
| 1068 | | UPDATE guest SET dietary_preference = 'Vegan' WHERE guest_id = 55; |
| 1069 | | |
| 1070 | | SELECT table_name, operation, record_id, changed_by, changed_at, new_values |
| 1071 | | FROM audit_log |
| 1072 | | WHERE table_name = 'guest' |
| 1073 | | AND record_id = 55 |
| 1074 | | ORDER BY changed_at DESC |
| 1075 | | LIMIT 5; |
| 1076 | | }}} |
| 1077 | | |
| 1078 | | ---- |
| 1079 | | |
| 1080 | | == 7. Monitoring, Observability & Alerting == |
| 1081 | | |
| 1082 | | === 7.1 Key Database Metrics to Track === |
| 1083 | | |
| 1084 | | * Query latency percentiles: P50, P95, P99 |
| 1085 | | * Lock wait times and deadlock frequency |
| 1086 | | * Buffer cache hit ratio (target: > 99%) |
| 1087 | | * Slow query log (threshold: > 500ms) |
| 1088 | | * Index utilization ratios per table |
| 1089 | | * Connection pool saturation |
| 1090 | | |
| 1091 | | === 7.2 Useful Monitoring Queries === |
| 1092 | | |
| 1093 | | {{{ |
| 1094 | | #!sql |
| 1095 | | -- Identify the slowest queries currently running |
| 1096 | | SELECT |
| 1097 | | pid, |
| 1098 | | now() - pg_stat_activity.query_start AS duration, |
| 1099 | | query, |
| 1100 | | state |
| 1101 | | FROM pg_stat_activity |
| 1102 | | WHERE state = 'active' |
| 1103 | | AND now() - pg_stat_activity.query_start > INTERVAL '5 seconds' |
| 1104 | | ORDER BY duration DESC; |
| 1105 | | |
| 1106 | | -- Check buffer cache hit ratio (should be > 99%) |
| 1107 | | SELECT |
| 1108 | | SUM(heap_blks_hit) AS cache_hits, |
| 1109 | | SUM(heap_blks_read) AS disk_reads, |
| 1110 | | ROUND( |
| 1111 | | 100.0 * SUM(heap_blks_hit) |
| 1112 | | / NULLIF(SUM(heap_blks_hit) + SUM(heap_blks_read), 0), 2 |
| 1113 | | ) AS cache_hit_ratio |
| 1114 | | FROM pg_statio_user_tables; |
| 1115 | | |
| 1116 | | -- Identify unused indexes (candidates for removal) |
| 1117 | | SELECT |
| 1118 | | schemaname, |
| 1119 | | tablename, |
| 1120 | | indexname, |
| 1121 | | pg_size_pretty(pg_relation_size(indexrelid)) AS index_size, |
| 1122 | | idx_scan |
| 1123 | | FROM pg_stat_user_indexes |
| 1124 | | WHERE idx_scan = 0 |
| 1125 | | ORDER BY pg_relation_size(indexrelid) DESC; |
| 1126 | | }}} |
| 1127 | | |
| 1128 | | === 7.3 Alerting Rules === |
| 1129 | | |
| 1130 | | Trigger alerts when: |
| 1131 | | |
| 1132 | | * Deadlock frequency exceeds 5 per minute |
| 1133 | | * Replication lag exceeds 30 seconds |
| 1134 | | * CPU utilization exceeds 80% for more than 2 minutes |
| 1135 | | * I/O queue depth exceeds defined threshold |
| 1136 | | * Buffer cache hit ratio drops below 95% |
| 1137 | | |
| 1138 | | ---- |
| 1139 | | |
| 1140 | | |
| 1141 | | == 8. Conclusion == |
| 1142 | | |
| 1143 | | Phase 9 elevates the Wedding Planner database into a high-performance, secure, and |
| 1144 | | scalable enterprise platform. By combining: |
| 1145 | | |
| 1146 | | * Targeted and well-maintained indexing (single-column, composite, partial, GIN, expression) |
| 1147 | | * Optimized JOIN ordering, early predicate filtering, and query rewrites |
| 1148 | | * Multi-tier caching and horizontal partitioning for large-scale data |
| 1149 | | * Strict, multi-layer security (encryption, RBAC, RLS, audit logging, parameterized queries) |
| 1150 | | * Full GDPR-compliant data protection practices |
| 1151 | | * Comprehensive performance analysis with `EXPLAIN ANALYZE` and real benchmarking |
| 1152 | | |
| 1153 | | The system achieves: |
| 1154 | | |
| 1155 | | * Fast, consistent record retrieval and report generation |
| 1156 | | * Secure and compliant handling of sensitive personal and financial data |
| 1157 | | * Stable and predictable performance at scale |
| 1158 | | * A solid, well-documented foundation for the enterprise-grade phases ahead |
| | 1982 | }}} |
| | 1983 | |
| | 1984 | === Example Trigger === |
| | 1985 | |
| | 1986 | {{{ |
| | 1987 | #!sql |
| | 1988 | CREATE TRIGGER trg_guest_audit |
| | 1989 | |
| | 1990 | AFTER INSERT OR UPDATE |
| | 1991 | ON guest |
| | 1992 | |
| | 1993 | FOR EACH ROW |
| | 1994 | |
| | 1995 | EXECUTE FUNCTION log_guest_changes(); |
| | 1996 | }}} |
| | 1997 | |
| | 1998 | === Validation === |
| | 1999 | |
| | 2000 | {{{ |
| | 2001 | #!sql |
| | 2002 | SELECT * |
| | 2003 | FROM audit_log; |
| | 2004 | }}} |
| | 2005 | |
| | 2006 | Expected result: |
| | 2007 | * logged INSERT and UPDATE operations on guest records |
| | 2008 | |
| | 2009 | === Security Benefits === |
| | 2010 | |
| | 2011 | Audit logging improves: |
| | 2012 | * monitoring of sensitive changes |
| | 2013 | * accountability of database operations |
| | 2014 | * recovery investigation |
| | 2015 | * detection of suspicious activity |
| | 2016 | |
| | 2017 | It is especially important for: |
| | 2018 | * guest modifications |
| | 2019 | * RSVP changes |
| | 2020 | * booking updates |
| | 2021 | * attendance management |
| | 2022 | |
| | 2023 | == 7. Final Conclusions == |
| | 2024 | |
| | 2025 | Phase 9 extends the Wedding Planner Management System with advanced database engineering concepts focused on: |
| | 2026 | * performance analysis |
| | 2027 | * query optimization |
| | 2028 | * scalability |
| | 2029 | * transaction management |
| | 2030 | * security |
| | 2031 | * analytical workload optimization |
| | 2032 | |
| | 2033 | Unlike previous phases that focused primarily on schema design and transactional functionality, this phase evaluates how the database behaves under complex analytical workloads generated by the Phase 6 reports. |
| | 2034 | |
| | 2035 | The analysis focused on: |
| | 2036 | * Budget Analysis |
| | 2037 | * Venue Capacity Utilization |
| | 2038 | * RSVP Conversion Analysis |
| | 2039 | |
| | 2040 | These reports were analyzed using: |
| | 2041 | * EXPLAIN ANALYZE |
| | 2042 | * execution-plan interpretation |
| | 2043 | * indexing strategies |
| | 2044 | * aggregation analysis |
| | 2045 | * scalability evaluation |
| | 2046 | |
| | 2047 | The identified performance bottlenecks include: |
| | 2048 | * multiple LEFT JOIN operations |
| | 2049 | * COUNT(DISTINCT ...) aggregation |
| | 2050 | * GROUP BY operations |
| | 2051 | * temporal calculations |
| | 2052 | * analytical workload complexity |
| | 2053 | |
| | 2054 | The proposed indexing strategy improves: |
| | 2055 | * JOIN efficiency |
| | 2056 | * aggregation preparation |
| | 2057 | * analytical reporting speed |
| | 2058 | * scalability for larger datasets |
| | 2059 | |
| | 2060 | The phase also introduced: |
| | 2061 | * materialized views |
| | 2062 | * partitioning strategies |
| | 2063 | * transaction isolation analysis |
| | 2064 | * row-level locking |
| | 2065 | * role-based access control |
| | 2066 | * row-level security |
| | 2067 | * encryption strategies |
| | 2068 | * audit logging |
| | 2069 | * backup and disaster recovery planning |
| | 2070 | |
| | 2071 | Together, these techniques transform the Wedding Planner Management System from a simple transactional database into a scalable analytical database platform capable of supporting: |
| | 2072 | * operational reporting |
| | 2073 | * performance monitoring |
| | 2074 | * analytical decision-making |
| | 2075 | * secure multi-user access |
| | 2076 | |
| | 2077 | == Final Technical Evaluation == |
| | 2078 | |
| | 2079 | The implementation demonstrates: |
| | 2080 | * advanced PostgreSQL usage |
| | 2081 | * analytical SQL optimization |
| | 2082 | * transaction-safe database operations |
| | 2083 | * scalable reporting architecture |
| | 2084 | * secure relational database design |
| | 2085 | |
| | 2086 | The Phase 6 analytical reports were successfully integrated into the Phase 9 performance analysis workflow, providing realistic optimization and scalability evaluation over the actual project workload. |
| | 2087 | |
| | 2088 | == Final Notes == |
| | 2089 | |
| | 2090 | All optimization strategies, indexes, security mechanisms, and execution analyses were designed and validated using PostgreSQL 15. |
| | 2091 | |
| | 2092 | The implementation illustrates how modern relational database systems support both: |
| | 2093 | * transactional processing |
| | 2094 | * analytical business intelligence workloads |
| | 2095 | |
| | 2096 | within a unified Wedding Planner Management System. |