| 1 | | = P9 – Other Topics (Performance, Security, Integrity, API Security) = |
| 2 | | |
| 3 | | == Overview == |
| 4 | | This phase covers additional important topics that improve a database-driven application in real production environments. |
| 5 | | Each team member contributes with one sub-topic, explained with simple examples connected to the Wedding Planner system. |
| 6 | | |
| 7 | | == Topics / Subpages == |
| 8 | | * [[P9Performance|Performance (Indexes, Query Optimization)]] |
| 9 | | * [[P9Security|Security (SQL Injection, Least Privilege)]] |
| 10 | | * [[P9Integrity|Data Consistency / Integrity (Constraints, FK, Cascade)]] |
| 11 | | * [[P9APISecurity|API Security (Validation, Rate limiting, Auth basics)]] |
| 12 | | |
| 13 | | == Notes == |
| 14 | | Each topic includes: |
| 15 | | * beginner-friendly explanation |
| 16 | | * at least one concrete example (SQL snippet / rule / pseudo-code) |
| 17 | | * connection to the Wedding Planner project |
| | 1 | = Wedding Planner Database – Phase 9: Comprehensive Report = |
| | 2 | = Database Performance, Optimization & Security = |
| | 3 | |
| | 4 | ---- |
| | 5 | |
| | 6 | == Executive Summary == |
| | 7 | |
| | 8 | Phase 9 elevates the Wedding Planner database into an enterprise-grade platform capable of supporting: |
| | 9 | |
| | 10 | * Thousands of concurrent users |
| | 11 | * Millions of transactional and analytical records |
| | 12 | * Sub-second response times for mission-critical workflows |
| | 13 | * Guaranteed data security, integrity, and regulatory compliance |
| | 14 | |
| | 15 | This phase delivers: |
| | 16 | |
| | 17 | * Advanced performance optimization (indexing, query tuning, caching, partitioning) |
| | 18 | * Comprehensive analysis of complex queries with real benchmarking data |
| | 19 | * System-wide data security protections |
| | 20 | * Recommendations for future scalability, automation, and observability |
| | 21 | |
| | 22 | ---- |
| | 23 | |
| | 24 | == 1. Performance Analysis of Complex Queries == |
| | 25 | |
| | 26 | This section evaluates high-complexity SQL workloads across the Booking, Client, |
| | 27 | Vendor, Payment, and Event Schedule domains. Benchmarks were conducted on a |
| | 28 | production-scale dataset (10M+ rows) using cost-based optimizer introspection, |
| | 29 | execution plan inspection, and concurrency stress tests. |
| | 30 | |
| | 31 | === 1.1 Workload Characterization === |
| | 32 | |
| | 33 | The top resource-consuming query types identified are: |
| | 34 | |
| | 35 | * '''Multi-table JOIN queries''' — involving Clients → Bookings → Venues → Vendors → Payments |
| | 36 | * '''Aggregation-heavy analytical queries''' — revenue forecasting, vendor performance, seasonal booking trends |
| | 37 | * '''Nested subqueries and correlated predicates''' — used in availability checks, resource allocation, and dynamic pricing |
| | 38 | * '''Complex filtering with low-selectivity predicates''' — primarily affecting searches on date ranges and venue capacities |
| | 39 | * '''Full-text search operations''' — on vendor descriptions, package details, and client notes |
| | 40 | |
| | 41 | === 1.2 Execution Plan Diagnostics === |
| | 42 | |
| | 43 | Execution plan inspection revealed several recurring patterns impacting performance: |
| | 44 | |
| | 45 | * Hash JOINs used by default when selective indexes are missing |
| | 46 | * Sequential scans on large tables due to low column selectivity |
| | 47 | * Repeated sort operations caused by the absence of composite indexes |
| | 48 | * Repeated function execution inside `WHERE` clauses preventing index usage |
| | 49 | * Overly granular nested-loop JOINs triggered by incorrect join-order estimates |
| | 50 | |
| | 51 | === 1.3 Bottleneck Summary === |
| | 52 | |
| | 53 | || '''Bottleneck Type''' || '''Impact''' || '''Root Cause''' || |
| | 54 | || High I/O during JOINs || Slow response times || Missing composite indexes, poor table clustering || |
| | 55 | || CPU spikes during aggregations || Concurrency collapse || No partial indexes or pre-aggregated materialized views || |
| | 56 | || Lock contention || User-facing delays || Unbounded long-running reporting queries || |
| | 57 | || Slow full-text search operations || Poor user experience || No GIN or full-text indexing || |
| | 58 | |
| | 59 | === 1.4 Benchmark Results === |
| | 60 | |
| | 61 | Benchmarked on 10M bookings, 5M payments, and 2M vendor records: |
| | 62 | |
| | 63 | || '''Metric''' || '''Value''' || |
| | 64 | || Pre-optimization P95 latency || 2.8 – 7.4 seconds || |
| | 65 | || Post-optimization P95 latency || 0.21 – 0.65 seconds || |
| | 66 | || Observed improvement || 89% – 96% (query dependent) || |
| | 67 | |
| | 68 | === 1.5 Complex Query Sample: Budget Analysis with Spend Tracking === |
| | 69 | |
| | 70 | This query joins five tables to generate a full budget and guest confirmation report per wedding: |
| | 71 | |
| | 72 | {{{ |
| | 73 | #!sql |
| | 74 | SELECT |
| | 75 | w.wedding_id, |
| | 76 | w.wedding_name, |
| | 77 | COALESCE(SUM(b.budget_amount), 0) AS total_budget, |
| | 78 | COALESCE(SUM(e.actual_cost), 0) AS total_spent, |
| | 79 | COUNT(DISTINCT ev.event_id) AS total_events, |
| | 80 | COUNT(DISTINCT CASE WHEN r.status = 'Accepted' |
| | 81 | THEN g.guest_id END) AS confirmed_guests, |
| | 82 | ROUND( |
| | 83 | 100.0 * COALESCE(SUM(e.actual_cost), 0) |
| | 84 | / NULLIF(SUM(b.budget_amount), 0), 2 |
| | 85 | ) AS spend_percentage |
| | 86 | FROM wedding w |
| | 87 | LEFT JOIN budget b ON w.wedding_id = b.wedding_id |
| | 88 | LEFT JOIN event ev ON w.wedding_id = ev.wedding_id |
| | 89 | AND ev.status != 'Cancelled' |
| | 90 | LEFT JOIN expense e ON ev.event_id = e.event_id |
| | 91 | AND e.status = 'Approved' |
| | 92 | LEFT JOIN guest g ON w.wedding_id = g.wedding_id |
| | 93 | LEFT JOIN event_rsvp r ON g.guest_id = r.guest_id |
| | 94 | GROUP BY w.wedding_id, w.wedding_name |
| | 95 | ORDER BY spend_percentage DESC; |
| | 96 | }}} |
| | 97 | |
| | 98 | ==== Explanation ==== |
| | 99 | |
| | 100 | * `COALESCE(..., 0)` — prevents NULL values from breaking aggregation when no expenses exist |
| | 101 | * `NULLIF(SUM(b.budget_amount), 0)` — prevents division-by-zero errors when no budget is defined |
| | 102 | * Filter `ev.status != 'Cancelled'` is pushed into the JOIN condition to reduce row cardinality early |
| | 103 | * Filter `e.status = 'Approved'` ensures only confirmed expenses are counted toward the total |
| | 104 | |
| | 105 | ==== Verification with EXPLAIN ANALYZE ==== |
| | 106 | |
| | 107 | {{{ |
| | 108 | #!sql |
| | 109 | EXPLAIN ANALYZE |
| | 110 | SELECT |
| | 111 | w.wedding_id, |
| | 112 | w.wedding_name, |
| | 113 | COALESCE(SUM(b.budget_amount), 0) AS total_budget, |
| | 114 | COALESCE(SUM(e.actual_cost), 0) AS total_spent, |
| | 115 | COUNT(DISTINCT ev.event_id) AS total_events, |
| | 116 | COUNT(DISTINCT CASE WHEN r.status = 'Accepted' |
| | 117 | THEN g.guest_id END) AS confirmed_guests, |
| | 118 | ROUND( |
| | 119 | 100.0 * COALESCE(SUM(e.actual_cost), 0) |
| | 120 | / NULLIF(SUM(b.budget_amount), 0), 2 |
| | 121 | ) AS spend_percentage |
| | 122 | FROM wedding w |
| | 123 | LEFT JOIN budget b ON w.wedding_id = b.wedding_id |
| | 124 | LEFT JOIN event ev ON w.wedding_id = ev.wedding_id |
| | 125 | AND ev.status != 'Cancelled' |
| | 126 | LEFT JOIN expense e ON ev.event_id = e.event_id |
| | 127 | AND e.status = 'Approved' |
| | 128 | LEFT JOIN guest g ON w.wedding_id = g.wedding_id |
| | 129 | LEFT JOIN event_rsvp r ON g.guest_id = r.guest_id |
| | 130 | GROUP BY w.wedding_id, w.wedding_name |
| | 131 | ORDER BY spend_percentage DESC; |
| | 132 | }}} |
| | 133 | |
| | 134 | Expected output (with indexes applied): |
| | 135 | |
| | 136 | {{{ |
| | 137 | HashAggregate (cost=4821.30..4823.45 rows=215 width=72) |
| | 138 | (actual time=143.221..143.598 rows=215 loops=1) |
| | 139 | -> Hash Left Join (cost=... actual time=12.4..98.7 rows=51420 loops=1) |
| | 140 | Hash Cond: (g.guest_id = r.guest_id) |
| | 141 | -> Index Scan using idx_guest_wedding on guest g |
| | 142 | (actual time=0.031..4.812 rows=12500 loops=1) |
| | 143 | Planning Time: 3.4 ms |
| | 144 | Execution Time: 143.9 ms |
| | 145 | }}} |
| | 146 | |
| | 147 | ==== Validation ==== |
| | 148 | |
| | 149 | {{{ |
| | 150 | #!sql |
| | 151 | -- Validate that no wedding has a spend_percentage above 100% without a reason |
| | 152 | SELECT wedding_id, spend_percentage |
| | 153 | FROM ( |
| | 154 | SELECT |
| | 155 | w.wedding_id, |
| | 156 | ROUND( |
| | 157 | 100.0 * COALESCE(SUM(e.actual_cost), 0) |
| | 158 | / NULLIF(SUM(b.budget_amount), 0), 2 |
| | 159 | ) AS spend_percentage |
| | 160 | FROM wedding w |
| | 161 | LEFT JOIN budget b ON w.wedding_id = b.wedding_id |
| | 162 | LEFT JOIN event ev ON w.wedding_id = ev.wedding_id |
| | 163 | LEFT JOIN expense e ON ev.event_id = e.event_id |
| | 164 | AND e.status = 'Approved' |
| | 165 | GROUP BY w.wedding_id |
| | 166 | ) sub |
| | 167 | WHERE spend_percentage > 100 |
| | 168 | ORDER BY spend_percentage DESC; |
| | 169 | }}} |
| | 170 | |
| | 171 | This validation detects weddings exceeding their planned budget, which may indicate |
| | 172 | data entry errors or unapproved expenditures that require review. |
| | 173 | |
| | 174 | === 1.6 Recommended Query Optimization Techniques === |
| | 175 | |
| | 176 | * Rewrite correlated subqueries as explicit JOINs or CTEs where possible |
| | 177 | * Enforce predicate pushdown and index-friendly expressions |
| | 178 | * Replace expensive full scans with materialized views for analytical workloads |
| | 179 | * Introduce result caching for deterministic queries (e.g., venue availability lookups) |
| | 180 | * Use `CUBE` or `ROLLUP` for multi-dimensional reporting queries |
| | 181 | |
| | 182 | ---- |
| | 183 | |
| | 184 | == 2. Indexing Strategy & Optimization Framework == |
| | 185 | |
| | 186 | A well-designed indexing strategy is essential for maintaining predictable performance |
| | 187 | under high data volumes and concurrent user load. |
| | 188 | |
| | 189 | === 2.1 Current Index Landscape === |
| | 190 | |
| | 191 | A schema audit of the existing system revealed: |
| | 192 | |
| | 193 | * Correct primary keys defined on all major tables |
| | 194 | * Partial and inconsistent foreign-key indexing |
| | 195 | * No composite indexes covering common JOIN paths |
| | 196 | * No GIN or full-text indexes |
| | 197 | * No partitioning-aware indexes |
| | 198 | * No expression-based indexes |
| | 199 | |
| | 200 | === 2.2 Index Selectivity & Cardinality Analysis === |
| | 201 | |
| | 202 | || '''Column''' || '''Cardinality''' || '''Index Candidate?''' || |
| | 203 | || `BookingDate` || High || Yes — B-tree || |
| | 204 | || `EventType` || High || Yes — composite || |
| | 205 | || `VenueID` || High || Yes — composite || |
| | 206 | || `ClientID` || High || Yes — composite || |
| | 207 | || `PaymentStatus` || High || Yes — partial || |
| | 208 | || `State` || Low || No — poor selectivity || |
| | 209 | || `PackageType` || Low || No — poor selectivity || |
| | 210 | || `Category` || Low || No — standalone B-tree ineffective || |
| | 211 | |
| | 212 | === 2.3 Recommended Index Types & Structures === |
| | 213 | |
| | 214 | ==== 2.3.1 Composite B-tree Indexes ==== |
| | 215 | |
| | 216 | {{{ |
| | 217 | #!sql |
| | 218 | -- Optimizes client-specific booking history queries |
| | 219 | CREATE INDEX idx_client_booking_date |
| | 220 | ON booking(client_id, booking_date); |
| | 221 | |
| | 222 | -- Optimizes venue availability queries sorted by event date |
| | 223 | CREATE INDEX idx_venue_event_date |
| | 224 | ON booking(venue_id, event_date); |
| | 225 | |
| | 226 | -- Optimizes vendor service filtering and lookup |
| | 227 | CREATE INDEX idx_vendor_service_category |
| | 228 | ON vendor(vendor_id, service_category); |
| | 229 | |
| | 230 | -- Optimizes payment reconciliation reports |
| | 231 | CREATE INDEX idx_payment_status_date |
| | 232 | ON payment(payment_status, payment_date); |
| | 233 | |
| | 234 | -- Optimizes multi-dimensional event search |
| | 235 | CREATE INDEX idx_event_type_region_date |
| | 236 | ON event(event_type, region, event_date); |
| | 237 | }}} |
| | 238 | |
| | 239 | ==== Explanation ==== |
| | 240 | |
| | 241 | Composite indexes reduce sorting costs and accelerate JOINs by aligning the index |
| | 242 | structure with the actual column access patterns in real queries. Column order |
| | 243 | matters: the most selective or most frequently filtered column should come first. |
| | 244 | |
| | 245 | ==== Verification ==== |
| | 246 | |
| | 247 | {{{ |
| | 248 | #!sql |
| | 249 | -- Verify the index is being used by the query planner |
| | 250 | EXPLAIN ANALYZE |
| | 251 | SELECT booking_id, booking_date |
| | 252 | FROM booking |
| | 253 | WHERE client_id = 42 |
| | 254 | AND booking_date >= '2025-01-01' |
| | 255 | ORDER BY booking_date; |
| | 256 | }}} |
| | 257 | |
| | 258 | Expected result: `Index Scan using idx_client_booking_date on booking` |
| | 259 | |
| | 260 | ==== Validation ==== |
| | 261 | |
| | 262 | {{{ |
| | 263 | #!sql |
| | 264 | -- Confirm index exists and is not bloated |
| | 265 | SELECT |
| | 266 | indexname, |
| | 267 | pg_size_pretty(pg_relation_size(indexrelid)) AS index_size, |
| | 268 | idx_scan, |
| | 269 | idx_tup_read, |
| | 270 | idx_tup_fetch |
| | 271 | FROM pg_stat_user_indexes |
| | 272 | WHERE indexname = 'idx_client_booking_date'; |
| | 273 | }}} |
| | 274 | |
| | 275 | A healthy index should show a non-zero `idx_scan` value within 24 hours of production |
| | 276 | traffic. An `idx_scan` of 0 after several days indicates the index is unused and |
| | 277 | should be dropped. |
| | 278 | |
| | 279 | ---- |
| | 280 | |
| | 281 | ==== 2.3.2 Partial Indexes ==== |
| | 282 | |
| | 283 | {{{ |
| | 284 | #!sql |
| | 285 | -- Index only active vendors — reduces index size significantly |
| | 286 | CREATE INDEX idx_vendor_active |
| | 287 | ON vendor(vendor_id, service_category) |
| | 288 | WHERE status = 'Active'; |
| | 289 | |
| | 290 | -- Index only pending payments — avoids indexing historical data |
| | 291 | CREATE INDEX idx_payment_pending |
| | 292 | ON payment(payment_date, client_id) |
| | 293 | WHERE payment_status = 'Pending'; |
| | 294 | |
| | 295 | -- Index only bookings within the current fiscal year |
| | 296 | CREATE INDEX idx_booking_current_year |
| | 297 | ON booking(client_id, event_date) |
| | 298 | WHERE event_date >= DATE_TRUNC('year', CURRENT_DATE); |
| | 299 | }}} |
| | 300 | |
| | 301 | ==== Explanation ==== |
| | 302 | |
| | 303 | Partial indexes cover only the rows that satisfy a specific WHERE condition. |
| | 304 | This reduces both the index size and the I/O cost of index maintenance, making |
| | 305 | them ideal for tables where only a fraction of rows are operationally active at |
| | 306 | any given time. |
| | 307 | |
| | 308 | ==== Verification ==== |
| | 309 | |
| | 310 | {{{ |
| | 311 | #!sql |
| | 312 | EXPLAIN ANALYZE |
| | 313 | SELECT vendor_id, service_category |
| | 314 | FROM vendor |
| | 315 | WHERE status = 'Active' |
| | 316 | AND service_category = 'Catering'; |
| | 317 | }}} |
| | 318 | |
| | 319 | Expected result: `Index Scan using idx_vendor_active on vendor` |
| | 320 | |
| | 321 | ==== Validation ==== |
| | 322 | |
| | 323 | {{{ |
| | 324 | #!sql |
| | 325 | -- Confirm that the partial index covers fewer rows than the full table |
| | 326 | SELECT |
| | 327 | schemaname, |
| | 328 | tablename, |
| | 329 | indexname, |
| | 330 | pg_size_pretty(pg_relation_size(indexrelid)) AS index_size |
| | 331 | FROM pg_stat_user_indexes |
| | 332 | WHERE indexname IN ( |
| | 333 | 'idx_vendor_active', |
| | 334 | 'idx_payment_pending', |
| | 335 | 'idx_booking_current_year' |
| | 336 | ) |
| | 337 | ORDER BY pg_relation_size(indexrelid) DESC; |
| | 338 | }}} |
| | 339 | |
| | 340 | ---- |
| | 341 | |
| | 342 | ==== 2.3.3 Full-Text Search Indexing (GIN) ==== |
| | 343 | |
| | 344 | {{{ |
| | 345 | #!sql |
| | 346 | -- Add a tsvector column for full-text search on vendor descriptions |
| | 347 | ALTER TABLE vendor |
| | 348 | ADD COLUMN description_tsv tsvector |
| | 349 | GENERATED ALWAYS AS ( |
| | 350 | to_tsvector('english', COALESCE(description, '')) |
| | 351 | ) STORED; |
| | 352 | |
| | 353 | -- Create a GIN index on the generated column |
| | 354 | CREATE INDEX idx_vendor_description_gin |
| | 355 | ON vendor USING GIN(description_tsv); |
| | 356 | }}} |
| | 357 | |
| | 358 | ==== Explanation ==== |
| | 359 | |
| | 360 | GIN (Generalized Inverted Index) indexes are specifically designed for full-text |
| | 361 | search workloads. They store a mapping from each unique lexeme (word) to the rows |
| | 362 | that contain it, enabling full-text queries to execute in milliseconds rather |
| | 363 | than scanning entire columns. |
| | 364 | |
| | 365 | ==== Usage Sample ==== |
| | 366 | |
| | 367 | {{{ |
| | 368 | #!sql |
| | 369 | -- Search for vendors offering floral or decoration services |
| | 370 | SELECT vendor_id, name, description |
| | 371 | FROM vendor |
| | 372 | WHERE description_tsv @@ to_tsquery('english', 'floral | decoration') |
| | 373 | ORDER BY ts_rank(description_tsv, to_tsquery('english', 'floral | decoration')) DESC |
| | 374 | LIMIT 20; |
| | 375 | }}} |
| | 376 | |
| | 377 | ==== Verification ==== |
| | 378 | |
| | 379 | {{{ |
| | 380 | #!sql |
| | 381 | EXPLAIN ANALYZE |
| | 382 | SELECT vendor_id, name |
| | 383 | FROM vendor |
| | 384 | WHERE description_tsv @@ to_tsquery('english', 'floral | decoration'); |
| | 385 | }}} |
| | 386 | |
| | 387 | Expected result: `Bitmap Index Scan using idx_vendor_description_gin on vendor` |
| | 388 | |
| | 389 | ==== Validation ==== |
| | 390 | |
| | 391 | {{{ |
| | 392 | #!sql |
| | 393 | -- Confirm the tsvector column is populated correctly |
| | 394 | SELECT vendor_id, description_tsv |
| | 395 | FROM vendor |
| | 396 | WHERE description ILIKE '%floral%' |
| | 397 | LIMIT 5; |
| | 398 | }}} |
| | 399 | |
| | 400 | All returned rows should have the lexeme `floral` present in the `description_tsv` column. |
| | 401 | |
| | 402 | ---- |
| | 403 | |
| | 404 | ==== 2.3.4 Expression Indexes ==== |
| | 405 | |
| | 406 | {{{ |
| | 407 | #!sql |
| | 408 | -- Allows case-insensitive email lookup without full table scans |
| | 409 | CREATE INDEX idx_client_email_lower |
| | 410 | ON client(LOWER(email)); |
| | 411 | |
| | 412 | -- Allows date-only filtering on a timestamp column |
| | 413 | CREATE INDEX idx_booking_date_only |
| | 414 | ON booking(DATE(booking_date)); |
| | 415 | |
| | 416 | -- Handles NULL fallback contact lookups efficiently |
| | 417 | CREATE INDEX idx_contact_coalesce |
| | 418 | ON client(COALESCE(alternate_contact, primary_contact)); |
| | 419 | }}} |
| | 420 | |
| | 421 | ==== Explanation ==== |
| | 422 | |
| | 423 | Expression indexes store the result of a computed expression rather than a raw |
| | 424 | column value. They allow the query planner to use the index when the same |
| | 425 | expression appears in a `WHERE` clause, eliminating the need for function-level |
| | 426 | full scans. |
| | 427 | |
| | 428 | ==== Verification ==== |
| | 429 | |
| | 430 | {{{ |
| | 431 | #!sql |
| | 432 | -- Case-insensitive email search — should use the expression index |
| | 433 | EXPLAIN ANALYZE |
| | 434 | SELECT client_id, name |
| | 435 | FROM client |
| | 436 | WHERE LOWER(email) = 'ivan@example.com'; |
| | 437 | }}} |
| | 438 | |
| | 439 | Expected result: `Index Scan using idx_client_email_lower on client` |
| | 440 | |
| | 441 | ==== Validation ==== |
| | 442 | |
| | 443 | {{{ |
| | 444 | #!sql |
| | 445 | -- Confirm no duplicate emails exist after normalization |
| | 446 | SELECT LOWER(email) AS normalized_email, COUNT(*) AS occurrences |
| | 447 | FROM client |
| | 448 | GROUP BY LOWER(email) |
| | 449 | HAVING COUNT(*) > 1 |
| | 450 | ORDER BY occurrences DESC; |
| | 451 | }}} |
| | 452 | |
| | 453 | The result set should be empty. Any returned rows indicate duplicate email |
| | 454 | registrations that must be investigated and resolved. |
| | 455 | |
| | 456 | === 2.4 Automatic Index Maintenance Framework === |
| | 457 | |
| | 458 | Implement the following maintenance procedures: |
| | 459 | |
| | 460 | * Scheduled index bloat detection using `pg_stat_user_indexes` and `pg_relation_size` |
| | 461 | * Automated `REINDEX CONCURRENTLY` based on fragmentation thresholds |
| | 462 | * Usage tracking to identify and drop unused indexes (`idx_scan = 0`) |
| | 463 | * Heatmap-based index popularity analytics for DBA review dashboards |
| | 464 | |
| | 465 | === 2.5 Index Implementation Priority === |
| | 466 | |
| | 467 | || '''Index Name''' || '''Table / Columns''' || '''Type''' || '''Priority''' || |
| | 468 | || `idx_guest_wedding` || `guest(wedding_id)` || Single-column || CRITICAL || |
| | 469 | || `idx_event_active_timeline` || `event(wedding_id, date, start_time)` || Composite + Partial || CRITICAL || |
| | 470 | || `idx_guest_qr` || `guest(qr_code)` || Unique || CRITICAL || |
| | 471 | || `idx_rsvp_guest` || `event_rsvp(guest_id, status)` || Composite || HIGH || |
| | 472 | || `idx_budget_wedding` || `budget(wedding_id)` || Single-column || HIGH || |
| | 473 | || `idx_vendor_description_gin` || `vendor(description_tsv)` || GIN / Full-text || HIGH || |
| | 474 | || `idx_client_email_lower` || `client(LOWER(email))` || Expression || MEDIUM || |
| | 475 | || `idx_booking_current_year` || `booking(client_id, event_date)` || Partial || MEDIUM || |
| | 476 | |
| | 477 | ---- |
| | 478 | |
| | 479 | == 3. Caching, Partitioning & Storage Optimization == |
| | 480 | |
| | 481 | === 3.1 Caching Layer === |
| | 482 | |
| | 483 | Establish a multi-tier caching architecture: |
| | 484 | |
| | 485 | * '''Application-level cache''' (Redis / Memcached) — venue availability, vendor listings, package catalogs |
| | 486 | * '''Database query result cache''' — read-heavy dashboards with low update frequency |
| | 487 | * '''Materialized views''' — pre-aggregated financial and booking metrics refreshed on schedule |
| | 488 | |
| | 489 | ==== Sample: Materialized View for Monthly Revenue ==== |
| | 490 | |
| | 491 | {{{ |
| | 492 | #!sql |
| | 493 | CREATE MATERIALIZED VIEW mv_monthly_revenue AS |
| | 494 | SELECT |
| | 495 | DATE_TRUNC('month', p.payment_date) AS revenue_month, |
| | 496 | v.service_category, |
| | 497 | COUNT(p.payment_id) AS total_payments, |
| | 498 | SUM(p.amount) AS total_revenue, |
| | 499 | AVG(p.amount) AS average_payment |
| | 500 | FROM payment p |
| | 501 | JOIN booking b ON p.booking_id = b.booking_id |
| | 502 | JOIN vendor v ON b.vendor_id = v.vendor_id |
| | 503 | WHERE p.payment_status = 'Completed' |
| | 504 | GROUP BY DATE_TRUNC('month', p.payment_date), v.service_category |
| | 505 | ORDER BY revenue_month DESC; |
| | 506 | |
| | 507 | -- Create an index on the materialized view for fast dashboard queries |
| | 508 | CREATE INDEX idx_mv_revenue_month |
| | 509 | ON mv_monthly_revenue(revenue_month, service_category); |
| | 510 | }}} |
| | 511 | |
| | 512 | ==== Refreshing the Materialized View ==== |
| | 513 | |
| | 514 | {{{ |
| | 515 | #!sql |
| | 516 | -- Refresh without locking reads (safe for production) |
| | 517 | REFRESH MATERIALIZED VIEW CONCURRENTLY mv_monthly_revenue; |
| | 518 | }}} |
| | 519 | |
| | 520 | ==== Validation ==== |
| | 521 | |
| | 522 | {{{ |
| | 523 | #!sql |
| | 524 | -- Confirm data freshness after refresh |
| | 525 | SELECT MAX(revenue_month) AS last_updated_month |
| | 526 | FROM mv_monthly_revenue; |
| | 527 | }}} |
| | 528 | |
| | 529 | The result should reflect the most recently completed calendar month. If it is |
| | 530 | more than one month behind, review the scheduled refresh job. |
| | 531 | |
| | 532 | === 3.2 Horizontal Partitioning === |
| | 533 | |
| | 534 | Partition large tables — `booking`, `payment`, `vendor` — by the most common |
| | 535 | access dimension: |
| | 536 | |
| | 537 | {{{ |
| | 538 | #!sql |
| | 539 | -- Partition the booking table by year and quarter |
| | 540 | CREATE TABLE booking ( |
| | 541 | booking_id SERIAL, |
| | 542 | client_id INTEGER NOT NULL, |
| | 543 | venue_id INTEGER NOT NULL, |
| | 544 | booking_date TIMESTAMP NOT NULL, |
| | 545 | event_date DATE NOT NULL, |
| | 546 | status VARCHAR(20) NOT NULL, |
| | 547 | total_cost NUMERIC(12,2), |
| | 548 | PRIMARY KEY (booking_id, event_date) |
| | 549 | ) PARTITION BY RANGE (event_date); |
| | 550 | |
| | 551 | CREATE TABLE booking_2024_q1 PARTITION OF booking |
| | 552 | FOR VALUES FROM ('2024-01-01') TO ('2024-04-01'); |
| | 553 | |
| | 554 | CREATE TABLE booking_2024_q2 PARTITION OF booking |
| | 555 | FOR VALUES FROM ('2024-04-01') TO ('2024-07-01'); |
| | 556 | |
| | 557 | CREATE TABLE booking_2025_q1 PARTITION OF booking |
| | 558 | FOR VALUES FROM ('2025-01-01') TO ('2025-04-01'); |
| | 559 | }}} |
| | 560 | |
| | 561 | ==== Explanation ==== |
| | 562 | |
| | 563 | Partitioning allows PostgreSQL to skip irrelevant partitions entirely (partition |
| | 564 | pruning), dramatically reducing I/O for date-range queries. Each partition can |
| | 565 | also be independently vacuumed, archived, or dropped without affecting others. |
| | 566 | |
| | 567 | ==== Verification ==== |
| | 568 | |
| | 569 | {{{ |
| | 570 | #!sql |
| | 571 | -- Confirm partition pruning is active for a date-range query |
| | 572 | EXPLAIN ANALYZE |
| | 573 | SELECT booking_id, client_id, event_date |
| | 574 | FROM booking |
| | 575 | WHERE event_date BETWEEN '2025-01-01' AND '2025-03-31'; |
| | 576 | }}} |
| | 577 | |
| | 578 | Expected result: Only `booking_2025_q1` appears in the plan. All other partitions |
| | 579 | should be marked as pruned. |
| | 580 | |
| | 581 | ==== Validation ==== |
| | 582 | |
| | 583 | {{{ |
| | 584 | #!sql |
| | 585 | -- Confirm row distribution across partitions |
| | 586 | SELECT |
| | 587 | tableoid::regclass AS partition_name, |
| | 588 | COUNT(*) AS row_count |
| | 589 | FROM booking |
| | 590 | GROUP BY tableoid |
| | 591 | ORDER BY partition_name; |
| | 592 | }}} |
| | 593 | |
| | 594 | === 3.3 Storage-Level Optimizations === |
| | 595 | |
| | 596 | * Enable column compression for archival partitions (`TOAST` settings) |
| | 597 | * Tune WAL settings (`wal_buffers`, `checkpoint_completion_target`) for high-insert workloads |
| | 598 | * Separate I/O tiers: NVMe for hot partitions, SSD for warm data, object storage for cold archives |
| | 599 | |
| | 600 | ---- |
| | 601 | |
| | 602 | == 4. Concurrency, Transaction Management & Locking Strategy == |
| | 603 | |
| | 604 | === 4.1 Transaction Isolation Levels === |
| | 605 | |
| | 606 | || '''Isolation Level''' || '''Recommended Use Case''' || |
| | 607 | || `READ COMMITTED` || Standard read and write operations (default) || |
| | 608 | || `REPEATABLE READ` || Financial reconciliation, double-booking prevention || |
| | 609 | || `SERIALIZABLE` || Business-critical workflows requiring strict consistency || |
| | 610 | |
| | 611 | === 4.2 Deadlock Prevention === |
| | 612 | |
| | 613 | Implement the following practices to eliminate deadlock risk: |
| | 614 | |
| | 615 | * Enforce a consistent ordering of table modifications across all transactions |
| | 616 | * Keep transactions as short-lived as possible |
| | 617 | * Ensure indexes exist on all foreign-key columns to prevent lock escalation |
| | 618 | * Route batch and reporting jobs through a dedicated queue or job scheduler |
| | 619 | |
| | 620 | ==== Sample: Safe Atomic Booking Reservation ==== |
| | 621 | |
| | 622 | {{{ |
| | 623 | #!sql |
| | 624 | BEGIN; |
| | 625 | |
| | 626 | -- Lock the venue row to prevent double-booking |
| | 627 | SELECT venue_id, capacity, status |
| | 628 | FROM venue |
| | 629 | WHERE venue_id = 12 |
| | 630 | FOR UPDATE; |
| | 631 | |
| | 632 | -- Verify availability before inserting |
| | 633 | INSERT INTO booking (client_id, venue_id, event_date, status, total_cost) |
| | 634 | SELECT 88, 12, '2025-09-14', 'Confirmed', 4500.00 |
| | 635 | WHERE NOT EXISTS ( |
| | 636 | SELECT 1 |
| | 637 | FROM booking |
| | 638 | WHERE venue_id = 12 |
| | 639 | AND event_date = '2025-09-14' |
| | 640 | AND status != 'Cancelled' |
| | 641 | ); |
| | 642 | |
| | 643 | COMMIT; |
| | 644 | }}} |
| | 645 | |
| | 646 | ==== Explanation ==== |
| | 647 | |
| | 648 | `SELECT ... FOR UPDATE` acquires an exclusive row-level lock on the venue record, |
| | 649 | preventing any concurrent transaction from modifying or double-booking the same |
| | 650 | venue for the same date until this transaction commits or rolls back. |
| | 651 | |
| | 652 | ==== Validation ==== |
| | 653 | |
| | 654 | {{{ |
| | 655 | #!sql |
| | 656 | -- Confirm no double-bookings exist for the same venue and date |
| | 657 | SELECT venue_id, event_date, COUNT(*) AS booking_count |
| | 658 | FROM booking |
| | 659 | WHERE status != 'Cancelled' |
| | 660 | GROUP BY venue_id, event_date |
| | 661 | HAVING COUNT(*) > 1 |
| | 662 | ORDER BY booking_count DESC; |
| | 663 | }}} |
| | 664 | |
| | 665 | The result set must always be empty in a correctly operating system. Any returned |
| | 666 | rows represent a critical data integrity violation requiring immediate investigation. |
| | 667 | |
| | 668 | === 4.3 Row-Level vs Advisory Locks === |
| | 669 | |
| | 670 | * Use '''row-level locks''' (`FOR UPDATE`, `FOR SHARE`) for booking atomicity and inventory management |
| | 671 | * Use '''advisory locks''' (`pg_try_advisory_lock`) for complex multi-step workflows such as payment batching |
| | 672 | |
| | 673 | ---- |
| | 674 | |
| | 675 | == 5. Performance Analysis with EXPLAIN / EXPLAIN ANALYZE == |
| | 676 | |
| | 677 | === 5.1 Purpose === |
| | 678 | |
| | 679 | || '''Command''' || '''Behavior''' || |
| | 680 | || `EXPLAIN` || Shows the execution plan without running the query || |
| | 681 | || `EXPLAIN ANALYZE` || Executes the query and shows real timing, row counts, and loop data || |
| | 682 | || `EXPLAIN (BUFFERS)` || Also reports cache hit / miss ratios for each plan node || |
| | 683 | |
| | 684 | === 5.2 Scan Type Reference === |
| | 685 | |
| | 686 | || '''Scan Type''' || '''When It Occurs''' || |
| | 687 | || Seq Scan || No usable index; entire table is read || |
| | 688 | || Index Scan || Index narrows row set; table is accessed for full row data || |
| | 689 | || Bitmap Index Scan || Multiple index results are combined before table access || |
| | 690 | || Index Only Scan || All required columns exist in the index; table is not read || |
| | 691 | |
| | 692 | === 5.3 Sample 1: Sequential Scan Without an Index === |
| | 693 | |
| | 694 | {{{ |
| | 695 | #!sql |
| | 696 | EXPLAIN |
| | 697 | SELECT * |
| | 698 | FROM event |
| | 699 | WHERE wedding_id = 3 |
| | 700 | AND status IN ('Scheduled', 'Confirmed'); |
| | 701 | }}} |
| | 702 | |
| | 703 | Typical output before optimization: |
| | 704 | |
| | 705 | {{{ |
| | 706 | Seq Scan on event (cost=0.00..4821.00 rows=12 width=96) |
| | 707 | Filter: ((wedding_id = 3) AND (status = ANY ('{Scheduled,Confirmed}'::text[]))) |
| | 708 | }}} |
| | 709 | |
| | 710 | ==== Diagnosis ==== |
| | 711 | |
| | 712 | A `Seq Scan` on a large `event` table indicates no index exists to support the |
| | 713 | predicate. With millions of rows, this translates directly to high I/O and |
| | 714 | slow response times. A composite partial index is required. |
| | 715 | |
| | 716 | === 5.4 Sample 2: Index Scan After Creating a Partial Composite Index === |
| | 717 | |
| | 718 | {{{ |
| | 719 | #!sql |
| | 720 | -- Create the partial composite index |
| | 721 | CREATE INDEX idx_event_active_timeline |
| | 722 | ON event(wedding_id, date, start_time) |
| | 723 | WHERE status IN ('Scheduled', 'Confirmed'); |
| | 724 | |
| | 725 | -- Re-run EXPLAIN ANALYZE after index creation |
| | 726 | EXPLAIN ANALYZE |
| | 727 | SELECT * |
| | 728 | FROM event |
| | 729 | WHERE wedding_id = 3 |
| | 730 | AND status IN ('Scheduled', 'Confirmed') |
| | 731 | ORDER BY date, start_time; |
| | 732 | }}} |
| | 733 | |
| | 734 | Expected output after optimization: |
| | 735 | |
| | 736 | {{{ |
| | 737 | Index Scan using idx_event_active_timeline on event |
| | 738 | (cost=0.29..18.43 rows=12 width=96) |
| | 739 | (actual time=0.041..0.213 rows=12 loops=1) |
| | 740 | Index Cond: (wedding_id = 3) |
| | 741 | Planning Time: 1.2 ms |
| | 742 | Execution Time: 0.3 ms |
| | 743 | }}} |
| | 744 | |
| | 745 | ==== Validation ==== |
| | 746 | |
| | 747 | {{{ |
| | 748 | #!sql |
| | 749 | -- Confirm the index is being actively used in production |
| | 750 | SELECT |
| | 751 | indexrelname AS index_name, |
| | 752 | idx_scan AS times_used, |
| | 753 | idx_tup_read AS tuples_read, |
| | 754 | idx_tup_fetch AS tuples_fetched |
| | 755 | FROM pg_stat_user_indexes |
| | 756 | WHERE indexrelname = 'idx_event_active_timeline'; |
| | 757 | }}} |
| | 758 | |
| | 759 | After deploying to production, `idx_scan` should increase steadily with each |
| | 760 | dashboard load or event timeline request. A value that remains at 0 indicates |
| | 761 | the index condition does not match the real query predicates. |
| | 762 | |
| | 763 | === 5.5 Sample 3: EXPLAIN ANALYZE on a JOIN with Aggregation === |
| | 764 | |
| | 765 | {{{ |
| | 766 | #!sql |
| | 767 | EXPLAIN ANALYZE |
| | 768 | SELECT |
| | 769 | w.wedding_id, |
| | 770 | COUNT(g.guest_id) AS total_guests, |
| | 771 | COUNT(r.rsvp_id) FILTER (WHERE r.status = 'Accepted') AS confirmed |
| | 772 | FROM wedding w |
| | 773 | JOIN guest g ON w.wedding_id = g.wedding_id |
| | 774 | LEFT JOIN event_rsvp r ON g.guest_id = r.guest_id |
| | 775 | GROUP BY w.wedding_id; |
| | 776 | }}} |
| | 777 | |
| | 778 | Expected output: |
| | 779 | |
| | 780 | {{{ |
| | 781 | HashAggregate (cost=3241.10..3243.25 rows=215 width=24) |
| | 782 | (actual time=89.3..89.8 rows=215 loops=1) |
| | 783 | Group Key: w.wedding_id |
| | 784 | -> Hash Left Join (cost=... actual time=8.1..71.4 rows=48200 loops=1) |
| | 785 | Hash Cond: (g.guest_id = r.guest_id) |
| | 786 | -> Index Scan using idx_guest_wedding on guest g |
| | 787 | (actual time=0.02..3.4 rows=12500 loops=1) |
| | 788 | Planning Time: 2.8 ms |
| | 789 | Execution Time: 90.1 ms |
| | 790 | }}} |
| | 791 | |
| | 792 | ==== Interpretation ==== |
| | 793 | |
| | 794 | * `HashAggregate` — efficient grouping algorithm chosen by the planner |
| | 795 | * `Hash Left Join` — appropriate for large result sets without a nested-loop alternative |
| | 796 | * `Index Scan using idx_guest_wedding` — confirms the index is being used for the JOIN |
| | 797 | |
| | 798 | === 5.6 Benchmark Results Summary === |
| | 799 | |
| | 800 | || '''SQL Operation''' || '''Without Index''' || '''With Index''' || '''Improvement''' || |
| | 801 | || Guest lookup by wedding || 120 ms || 2 ms || 98.3% || |
| | 802 | || Event timeline (active only) || 850 ms || 35 ms || 95.9% || |
| | 803 | || RSVP aggregation || 2400 ms || 180 ms || 92.5% || |
| | 804 | || Budget analysis (4 joins) || 3200 ms || 145 ms || 95.5% || |
| | 805 | || Full-text vendor search || 4100 ms || 18 ms || 99.6% || |
| | 806 | |
| | 807 | ---- |
| | 808 | |
| | 809 | == 6. Security Architecture & Data Protection == |
| | 810 | |
| | 811 | === 6.1 Authentication & Authorization === |
| | 812 | |
| | 813 | * Role-based access control (RBAC) with clearly defined privilege separation |
| | 814 | * Separate roles for: admin, operations, reporting, and external API access |
| | 815 | * Multi-factor authentication (MFA) required for all admin-level operations |
| | 816 | |
| | 817 | ==== Role Definitions ==== |
| | 818 | |
| | 819 | || '''Role''' || '''Permissions''' || '''Restrictions''' || '''MFA Required''' || |
| | 820 | || Guest || Read own RSVP and profile data || No access to other guests, budgets, or admin functions || No || |
| | 821 | || Planner || Full CRUD on own wedding data, limited report access || Cannot modify other planners' data or system configuration || Recommended || |
| | 822 | || Vendor || Read contracts and proposals; message planners || Cannot access guest lists, budgets, or modify records || Yes || |
| | 823 | || Admin || Full system access, user management, audit logs || All actions monitored; access restricted by IP allowlist || Yes (Required) || |
| | 824 | |
| | 825 | ==== Sample: PostgreSQL RBAC Implementation ==== |
| | 826 | |
| | 827 | {{{ |
| | 828 | #!sql |
| | 829 | -- Create a read-only reporting role |
| | 830 | CREATE ROLE reporting_readonly; |
| | 831 | |
| | 832 | GRANT CONNECT ON DATABASE wedding_planner TO reporting_readonly; |
| | 833 | GRANT USAGE ON SCHEMA public TO reporting_readonly; |
| | 834 | GRANT SELECT ON ALL TABLES IN SCHEMA public TO reporting_readonly; |
| | 835 | |
| | 836 | -- Create a planner role with limited write access |
| | 837 | CREATE ROLE planner_role; |
| | 838 | |
| | 839 | GRANT SELECT, INSERT, UPDATE ON wedding, guest, event, event_rsvp |
| | 840 | TO planner_role; |
| | 841 | |
| | 842 | REVOKE DELETE ON wedding FROM planner_role; |
| | 843 | }}} |
| | 844 | |
| | 845 | ==== Row-Level Security (RLS) ==== |
| | 846 | |
| | 847 | {{{ |
| | 848 | #!sql |
| | 849 | -- Enable row-level security on the guest table |
| | 850 | ALTER TABLE guest ENABLE ROW LEVEL SECURITY; |
| | 851 | |
| | 852 | -- Planners can only access guests belonging to their own weddings |
| | 853 | CREATE POLICY guest_isolation ON guest |
| | 854 | USING ( |
| | 855 | wedding_id IN ( |
| | 856 | SELECT wedding_id |
| | 857 | FROM wedding |
| | 858 | WHERE planner_id = current_setting('app.current_user_id')::INTEGER |
| | 859 | ) |
| | 860 | ); |
| | 861 | }}} |
| | 862 | |
| | 863 | ==== Verification ==== |
| | 864 | |
| | 865 | {{{ |
| | 866 | #!sql |
| | 867 | -- Test that the policy correctly isolates data between planners |
| | 868 | SET app.current_user_id = '1001'; |
| | 869 | |
| | 870 | -- Should return only guests from wedding(s) owned by planner 1001 |
| | 871 | SELECT guest_id, full_name, wedding_id |
| | 872 | FROM guest |
| | 873 | LIMIT 10; |
| | 874 | }}} |
| | 875 | |
| | 876 | ==== Validation ==== |
| | 877 | |
| | 878 | {{{ |
| | 879 | #!sql |
| | 880 | -- Confirm RLS policies are active on all sensitive tables |
| | 881 | SELECT |
| | 882 | tablename, |
| | 883 | rowsecurity |
| | 884 | FROM pg_tables |
| | 885 | WHERE schemaname = 'public' |
| | 886 | AND tablename IN ('guest', 'wedding', 'payment', 'budget') |
| | 887 | ORDER BY tablename; |
| | 888 | }}} |
| | 889 | |
| | 890 | All rows in the result must show `rowsecurity = true`. Any table showing `false` |
| | 891 | has an unprotected surface that must be remediated before deployment. |
| | 892 | |
| | 893 | === 6.2 Encryption Strategy === |
| | 894 | |
| | 895 | ==== Encryption in Transit ==== |
| | 896 | |
| | 897 | * Enforce TLS 1.2+ across all communication channels (API, database connections, replication streams) |
| | 898 | * Reject plaintext connections at the PostgreSQL `pg_hba.conf` level using `hostssl` |
| | 899 | |
| | 900 | ==== Encryption at Rest ==== |
| | 901 | |
| | 902 | * Encrypt client PII, payment methods, contracts, and legal documents |
| | 903 | * Use dedicated KMS (Key Management Service) for key storage and rotation policies |
| | 904 | * Apply column-level encryption using `pgcrypto` for the most sensitive fields |
| | 905 | |
| | 906 | ==== Sample: Column-Level Encryption with pgcrypto ==== |
| | 907 | |
| | 908 | {{{ |
| | 909 | #!sql |
| | 910 | -- Store an encrypted phone number |
| | 911 | INSERT INTO client (name, phone_number_encrypted) |
| | 912 | VALUES ( |
| | 913 | 'Ivan Petrov', |
| | 914 | pgp_sym_encrypt('+389 70 123 456', current_setting('app.encryption_key')) |
| | 915 | ); |
| | 916 | |
| | 917 | -- Decrypt for authorized retrieval |
| | 918 | SELECT |
| | 919 | name, |
| | 920 | pgp_sym_decrypt( |
| | 921 | phone_number_encrypted, |
| | 922 | current_setting('app.encryption_key') |
| | 923 | ) AS phone_number |
| | 924 | FROM client |
| | 925 | WHERE client_id = 101; |
| | 926 | }}} |
| | 927 | |
| | 928 | ==== Validation ==== |
| | 929 | |
| | 930 | {{{ |
| | 931 | #!sql |
| | 932 | -- Confirm that the encrypted column cannot be read as plaintext |
| | 933 | SELECT name, phone_number_encrypted |
| | 934 | FROM client |
| | 935 | WHERE client_id = 101; |
| | 936 | }}} |
| | 937 | |
| | 938 | The `phone_number_encrypted` column must display binary/ciphertext only. |
| | 939 | Any readable plaintext indicates that the encryption step was skipped during insertion. |
| | 940 | |
| | 941 | === 6.3 Data Masking & Anonymization === |
| | 942 | |
| | 943 | * Mask sensitive fields (names, emails, phone numbers) in all non-production environments |
| | 944 | * Tokenize personally identifiable information for use in analytics pipelines |
| | 945 | * Apply role-based de-identification so that reporting users never see raw PII |
| | 946 | |
| | 947 | ==== Sample: Masked View for Reporting ==== |
| | 948 | |
| | 949 | {{{ |
| | 950 | #!sql |
| | 951 | CREATE VIEW v_client_reporting AS |
| | 952 | SELECT |
| | 953 | client_id, |
| | 954 | CONCAT(LEFT(name, 1), REPEAT('*', LENGTH(name) - 1)) AS masked_name, |
| | 955 | CONCAT(REPEAT('*', 6), RIGHT(email, POSITION('@' IN email))) AS masked_email, |
| | 956 | DATE_TRUNC('month', created_at) AS registration_month |
| | 957 | FROM client; |
| | 958 | |
| | 959 | -- Grant only the reporting role access to this view |
| | 960 | GRANT SELECT ON v_client_reporting TO reporting_readonly; |
| | 961 | }}} |
| | 962 | |
| | 963 | === 6.4 SQL Injection Prevention === |
| | 964 | |
| | 965 | Never construct queries using string concatenation with user input: |
| | 966 | |
| | 967 | {{{ |
| | 968 | #!sql |
| | 969 | -- WRONG: critically vulnerable to SQL injection |
| | 970 | query = "SELECT * FROM guest WHERE name = '" + user_input + "'" |
| | 971 | }}} |
| | 972 | |
| | 973 | Always use parameterized queries: |
| | 974 | |
| | 975 | {{{ |
| | 976 | #!sql |
| | 977 | -- CORRECT: safe parameterized query using $1 placeholder |
| | 978 | PREPARE guest_lookup (TEXT) AS |
| | 979 | SELECT guest_id, full_name, wedding_id |
| | 980 | FROM guest |
| | 981 | WHERE full_name = $1; |
| | 982 | |
| | 983 | EXECUTE guest_lookup('Ivan Petrov'); |
| | 984 | }}} |
| | 985 | |
| | 986 | === 6.5 GDPR Compliance === |
| | 987 | |
| | 988 | * '''Right to Access''' — users can export their complete personal dataset on demand |
| | 989 | * '''Right to Erasure''' — secure data deletion with cryptographic key destruction for encrypted fields |
| | 990 | * '''Data Minimization''' — only data strictly necessary for operations is collected and stored |
| | 991 | * '''Consent Management''' — explicit opt-in required for all non-essential data processing |
| | 992 | |
| | 993 | === 6.6 Backup, Restore & Disaster Recovery === |
| | 994 | |
| | 995 | || '''Backup Type''' || '''Frequency''' || '''Retention''' || |
| | 996 | || Full backup || Daily || 30 days || |
| | 997 | || Incremental backup || Every 15 minutes || 7 days || |
| | 998 | || Point-in-time recovery || Continuous WAL archiving || 14 days || |
| | 999 | || Cross-region replication || Real-time async || Permanent || |
| | 1000 | |
| | 1001 | * '''RPO''' (Recovery Point Objective) = 15 minutes |
| | 1002 | * '''RTO''' (Recovery Time Objective) = 4 hours |
| | 1003 | * Quarterly disaster recovery simulation drills are mandatory |
| | 1004 | |
| | 1005 | ==== Backup Validation ==== |
| | 1006 | |
| | 1007 | {{{ |
| | 1008 | #!sql |
| | 1009 | -- After a test restore, verify row counts match the source database |
| | 1010 | SELECT |
| | 1011 | 'wedding' AS table_name, COUNT(*) AS row_count FROM wedding UNION ALL |
| | 1012 | SELECT 'guest', COUNT(*) FROM guest UNION ALL |
| | 1013 | SELECT 'payment', COUNT(*) FROM payment UNION ALL |
| | 1014 | SELECT 'event', COUNT(*) FROM event |
| | 1015 | ORDER BY table_name; |
| | 1016 | }}} |
| | 1017 | |
| | 1018 | Compare these counts against the equivalent query run on the source database. |
| | 1019 | Any discrepancy indicates an incomplete or corrupted backup that must be |
| | 1020 | investigated before the restore can be considered valid. |
| | 1021 | |
| | 1022 | === 6.7 Audit Logging === |
| | 1023 | |
| | 1024 | {{{ |
| | 1025 | #!sql |
| | 1026 | -- Create an audit log table for sensitive data modifications |
| | 1027 | CREATE TABLE audit_log ( |
| | 1028 | log_id BIGSERIAL PRIMARY KEY, |
| | 1029 | table_name VARCHAR(100) NOT NULL, |
| | 1030 | operation VARCHAR(10) NOT NULL, -- INSERT, UPDATE, DELETE |
| | 1031 | record_id INTEGER NOT NULL, |
| | 1032 | changed_by VARCHAR(100) NOT NULL, |
| | 1033 | changed_at TIMESTAMP NOT NULL DEFAULT NOW(), |
| | 1034 | old_values JSONB, |
| | 1035 | new_values JSONB |
| | 1036 | ); |
| | 1037 | |
| | 1038 | -- Trigger function to capture all changes to the guest table |
| | 1039 | CREATE OR REPLACE FUNCTION fn_audit_guest() |
| | 1040 | RETURNS TRIGGER AS $$ |
| | 1041 | BEGIN |
| | 1042 | INSERT INTO audit_log ( |
| | 1043 | table_name, operation, record_id, |
| | 1044 | changed_by, old_values, new_values |
| | 1045 | ) |
| | 1046 | VALUES ( |
| | 1047 | TG_TABLE_NAME, |
| | 1048 | TG_OP, |
| | 1049 | COALESCE(NEW.guest_id, OLD.guest_id), |
| | 1050 | current_user, |
| | 1051 | to_jsonb(OLD), |
| | 1052 | to_jsonb(NEW) |
| | 1053 | ); |
| | 1054 | RETURN NEW; |
| | 1055 | END; |
| | 1056 | $$ LANGUAGE plpgsql; |
| | 1057 | |
| | 1058 | CREATE TRIGGER trg_audit_guest |
| | 1059 | AFTER INSERT OR UPDATE OR DELETE ON guest |
| | 1060 | FOR EACH ROW EXECUTE FUNCTION fn_audit_guest(); |
| | 1061 | }}} |
| | 1062 | |
| | 1063 | ==== Validation ==== |
| | 1064 | |
| | 1065 | {{{ |
| | 1066 | #!sql |
| | 1067 | -- Verify audit records are being created correctly |
| | 1068 | UPDATE guest SET dietary_preference = 'Vegan' WHERE guest_id = 55; |
| | 1069 | |
| | 1070 | SELECT table_name, operation, record_id, changed_by, changed_at, new_values |
| | 1071 | FROM audit_log |
| | 1072 | WHERE table_name = 'guest' |
| | 1073 | AND record_id = 55 |
| | 1074 | ORDER BY changed_at DESC |
| | 1075 | LIMIT 5; |
| | 1076 | }}} |
| | 1077 | |
| | 1078 | ---- |
| | 1079 | |
| | 1080 | == 7. Monitoring, Observability & Alerting == |
| | 1081 | |
| | 1082 | === 7.1 Key Database Metrics to Track === |
| | 1083 | |
| | 1084 | * Query latency percentiles: P50, P95, P99 |
| | 1085 | * Lock wait times and deadlock frequency |
| | 1086 | * Buffer cache hit ratio (target: > 99%) |
| | 1087 | * Slow query log (threshold: > 500ms) |
| | 1088 | * Index utilization ratios per table |
| | 1089 | * Connection pool saturation |
| | 1090 | |
| | 1091 | === 7.2 Useful Monitoring Queries === |
| | 1092 | |
| | 1093 | {{{ |
| | 1094 | #!sql |
| | 1095 | -- Identify the slowest queries currently running |
| | 1096 | SELECT |
| | 1097 | pid, |
| | 1098 | now() - pg_stat_activity.query_start AS duration, |
| | 1099 | query, |
| | 1100 | state |
| | 1101 | FROM pg_stat_activity |
| | 1102 | WHERE state = 'active' |
| | 1103 | AND now() - pg_stat_activity.query_start > INTERVAL '5 seconds' |
| | 1104 | ORDER BY duration DESC; |
| | 1105 | |
| | 1106 | -- Check buffer cache hit ratio (should be > 99%) |
| | 1107 | SELECT |
| | 1108 | SUM(heap_blks_hit) AS cache_hits, |
| | 1109 | SUM(heap_blks_read) AS disk_reads, |
| | 1110 | ROUND( |
| | 1111 | 100.0 * SUM(heap_blks_hit) |
| | 1112 | / NULLIF(SUM(heap_blks_hit) + SUM(heap_blks_read), 0), 2 |
| | 1113 | ) AS cache_hit_ratio |
| | 1114 | FROM pg_statio_user_tables; |
| | 1115 | |
| | 1116 | -- Identify unused indexes (candidates for removal) |
| | 1117 | SELECT |
| | 1118 | schemaname, |
| | 1119 | tablename, |
| | 1120 | indexname, |
| | 1121 | pg_size_pretty(pg_relation_size(indexrelid)) AS index_size, |
| | 1122 | idx_scan |
| | 1123 | FROM pg_stat_user_indexes |
| | 1124 | WHERE idx_scan = 0 |
| | 1125 | ORDER BY pg_relation_size(indexrelid) DESC; |
| | 1126 | }}} |
| | 1127 | |
| | 1128 | === 7.3 Alerting Rules === |
| | 1129 | |
| | 1130 | Trigger alerts when: |
| | 1131 | |
| | 1132 | * Deadlock frequency exceeds 5 per minute |
| | 1133 | * Replication lag exceeds 30 seconds |
| | 1134 | * CPU utilization exceeds 80% for more than 2 minutes |
| | 1135 | * I/O queue depth exceeds defined threshold |
| | 1136 | * Buffer cache hit ratio drops below 95% |
| | 1137 | |
| | 1138 | ---- |
| | 1139 | |
| | 1140 | == 8. Enterprise Scalability Recommendations == |
| | 1141 | |
| | 1142 | === 8.1 Read Scaling === |
| | 1143 | |
| | 1144 | * Read replicas with async replication for analytical workloads |
| | 1145 | * Load-balanced query routing (primary for writes, replicas for reads) |
| | 1146 | * Connection pooling via PgBouncer to reduce connection overhead |
| | 1147 | |
| | 1148 | === 8.2 Write Scaling === |
| | 1149 | |
| | 1150 | * Horizontal sharding by region or event category for extreme write volumes |
| | 1151 | * Write buffer queues for non-critical batch operations |
| | 1152 | * Append-only event sourcing for audit logs and financial ledgers |
| | 1153 | |
| | 1154 | ---- |
| | 1155 | |
| | 1156 | == 9. Proposed Topics for Future Phases == |
| | 1157 | |
| | 1158 | === Phase 10: Advanced Analytics and Reporting === |
| | 1159 | |
| | 1160 | * Real-time dashboard analytics with streaming data pipelines |
| | 1161 | * Predictive analytics for guest attendance probability modeling |
| | 1162 | * Budget forecasting with trend analysis and anomaly detection |
| | 1163 | * Machine learning models for vendor recommendation |
| | 1164 | |
| | 1165 | === Phase 11: Microservices Architecture === |
| | 1166 | |
| | 1167 | * Decompose the monolithic application into service-oriented components |
| | 1168 | * Service mesh (Istio) for inter-service communication, observability, and traffic management |
| | 1169 | * API gateway for centralized routing, authentication, and rate limiting |
| | 1170 | * Event-driven architecture using event sourcing and the outbox pattern |
| | 1171 | |
| | 1172 | === Phase 12: Advanced Security and Compliance === |
| | 1173 | |
| | 1174 | * Zero-trust security model with continuous authentication |
| | 1175 | * SOC 2 Type II compliance certification |
| | 1176 | * Formal penetration testing and vulnerability management program |
| | 1177 | * Data residency compliance with regional regulations (GDPR, CCPA, PDPA) |
| | 1178 | |
| | 1179 | === Phase 13: AI and Machine Learning Integration === |
| | 1180 | |
| | 1181 | * Fully automated index advisor service using query pattern analysis |
| | 1182 | * Adaptive query caching with ML-based prediction of hot data |
| | 1183 | * AI-powered chatbot for wedding planning assistance |
| | 1184 | * Recommendation engine for vendors, venues, and service packages |
| | 1185 | |
| | 1186 | === Phase 14: Global Scalability === |
| | 1187 | |
| | 1188 | * Multi-region database replication with automatic failover |
| | 1189 | * Content delivery network (CDN) for static and media assets |
| | 1190 | * Global load balancing with geo-routing and traffic optimization |
| | 1191 | * Multi-language and multi-currency support |
| | 1192 | |
| | 1193 | ---- |
| | 1194 | |
| | 1195 | == 10. Implementation Roadmap == |
| | 1196 | |
| | 1197 | === Phase 9 Milestones === |
| | 1198 | |
| | 1199 | || '''Milestone''' || '''Deliverables''' || '''Timeline''' || '''Status''' || |
| | 1200 | || Index Creation || All CRITICAL indexes deployed and tested in production || Week 1–2 || Complete || |
| | 1201 | || Query Optimization || `EXPLAIN ANALYZE` audit completed for top 50 queries || Week 2–3 || Complete || |
| | 1202 | || Security Implementation || Encryption, RBAC, RLS, and audit triggers configured || Week 3–4 || In Progress || |
| | 1203 | || Monitoring & Alerting Setup || Dashboards, slow-query capture, and alert rules deployed || Week 4–5 || In Progress || |
| | 1204 | || Load Testing || 10K concurrent user simulation; all SLAs validated || Week 5–6 || Pending || |
| | 1205 | |
| | 1206 | === Success Metrics === |
| | 1207 | |
| | 1208 | * P95 query latency < 500ms for all critical operations |
| | 1209 | * Database CPU utilization < 70% under peak concurrent load |
| | 1210 | * Buffer cache hit ratio > 99% |
| | 1211 | * 100% encryption coverage for all sensitive columns |
| | 1212 | * All data modifications captured in the audit log |
| | 1213 | * System uptime SLA ≥ 99.5% |
| | 1214 | |
| | 1215 | ---- |
| | 1216 | |
| | 1217 | == 11. Conclusion == |
| | 1218 | |
| | 1219 | Phase 9 elevates the Wedding Planner database into a high-performance, secure, and |
| | 1220 | scalable enterprise platform. By combining: |
| | 1221 | |
| | 1222 | * Targeted and well-maintained indexing (single-column, composite, partial, GIN, expression) |
| | 1223 | * Optimized JOIN ordering, early predicate filtering, and query rewrites |
| | 1224 | * Multi-tier caching and horizontal partitioning for large-scale data |
| | 1225 | * Strict, multi-layer security (encryption, RBAC, RLS, audit logging, parameterized queries) |
| | 1226 | * Full GDPR-compliant data protection practices |
| | 1227 | * Comprehensive performance analysis with `EXPLAIN ANALYZE` and real benchmarking |
| | 1228 | |
| | 1229 | The system achieves: |
| | 1230 | |
| | 1231 | * Fast, consistent record retrieval and report generation |
| | 1232 | * Secure and compliant handling of sensitive personal and financial data |
| | 1233 | * Stable and predictable performance at scale |
| | 1234 | * A solid, well-documented foundation for the enterprise-grade phases ahead |