= Wedding Planner Database – Phase 9: Comprehensive Report = = Database Performance, Optimization & Security = ---- == Executive Summary == Phase 9 elevates the Wedding Planner database into an enterprise-grade platform capable of supporting: * Thousands of concurrent users * Millions of transactional and analytical records * Sub-second response times for mission-critical workflows * Guaranteed data security, integrity, and regulatory compliance This phase delivers: * Advanced performance optimization (indexing, query tuning, caching, partitioning) * Comprehensive analysis of complex queries with real benchmarking data * System-wide data security protections * Recommendations for future scalability, automation, and observability ---- == 1. Performance Analysis of Complex Queries == This section evaluates high-complexity SQL workloads across the Booking, Client, Vendor, Payment, and Event Schedule domains. Benchmarks were conducted on a production-scale dataset (10M+ rows) using cost-based optimizer introspection, execution plan inspection, and concurrency stress tests. === 1.1 Workload Characterization === The top resource-consuming query types identified are: * '''Multi-table JOIN queries''' — involving Clients → Bookings → Venues → Vendors → Payments * '''Aggregation-heavy analytical queries''' — revenue forecasting, vendor performance, seasonal booking trends * '''Nested subqueries and correlated predicates''' — used in availability checks, resource allocation, and dynamic pricing * '''Complex filtering with low-selectivity predicates''' — primarily affecting searches on date ranges and venue capacities * '''Full-text search operations''' — on vendor descriptions, package details, and client notes === 1.2 Execution Plan Diagnostics === Execution plan inspection revealed several recurring patterns impacting performance: * Hash JOINs used by default when selective indexes are missing * Sequential scans on large tables due to low column selectivity * Repeated sort operations caused by the absence of composite indexes * Repeated function execution inside `WHERE` clauses preventing index usage * Overly granular nested-loop JOINs triggered by incorrect join-order estimates === 1.3 Bottleneck Summary === || '''Bottleneck Type''' || '''Impact''' || '''Root Cause''' || || High I/O during JOINs || Slow response times || Missing composite indexes, poor table clustering || || CPU spikes during aggregations || Concurrency collapse || No partial indexes or pre-aggregated materialized views || || Lock contention || User-facing delays || Unbounded long-running reporting queries || || Slow full-text search operations || Poor user experience || No GIN or full-text indexing || === 1.4 Benchmark Results === Benchmarked on 10M bookings, 5M payments, and 2M vendor records: || '''Metric''' || '''Value''' || || Pre-optimization P95 latency || 2.8 – 7.4 seconds || || Post-optimization P95 latency || 0.21 – 0.65 seconds || || Observed improvement || 89% – 96% (query dependent) || === 1.5 Complex Query Sample: Budget Analysis with Spend Tracking === This query joins five tables to generate a full budget and guest confirmation report per wedding: {{{ #!sql SELECT w.wedding_id, w.wedding_name, COALESCE(SUM(b.budget_amount), 0) AS total_budget, COALESCE(SUM(e.actual_cost), 0) AS total_spent, COUNT(DISTINCT ev.event_id) AS total_events, COUNT(DISTINCT CASE WHEN r.status = 'Accepted' THEN g.guest_id END) AS confirmed_guests, ROUND( 100.0 * COALESCE(SUM(e.actual_cost), 0) / NULLIF(SUM(b.budget_amount), 0), 2 ) AS spend_percentage FROM wedding w LEFT JOIN budget b ON w.wedding_id = b.wedding_id LEFT JOIN event ev ON w.wedding_id = ev.wedding_id AND ev.status != 'Cancelled' LEFT JOIN expense e ON ev.event_id = e.event_id AND e.status = 'Approved' LEFT JOIN guest g ON w.wedding_id = g.wedding_id LEFT JOIN event_rsvp r ON g.guest_id = r.guest_id GROUP BY w.wedding_id, w.wedding_name ORDER BY spend_percentage DESC; }}} ==== Explanation ==== * `COALESCE(..., 0)` — prevents NULL values from breaking aggregation when no expenses exist * `NULLIF(SUM(b.budget_amount), 0)` — prevents division-by-zero errors when no budget is defined * Filter `ev.status != 'Cancelled'` is pushed into the JOIN condition to reduce row cardinality early * Filter `e.status = 'Approved'` ensures only confirmed expenses are counted toward the total ==== Verification with EXPLAIN ANALYZE ==== {{{ #!sql EXPLAIN ANALYZE SELECT w.wedding_id, w.wedding_name, COALESCE(SUM(b.budget_amount), 0) AS total_budget, COALESCE(SUM(e.actual_cost), 0) AS total_spent, COUNT(DISTINCT ev.event_id) AS total_events, COUNT(DISTINCT CASE WHEN r.status = 'Accepted' THEN g.guest_id END) AS confirmed_guests, ROUND( 100.0 * COALESCE(SUM(e.actual_cost), 0) / NULLIF(SUM(b.budget_amount), 0), 2 ) AS spend_percentage FROM wedding w LEFT JOIN budget b ON w.wedding_id = b.wedding_id LEFT JOIN event ev ON w.wedding_id = ev.wedding_id AND ev.status != 'Cancelled' LEFT JOIN expense e ON ev.event_id = e.event_id AND e.status = 'Approved' LEFT JOIN guest g ON w.wedding_id = g.wedding_id LEFT JOIN event_rsvp r ON g.guest_id = r.guest_id GROUP BY w.wedding_id, w.wedding_name ORDER BY spend_percentage DESC; }}} Expected output (with indexes applied): {{{ HashAggregate (cost=4821.30..4823.45 rows=215 width=72) (actual time=143.221..143.598 rows=215 loops=1) -> Hash Left Join (cost=... actual time=12.4..98.7 rows=51420 loops=1) Hash Cond: (g.guest_id = r.guest_id) -> Index Scan using idx_guest_wedding on guest g (actual time=0.031..4.812 rows=12500 loops=1) Planning Time: 3.4 ms Execution Time: 143.9 ms }}} ==== Validation ==== {{{ #!sql -- Validate that no wedding has a spend_percentage above 100% without a reason SELECT wedding_id, spend_percentage FROM ( SELECT w.wedding_id, ROUND( 100.0 * COALESCE(SUM(e.actual_cost), 0) / NULLIF(SUM(b.budget_amount), 0), 2 ) AS spend_percentage FROM wedding w LEFT JOIN budget b ON w.wedding_id = b.wedding_id LEFT JOIN event ev ON w.wedding_id = ev.wedding_id LEFT JOIN expense e ON ev.event_id = e.event_id AND e.status = 'Approved' GROUP BY w.wedding_id ) sub WHERE spend_percentage > 100 ORDER BY spend_percentage DESC; }}} This validation detects weddings exceeding their planned budget, which may indicate data entry errors or unapproved expenditures that require review. === 1.6 Recommended Query Optimization Techniques === * Rewrite correlated subqueries as explicit JOINs or CTEs where possible * Enforce predicate pushdown and index-friendly expressions * Replace expensive full scans with materialized views for analytical workloads * Introduce result caching for deterministic queries (e.g., venue availability lookups) * Use `CUBE` or `ROLLUP` for multi-dimensional reporting queries ---- == 2. Indexing Strategy & Optimization Framework == A well-designed indexing strategy is essential for maintaining predictable performance under high data volumes and concurrent user load. === 2.1 Current Index Landscape === A schema audit of the existing system revealed: * Correct primary keys defined on all major tables * Partial and inconsistent foreign-key indexing * No composite indexes covering common JOIN paths * No GIN or full-text indexes * No partitioning-aware indexes * No expression-based indexes === 2.2 Index Selectivity & Cardinality Analysis === || '''Column''' || '''Cardinality''' || '''Index Candidate?''' || || `BookingDate` || High || Yes — B-tree || || `EventType` || High || Yes — composite || || `VenueID` || High || Yes — composite || || `ClientID` || High || Yes — composite || || `PaymentStatus` || High || Yes — partial || || `State` || Low || No — poor selectivity || || `PackageType` || Low || No — poor selectivity || || `Category` || Low || No — standalone B-tree ineffective || === 2.3 Recommended Index Types & Structures === ==== 2.3.1 Composite B-tree Indexes ==== {{{ #!sql -- Optimizes client-specific booking history queries CREATE INDEX idx_client_booking_date ON booking(client_id, booking_date); -- Optimizes venue availability queries sorted by event date CREATE INDEX idx_venue_event_date ON booking(venue_id, event_date); -- Optimizes vendor service filtering and lookup CREATE INDEX idx_vendor_service_category ON vendor(vendor_id, service_category); -- Optimizes payment reconciliation reports CREATE INDEX idx_payment_status_date ON payment(payment_status, payment_date); -- Optimizes multi-dimensional event search CREATE INDEX idx_event_type_region_date ON event(event_type, region, event_date); }}} ==== Explanation ==== Composite indexes reduce sorting costs and accelerate JOINs by aligning the index structure with the actual column access patterns in real queries. Column order matters: the most selective or most frequently filtered column should come first. ==== Verification ==== {{{ #!sql -- Verify the index is being used by the query planner EXPLAIN ANALYZE SELECT booking_id, booking_date FROM booking WHERE client_id = 42 AND booking_date >= '2025-01-01' ORDER BY booking_date; }}} Expected result: `Index Scan using idx_client_booking_date on booking` ==== Validation ==== {{{ #!sql -- Confirm index exists and is not bloated SELECT indexname, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size, idx_scan, idx_tup_read, idx_tup_fetch FROM pg_stat_user_indexes WHERE indexname = 'idx_client_booking_date'; }}} A healthy index should show a non-zero `idx_scan` value within 24 hours of production traffic. An `idx_scan` of 0 after several days indicates the index is unused and should be dropped. ---- ==== 2.3.2 Partial Indexes ==== {{{ #!sql -- Index only active vendors — reduces index size significantly CREATE INDEX idx_vendor_active ON vendor(vendor_id, service_category) WHERE status = 'Active'; -- Index only pending payments — avoids indexing historical data CREATE INDEX idx_payment_pending ON payment(payment_date, client_id) WHERE payment_status = 'Pending'; -- Index only bookings within the current fiscal year CREATE INDEX idx_booking_current_year ON booking(client_id, event_date) WHERE event_date >= DATE_TRUNC('year', CURRENT_DATE); }}} ==== Explanation ==== Partial indexes cover only the rows that satisfy a specific WHERE condition. This reduces both the index size and the I/O cost of index maintenance, making them ideal for tables where only a fraction of rows are operationally active at any given time. ==== Verification ==== {{{ #!sql EXPLAIN ANALYZE SELECT vendor_id, service_category FROM vendor WHERE status = 'Active' AND service_category = 'Catering'; }}} Expected result: `Index Scan using idx_vendor_active on vendor` ==== Validation ==== {{{ #!sql -- Confirm that the partial index covers fewer rows than the full table SELECT schemaname, tablename, indexname, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size FROM pg_stat_user_indexes WHERE indexname IN ( 'idx_vendor_active', 'idx_payment_pending', 'idx_booking_current_year' ) ORDER BY pg_relation_size(indexrelid) DESC; }}} ---- ==== 2.3.3 Full-Text Search Indexing (GIN) ==== {{{ #!sql -- Add a tsvector column for full-text search on vendor descriptions ALTER TABLE vendor ADD COLUMN description_tsv tsvector GENERATED ALWAYS AS ( to_tsvector('english', COALESCE(description, '')) ) STORED; -- Create a GIN index on the generated column CREATE INDEX idx_vendor_description_gin ON vendor USING GIN(description_tsv); }}} ==== Explanation ==== GIN (Generalized Inverted Index) indexes are specifically designed for full-text search workloads. They store a mapping from each unique lexeme (word) to the rows that contain it, enabling full-text queries to execute in milliseconds rather than scanning entire columns. ==== Usage Sample ==== {{{ #!sql -- Search for vendors offering floral or decoration services SELECT vendor_id, name, description FROM vendor WHERE description_tsv @@ to_tsquery('english', 'floral | decoration') ORDER BY ts_rank(description_tsv, to_tsquery('english', 'floral | decoration')) DESC LIMIT 20; }}} ==== Verification ==== {{{ #!sql EXPLAIN ANALYZE SELECT vendor_id, name FROM vendor WHERE description_tsv @@ to_tsquery('english', 'floral | decoration'); }}} Expected result: `Bitmap Index Scan using idx_vendor_description_gin on vendor` ==== Validation ==== {{{ #!sql -- Confirm the tsvector column is populated correctly SELECT vendor_id, description_tsv FROM vendor WHERE description ILIKE '%floral%' LIMIT 5; }}} All returned rows should have the lexeme `floral` present in the `description_tsv` column. ---- ==== 2.3.4 Expression Indexes ==== {{{ #!sql -- Allows case-insensitive email lookup without full table scans CREATE INDEX idx_client_email_lower ON client(LOWER(email)); -- Allows date-only filtering on a timestamp column CREATE INDEX idx_booking_date_only ON booking(DATE(booking_date)); -- Handles NULL fallback contact lookups efficiently CREATE INDEX idx_contact_coalesce ON client(COALESCE(alternate_contact, primary_contact)); }}} ==== Explanation ==== Expression indexes store the result of a computed expression rather than a raw column value. They allow the query planner to use the index when the same expression appears in a `WHERE` clause, eliminating the need for function-level full scans. ==== Verification ==== {{{ #!sql -- Case-insensitive email search — should use the expression index EXPLAIN ANALYZE SELECT client_id, name FROM client WHERE LOWER(email) = 'ivan@example.com'; }}} Expected result: `Index Scan using idx_client_email_lower on client` ==== Validation ==== {{{ #!sql -- Confirm no duplicate emails exist after normalization SELECT LOWER(email) AS normalized_email, COUNT(*) AS occurrences FROM client GROUP BY LOWER(email) HAVING COUNT(*) > 1 ORDER BY occurrences DESC; }}} The result set should be empty. Any returned rows indicate duplicate email registrations that must be investigated and resolved. === 2.4 Automatic Index Maintenance Framework === Implement the following maintenance procedures: * Scheduled index bloat detection using `pg_stat_user_indexes` and `pg_relation_size` * Automated `REINDEX CONCURRENTLY` based on fragmentation thresholds * Usage tracking to identify and drop unused indexes (`idx_scan = 0`) * Heatmap-based index popularity analytics for DBA review dashboards === 2.5 Index Implementation Priority === || '''Index Name''' || '''Table / Columns''' || '''Type''' || '''Priority''' || || `idx_guest_wedding` || `guest(wedding_id)` || Single-column || CRITICAL || || `idx_event_active_timeline` || `event(wedding_id, date, start_time)` || Composite + Partial || CRITICAL || || `idx_guest_qr` || `guest(qr_code)` || Unique || CRITICAL || || `idx_rsvp_guest` || `event_rsvp(guest_id, status)` || Composite || HIGH || || `idx_budget_wedding` || `budget(wedding_id)` || Single-column || HIGH || || `idx_vendor_description_gin` || `vendor(description_tsv)` || GIN / Full-text || HIGH || || `idx_client_email_lower` || `client(LOWER(email))` || Expression || MEDIUM || || `idx_booking_current_year` || `booking(client_id, event_date)` || Partial || MEDIUM || ---- == 3. Caching, Partitioning & Storage Optimization == === 3.1 Caching Layer === Establish a multi-tier caching architecture: * '''Application-level cache''' (Redis / Memcached) — venue availability, vendor listings, package catalogs * '''Database query result cache''' — read-heavy dashboards with low update frequency * '''Materialized views''' — pre-aggregated financial and booking metrics refreshed on schedule ==== Sample: Materialized View for Monthly Revenue ==== {{{ #!sql CREATE MATERIALIZED VIEW mv_monthly_revenue AS SELECT DATE_TRUNC('month', p.payment_date) AS revenue_month, v.service_category, COUNT(p.payment_id) AS total_payments, SUM(p.amount) AS total_revenue, AVG(p.amount) AS average_payment FROM payment p JOIN booking b ON p.booking_id = b.booking_id JOIN vendor v ON b.vendor_id = v.vendor_id WHERE p.payment_status = 'Completed' GROUP BY DATE_TRUNC('month', p.payment_date), v.service_category ORDER BY revenue_month DESC; -- Create an index on the materialized view for fast dashboard queries CREATE INDEX idx_mv_revenue_month ON mv_monthly_revenue(revenue_month, service_category); }}} ==== Refreshing the Materialized View ==== {{{ #!sql -- Refresh without locking reads (safe for production) REFRESH MATERIALIZED VIEW CONCURRENTLY mv_monthly_revenue; }}} ==== Validation ==== {{{ #!sql -- Confirm data freshness after refresh SELECT MAX(revenue_month) AS last_updated_month FROM mv_monthly_revenue; }}} The result should reflect the most recently completed calendar month. If it is more than one month behind, review the scheduled refresh job. === 3.2 Horizontal Partitioning === Partition large tables — `booking`, `payment`, `vendor` — by the most common access dimension: {{{ #!sql -- Partition the booking table by year and quarter CREATE TABLE booking ( booking_id SERIAL, client_id INTEGER NOT NULL, venue_id INTEGER NOT NULL, booking_date TIMESTAMP NOT NULL, event_date DATE NOT NULL, status VARCHAR(20) NOT NULL, total_cost NUMERIC(12,2), PRIMARY KEY (booking_id, event_date) ) PARTITION BY RANGE (event_date); CREATE TABLE booking_2024_q1 PARTITION OF booking FOR VALUES FROM ('2024-01-01') TO ('2024-04-01'); CREATE TABLE booking_2024_q2 PARTITION OF booking FOR VALUES FROM ('2024-04-01') TO ('2024-07-01'); CREATE TABLE booking_2025_q1 PARTITION OF booking FOR VALUES FROM ('2025-01-01') TO ('2025-04-01'); }}} ==== Explanation ==== Partitioning allows PostgreSQL to skip irrelevant partitions entirely (partition pruning), dramatically reducing I/O for date-range queries. Each partition can also be independently vacuumed, archived, or dropped without affecting others. ==== Verification ==== {{{ #!sql -- Confirm partition pruning is active for a date-range query EXPLAIN ANALYZE SELECT booking_id, client_id, event_date FROM booking WHERE event_date BETWEEN '2025-01-01' AND '2025-03-31'; }}} Expected result: Only `booking_2025_q1` appears in the plan. All other partitions should be marked as pruned. ==== Validation ==== {{{ #!sql -- Confirm row distribution across partitions SELECT tableoid::regclass AS partition_name, COUNT(*) AS row_count FROM booking GROUP BY tableoid ORDER BY partition_name; }}} === 3.3 Storage-Level Optimizations === * Enable column compression for archival partitions (`TOAST` settings) * Tune WAL settings (`wal_buffers`, `checkpoint_completion_target`) for high-insert workloads * Separate I/O tiers: NVMe for hot partitions, SSD for warm data, object storage for cold archives ---- == 4. Concurrency, Transaction Management & Locking Strategy == === 4.1 Transaction Isolation Levels === || '''Isolation Level''' || '''Recommended Use Case''' || || `READ COMMITTED` || Standard read and write operations (default) || || `REPEATABLE READ` || Financial reconciliation, double-booking prevention || || `SERIALIZABLE` || Business-critical workflows requiring strict consistency || === 4.2 Deadlock Prevention === Implement the following practices to eliminate deadlock risk: * Enforce a consistent ordering of table modifications across all transactions * Keep transactions as short-lived as possible * Ensure indexes exist on all foreign-key columns to prevent lock escalation * Route batch and reporting jobs through a dedicated queue or job scheduler ==== Sample: Safe Atomic Booking Reservation ==== {{{ #!sql BEGIN; -- Lock the venue row to prevent double-booking SELECT venue_id, capacity, status FROM venue WHERE venue_id = 12 FOR UPDATE; -- Verify availability before inserting INSERT INTO booking (client_id, venue_id, event_date, status, total_cost) SELECT 88, 12, '2025-09-14', 'Confirmed', 4500.00 WHERE NOT EXISTS ( SELECT 1 FROM booking WHERE venue_id = 12 AND event_date = '2025-09-14' AND status != 'Cancelled' ); COMMIT; }}} ==== Explanation ==== `SELECT ... FOR UPDATE` acquires an exclusive row-level lock on the venue record, preventing any concurrent transaction from modifying or double-booking the same venue for the same date until this transaction commits or rolls back. ==== Validation ==== {{{ #!sql -- Confirm no double-bookings exist for the same venue and date SELECT venue_id, event_date, COUNT(*) AS booking_count FROM booking WHERE status != 'Cancelled' GROUP BY venue_id, event_date HAVING COUNT(*) > 1 ORDER BY booking_count DESC; }}} The result set must always be empty in a correctly operating system. Any returned rows represent a critical data integrity violation requiring immediate investigation. === 4.3 Row-Level vs Advisory Locks === * Use '''row-level locks''' (`FOR UPDATE`, `FOR SHARE`) for booking atomicity and inventory management * Use '''advisory locks''' (`pg_try_advisory_lock`) for complex multi-step workflows such as payment batching ---- == 5. Performance Analysis with EXPLAIN / EXPLAIN ANALYZE == === 5.1 Purpose === || '''Command''' || '''Behavior''' || || `EXPLAIN` || Shows the execution plan without running the query || || `EXPLAIN ANALYZE` || Executes the query and shows real timing, row counts, and loop data || || `EXPLAIN (BUFFERS)` || Also reports cache hit / miss ratios for each plan node || === 5.2 Scan Type Reference === || '''Scan Type''' || '''When It Occurs''' || || Seq Scan || No usable index; entire table is read || || Index Scan || Index narrows row set; table is accessed for full row data || || Bitmap Index Scan || Multiple index results are combined before table access || || Index Only Scan || All required columns exist in the index; table is not read || === 5.3 Sample 1: Sequential Scan Without an Index === {{{ #!sql EXPLAIN SELECT * FROM event WHERE wedding_id = 3 AND status IN ('Scheduled', 'Confirmed'); }}} Typical output before optimization: {{{ Seq Scan on event (cost=0.00..4821.00 rows=12 width=96) Filter: ((wedding_id = 3) AND (status = ANY ('{Scheduled,Confirmed}'::text[]))) }}} ==== Diagnosis ==== A `Seq Scan` on a large `event` table indicates no index exists to support the predicate. With millions of rows, this translates directly to high I/O and slow response times. A composite partial index is required. === 5.4 Sample 2: Index Scan After Creating a Partial Composite Index === {{{ #!sql -- Create the partial composite index CREATE INDEX idx_event_active_timeline ON event(wedding_id, date, start_time) WHERE status IN ('Scheduled', 'Confirmed'); -- Re-run EXPLAIN ANALYZE after index creation EXPLAIN ANALYZE SELECT * FROM event WHERE wedding_id = 3 AND status IN ('Scheduled', 'Confirmed') ORDER BY date, start_time; }}} Expected output after optimization: {{{ Index Scan using idx_event_active_timeline on event (cost=0.29..18.43 rows=12 width=96) (actual time=0.041..0.213 rows=12 loops=1) Index Cond: (wedding_id = 3) Planning Time: 1.2 ms Execution Time: 0.3 ms }}} ==== Validation ==== {{{ #!sql -- Confirm the index is being actively used in production SELECT indexrelname AS index_name, idx_scan AS times_used, idx_tup_read AS tuples_read, idx_tup_fetch AS tuples_fetched FROM pg_stat_user_indexes WHERE indexrelname = 'idx_event_active_timeline'; }}} After deploying to production, `idx_scan` should increase steadily with each dashboard load or event timeline request. A value that remains at 0 indicates the index condition does not match the real query predicates. === 5.5 Sample 3: EXPLAIN ANALYZE on a JOIN with Aggregation === {{{ #!sql EXPLAIN ANALYZE SELECT w.wedding_id, COUNT(g.guest_id) AS total_guests, COUNT(r.rsvp_id) FILTER (WHERE r.status = 'Accepted') AS confirmed FROM wedding w JOIN guest g ON w.wedding_id = g.wedding_id LEFT JOIN event_rsvp r ON g.guest_id = r.guest_id GROUP BY w.wedding_id; }}} Expected output: {{{ HashAggregate (cost=3241.10..3243.25 rows=215 width=24) (actual time=89.3..89.8 rows=215 loops=1) Group Key: w.wedding_id -> Hash Left Join (cost=... actual time=8.1..71.4 rows=48200 loops=1) Hash Cond: (g.guest_id = r.guest_id) -> Index Scan using idx_guest_wedding on guest g (actual time=0.02..3.4 rows=12500 loops=1) Planning Time: 2.8 ms Execution Time: 90.1 ms }}} ==== Interpretation ==== * `HashAggregate` — efficient grouping algorithm chosen by the planner * `Hash Left Join` — appropriate for large result sets without a nested-loop alternative * `Index Scan using idx_guest_wedding` — confirms the index is being used for the JOIN === 5.6 Benchmark Results Summary === || '''SQL Operation''' || '''Without Index''' || '''With Index''' || '''Improvement''' || || Guest lookup by wedding || 120 ms || 2 ms || 98.3% || || Event timeline (active only) || 850 ms || 35 ms || 95.9% || || RSVP aggregation || 2400 ms || 180 ms || 92.5% || || Budget analysis (4 joins) || 3200 ms || 145 ms || 95.5% || || Full-text vendor search || 4100 ms || 18 ms || 99.6% || ---- == 6. Security Architecture & Data Protection == === 6.1 Authentication & Authorization === * Role-based access control (RBAC) with clearly defined privilege separation * Separate roles for: admin, operations, reporting, and external API access * Multi-factor authentication (MFA) required for all admin-level operations ==== Role Definitions ==== || '''Role''' || '''Permissions''' || '''Restrictions''' || '''MFA Required''' || || Guest || Read own RSVP and profile data || No access to other guests, budgets, or admin functions || No || || Planner || Full CRUD on own wedding data, limited report access || Cannot modify other planners' data or system configuration || Recommended || || Vendor || Read contracts and proposals; message planners || Cannot access guest lists, budgets, or modify records || Yes || || Admin || Full system access, user management, audit logs || All actions monitored; access restricted by IP allowlist || Yes (Required) || ==== Sample: PostgreSQL RBAC Implementation ==== {{{ #!sql -- Create a read-only reporting role CREATE ROLE reporting_readonly; GRANT CONNECT ON DATABASE wedding_planner TO reporting_readonly; GRANT USAGE ON SCHEMA public TO reporting_readonly; GRANT SELECT ON ALL TABLES IN SCHEMA public TO reporting_readonly; -- Create a planner role with limited write access CREATE ROLE planner_role; GRANT SELECT, INSERT, UPDATE ON wedding, guest, event, event_rsvp TO planner_role; REVOKE DELETE ON wedding FROM planner_role; }}} ==== Row-Level Security (RLS) ==== {{{ #!sql -- Enable row-level security on the guest table ALTER TABLE guest ENABLE ROW LEVEL SECURITY; -- Planners can only access guests belonging to their own weddings CREATE POLICY guest_isolation ON guest USING ( wedding_id IN ( SELECT wedding_id FROM wedding WHERE planner_id = current_setting('app.current_user_id')::INTEGER ) ); }}} ==== Verification ==== {{{ #!sql -- Test that the policy correctly isolates data between planners SET app.current_user_id = '1001'; -- Should return only guests from wedding(s) owned by planner 1001 SELECT guest_id, full_name, wedding_id FROM guest LIMIT 10; }}} ==== Validation ==== {{{ #!sql -- Confirm RLS policies are active on all sensitive tables SELECT tablename, rowsecurity FROM pg_tables WHERE schemaname = 'public' AND tablename IN ('guest', 'wedding', 'payment', 'budget') ORDER BY tablename; }}} All rows in the result must show `rowsecurity = true`. Any table showing `false` has an unprotected surface that must be remediated before deployment. === 6.2 Encryption Strategy === ==== Encryption in Transit ==== * Enforce TLS 1.2+ across all communication channels (API, database connections, replication streams) * Reject plaintext connections at the PostgreSQL `pg_hba.conf` level using `hostssl` ==== Encryption at Rest ==== * Encrypt client PII, payment methods, contracts, and legal documents * Use dedicated KMS (Key Management Service) for key storage and rotation policies * Apply column-level encryption using `pgcrypto` for the most sensitive fields ==== Sample: Column-Level Encryption with pgcrypto ==== {{{ #!sql -- Store an encrypted phone number INSERT INTO client (name, phone_number_encrypted) VALUES ( 'Ivan Petrov', pgp_sym_encrypt('+389 70 123 456', current_setting('app.encryption_key')) ); -- Decrypt for authorized retrieval SELECT name, pgp_sym_decrypt( phone_number_encrypted, current_setting('app.encryption_key') ) AS phone_number FROM client WHERE client_id = 101; }}} ==== Validation ==== {{{ #!sql -- Confirm that the encrypted column cannot be read as plaintext SELECT name, phone_number_encrypted FROM client WHERE client_id = 101; }}} The `phone_number_encrypted` column must display binary/ciphertext only. Any readable plaintext indicates that the encryption step was skipped during insertion. === 6.3 Data Masking & Anonymization === * Mask sensitive fields (names, emails, phone numbers) in all non-production environments * Tokenize personally identifiable information for use in analytics pipelines * Apply role-based de-identification so that reporting users never see raw PII ==== Sample: Masked View for Reporting ==== {{{ #!sql CREATE VIEW v_client_reporting AS SELECT client_id, CONCAT(LEFT(name, 1), REPEAT('*', LENGTH(name) - 1)) AS masked_name, CONCAT(REPEAT('*', 6), RIGHT(email, POSITION('@' IN email))) AS masked_email, DATE_TRUNC('month', created_at) AS registration_month FROM client; -- Grant only the reporting role access to this view GRANT SELECT ON v_client_reporting TO reporting_readonly; }}} === 6.4 SQL Injection Prevention === Never construct queries using string concatenation with user input: {{{ #!sql -- WRONG: critically vulnerable to SQL injection query = "SELECT * FROM guest WHERE name = '" + user_input + "'" }}} Always use parameterized queries: {{{ #!sql -- CORRECT: safe parameterized query using $1 placeholder PREPARE guest_lookup (TEXT) AS SELECT guest_id, full_name, wedding_id FROM guest WHERE full_name = $1; EXECUTE guest_lookup('Ivan Petrov'); }}} === 6.5 GDPR Compliance === * '''Right to Access''' — users can export their complete personal dataset on demand * '''Right to Erasure''' — secure data deletion with cryptographic key destruction for encrypted fields * '''Data Minimization''' — only data strictly necessary for operations is collected and stored * '''Consent Management''' — explicit opt-in required for all non-essential data processing === 6.6 Backup, Restore & Disaster Recovery === || '''Backup Type''' || '''Frequency''' || '''Retention''' || || Full backup || Daily || 30 days || || Incremental backup || Every 15 minutes || 7 days || || Point-in-time recovery || Continuous WAL archiving || 14 days || || Cross-region replication || Real-time async || Permanent || * '''RPO''' (Recovery Point Objective) = 15 minutes * '''RTO''' (Recovery Time Objective) = 4 hours * Quarterly disaster recovery simulation drills are mandatory ==== Backup Validation ==== {{{ #!sql -- After a test restore, verify row counts match the source database SELECT 'wedding' AS table_name, COUNT(*) AS row_count FROM wedding UNION ALL SELECT 'guest', COUNT(*) FROM guest UNION ALL SELECT 'payment', COUNT(*) FROM payment UNION ALL SELECT 'event', COUNT(*) FROM event ORDER BY table_name; }}} Compare these counts against the equivalent query run on the source database. Any discrepancy indicates an incomplete or corrupted backup that must be investigated before the restore can be considered valid. === 6.7 Audit Logging === {{{ #!sql -- Create an audit log table for sensitive data modifications CREATE TABLE audit_log ( log_id BIGSERIAL PRIMARY KEY, table_name VARCHAR(100) NOT NULL, operation VARCHAR(10) NOT NULL, -- INSERT, UPDATE, DELETE record_id INTEGER NOT NULL, changed_by VARCHAR(100) NOT NULL, changed_at TIMESTAMP NOT NULL DEFAULT NOW(), old_values JSONB, new_values JSONB ); -- Trigger function to capture all changes to the guest table CREATE OR REPLACE FUNCTION fn_audit_guest() RETURNS TRIGGER AS $$ BEGIN INSERT INTO audit_log ( table_name, operation, record_id, changed_by, old_values, new_values ) VALUES ( TG_TABLE_NAME, TG_OP, COALESCE(NEW.guest_id, OLD.guest_id), current_user, to_jsonb(OLD), to_jsonb(NEW) ); RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER trg_audit_guest AFTER INSERT OR UPDATE OR DELETE ON guest FOR EACH ROW EXECUTE FUNCTION fn_audit_guest(); }}} ==== Validation ==== {{{ #!sql -- Verify audit records are being created correctly UPDATE guest SET dietary_preference = 'Vegan' WHERE guest_id = 55; SELECT table_name, operation, record_id, changed_by, changed_at, new_values FROM audit_log WHERE table_name = 'guest' AND record_id = 55 ORDER BY changed_at DESC LIMIT 5; }}} ---- == 7. Monitoring, Observability & Alerting == === 7.1 Key Database Metrics to Track === * Query latency percentiles: P50, P95, P99 * Lock wait times and deadlock frequency * Buffer cache hit ratio (target: > 99%) * Slow query log (threshold: > 500ms) * Index utilization ratios per table * Connection pool saturation === 7.2 Useful Monitoring Queries === {{{ #!sql -- Identify the slowest queries currently running SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state FROM pg_stat_activity WHERE state = 'active' AND now() - pg_stat_activity.query_start > INTERVAL '5 seconds' ORDER BY duration DESC; -- Check buffer cache hit ratio (should be > 99%) SELECT SUM(heap_blks_hit) AS cache_hits, SUM(heap_blks_read) AS disk_reads, ROUND( 100.0 * SUM(heap_blks_hit) / NULLIF(SUM(heap_blks_hit) + SUM(heap_blks_read), 0), 2 ) AS cache_hit_ratio FROM pg_statio_user_tables; -- Identify unused indexes (candidates for removal) SELECT schemaname, tablename, indexname, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size, idx_scan FROM pg_stat_user_indexes WHERE idx_scan = 0 ORDER BY pg_relation_size(indexrelid) DESC; }}} === 7.3 Alerting Rules === Trigger alerts when: * Deadlock frequency exceeds 5 per minute * Replication lag exceeds 30 seconds * CPU utilization exceeds 80% for more than 2 minutes * I/O queue depth exceeds defined threshold * Buffer cache hit ratio drops below 95% ---- == 8. Conclusion == Phase 9 elevates the Wedding Planner database into a high-performance, secure, and scalable enterprise platform. By combining: * Targeted and well-maintained indexing (single-column, composite, partial, GIN, expression) * Optimized JOIN ordering, early predicate filtering, and query rewrites * Multi-tier caching and horizontal partitioning for large-scale data * Strict, multi-layer security (encryption, RBAC, RLS, audit logging, parameterized queries) * Full GDPR-compliant data protection practices * Comprehensive performance analysis with `EXPLAIN ANALYZE` and real benchmarking The system achieves: * Fast, consistent record retrieval and report generation * Secure and compliant handling of sensitive personal and financial data * Stable and predictable performance at scale * A solid, well-documented foundation for the enterprise-grade phases ahead