= Wedding Planner Database – Phase 9: Comprehensive Report =
= Database Performance, Optimization & Security =

----

== Executive Summary ==

Phase 9 focuses on database performance, optimization, scalability, and security for the Wedding Planner Management System.

The primary focus of this phase is the execution analysis and optimization of the complex analytical queries implemented in Phase 6.

The analyzed analytical reports are:
* Budget vs Actual Expenditure Analysis
* Venue Capacity Utilization Analysis
* RSVP Conversion Rate Analysis

This phase includes:
* EXPLAIN ANALYZE-based query analysis
* identification of execution bottlenecks
* indexing and optimization strategies
* materialized view and partitioning recommendations
* transaction and locking analysis
* database security mechanisms
* backup, audit logging, and data protection strategies

The implementation demonstrates how PostgreSQL can support both transactional processing and analytical workloads within a scalable Wedding Planner Management System.

== 1. Performance Analysis of Phase 6 Queries ==

== 1.1 Query Workload Characteristics ==

The analytical queries from Phase 6 are reporting-oriented workloads.

These queries are significantly more expensive than standard CRUD operations because they combine data from multiple related tables before generating aggregated analytical metrics.

|| Query || Main Operations || Potential Bottleneck ||
|| Budget Analysis || Multiple LEFT JOIN operations, SUM aggregation, temporal calculations || Aggregation and repeated booking joins ||
|| Venue Capacity || JOINs, COUNT(DISTINCT), CASE categorization || Attendance aggregation and GROUP BY ||
|| RSVP Conversion || Multiple LEFT JOINs, COUNT(DISTINCT), conditional aggregation || DISTINCT counting and aggregation ||

The most expensive operations identified are:
* COUNT(DISTINCT ...)
* GROUP BY aggregation
* LEFT JOIN chains
* temporal cost calculations
* conditional CASE calculations

== 1.2 General Execution Plan Expectations ==

For the Phase 6 analytical reports, PostgreSQL is expected to use:
* Sequential Scan
* Index Scan
* Hash Join
* !HashAggregate
* Sort

The selected execution strategy depends on:
* table size
* index availability
* row selectivity
* join cardinality
* aggregation complexity

The most important optimization factors are:
* indexes on foreign-key columns
* indexes on status columns
* optimization of GROUP BY operations
* reduction of unnecessary sequential scans
* efficient JOIN ordering

== 1.3 Budget Analysis Query – Execution Analysis ==

The Budget Analysis query evaluates the financial relationship between the planned wedding budget and the actual expenses generated by venue, photographer, and band bookings.

This query is computationally expensive because it combines multiple booking-related tables and performs aggregation and temporal cost calculations.

=== Query Characteristics ===

The query includes:
* multiple LEFT JOIN operations
* SUM() aggregation
* temporal calculations using EXTRACT(EPOCH)
* GROUP BY aggregation
* COALESCE() handling of NULL values

The query combines:
* wedding
* user
* venue_booking
* photographer_booking
* photographer
* band_booking
* band

=== Performance-Sensitive Operations ===

The most expensive operations identified are:

* Multiple LEFT JOIN operations:
  * all weddings must remain in the result set even when certain bookings do not exist

* Aggregation:
  * SUM() calculations process multiple booking records per wedding

* Temporal Calculations:
  * EXTRACT(EPOCH FROM (...)) converts booking durations into hours

* GROUP BY:
  * aggregation requires grouping all joined rows by wedding attributes

=== EXPLAIN ANALYZE ===

{{{
#!sql
EXPLAIN ANALYZE
SELECT
    w.wedding_id,

    u.first_name || ' ' || u.last_name
        AS organizer_name,

    w.date,

    w.budget,

    COALESCE(SUM(vb.price), 0)
        AS venue_cost,

    COALESCE(SUM(
        EXTRACT(EPOCH FROM (pb.end_time - pb.start_time))/3600
        * p.price_per_hour
    ), 0) AS photographer_cost,

    COALESCE(SUM(
        EXTRACT(EPOCH FROM (bb.end_time - bb.start_time))/3600
        * b.price_per_hour
    ), 0) AS band_cost

FROM wedding w

LEFT JOIN "user" u
    ON w.user_id = u.user_id

LEFT JOIN venue_booking vb
    ON w.wedding_id = vb.wedding_id

LEFT JOIN photographer_booking pb
    ON w.wedding_id = pb.wedding_id

LEFT JOIN photographer p
    ON pb.photographer_id = p.photographer_id

LEFT JOIN band_booking bb
    ON w.wedding_id = bb.wedding_id

LEFT JOIN band b
    ON bb.band_id = b.band_id

GROUP BY
    w.wedding_id,
    u.first_name,
    u.last_name,
    w.date,
    w.budget

ORDER BY w.wedding_id;
}}}

=== Expected Execution Plan ===

A typical execution plan for this query may include:

{{{
HashAggregate
  -> Hash Left Join
       -> Hash Left Join
            -> Hash Left Join
                 -> Seq Scan on wedding
}}}

=== Interpretation ===

* Seq Scan on wedding:
  * PostgreSQL scans the wedding table as the base relation

* Hash Left Join:
  * booking tables are joined using hash joins because of the multiple LEFT JOIN operations

* HashAggregate:
  * aggregation is performed after all joins are completed

The query execution cost increases proportionally with:
* number of weddings
* number of booking records
* number of vendors per wedding

=== Recommended Indexes ===

{{{
#!sql
CREATE INDEX idx_wedding_user
ON wedding(user_id);

CREATE INDEX idx_venue_booking_wedding
ON venue_booking(wedding_id);

CREATE INDEX idx_photographer_booking_wedding
ON photographer_booking(wedding_id);

CREATE INDEX idx_photographer_booking_photographer
ON photographer_booking(photographer_id);

CREATE INDEX idx_band_booking_wedding
ON band_booking(wedding_id);

CREATE INDEX idx_band_booking_band
ON band_booking(band_id);
}}}

=== Optimization Benefits ===

The proposed indexes improve:
* JOIN performance
* row lookup speed
* aggregation preparation
* scalability for large booking datasets

The indexes reduce:
* sequential scans
* unnecessary I/O operations
* execution latency for analytical reports

=== Validation ===

{{{
#!sql
EXPLAIN ANALYZE
SELECT *
FROM venue_booking
WHERE wedding_id = 1;
}}}

Expected result:

{{{
Index Scan using idx_venue_booking_wedding on venue_booking
}}}

=== Conclusion ===

The Budget Analysis query represents one of the most computationally intensive analytical reports in the system because it combines multiple booking sources and performs aggregation across several relations simultaneously.

Efficient indexing of foreign-key columns is essential for maintaining acceptable execution time as the number of weddings and bookings increases.

== 1.4 Venue Capacity Query – Execution Analysis ==

The Venue Capacity Utilization query analyzes the relationship between venue capacity and guest attendance for wedding events.

This query combines venue, booking, wedding, event, and attendance data in order to calculate occupancy metrics and utilization categories.

=== Query Characteristics ===

The query includes:
* multiple INNER JOIN and LEFT JOIN operations
* COUNT(DISTINCT ...) aggregation
* CASE-based categorization
* GROUP BY aggregation
* occupancy percentage calculations

The query combines:
* venue
* venue_booking
* wedding
* user
* event
* attendance

=== Performance-Sensitive Operations ===

The most expensive operations identified are:

* COUNT(DISTINCT a.guest_id):
  * distinct counting requires additional aggregation work

* GROUP BY:
  * all joined attendance records must be grouped by venue and wedding attributes

* LEFT JOIN attendance:
  * attendance rows must remain optional to preserve events without attendance data

* CASE categorization:
  * occupancy thresholds are evaluated during aggregation

=== EXPLAIN ANALYZE ===

{{{
#!sql
EXPLAIN ANALYZE
SELECT
    v.venue_id,
    v.name AS venue_name,
    v.capacity AS venue_capacity,

    w.wedding_id,

    u.first_name || ' ' || u.last_name
        AS organizer_name,

    w.date AS wedding_date,

    COUNT(DISTINCT a.guest_id)
        AS confirmed_attendees,

    COUNT(
        DISTINCT CASE
            WHEN a.status = 'ATTENDED'
            THEN a.guest_id
        END
    ) AS actual_attendance,

    v.capacity - COUNT(DISTINCT a.guest_id)
        AS available_seats,

    ROUND(
        (
            CAST(COUNT(DISTINCT a.guest_id) AS NUMERIC)
            / v.capacity
        ) * 100,
        2
    ) AS occupancy_rate_percent

FROM venue v

INNER JOIN venue_booking vb
    ON v.venue_id = vb.venue_id

INNER JOIN wedding w
    ON vb.wedding_id = w.wedding_id

INNER JOIN "user" u
    ON w.user_id = u.user_id

LEFT JOIN event e
    ON w.wedding_id = e.wedding_id

LEFT JOIN attendance a
    ON e.event_id = a.event_id
    AND a.status IN ('ATTENDED', 'CONFIRMED')

GROUP BY
    v.venue_id,
    v.name,
    v.capacity,
    w.wedding_id,
    u.first_name,
    u.last_name,
    w.date

ORDER BY
    v.venue_id,
    w.wedding_id;
}}}

=== Expected Execution Plan ===

A typical execution plan may include:

{{{
HashAggregate
  -> Hash Left Join
       -> Hash Join
            -> Seq Scan on attendance
}}}

=== Interpretation ===

* Seq Scan on attendance:
  * attendance records are scanned before aggregation

* Hash Join:
  * attendance records are matched with event and wedding relations

* HashAggregate:
  * PostgreSQL groups attendance rows by venue and wedding information

The execution cost depends mainly on:
* number of attendance records
* number of events
* number of guests per wedding

=== Recommended Indexes ===

{{{
#!sql
CREATE INDEX idx_venue_booking_venue
ON venue_booking(venue_id);

CREATE INDEX idx_venue_booking_wedding
ON venue_booking(wedding_id);

CREATE INDEX idx_event_wedding
ON event(wedding_id);

CREATE INDEX idx_attendance_event
ON attendance(event_id);

CREATE INDEX idx_attendance_status
ON attendance(status);

CREATE INDEX idx_attendance_guest
ON attendance(guest_id);
}}}

=== Optimization Benefits ===

The proposed indexes improve:
* attendance lookup performance
* JOIN efficiency
* aggregation preparation
* occupancy calculation speed

The indexes reduce:
* full table scans
* aggregation overhead
* JOIN latency

=== Validation ===

{{{
#!sql
EXPLAIN ANALYZE
SELECT *
FROM attendance
WHERE event_id = 1;
}}}

Expected result:

{{{
Index Scan using idx_attendance_event on attendance
}}}

=== Conclusion ===

The Venue Capacity query is aggregation-heavy because it calculates attendance and occupancy metrics across multiple joined relations.

The most performance-sensitive component is the DISTINCT attendance aggregation.

Proper indexing of attendance and event relations significantly improves report scalability and execution efficiency.

== 1.5 RSVP Conversion Query – Execution Analysis ==

The RSVP Conversion query analyzes how invited guests move through the RSVP process and how many confirmed guests actually attend the event.

This query is important because it evaluates guest engagement and invitation effectiveness using RSVP and attendance data.

=== Query Characteristics ===

The query includes:
* multiple INNER JOIN and LEFT JOIN operations
* COUNT(DISTINCT ...) aggregation
* conditional aggregation with CASE WHEN
* NULLIF() division protection
* percentage calculations
* GROUP BY aggregation

The query combines:
* wedding
* user
* event
* guest
* event_rsvp
* attendance

=== Performance-Sensitive Operations ===

The most expensive operations identified are:

* COUNT(DISTINCT g.guest_id):
  * calculates total invitations

* COUNT(DISTINCT r.response_id):
  * calculates RSVP responses

* Conditional COUNT(DISTINCT ...):
  * calculates confirmed and declined RSVP responses

* LEFT JOIN event_rsvp:
  * preserves guests even when they have not submitted an RSVP

* LEFT JOIN attendance:
  * preserves invited guests even when attendance data does not exist

=== EXPLAIN ANALYZE ===

{{{
#!sql
EXPLAIN ANALYZE
SELECT
    w.wedding_id,

    u.first_name || ' ' || u.last_name
        AS organizer_name,

    w.date AS wedding_date,

    e.event_id,
    e.event_type,

    COUNT(DISTINCT g.guest_id)
        AS total_invitations,

    COUNT(DISTINCT r.response_id)
        AS rsvp_responses,

    COUNT(DISTINCT CASE
        WHEN r.status = 'CONFIRMED'
        THEN r.response_id
    END) AS confirmed_rsvps,

    COUNT(DISTINCT CASE
        WHEN r.status = 'DECLINED'
        THEN r.response_id
    END) AS declined_rsvps,

    COUNT(DISTINCT a.attendance_id)
        AS attendance_records,

    COUNT(DISTINCT CASE
        WHEN a.status = 'ATTENDED'
        THEN a.attendance_id
    END) AS actual_attendees

FROM wedding w

INNER JOIN "user" u
    ON w.user_id = u.user_id

INNER JOIN event e
    ON w.wedding_id = e.wedding_id

LEFT JOIN guest g
    ON w.wedding_id = g.wedding_id

LEFT JOIN event_rsvp r
    ON g.guest_id = r.guest_id
    AND e.event_id = r.event_id

LEFT JOIN attendance a
    ON g.guest_id = a.guest_id
    AND e.event_id = a.event_id

GROUP BY
    w.wedding_id,
    u.first_name,
    u.last_name,
    w.date,
    e.event_id,
    e.event_type

ORDER BY
    w.wedding_id,
    e.event_id;
}}}

=== Expected Execution Plan ===

A typical execution plan may include:

{{{
HashAggregate
  -> Hash Left Join
       -> Hash Left Join
            -> Hash Join
                 -> Seq Scan on guest
}}}

=== Interpretation ===

* Seq Scan on guest:
  * guest rows are scanned before RSVP and attendance matching

* Hash Left Join:
  * guests are matched with RSVP and attendance records

* HashAggregate:
  * PostgreSQL groups records by wedding and event before calculating conversion metrics

The execution cost increases with:
* number of guests
* number of events per wedding
* number of RSVP records
* number of attendance records

=== Recommended Indexes ===

{{{
#!sql
CREATE INDEX idx_event_wedding
ON event(wedding_id);

CREATE INDEX idx_guest_wedding
ON guest(wedding_id);

CREATE INDEX idx_event_rsvp_guest
ON event_rsvp(guest_id);

CREATE INDEX idx_event_rsvp_event
ON event_rsvp(event_id);

CREATE INDEX idx_event_rsvp_status
ON event_rsvp(status);

CREATE INDEX idx_attendance_guest
ON attendance(guest_id);

CREATE INDEX idx_attendance_event
ON attendance(event_id);

CREATE INDEX idx_attendance_status
ON attendance(status);
}}}

=== Optimization Benefits ===

The proposed indexes improve:
* guest lookup by wedding
* RSVP lookup by guest and event
* attendance lookup by guest and event
* filtering by RSVP and attendance status
* GROUP BY preparation

The indexes reduce:
* unnecessary sequential scans
* large intermediate join results
* execution time for RSVP reporting

=== Validation ===

{{{
#!sql
EXPLAIN ANALYZE
SELECT *
FROM event_rsvp
WHERE guest_id = 1
  AND event_id = 1;
}}}

Expected result:

{{{
Index Scan using idx_event_rsvp_guest on event_rsvp
}}}

=== Conclusion ===

The RSVP Conversion query is one of the most aggregation-heavy Phase 6 reports because it combines invitation, RSVP, and attendance records into conversion metrics.

The most expensive operations are COUNT(DISTINCT ...) and conditional aggregation.

Indexing guest, RSVP, attendance, and event foreign-key columns directly improves execution performance and makes the report scalable for larger weddings with many guests.

== 1.6 Performance Analysis Summary ==

The Phase 6 analytical reports demonstrate significantly higher execution complexity compared to standard transactional queries.

The main performance-intensive operations identified throughout the analysis are:

* multiple JOIN operations
* LEFT JOIN preservation of incomplete relations
* GROUP BY aggregation
* COUNT(DISTINCT ...) calculations
* conditional aggregation using CASE
* temporal calculations using EXTRACT(EPOCH)
* percentage calculations using ROUND() and NULLIF()

== 1.6.1 Main Bottlenecks ==

|| Bottleneck || Impact ||
|| Multiple LEFT JOIN chains || Increased intermediate result size ||
|| COUNT(DISTINCT ...) || Expensive aggregation and sorting ||
|| GROUP BY over joined relations || Higher memory and CPU usage ||
|| Temporal calculations || Additional CPU processing ||
|| Missing indexes on foreign keys || Sequential scans and slow joins ||

The most expensive queries are:
* RSVP Conversion Analysis
* Venue Capacity Utilization Analysis

because they process:
* attendance records
* RSVP records
* DISTINCT aggregations
* multiple optional relations

== 1.6.2 Most Important Indexes ==

The following indexes provide the greatest performance improvements for the Phase 6 analytical workload:

{{{
#!sql
CREATE INDEX idx_guest_wedding
ON guest(wedding_id);

CREATE INDEX idx_event_wedding
ON event(wedding_id);

CREATE INDEX idx_event_rsvp_guest
ON event_rsvp(guest_id);

CREATE INDEX idx_attendance_event
ON attendance(event_id);

CREATE INDEX idx_venue_booking_wedding
ON venue_booking(wedding_id);

CREATE INDEX idx_photographer_booking_wedding
ON photographer_booking(wedding_id);

CREATE INDEX idx_band_booking_wedding
ON band_booking(wedding_id);
}}}

These indexes improve:
* JOIN performance
* aggregation preparation
* filtering efficiency
* report scalability

== 1.6.3 Expected Optimization Improvements ==

After applying the recommended indexes, PostgreSQL is expected to:
* replace Sequential Scans with Index Scans
* reduce JOIN execution cost
* reduce aggregation preparation time
* reduce overall execution latency

Expected improvements include:
* faster analytical report generation
* lower memory consumption
* lower disk I/O
* better scalability for larger wedding datasets

== 1.6.4 Scalability Considerations ==

As the database grows, the analytical queries from Phase 6 become increasingly dependent on:
* index quality
* efficient JOIN ordering
* aggregation optimization
* table statistics maintenance

The largest future scalability risks are:
* very large attendance datasets
* large RSVP histories
* repeated analytical aggregation over historical weddings

To maintain acceptable performance in large-scale deployments, the following strategies are recommended:
* materialized views for analytical reports
* periodic archiving of historical weddings
* automatic VACUUM and ANALYZE maintenance
* partitioning of large attendance and RSVP tables

== 1.6.5 Final Interpretation ==

The analysis confirms that the Phase 6 analytical reports are computationally more expensive than standard transactional operations because they combine multiple relations and perform aggregation-intensive calculations.

However, with proper indexing and optimization strategies, PostgreSQL can efficiently execute these analytical reports while maintaining acceptable scalability and execution performance.

== 2. Indexing Strategy & Optimization Framework ==

A well-designed indexing strategy is essential for maintaining predictable performance
under high data volumes and concurrent user load.

=== 2.1 Current Index Landscape ===

The Phase 6 analytical reports rely heavily on foreign-key relationships and aggregation over attendance, RSVP, and booking data.

A schema analysis identified the following important optimization requirements:
* indexing of foreign-key columns
* indexing of frequently grouped relations
* optimization of JOIN paths
* optimization of attendance and RSVP aggregation

The most performance-sensitive tables are:
* attendance
* event_rsvp
* venue_booking
* photographer_booking
* band_booking
* guest
* event

=== 2.2 Index Selectivity & Cardinality Analysis ===

|| Column || Cardinality || Index Recommendation ||
|| wedding_id || High || YES ||
|| event_id || High || YES ||
|| guest_id || High || YES ||
|| venue_id || High || YES ||
|| photographer_id || High || YES ||
|| band_id || High || YES ||
|| status || Medium || YES (combined indexes) ||
|| event_type || Medium || Optional ||
|| date || High || YES ||

Columns with high cardinality provide the best index selectivity and improve JOIN performance significantly.

=== 2.3 Recommended Indexes for Phase 6 Reports ===

==== 2.3.1 Foreign-Key Optimization Indexes ====

{{{
#!sql
CREATE INDEX idx_guest_wedding
ON guest(wedding_id);

CREATE INDEX idx_event_wedding
ON event(wedding_id);

CREATE INDEX idx_venue_booking_wedding
ON venue_booking(wedding_id);

CREATE INDEX idx_venue_booking_venue
ON venue_booking(venue_id);

CREATE INDEX idx_photographer_booking_wedding
ON photographer_booking(wedding_id);

CREATE INDEX idx_photographer_booking_photographer
ON photographer_booking(photographer_id);

CREATE INDEX idx_band_booking_wedding
ON band_booking(wedding_id);

CREATE INDEX idx_band_booking_band
ON band_booking(band_id);

CREATE INDEX idx_attendance_event
ON attendance(event_id);

CREATE INDEX idx_attendance_guest
ON attendance(guest_id);

CREATE INDEX idx_event_rsvp_guest
ON event_rsvp(guest_id);

CREATE INDEX idx_event_rsvp_event
ON event_rsvp(event_id);
}}}

==== Explanation ====

These indexes optimize:
* JOIN operations
* attendance aggregation
* RSVP lookup
* booking analysis
* analytical report generation

The indexes reduce:
* sequential scans
* join latency
* aggregation preparation cost

=== 2.3.2 Composite Analytical Indexes ====

{{{
#!sql
CREATE INDEX idx_attendance_event_status
ON attendance(event_id, status);

CREATE INDEX idx_event_rsvp_guest_status
ON event_rsvp(guest_id, status);

CREATE INDEX idx_event_wedding_date
ON event(wedding_id, date);

CREATE INDEX idx_venue_booking_date
ON venue_booking(wedding_id, date);
}}}

==== Explanation ====

Composite indexes improve analytical filtering because the Phase 6 reports frequently:
* filter by status
* group by wedding/event
* analyze attendance by event
* analyze RSVP responses by status

These indexes significantly improve:
* conditional aggregation
* GROUP BY preparation
* attendance filtering
* RSVP reporting

=== 2.4 EXPLAIN ANALYZE Validation ===

The following queries can be used to validate index utilization:

{{{
#!sql
EXPLAIN ANALYZE
SELECT *
FROM attendance
WHERE event_id = 1;

EXPLAIN ANALYZE
SELECT *
FROM event_rsvp
WHERE guest_id = 1;

EXPLAIN ANALYZE
SELECT *
FROM venue_booking
WHERE wedding_id = 1;
}}}

Expected PostgreSQL output:

{{{
Index Scan using idx_attendance_event on attendance

Index Scan using idx_event_rsvp_guest on event_rsvp

Index Scan using idx_venue_booking_wedding on venue_booking
}}}

=== 2.5 Optimization Benefits ===

The proposed indexing strategy improves:
* JOIN performance
* aggregation efficiency
* analytical reporting speed
* scalability of Phase 6 queries

The indexing strategy reduces:
* full table scans
* unnecessary disk I/O
* aggregation overhead
* execution latency

=== 2.6 Maintenance Recommendations ===

To maintain stable analytical performance, the following maintenance procedures are recommended:
* periodic VACUUM execution
* regular ANALYZE statistics updates
* monitoring unused indexes
* reindexing fragmented indexes
* monitoring query execution plans with EXPLAIN ANALYZE

These maintenance operations help PostgreSQL preserve optimal execution plans for the analytical reports implemented in Phase 6.

== 3. Caching, Partitioning & Storage Optimization ==

=== 3.1 Caching & Analytical Optimization ===

The Phase 6 analytical reports execute complex aggregation queries across attendance, RSVP, booking, and event relations.

As the database grows, repeatedly calculating these analytical metrics may increase execution time and server load.

To improve scalability, the following optimization strategies are recommended:
* materialized analytical reports
* cached aggregation results
* periodic statistics refresh
* optimization of historical analytical workloads

=== 3.2 Materialized Views ===

Materialized views can precompute expensive analytical calculations and significantly reduce report execution time.

The following analytical reports are good candidates for materialization:
* Budget Analysis
* Venue Capacity Utilization
* RSVP Conversion Analysis

==== Example: RSVP Conversion Materialized View ====

{{{
#!sql
CREATE MATERIALIZED VIEW mv_rsvp_conversion AS

SELECT
    w.wedding_id,

    e.event_id,

    COUNT(DISTINCT g.guest_id)
        AS total_invitations,

    COUNT(DISTINCT r.response_id)
        AS rsvp_responses,

    COUNT(DISTINCT CASE
        WHEN r.status = 'CONFIRMED'
        THEN r.response_id
    END) AS confirmed_rsvps,

    COUNT(DISTINCT CASE
        WHEN a.status = 'ATTENDED'
        THEN a.attendance_id
    END) AS actual_attendees

FROM wedding w

INNER JOIN event e
    ON w.wedding_id = e.wedding_id

LEFT JOIN guest g
    ON w.wedding_id = g.wedding_id

LEFT JOIN event_rsvp r
    ON g.guest_id = r.guest_id
    AND e.event_id = r.event_id

LEFT JOIN attendance a
    ON g.guest_id = a.guest_id
    AND e.event_id = a.event_id

GROUP BY
    w.wedding_id,
    e.event_id;
}}}

==== Benefits ====

Materialized views improve:
* analytical query speed
* dashboard loading
* repeated reporting performance
* scalability for historical analytics

The materialized view stores precomputed aggregation results and avoids recalculating complex JOIN operations during every report execution.

==== Refreshing the Materialized View ====

{{{
#!sql
REFRESH MATERIALIZED VIEW mv_rsvp_conversion;
}}}

=== 3.3 Partitioning Considerations ===

The largest future analytical tables are expected to be:
* attendance
* event_rsvp
* guest

As historical wedding data grows, partitioning may improve scalability.

Recommended partitioning strategy:
* partition attendance by event date
* partition RSVP records by wedding date
* archive historical weddings separately

==== Example: Attendance Partitioning ====

The following example demonstrates a conceptual partitioned version of the attendance table for large-scale deployments.

{{{
#!sql
CREATE TABLE attendance (

    attendance_id SERIAL,
    status VARCHAR(30),
    table_number INTEGER,
    role VARCHAR(50),
    guest_id INTEGER,
    event_id INTEGER,
    attendance_date DATE,

    PRIMARY KEY(attendance_id, attendance_date)

) PARTITION BY RANGE (attendance_date);
}}}

==== Benefits ====

Partitioning improves:
* analytical query performance
* historical data management
* maintenance operations
* VACUUM efficiency
* scalability of large attendance datasets

=== 3.4 Storage Optimization ===

To improve long-term database performance, the following storage optimizations are recommended:
* periodic VACUUM execution
* ANALYZE statistics updates
* archival of historical wedding records
* separation of analytical and transactional workloads

These optimizations reduce:
* table fragmentation
* outdated planner statistics
* unnecessary sequential scans
* analytical execution overhead

=== 3.5 Scalability Interpretation ===

The analytical queries from Phase 6 are aggregation-heavy and become increasingly expensive as attendance and RSVP data grows.

Caching, materialized views, and partitioning help PostgreSQL maintain predictable execution performance even when processing large analytical datasets.

These strategies are especially important for:
* large weddings
* long-term historical reporting
* repeated dashboard analytics
* concurrent report execution

== 4. Concurrency, Transaction Management & Locking Strategy ==

=== 4.1 Transaction Isolation Levels ===

The Wedding Planner Management System includes several operations that require transactional consistency and protection from concurrent modification.

Examples include:
* venue booking
* attendance updates
* RSVP processing
* wedding scheduling
* event creation

The following PostgreSQL isolation levels are recommended:

|| Isolation Level || Recommended Usage ||
|| READ COMMITTED || Standard CRUD operations ||
|| REPEATABLE READ || Venue booking validation ||
|| SERIALIZABLE || Critical scheduling operations ||

=== 4.2 Concurrency Risks ===

The most important concurrency risks identified are:
* double-booking of venues
* simultaneous RSVP modifications
* concurrent attendance updates
* overlapping event scheduling

Without proper transaction management, multiple users may modify the same wedding-related records simultaneously.

=== 4.3 Deadlock Prevention Strategies ===

The following practices reduce deadlock risk:
* consistent transaction ordering
* short transaction duration
* indexing foreign-key columns
* avoiding unnecessary table locking

The most sensitive operations are:
* venue reservation
* event scheduling
* attendance confirmation

=== 4.4 Safe Venue Reservation Transaction ===

The following transaction prevents double-booking of venues.

{{{
#!sql
BEGIN;

SELECT venue_id
FROM venue
WHERE venue_id = 1
FOR UPDATE;

INSERT INTO venue_booking (
    date,
    start_time,
    end_time,
    status,
    price,
    venue_id,
    wedding_id
)
SELECT
    '2025-08-20',
    '18:00:00',
    '23:00:00',
    'CONFIRMED',
    5000.00,
    1,
    2

WHERE NOT EXISTS (

    SELECT 1

    FROM venue_booking

    WHERE venue_id = 1
      AND date = '2025-08-20'
      AND status != 'CANCELLED'
);

COMMIT;
}}}

=== Explanation ===

`FOR UPDATE` locks the selected venue row during the transaction.

This prevents concurrent transactions from simultaneously booking the same venue for overlapping dates.

The `NOT EXISTS` condition ensures that no conflicting booking already exists.

=== Validation ===

{{{
#!sql
SELECT
    venue_id,
    date,
    COUNT(*)

FROM venue_booking

WHERE status != 'CANCELLED'

GROUP BY
    venue_id,
    date

HAVING COUNT(*) > 1;
}}}

Expected result:
* empty result set

Any returned rows indicate conflicting venue bookings.

=== 4.5 Row-Level Locking ===

The system primarily relies on:
* row-level locks
* transaction isolation
* foreign-key consistency

Row-level locking is preferred because it:
* minimizes blocking
* improves concurrency
* prevents unnecessary table-wide locks

=== 4.6 Transaction Scalability ===

As the number of weddings and concurrent users increases, transaction management becomes increasingly important.

The following practices improve scalability:
* keeping transactions short
* indexing transactional lookup columns
* separating analytical reports from transactional operations
* avoiding long-running locks

=== 4.7 Final Interpretation ===

The Wedding Planner Management System contains several scheduling and booking operations that require transactional consistency.

PostgreSQL transaction isolation and row-level locking mechanisms help prevent:
* double-booking
* inconsistent RSVP updates
* concurrent attendance conflicts
* invalid scheduling states

Proper transaction management ensures both:
* data integrity
* stable concurrent system behavior

== 5. Performance Analysis with EXPLAIN / EXPLAIN ANALYZE ==

=== 5.1 Purpose of EXPLAIN and EXPLAIN ANALYZE ===

PostgreSQL provides several commands for analyzing query execution behavior.

|| Command || Purpose ||
|| EXPLAIN || Shows the planned execution strategy without running the query ||
|| EXPLAIN ANALYZE || Executes the query and shows real execution statistics ||
|| EXPLAIN (ANALYZE, BUFFERS) || Shows execution statistics plus buffer/cache usage ||

In this phase, `EXPLAIN ANALYZE` is used to evaluate the complex analytical queries from Phase 6.

It helps identify:
* scan types
* join strategies
* aggregation methods
* actual execution time
* row counts
* loops
* possible bottlenecks

=== 5.2 Important Execution Plan Elements ===

|| Plan Element || Meaning ||
|| Seq Scan || PostgreSQL scans the entire table ||
|| Index Scan || PostgreSQL uses an index to locate rows faster ||
|| Hash Join || PostgreSQL builds a hash table to join larger datasets ||
|| Nested Loop || PostgreSQL repeatedly scans one relation for each row of another relation ||
|| HashAggregate || PostgreSQL performs grouping and aggregation using a hash table ||
|| Sort || PostgreSQL sorts rows for ORDER BY or aggregation operations ||

For Phase 6 reports, the most common expected elements are:
* Hash Join
* HashAggregate
* Index Scan
* Seq Scan on small tables
* Sort

=== 5.3 Sequential Scan vs Index Scan ===

A Sequential Scan is not always a problem.

PostgreSQL may choose a Sequential Scan when:
* the table is small
* most rows are needed
* the cost of using an index is higher than scanning the table

However, for large Phase 6 analytical tables such as:
* attendance
* event_rsvp
* guest
* booking tables

Index Scans are preferred when filtering or joining by foreign-key columns.

=== 5.4 Example: Attendance Lookup Before Optimization ===

{{{
#!sql
EXPLAIN ANALYZE
SELECT *
FROM attendance
WHERE event_id = 1;
}}}

Without an index on `attendance(event_id)`, PostgreSQL may use:

{{{
Seq Scan on attendance
  Filter: (event_id = 1)
}}}

This means that all attendance rows are scanned before matching rows are returned.

For large attendance tables, this increases:
* disk I/O
* CPU usage
* query latency

=== 5.5 Example: Attendance Lookup After Optimization ===

{{{
#!sql
CREATE INDEX idx_attendance_event
ON attendance(event_id);

EXPLAIN ANALYZE
SELECT *
FROM attendance
WHERE event_id = 1;
}}}

Expected result:

{{{
Index Scan using idx_attendance_event on attendance
  Index Cond: (event_id = 1)
}}}

This confirms that PostgreSQL can directly locate attendance records for a specific event.

=== 5.6 EXPLAIN ANALYZE for Phase 6 Reports ===

The Phase 6 reports should be analyzed using EXPLAIN ANALYZE because they include:
* joins across multiple tables
* aggregation
* distinct counting
* conditional calculations

The analysis should be performed on:
* Budget Analysis query
* Venue Capacity query
* RSVP Conversion query

These queries represent the real analytical workload of the Wedding Planner Management System.

=== 5.7 Interpreting Execution Time ===

When reading EXPLAIN ANALYZE output, the most important values are:
* Planning Time
* Execution Time
* actual rows
* loops
* scan type
* join type

High execution time may indicate:
* missing indexes
* inefficient JOIN order
* expensive aggregation
* large intermediate result sets
* outdated PostgreSQL statistics

=== 5.8 Validation of Index Usage ===

After creating indexes, index usage can be checked using:

{{{
#!sql
SELECT
    indexrelname AS index_name,
    idx_scan AS times_used,
    idx_tup_read AS tuples_read,
    idx_tup_fetch AS tuples_fetched

FROM pg_stat_user_indexes

WHERE indexrelname IN (
    'idx_attendance_event',
    'idx_guest_wedding',
    'idx_event_rsvp_guest',
    'idx_venue_booking_wedding'
);
}}}

If `idx_scan` increases after report execution, the index is being used.

If `idx_scan` remains 0, the index may be unused or the query planner may prefer another execution strategy.

=== 5.9 Summary ===

EXPLAIN ANALYZE is essential for validating the performance of the Phase 6 analytical reports.

It helps confirm whether:
* indexes are used correctly
* joins are efficient
* aggregation is acceptable
* execution time is reasonable

This makes EXPLAIN ANALYZE an important part of database performance tuning and report optimization.

== 6. Security Architecture & Data Protection ==

=== 6.1 Authentication & Authorization ===

The Wedding Planner Management System stores:
* wedding schedules
* guest information
* RSVP records
* attendance data
* booking information

Because the system contains sensitive personal and operational data, controlled database access is required.

The recommended approach is Role-Based Access Control (RBAC).

=== Recommended Roles ===

|| Role || Permissions || Restrictions ||
|| Guest User || Read personal RSVP information || Cannot modify wedding data ||
|| Wedding Organizer || CRUD operations for own weddings and events || Cannot access unrelated weddings ||
|| Administrator || Full database access || Restricted to authorized personnel only ||

=== PostgreSQL Role Implementation ===

{{{
#!sql
CREATE ROLE wedding_guest;

GRANT CONNECT ON DATABASE wedding_planner
TO wedding_guest;

GRANT SELECT ON guest, event_rsvp
TO wedding_guest;

CREATE ROLE wedding_organizer;

GRANT SELECT, INSERT, UPDATE
ON wedding, event, guest, event_rsvp, attendance, venue_booking
TO wedding_organizer;

CREATE ROLE wedding_admin;

GRANT ALL PRIVILEGES
ON ALL TABLES IN SCHEMA public
TO wedding_admin;
}}}

=== Row-Level Security ===

Row-Level Security (RLS) can restrict organizers to accessing only their own weddings.

{{{
#!sql
ALTER TABLE wedding
ENABLE ROW LEVEL SECURITY;

CREATE POLICY wedding_access_policy
ON wedding

USING (
    user_id =
    current_setting('app.current_user_id')::INTEGER
);
}}}

=== Explanation ===

This policy ensures that each organizer can only access weddings linked to their own user account.

This prevents unauthorized access to:
* guest lists
* attendance records
* RSVP information
* booking information

=== Validation ===

{{{
#!sql
SELECT *
FROM wedding;
}}}

Expected behavior:
* users only see weddings associated with their own user_id

=== Security Benefits ===

The proposed RBAC and RLS configuration improves:
* access control
* data isolation
* organizer privacy
* protection against unauthorized data access

These mechanisms are especially important in multi-user environments where several organizers use the system simultaneously.

=== 6.2 Encryption Strategy ===

==== Encryption in Transit ====

All communication between the application and PostgreSQL database should use encrypted connections.

Recommended protections:
* TLS-secured database connections
* encrypted API communication
* secure administrator access

These protections prevent interception of:
* guest information
* RSVP records
* attendance data
* wedding schedules

==== Encryption at Rest ====

Sensitive information stored inside the database should be protected using encryption mechanisms.

Sensitive fields include:
* phone numbers
* email addresses
* guest notes
* RSVP comments

PostgreSQL extensions such as `pgcrypto` can be used for column-level encryption.

==== Example: pgcrypto Encryption ====

{{{
#!sql
INSERT INTO guest (
    first_name,
    last_name,
    email
)
VALUES (
    'Ivan',
    'Petrov',
    pgp_sym_encrypt(
        'ivan@email.com',
        'wedding_secret_key'
    )
);
}}}

==== Example: Decryption ====

{{{
#!sql
SELECT
    first_name,
    last_name,

    pgp_sym_decrypt(
        email::bytea,
        'wedding_secret_key'
    ) AS decrypted_email

FROM guest;
}}}

==== Implementation Note ====

The encryption example demonstrates the conceptual usage of PostgreSQL `pgcrypto`.

In a production implementation, encrypted values should be stored in dedicated encrypted columns (for example `email_encrypted BYTEA`) instead of replacing the original plaintext column directly.

This approach improves schema consistency and avoids datatype conflicts during encryption and decryption operations.

==== Validation ====

{{{
#!sql
SELECT email
FROM guest;
}}}

Expected result:
* encrypted binary data instead of readable plaintext

=== Security Benefits ===

Encryption improves:
* protection of sensitive guest data
* privacy of wedding participants
* protection against unauthorized database access
* compliance with modern data-protection practices

=== 6.3 Data Masking & Anonymization ===

The Wedding Planner Management System stores sensitive personal information related to wedding guests and organizers.

For analytical and testing environments, sensitive information should be masked or anonymized.

Sensitive information includes:
* guest names
* phone numbers
* email addresses
* RSVP comments

=== Example: Masked Reporting View ===

{{{
#!sql
CREATE VIEW v_guest_reporting AS

SELECT
    guest_id,

    CONCAT(
        LEFT(first_name, 1),
        REPEAT('*', LENGTH(first_name) - 1)
    ) AS masked_first_name,

    CONCAT(
        LEFT(last_name, 1),
        REPEAT('*', LENGTH(last_name) - 1)
    ) AS masked_last_name,

    CONCAT(
        REPEAT('*', 5),
        RIGHT(email, POSITION('@' IN email))
    ) AS masked_email,

    wedding_id

FROM guest;
}}}

=== Explanation ===

The view masks personally identifiable information while still allowing analytical reporting.

This approach is useful for:
* testing environments
* reporting dashboards
* analytical exports
* demonstration systems

=== Validation ===

{{{
#!sql
SELECT *
FROM v_guest_reporting;
}}}

Expected result:
* masked guest names
* masked email addresses
* preserved wedding relationships

=== Security Benefits ===

Data masking improves:
* privacy protection
* safer analytical reporting
* reduced exposure of sensitive data
* compliance with data-protection principles

=== 6.4 SQL Injection Prevention ===

The Wedding Planner Management System accepts user-generated input through:
* RSVP forms
* guest registration
* wedding creation
* event scheduling
* attendance management

If SQL queries are constructed incorrectly, malicious input may compromise database security.

=== Unsafe Query Example ===

The following example is vulnerable to SQL injection:

{{{
#!sql
query =
"SELECT * FROM guest WHERE first_name = '" + user_input + "'"
}}}

This approach allows attackers to manipulate SQL syntax through malicious input.

=== Secure Parameterized Query ===

The recommended approach is parameterized execution.

{{{
#!sql
PREPARE guest_lookup(TEXT) AS

SELECT
    guest_id,
    first_name,
    last_name,
    wedding_id

FROM guest

WHERE first_name = $1;
}}}

=== Example Execution ===

{{{
#!sql
EXECUTE guest_lookup('Ivan');
}}}

=== Explanation ===

Parameterized queries separate:
* SQL structure
* user-provided input

This prevents user input from being interpreted as executable SQL code.

=== Security Benefits ===

Parameterized queries protect against:
* unauthorized data access
* SQL injection attacks
* manipulation of RSVP data
* unauthorized attendance modification
* wedding data corruption

=== Recommended Practices ===

The following practices are recommended throughout the system:
* parameterized queries
* input validation
* restricted database permissions
* transaction isolation
* prepared statements

These protections significantly improve overall database security.

=== 6.5 GDPR Compliance ===

The Wedding Planner Management System stores personal information related to:
* wedding organizers
* guests
* RSVP records
* attendance information

Because the system processes personal data, several GDPR-related principles should be respected.

=== Data Minimization ===

Only information necessary for wedding organization should be stored.

Examples:
* guest names
* RSVP responses
* attendance status
* organizer contact information

Unnecessary personal information should not be collected.

=== Access Control ===

Only authorized users should access:
* guest information
* RSVP records
* attendance reports
* wedding schedules

Role-Based Access Control and Row-Level Security help enforce this restriction.

=== Right to Access ===

Users should be able to:
* view their stored information
* verify RSVP information
* review attendance-related records

=== Right to Deletion ===

When requested, personal information should be removable from the database.

Example deletion operation:

{{{
#!sql
DELETE FROM guest
WHERE guest_id = 10;
}}}

=== Backup Protection ===

Backups containing personal information should:
* remain encrypted
* be access-controlled
* be stored securely

=== GDPR Benefits ===

Applying GDPR principles improves:
* privacy protection
* organizer trust
* legal compliance
* secure handling of wedding-related data

=== 6.6 Backup, Restore & Disaster Recovery ===

The Wedding Planner Management System stores important operational and personal data.

Database backups are necessary to protect:
* wedding schedules
* RSVP records
* attendance information
* booking data
* organizer information

=== Recommended Backup Strategy ===

|| Backup Type || Frequency ||
|| Full Backup || Daily ||
|| Incremental Backup || Every few hours ||
|| WAL Archiving || Continuous ||

=== Backup Validation ===

After backup restoration, database consistency should be verified.

{{{
#!sql
SELECT
    'wedding' AS table_name,
    COUNT(*) AS row_count
FROM wedding

UNION ALL

SELECT
    'guest',
    COUNT(*)
FROM guest

UNION ALL

SELECT
    'event',
    COUNT(*)
FROM event

UNION ALL

SELECT
    'attendance',
    COUNT(*)
FROM attendance;
}}}

=== Explanation ===

The validation query confirms that:
* important tables were restored correctly
* row counts are preserved
* critical wedding data still exists

=== Disaster Recovery Recommendations ===

The following practices improve recovery reliability:
* encrypted backups
* off-site backup storage
* periodic restore testing
* automatic backup scheduling

=== Security Benefits ===

Proper backup management improves:
* data recovery capability
* protection against accidental deletion
* recovery after hardware failure
* long-term database reliability

=== 6.7 Audit Logging ===

Audit logging is important for tracking sensitive modifications inside the Wedding Planner Management System.

The system should record:
* RSVP modifications
* attendance updates
* venue booking changes
* wedding schedule modifications
* guest list updates

Audit logging improves:
* accountability
* traceability
* security monitoring
* recovery after accidental modification

=== Example Audit Table ===

{{{
#!sql
CREATE TABLE audit_log (

    audit_id SERIAL PRIMARY KEY,

    table_name VARCHAR(100),

    operation_type VARCHAR(30),

    changed_by VARCHAR(100),

    changed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

    affected_record_id INTEGER
);
}}}

=== Example Trigger Function ===

{{{
#!sql
CREATE OR REPLACE FUNCTION log_guest_changes()

RETURNS TRIGGER AS $$

BEGIN

    INSERT INTO audit_log (
        table_name,
        operation_type,
        changed_by,
        affected_record_id
    )

    VALUES (
        'guest',
        TG_OP,
        CURRENT_USER,
        NEW.guest_id
    );

    RETURN NEW;

END;

$$ LANGUAGE plpgsql;
}}}

=== Example Trigger ===

{{{
#!sql
CREATE TRIGGER trg_guest_audit

AFTER INSERT OR UPDATE
ON guest

FOR EACH ROW

EXECUTE FUNCTION log_guest_changes();
}}}

=== Validation ===

{{{
#!sql
SELECT *
FROM audit_log;
}}}

Expected result:
* logged INSERT and UPDATE operations on guest records

=== Security Benefits ===

Audit logging improves:
* monitoring of sensitive changes
* accountability of database operations
* recovery investigation
* detection of suspicious activity

It is especially important for:
* guest modifications
* RSVP changes
* booking updates
* attendance management

== 7. Final Conclusions ==

Phase 9 extends the Wedding Planner Management System with advanced database engineering concepts focused on:
* performance analysis
* query optimization
* scalability
* transaction management
* security
* analytical workload optimization

Unlike previous phases that focused primarily on schema design and transactional functionality, this phase evaluates how the database behaves under complex analytical workloads generated by the Phase 6 reports.

The analysis focused on:
* Budget Analysis
* Venue Capacity Utilization
* RSVP Conversion Analysis

These reports were analyzed using:
* EXPLAIN ANALYZE
* execution-plan interpretation
* indexing strategies
* aggregation analysis
* scalability evaluation

The identified performance bottlenecks include:
* multiple LEFT JOIN operations
* COUNT(DISTINCT ...) aggregation
* GROUP BY operations
* temporal calculations
* analytical workload complexity

The proposed indexing strategy improves:
* JOIN efficiency
* aggregation preparation
* analytical reporting speed
* scalability for larger datasets

The phase also introduced:
* materialized views
* partitioning strategies
* transaction isolation analysis
* row-level locking
* role-based access control
* row-level security
* encryption strategies
* audit logging
* backup and disaster recovery planning

Together, these techniques transform the Wedding Planner Management System from a simple transactional database into a scalable analytical database platform capable of supporting:
* operational reporting
* performance monitoring
* analytical decision-making
* secure multi-user access

== Final Technical Evaluation ==

The implementation demonstrates:
* advanced PostgreSQL usage
* analytical SQL optimization
* transaction-safe database operations
* scalable reporting architecture
* secure relational database design

The Phase 6 analytical reports were successfully integrated into the Phase 9 performance analysis workflow, providing realistic optimization and scalability evaluation over the actual project workload.

== Final Notes ==

All optimization strategies, indexes, security mechanisms, and execution analyses were designed and validated using PostgreSQL 15.

The implementation illustrates how modern relational database systems support both:
* transactional processing
* analytical business intelligence workloads

within a unified Wedding Planner Management System.