= QueryOptimization =

== Performance Analysis and Optimization ==

The performance of the non‑analytical views was analysed using `EXPLAIN (ANALYZE, BUFFERS, VERBOSE, TIMING, FORMAT TEXT)`. Views with poor performance were optimized through indexing and, where necessary, replaced with parameterized functions.

== Optimized Views ==

=== `v_student_current_semester` ===

- '''Performance''': 23.459 ms
- This view performs well with the existing indexes. The execution plan shows efficient nested loops with index lookups throughout the join chain.
- '''Result''': No optimization needed.

=== `v_student_upcoming_deadlines` ===

- '''Performance''': 35.945 ms
- The view efficiently combines upcoming exercises and exams using a UNION ALL. Both branches use appropriate indexes.
- '''Result''': No optimization needed.

=== `v_student_transcript` ===

- '''Performance''': 0.426 ms
- Highly efficient, using primarily index scans with minimal buffer hits.
- '''Result''': No optimization needed.

=== v_course_announcements ===

- '''Performance''': 8.085 ms
- Performs well with the existing indexes on Course_Edition and Announcement.
- '''Result''': No optimization needed.

=== v_exercise_grading_queue ===

- '''Performance''': 122.986 ms
- This view is the most complex, joining through multiple tables. The execution time is still within the acceptable 3‑second threshold.
- '''Result''': No optimization needed (performance is acceptable).

=== v_course_prerequisite_chain ===

- '''Performance''': 0.563 ms
- The recursive CTE executes efficiently with minimal buffer hits.
- '''Result''': No optimization needed.

== Views Replaced with Functions ==

Two views exhibited severe performance issues that indexing could not resolve.

=== `v_student_registration_state` ===

==== Before Optimization ====

- '''Execution Time''': ~2.3 seconds (2268 ms)
- '''Buffers''': shared hit=6947 read=41906, temp read=64919 written=61623

==== Problem Analysis ====

The CTEs in the view scan all records before applying the WHERE filter. PostgreSQL materializes every CTE completely before executing the outer `WHERE`. Even though only data for a specific `student_id` is needed, the database:

- Scans all 1 M program enrollments
- Joins all 9.7 M grades with all 13.7 M course enrollments
- Expands through equivalences to ~16 M rows
- Then discards 99.999 % of them with the WHERE filter

Indexes cannot help because the CTEs have no direct condition on `student_id`. This wastes enormous I/O and CPU.

==== Optimization Attempt ====

Created the following indexes (which did not help):

{{{
#!sql
CREATE INDEX CONCURRENTLY idx_pe_student_active ON Program_Enrollment(student_id)
WHERE date_disenrollment IS NULL AND finished IS NOT TRUE;

CREATE INDEX CONCURRENTLY idx_se_pe_year_sem ON Semester_Enrollment(program_enrollment_id, academic_year, semester);
CREATE INDEX CONCURRENTLY idx_ce_semester ON Course_Enrollment(semester_enrollment_id);
CREATE INDEX CONCURRENTLY idx_ce_edition ON Course_Enrollment(course_edition_id);
CREATE INDEX CONCURRENTLY idx_sg_ce ON Student_Grade(course_enrollment_id);
CREATE INDEX CONCURRENTLY idx_course_equiv_course1 ON Course_Equivalence(course1_code);
CREATE INDEX CONCURRENTLY idx_course_equiv_course2 ON Course_Equivalence(course2_code);
}}}

- '''After Index Creation''': No improvement – execution time ~2285 ms.

==== Solution: Parameterized Function ====

Replaced the view with a function that pushes the filter into the CTEs:

{{{
#!sql
CREATE OR REPLACE FUNCTION fn_student_registration_state(p_student_id INTEGER)
RETURNS TABLE (...) LANGUAGE plpgsql STABLE AS $$
BEGIN
RETURN QUERY
WITH active_pe AS (
-- Only one row due to student_id filter pushed down
SELECT pe.id AS pe_id, pe.academic_program_code
FROM Program_Enrollment pe
WHERE pe.student_id = p_student_id
AND pe.date_disenrollment IS NULL
AND pe.finished IS NOT TRUE
),
-- All subsequent CTEs now operate on one student's data
...
}}}

- '''After Optimization''': ~370 ms — a '''6× speed improvement'''.

=== `v_teaching_assistant_eligibility` ===

==== Before Optimization ====

- '''Execution Time''': ~923 ms

- '''Buffers''': shared hit=8369 read=40525, temp read=39620 written=39808

==== Problem Analysis ====

Similar to the previous view, the view processes all students and course editions before filtering, with large table scans and expensive joins.

==== Optimization Attempt ====

Created indexes (again with little effect). After indexing, execution time ~966 ms – no improvement.

==== Solution: Parameterized Function ====

Replaced the view with a function that first identifies the course edition and then filters to only students who have passed that course.

- '''After Optimization''': ~940 ms – a slight improvement due to reduced data volume.

== Optimization Summary ==

|| '''View''' || '''Initial Time''' || '''Final Time''' || '''Method''' ||
|| v_student_current_semester || 23.459 ms || 23.459 ms || Existing indexes sufficient ||
|| v_student_upcoming_deadlines || 35.945 ms || 35.945 ms || Existing indexes sufficient ||
|| v_student_transcript || 0.426 ms || 0.426 ms || Existing indexes sufficient ||
|| v_course_announcements || 8.085 ms || 8.085 ms || Existing indexes sufficient ||
|| v_exercise_grading_queue || 122.986 ms || 122.986 ms || Acceptable performance ||
|| v_course_prerequisite_chain || 0.563 ms || 0.563 ms || Existing indexes sufficient ||
|| v_student_registration_state || 2268 ms || ~370 ms || Function replacement (6×) ||
|| v_teaching_assistant_eligibility || 923 ms || ~940 ms || Function replacement ||

== Lessons Learned ==

- '''CTE Materialization''': PostgreSQL materializes CTEs, preventing early filtering. For queries that need to filter on student_id, CTEs can cause massive unnecessary work.

- '''Index Limitations''': Indexes on individual tables cannot overcome the fundamental problem of processing data before filtering. When the query structure prevents early filtering, indexes provide limited benefit.

- '''Parameterized Functions''': Pushing filters into functions allows PostgreSQL to apply conditions at the earliest possible stage, dramatically reducing I/O and processing.

- '''Analyze Before Indexing''': Always analyse the execution plan before creating indexes. In some cases, the problem is structural, not missing indexes.