wiki:QueryOptimization

Version 3 (modified by 231082, 4 days ago) ( diff )

--

QueryOptimization

Performance Analysis and Optimization

The performance of the non‑analytical views was analysed using EXPLAIN (ANALYZE, BUFFERS, VERBOSE, TIMING, FORMAT TEXT). Views with poor performance were optimized through indexing and, where necessary, replaced with parameterized functions.

Optimized Views

v_student_current_semester

  • Performance: 23.459 ms
  • This view performs well with the existing indexes. The execution plan shows efficient nested loops with index lookups throughout the join chain.
  • Result: No optimization needed.

v_student_upcoming_deadlines

  • Performance: 35.945 ms
  • The view efficiently combines upcoming exercises and exams using a UNION ALL. Both branches use appropriate indexes.
  • Result: No optimization needed.

v_student_transcript

  • Performance: 0.426 ms
  • Highly efficient, using primarily index scans with minimal buffer hits.
  • Result: No optimization needed.

v_course_announcements

  • Performance: 8.085 ms
  • Performs well with the existing indexes on Course_Edition and Announcement.
  • Result: No optimization needed.

v_exercise_grading_queue

  • Performance: 122.986 ms
  • This view is the most complex, joining through multiple tables. The execution time is still within the acceptable 3‑second threshold.
  • Result: No optimization needed (performance is acceptable).

v_course_prerequisite_chain

  • Performance: 0.563 ms
  • The recursive CTE executes efficiently with minimal buffer hits.
  • Result: No optimization needed.

Views Replaced with Functions

Two views exhibited severe performance issues that indexing could not resolve.

v_student_registration_state

Before Optimization

  • Execution Time: ~2.3 seconds (2268 ms)
  • Buffers: shared hit=6947 read=41906, temp read=64919 written=61623

Problem Analysis

The CTEs in the view scan all records before applying the WHERE filter. PostgreSQL materializes every CTE completely before executing the outer WHERE. Even though only data for a specific student_id is needed, the database:

  • Scans all 1 M program enrollments
  • Joins all 9.7 M grades with all 13.7 M course enrollments
  • Expands through equivalences to ~16 M rows
  • Then discards 99.999 % of them with the WHERE filter

Indexes cannot help because the CTEs have no direct condition on student_id. This wastes enormous I/O and CPU.

Optimization Attempt

Created the following indexes (which did not help):

CREATE INDEX CONCURRENTLY idx_pe_student_active ON Program_Enrollment(student_id)
WHERE date_disenrollment IS NULL AND finished IS NOT TRUE;

CREATE INDEX CONCURRENTLY idx_se_pe_year_sem ON Semester_Enrollment(program_enrollment_id, academic_year, semester);
CREATE INDEX CONCURRENTLY idx_ce_semester ON Course_Enrollment(semester_enrollment_id);
CREATE INDEX CONCURRENTLY idx_ce_edition ON Course_Enrollment(course_edition_id);
CREATE INDEX CONCURRENTLY idx_sg_ce ON Student_Grade(course_enrollment_id);
CREATE INDEX CONCURRENTLY idx_course_equiv_course1 ON Course_Equivalence(course1_code);
CREATE INDEX CONCURRENTLY idx_course_equiv_course2 ON Course_Equivalence(course2_code);
  • After Index Creation: No improvement – execution time ~2285 ms.

Solution: Parameterized Function

Replaced the view with a function that pushes the filter into the CTEs:

CREATE OR REPLACE FUNCTION fn_student_registration_state(p_student_id INTEGER)
RETURNS TABLE (...) LANGUAGE plpgsql STABLE AS $$
BEGIN
RETURN QUERY
WITH active_pe AS (
-- Only one row due to student_id filter pushed down
SELECT pe.id AS pe_id, pe.academic_program_code
FROM Program_Enrollment pe
WHERE pe.student_id = p_student_id
AND pe.date_disenrollment IS NULL
AND pe.finished IS NOT TRUE
),
-- All subsequent CTEs now operate on one student's data
...
  • After Optimization: ~370 ms — a 6× speed improvement.

v_teaching_assistant_eligibility

Before Optimization

  • Execution Time: ~923 ms
  • Buffers: shared hit=8369 read=40525, temp read=39620 written=39808

Problem Analysis

Similar to the previous view, the view processes all students and course editions before filtering, with large table scans and expensive joins.

Optimization Attempt

Created indexes (again with little effect). After indexing, execution time ~966 ms – no improvement.

Solution: Parameterized Function

Replaced the view with a function that first identifies the course edition and then filters to only students who have passed that course.

CREATE OR REPLACE FUNCTION fn_teaching_assistant_eligibility(p_edition_id INTEGER)
    RETURNS TABLE (...) LANGUAGE plpgsql STABLE AS $$
DECLARE
    var_course_code TEXT;
    var_academic_year SMALLINT;
    var_semester SMALLINT;
BEGIN
    SELECT ed.course_code, ed.academic_year, ed.semester
    INTO var_course_code, var_academic_year, var_semester
    FROM Course_Edition ed
    WHERE ed.id = p_edition_id;
    
    -- Only process students who passed this specific course
    ...
  • After Optimization: ~383 ms – a significant improvement (approximately 2.4× faster than the view).

Optimization Summary

View Initial Time Final Time Method
v_student_current_semester 23.459 ms 23.459 ms Existing indexes sufficient
v_student_upcoming_deadlines 35.945 ms 35.945 ms Existing indexes sufficient
v_student_transcript 0.426 ms 0.426 ms Existing indexes sufficient
v_course_announcements 8.085 ms 8.085 ms Existing indexes sufficient
v_exercise_grading_queue 122.986 ms 122.986 ms Acceptable performance
v_course_prerequisite_chain 0.563 ms 0.563 ms Existing indexes sufficient
v_student_registration_state 2268 ms ~370 ms Function replacement (6×)
v_teaching_assistant_eligibility 923 ms ~383 ms Function replacement (2.4×)

Lessons Learned

  • CTE Materialization: PostgreSQL materializes CTEs, preventing early filtering. For queries that need to filter on student_id, CTEs can cause massive unnecessary work.
  • Index Limitations: Indexes on individual tables cannot overcome the fundamental problem of processing data before filtering. When the query structure prevents early filtering, indexes provide limited benefit.
  • Parameterized Functions: Pushing filters into functions allows PostgreSQL to apply conditions at the earliest possible stage, dramatically reducing I/O and processing.
  • Analyze Before Indexing: Always analyse the execution plan before creating indexes. In some cases, the problem is structural, not missing indexes.

Attachments (2)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.