wiki:QueryOptimization

Context Navigation

Version 3 (modified by 231082, 4 days ago) ( diff )
--

QueryOptimization

Performance Analysis and Optimization

The performance of the non‑analytical views was analysed using EXPLAIN (ANALYZE, BUFFERS, VERBOSE, TIMING, FORMAT TEXT). Views with poor performance were optimized through indexing and, where necessary, replaced with parameterized functions.

Optimized Views

`v_student_current_semester`

Performance: 23.459 ms
This view performs well with the existing indexes. The execution plan shows efficient nested loops with index lookups throughout the join chain.
Result: No optimization needed.

`v_student_upcoming_deadlines`

Performance: 35.945 ms
The view efficiently combines upcoming exercises and exams using a UNION ALL. Both branches use appropriate indexes.
Result: No optimization needed.

`v_student_transcript`

Performance: 0.426 ms
Highly efficient, using primarily index scans with minimal buffer hits.
Result: No optimization needed.

v_course_announcements

Performance: 8.085 ms
Performs well with the existing indexes on Course_Edition and Announcement.
Result: No optimization needed.

v_exercise_grading_queue

Performance: 122.986 ms
This view is the most complex, joining through multiple tables. The execution time is still within the acceptable 3‑second threshold.
Result: No optimization needed (performance is acceptable).

v_course_prerequisite_chain

Performance: 0.563 ms
The recursive CTE executes efficiently with minimal buffer hits.
Result: No optimization needed.

Views Replaced with Functions

Two views exhibited severe performance issues that indexing could not resolve.

`v_student_registration_state`

Before Optimization

Execution Time: ~2.3 seconds (2268 ms)
Buffers: shared hit=6947 read=41906, temp read=64919 written=61623

Problem Analysis

The CTEs in the view scan all records before applying the WHERE filter. PostgreSQL materializes every CTE completely before executing the outer WHERE. Even though only data for a specific student_id is needed, the database:

Scans all 1 M program enrollments
Joins all 9.7 M grades with all 13.7 M course enrollments
Expands through equivalences to ~16 M rows
Then discards 99.999 % of them with the WHERE filter

Indexes cannot help because the CTEs have no direct condition on student_id. This wastes enormous I/O and CPU.

Optimization Attempt

Created the following indexes (which did not help):

CREATE INDEX CONCURRENTLY idx_pe_student_active ON Program_Enrollment(student_id)
WHERE date_disenrollment IS NULL AND finished IS NOT TRUE;

CREATE INDEX CONCURRENTLY idx_se_pe_year_sem ON Semester_Enrollment(program_enrollment_id, academic_year, semester);
CREATE INDEX CONCURRENTLY idx_ce_semester ON Course_Enrollment(semester_enrollment_id);
CREATE INDEX CONCURRENTLY idx_ce_edition ON Course_Enrollment(course_edition_id);
CREATE INDEX CONCURRENTLY idx_sg_ce ON Student_Grade(course_enrollment_id);
CREATE INDEX CONCURRENTLY idx_course_equiv_course1 ON Course_Equivalence(course1_code);
CREATE INDEX CONCURRENTLY idx_course_equiv_course2 ON Course_Equivalence(course2_code);

After Index Creation: No improvement – execution time ~2285 ms.

Solution: Parameterized Function

Replaced the view with a function that pushes the filter into the CTEs:

CREATE OR REPLACE FUNCTION fn_student_registration_state(p_student_id INTEGER)
RETURNS TABLE (...) LANGUAGE plpgsql STABLE AS $$
BEGIN
RETURN QUERY
WITH active_pe AS (
-- Only one row due to student_id filter pushed down
SELECT pe.id AS pe_id, pe.academic_program_code
FROM Program_Enrollment pe
WHERE pe.student_id = p_student_id
AND pe.date_disenrollment IS NULL
AND pe.finished IS NOT TRUE
),
-- All subsequent CTEs now operate on one student's data
...

After Optimization: ~370 ms — a 6× speed improvement.

`v_teaching_assistant_eligibility`

Before Optimization

Execution Time: ~923 ms
Buffers: shared hit=8369 read=40525, temp read=39620 written=39808

Problem Analysis

Similar to the previous view, the view processes all students and course editions before filtering, with large table scans and expensive joins.

Optimization Attempt

Created indexes (again with little effect). After indexing, execution time ~966 ms – no improvement.

Solution: Parameterized Function

Replaced the view with a function that first identifies the course edition and then filters to only students who have passed that course.

CREATE OR REPLACE FUNCTION fn_teaching_assistant_eligibility(p_edition_id INTEGER)
    RETURNS TABLE (...) LANGUAGE plpgsql STABLE AS $$
DECLARE
    var_course_code TEXT;
    var_academic_year SMALLINT;
    var_semester SMALLINT;
BEGIN
    SELECT ed.course_code, ed.academic_year, ed.semester
    INTO var_course_code, var_academic_year, var_semester
    FROM Course_Edition ed
    WHERE ed.id = p_edition_id;
    
    -- Only process students who passed this specific course
    ...

After Optimization: ~383 ms – a significant improvement (approximately 2.4× faster than the view).

Optimization Summary

View	Initial Time	Final Time	Method
`v_student_current_semester`	23.459 ms	23.459 ms	Existing indexes sufficient
`v_student_upcoming_deadlines`	35.945 ms	35.945 ms	Existing indexes sufficient
`v_student_transcript`	0.426 ms	0.426 ms	Existing indexes sufficient
`v_course_announcements`	8.085 ms	8.085 ms	Existing indexes sufficient
`v_exercise_grading_queue`	122.986 ms	122.986 ms	Acceptable performance
`v_course_prerequisite_chain`	0.563 ms	0.563 ms	Existing indexes sufficient
`v_student_registration_state`	2268 ms	~370 ms	Function replacement (6×)
`v_teaching_assistant_eligibility`	923 ms	~383 ms	Function replacement (2.4×)

Lessons Learned

CTE Materialization: PostgreSQL materializes CTEs, preventing early filtering. For queries that need to filter on student_id, CTEs can cause massive unnecessary work.

Index Limitations: Indexes on individual tables cannot overcome the fundamental problem of processing data before filtering. When the query structure prevents early filtering, indexes provide limited benefit.

Parameterized Functions: Pushing filters into functions allows PostgreSQL to apply conditions at the earliest possible stage, dramatically reducing I/O and processing.

Analyze Before Indexing: Always analyse the execution plan before creating indexes. In some cases, the problem is structural, not missing indexes.

Attachments (2)

testing.sql (3.0 KB ) - added by 231082 4 days ago. Code for prototyping optimization.
meta-functions.sql (9.7 KB ) - added by 231082 4 days ago. Functional optimization of the two VIEWs.

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text