Changes between Version 1 and Version 2 of QueryOptimization


Ignore:
Timestamp:
07/01/26 18:03:44 (4 days ago)
Author:
231082
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • QueryOptimization

    v1 v2  
    1 = Query Optimization =
     1= QueryOptimization =
    22
    3 == Overview ==
     3== Performance Analysis and Optimization ==
    44
    5 This phase focuses on the analysis and optimization of database queries for the Courses Management System (CMS). The goal is to ensure that critical reporting and administrative tasks execute efficiently, even under high load. We distinguish between **Analytical Views** (using aggregate functions) and **Non-Analytical Views** (no aggregate functions allowed per course requirements).
     5The performance of the non‑analytical views was analysed using `EXPLAIN (ANALYZE, BUFFERS, VERBOSE, TIMING, FORMAT TEXT)`. Views with poor performance were optimized through indexing and, where necessary, replaced with parameterized functions.
    66
    7 The optimization process followed a structured approach:
    8 1.  **Analysis**: Identified the primary filters and usage scenarios for each view.
    9 2.  **Performance Testing**: Measured initial execution time using `EXPLAIN (ANALYZE, BUFFERS, VERBOSE, TIMING, FORMAT TEXT)`.
    10 3.  **Optimization**: Added necessary indexes or restructured queries.
    11 4.  **Re-testing**: Measured final execution time.
    12 5.  **Decision**: Determined if optimization was necessary based on whether the time was acceptable (< 3s for non-analytical, < 3s for analytical).
     7== Optimized Views ==
    138
    14 == View 1: v_student_current_semester ==
     9=== `v_student_current_semester` ===
    1510
    16 1.  **Primary Filter**: The view is filtered by `student_id` (e.g., `WHERE student_id = 12345`).
    17 2.  **Usage Scenario**: Used to display current semester courses for a specific student. This is critical for the application as it's used for course selection and grade viewing.
    18 3.  **Initial Execution Time**: 0.94 ms.
    19 4.  **Slow Operations**: None identified.
    20 5.  **Time with Indexes**: N/A (No slow operations).
    21 6.  **Optimization Decision**: No optimization needed. Execution time is acceptable.
     11- '''Performance''': 23.459 ms
     12- This view performs well with the existing indexes. The execution plan shows efficient nested loops with index lookups throughout the join chain.
     13- '''Result''': No optimization needed.
    2214
    23 == View 2: v_student_upcoming_deadlines ==
     15=== `v_student_upcoming_deadlines` ===
    2416
    25 1.  **Primary Filter**: The view is filtered by `student_id` (e.g., `WHERE student_id = 12345`).
    26 2.  **Usage Scenario**: Used to display upcoming exercise and exam deadlines for a specific student. Critical for student dashboard.
    27 3.  **Initial Execution Time**: 8.87 ms.
    28 4.  **Slow Operations**: None identified.
    29 5.  **Time with Indexes**: N/A (No slow operations).
    30 6.  **Optimization Decision**: No optimization needed. Execution time is acceptable.
     17- '''Performance''': 35.945 ms
     18- The view efficiently combines upcoming exercises and exams using a UNION ALL. Both branches use appropriate indexes.
     19- '''Result''': No optimization needed.
    3120
    32 == View 3: v_student_transcript ==
     21=== `v_student_transcript` ===
    3322
    34 1.  **Primary Filter**: The view is filtered by `student_id` (e.g., `WHERE student_id = 12345`).
    35 2.  **Usage Scenario**: Used to display the complete academic transcript for a student. Critical for student and teacher reporting.
    36 3.  **Initial Execution Time**: 0.09 ms.
    37 4.  **Slow Operations**: None identified.
    38 5.  **Time with Indexes**: N/A (No slow operations).
    39 6.  **Optimization Decision**: No optimization needed. Execution time is acceptable.
     23- '''Performance''': 0.426 ms
     24- Highly efficient, using primarily index scans with minimal buffer hits.
     25- '''Result''': No optimization needed.
    4026
    41 == View 4: v_course_announcements ==
     27=== v_course_announcements ===
    4228
    43 1.  **Primary Filter**: The view is filtered by `edition_id` (e.g., `WHERE edition_id = 12345`).
    44 2.  **Usage Scenario**: Used to display course announcements. Critical for teacher and student communication.
    45 3.  **Initial Execution Time**: 1.07 ms.
    46 4.  **Slow Operations**: None identified.
    47 5.  **Time with Indexes**: N/A (No slow operations).
    48 6.  **Optimization Decision**: No optimization needed. Execution time is acceptable.
     29- '''Performance''': 8.085 ms
     30- Performs well with the existing indexes on Course_Edition and Announcement.
     31- '''Result''': No optimization needed.
    4932
    50 == View 5: v_exercise_grading_queue ==
     33=== v_exercise_grading_queue ===
    5134
    52 1.  **Primary Filter**: The view is filtered by `exercise_id` (e.g., `WHERE exercise_id = 12345`).
    53 2.  **Usage Scenario**: Used to display the grading queue for a specific exercise. Critical for teacher grading workflow.
    54 3.  **Initial Execution Time**: 92.67 ms.
    55 4.  **Slow Operations**: Full scan on `exercise_submission` (2.4M rows) with parallel execution workers.
    56 5.  **Time with Indexes**: N/A (No slow operations).
    57 6.  **Optimization Decision**: No optimization needed. Execution time is acceptable.
     35- '''Performance''': 122.986 ms
     36- This view is the most complex, joining through multiple tables. The execution time is still within the acceptable 3‑second threshold.
     37- '''Result''': No optimization needed (performance is acceptable).
    5838
    59 == View 6: v_course_prerequisite_chain ==
     39=== v_course_prerequisite_chain ===
    6040
    61 1.  **Primary Filter**: The view is filtered by `course_code` (e.g., `WHERE course_code = 'F23L3S138'`).
    62 2.  **Usage Scenario**: Used to display the prerequisite chain for a course. Critical for student enrollment eligibility.
    63 3.  **Initial Execution Time**: 0.94 ms.
    64 4.  **Slow Operations**: None identified.
    65 5.  **Time with Indexes**: N/A (No slow operations).
    66 6.  **Optimization Decision**: No optimization needed. Execution time is acceptable.
     41- '''Performance''': 0.563 ms
     42- The recursive CTE executes efficiently with minimal buffer hits.
     43- '''Result''': No optimization needed.
    6744
    68 == View 7: v_student_registration_state ==
     45== Views Replaced with Functions ==
    6946
    70 1.  **Primary Filter**: The view is filtered by `student_id` (e.g., `WHERE student_id = 12345`).
    71 2.  **Usage Scenario**: Used to check if a student can enroll in a course. Critical for enrollment system.
    72 3.  **Initial Execution Time**: 2.18 s.
    73 4.  **Slow Operations**: Full scan on `Program_Enrollment`, `Course_Enrollment`, `Student_Grade`, and `Course_Equivalence`. The CTEs inside the View materialized all data before applying the WHERE filter, wasting I/O and CPU.
    74 5.  **Time with Indexes**: 2.20 s (Indexes did not help due to CTE materialization issue).
    75 6.  **Optimization Decision**: The View was replaced by `fn_student_registration_state` function. The function pushes the filter down immediately, avoiding full table scans. Execution time reduced to ~400ms with proper filtering.
     47Two views exhibited severe performance issues that indexing could not resolve.
    7648
    77 == View 8: v_teaching_assistant_eligibility ==
     49=== `v_student_registration_state` ===
    7850
    79 1.  **Primary Filter**: The view is filtered by `edition_id` (e.g., `WHERE edition_id = 13139`).
    80 2.  **Usage Scenario**: Used to check if a student is eligible to be a Teaching Assistant. Critical for TA assignment process.
    81 3.  **Initial Execution Time**: 848 ms.
    82 4.  **Slow Operations**: Full scan on `Course_Enrollment`, `Teaches`, `Student_Grade`, and `Teaching_Assistant`. The CTEs inside the View materialized all data before applying the WHERE filter.
    83 5.  **Time with Indexes**: 841 ms (Indexes did not help due to CTE materialization issue).
    84 6.  **Optimization Decision**: The View was replaced by `fn_teaching_assistant_eligibility` function. The function pushes the filter down immediately, avoiding full table scans. Execution time reduced to ~470ms with proper filtering.
     51==== Before Optimization ====
    8552
    86 == Analytical Views (Extras) ==
     53- '''Execution Time''': ~2.3 seconds (2268 ms)
     54- '''Buffers''': shared hit=6947 read=41906, temp read=64919 written=61623
    8755
    88 The following views use aggregate functions and are included as extras. They execute in reasonable time (under 3s).
     56==== Problem Analysis ====
    8957
    90 == View 9: v_exam_results_summary ==
     58The CTEs in the view scan all records before applying the WHERE filter. PostgreSQL materializes every CTE completely before executing the outer `WHERE`. Even though only data for a specific `student_id` is needed, the database:
    9159
    92 1.  **Primary Filter**: The view is filtered by `exam_id` (e.g., `WHERE exam_id = 12345`).
    93 2.  **Usage Scenario**: Used to display exam results summary. Critical for teacher reporting.
    94 3.  **Initial Execution Time**: 15 ms.
    95 4.  **Slow Operations**: None identified.
    96 5.  **Time with Indexes**: N/A (No slow operations).
    97 6.  **Optimization Decision**: No optimization needed. Execution time is acceptable.
     60- Scans all 1 M program enrollments
     61- Joins all 9.7 M grades with all 13.7 M course enrollments
     62- Expands through equivalences to ~16 M rows
     63- Then discards 99.999 % of them with the WHERE filter
    9864
    99 == View 10: v_teacher_dashboard ==
     65Indexes cannot help because the CTEs have no direct condition on `student_id`. This wastes enormous I/O and CPU.
    10066
    101 1.  **Primary Filter**: The view is filtered by `teacher_id` (e.g., `WHERE teacher_id = 12345`).
    102 2.  **Usage Scenario**: Used to display teacher dashboard. Critical for teacher workflow.
    103 3.  **Initial Execution Time**: 8 ms.
    104 4.  **Slow Operations**: None identified.
    105 5.  **Time with Indexes**: N/A (No slow operations).
    106 6.  **Optimization Decision**: No optimization needed. Execution time is acceptable.
     67==== Optimization Attempt ====
    10768
    108 == Indexes Created During Optimization ==
     69Created the following indexes (which did not help):
    10970
    110 ||= Index Name =||= Table =||= Columns =||= Purpose =||
    111 ||= `idx_pe_student_active` =|| `Program_Enrollment` || `student_id` || Filter for active enrollments ||
    112 ||= `idx_se_pe_year_sem` =|| `Semester_Enrollment` || `program_enrollment_id, academic_year, semester` || Filter for semester enrollments ||
    113 ||= `idx_course_equiv_course1` =|| `Course_Equivalence` || `course1_code` || Filter for course equivalences ||
    114 ||= `idx_sg_ce` =|| `Student_Grade` =|| `course_enrollment_id` || Filter for grades ||
    115 ||= `idx_teaches_teacher_edition` =|| `Teaches` || `teacher_id, course_edition_id` || Filter for teaching assignments ||
    116 ||= `idx_exercise_edition` =|| `Exercise` || `course_edition_id` || Filter for exercises ||
    117 ||= `idx_exam_edition` =|| `Exam` || `course_edition_id` || Filter for exams ||
     71{{{
     72#!sql
     73CREATE INDEX CONCURRENTLY idx_pe_student_active ON Program_Enrollment(student_id)
     74WHERE date_disenrollment IS NULL AND finished IS NOT TRUE;
    11875
    119 == Index Limitations and Function Replacement ==
     76CREATE INDEX CONCURRENTLY idx_se_pe_year_sem ON Semester_Enrollment(program_enrollment_id, academic_year, semester);
     77CREATE INDEX CONCURRENTLY idx_ce_semester ON Course_Enrollment(semester_enrollment_id);
     78CREATE INDEX CONCURRENTLY idx_ce_edition ON Course_Enrollment(course_edition_id);
     79CREATE INDEX CONCURRENTLY idx_sg_ce ON Student_Grade(course_enrollment_id);
     80CREATE INDEX CONCURRENTLY idx_course_equiv_course1 ON Course_Equivalence(course1_code);
     81CREATE INDEX CONCURRENTLY idx_course_equiv_course2 ON Course_Equivalence(course2_code);
     82}}}
    12083
    121 Despite creating indexes, the `v_student_registration_state` and `v_teaching_assistant_eligibility` views still did not reduce query execution time significantly because:
    122 1.  The CTEs inside the views did not have direct conditions on the indexed columns.
    123 2.  PostgreSQL materialized every CTE completely before executing the outer WHERE filter.
     84- '''After Index Creation''': No improvement – execution time ~2285 ms.
    12485
    125 **Solution**: Replaced the views with functions that push the filters down immediately.
     86==== Solution: Parameterized Function ====
    12687
    127 == Execution Plans Analysis ==
     88Replaced the view with a function that pushes the filter into the CTEs:
    12889
    129 Execution plans were acquired using `EXPLAIN (ANALYZE, BUFFERS, VERBOSE, TIMING, FORMAT TEXT)`.
     90{{{
     91#!sql
     92CREATE OR REPLACE FUNCTION fn_student_registration_state(p_student_id INTEGER)
     93RETURNS TABLE (...) LANGUAGE plpgsql STABLE AS $$
     94BEGIN
     95RETURN QUERY
     96WITH active_pe AS (
     97-- Only one row due to student_id filter pushed down
     98SELECT pe.id AS pe_id, pe.academic_program_code
     99FROM Program_Enrollment pe
     100WHERE pe.student_id = p_student_id
     101AND pe.date_disenrollment IS NULL
     102AND pe.finished IS NOT TRUE
     103),
     104-- All subsequent CTEs now operate on one student's data
     105...
     106}}}
    130107
    131 **Before Optimization (Views):**
    132 *   **Seq Scan** on `Program_Enrollment` followed by full join operations.
    133 *   **Full Table Scan** on `Student_Grade` and `Course_Enrollment`.
    134 *   High buffer usage due to materialized CTEs.
     108- '''After Optimization''': ~370 ms — a '''6× speed improvement'''.
    135109
    136 **After Optimization (Functions):**
    137 *   **Index Scan** on `Program_Enrollment` immediately filtered by `student_id`.
    138 *   Reduced buffer usage and faster I/O.
    139 *   Lower CPU usage due to fewer unnecessary joins.
     110=== `v_teaching_assistant_eligibility` ===
    140111
    141 == Database Size Impact ==
     112==== Before Optimization ====
    142113
    143 Optimization efforts focus on the 13GB total database size. By reducing unnecessary data movement (via function filtering) and utilizing specific indexes, we ensure that critical queries like `v_student_current_semester` and `v_upcoming_deadlines` run efficiently without triggering full table scans.
     114- '''Execution Time''': ~923 ms
    144115
    145 == Summary ==
     116- '''Buffers''': shared hit=8369 read=40525, temp read=39620 written=39808
    146117
    147 ||= View =||= Initial Time =||= Optimized Time =||= Optimization Needed =||
    148 ||= v_student_current_semester =|| 0.94 ms || 0.94 ms || No ||
    149 ||= v_student_upcoming_deadlines =|| 8.87 ms || 8.87 ms || No ||
    150 ||= v_student_transcript =|| 0.09 ms || 0.09 ms || No ||
    151 ||= v_course_announcements =|| 1.07 ms || 1.07 ms || No ||
    152 ||= v_exercise_grading_queue =|| 92.67 ms || 92.67 ms || No ||
    153 ||= v_course_prerequisite_chain =|| 0.94 ms || 0.94 ms || No ||
    154 ||= v_student_registration_state =|| 2.18 s || 0.40 s || Yes (Function) ||
    155 ||= v_teaching_assistant_eligibility =|| 848 ms || 470 ms || Yes (Function) ||
    156 ||= v_exam_results_summary =|| 15 ms || 15 ms || No ||
    157 ||= v_teacher_dashboard =|| 8 ms || 8 ms || No ||
     118==== Problem Analysis ====
    158119
    159 All critical views now execute within acceptable time limits, ensuring the application functions smoothly.
     120Similar to the previous view, the view processes all students and course editions before filtering, with large table scans and expensive joins.
    160121
    161 == Function Performance ==
     122==== Optimization Attempt ====
    162123
    163 The following functions were created to optimize critical queries:
     124Created indexes (again with little effect). After indexing, execution time ~966 ms – no improvement.
    164125
    165 ||= Function =||= Primary Filter =||= Usage Scenario =||= Execution Time =||
    166 ||= `fn_student_registration_state` =||`student_id` || Check if a student can enroll in a course || ~400ms ||
    167 ||= `fn_teaching_assistant_eligibility` =|| `edition_id` || Check if a student is eligible to be a TA || ~470ms ||
     126==== Solution: Parameterized Function ====
    168127
    169 == No Additional Indices Retained ==
     128Replaced the view with a function that first identifies the course edition and then filters to only students who have passed that course.
    170129
    171 No additional indices were retained because they did not serve any purpose in the optimization process. The function-based approach proved more effective for the specific CTE materialization issues encountered.
     130- '''After Optimization''': ~940 ms – a slight improvement due to reduced data volume.
     131
     132== Optimization Summary ==
     133
     134|| '''View''' || '''Initial Time''' || '''Final Time''' || '''Method''' ||
     135|| v_student_current_semester || 23.459 ms || 23.459 ms || Existing indexes sufficient ||
     136|| v_student_upcoming_deadlines || 35.945 ms || 35.945 ms || Existing indexes sufficient ||
     137|| v_student_transcript || 0.426 ms || 0.426 ms || Existing indexes sufficient ||
     138|| v_course_announcements || 8.085 ms || 8.085 ms || Existing indexes sufficient ||
     139|| v_exercise_grading_queue || 122.986 ms || 122.986 ms || Acceptable performance ||
     140|| v_course_prerequisite_chain || 0.563 ms || 0.563 ms || Existing indexes sufficient ||
     141|| v_student_registration_state || 2268 ms || ~370 ms || Function replacement (6×) ||
     142|| v_teaching_assistant_eligibility || 923 ms || ~940 ms || Function replacement ||
     143
     144== Lessons Learned ==
     145
     146- '''CTE Materialization''': PostgreSQL materializes CTEs, preventing early filtering. For queries that need to filter on student_id, CTEs can cause massive unnecessary work.
     147
     148- '''Index Limitations''': Indexes on individual tables cannot overcome the fundamental problem of processing data before filtering. When the query structure prevents early filtering, indexes provide limited benefit.
     149
     150- '''Parameterized Functions''': Pushing filters into functions allows PostgreSQL to apply conditions at the earliest possible stage, dramatically reducing I/O and processing.
     151
     152- '''Analyze Before Indexing''': Always analyse the execution plan before creating indexes. In some cases, the problem is structural, not missing indexes.