wiki:OtherDevelopments

Version 11 (modified by 221296, 6 days ago) ( diff )

--

SQL Performance Analysis of Complex Queries

The performance analysis in this phase focuses on the two most complex analytical queries developed in Phase 6.

Instead of analyzing simple CRUD operations, we analyzed the most computationally expensive reports in the system. The goal was to identify the most expensive JOIN operations, grouping operations and date-based aggregations, and evaluate whether indexes could improve their execution performance.

For each query, we analyzed:

JOIN conditions GROUP BY operations aggregate functions date-based filtering and grouping foreign key relationships

The queries were tested using PostgreSQL EXPLAIN ANALYZE before and after index creation.

Scenario 1: Yearly analysis of overall platform activity and student success across enrollments, quiz attempts, completions, certificates, and course performance

WITH course_year_performance AS (
    SELECT
        EXTRACT(YEAR FROM qa.attempt_date) AS year,
        c.course_id,
        ROUND(AVG(qa.score), 2) AS final_grade_per_course
    FROM quiz_attempt qa
    JOIN quiz q
        ON q.quiz_id = qa.quiz_id
    JOIN lesson l
        ON l.lesson_id = q.lesson_id
    JOIN module m
        ON m.module_id = l.module_id
    JOIN course c
        ON c.course_id = m.course_id
    GROUP BY
        EXTRACT(YEAR FROM qa.attempt_date),
        c.course_id
),
year_course_summary AS (
    SELECT
        year,
        ROUND(AVG(final_grade_per_course), 2) AS final_year_success
    FROM course_year_performance
    GROUP BY year
)
SELECT
    EXTRACT(YEAR FROM e.enroll_date) AS year,
    COUNT(DISTINCT c.instructor_id) AS total_instructor_activity,
    COUNT(DISTINCT e.user_id) AS total_students,
    COUNT(DISTINCT e.enrollment_id) AS total_enrollments,
    COUNT(DISTINCT qa.attempt_id) AS total_attempts,
    COUNT(DISTINCT CASE
        WHEN e.completion_status = 'COMPLETED' THEN e.enrollment_id
    END) AS completed_enrollments,
    COUNT(DISTINCT cert.certificate_id) AS total_certificates,
    ROUND(AVG(e.progress_percentage), 2) AS avg_student_success_within_courses,
    ROUND(AVG(qa.score), 2) AS avg_final_grade_per_course, ycs.final_year_success
FROM enrollment e
JOIN course c
    ON c.course_id = e.course_id
LEFT JOIN certificate cert
    ON cert.enrollment_id = e.enrollment_id
LEFT JOIN module m
    ON m.course_id = c.course_id
LEFT JOIN lesson l
    ON l.module_id = m.module_id
LEFT JOIN quiz q
    ON q.lesson_id = l.lesson_id
LEFT JOIN quiz_attempt qa
    ON qa.quiz_id = q.quiz_id
    AND qa.user_id = e.user_id
LEFT JOIN year_course_summary ycs
    ON ycs.year = EXTRACT(YEAR FROM e.enroll_date)
GROUP BY
    EXTRACT(YEAR FROM e.enroll_date), ycs.final_year_success
ORDER BY year;

This query provides a yearly analytical overview of the platform.

It combines information from:

courses enrollments certificates modules lessons quizzes quiz attempts

The report calculates:

total instructor activity total students total enrollments total quiz attempts completed enrollments total certificates average student success within courses average final grade per course final yearly success indicator

The query performs multiple JOIN operations across the learning hierarchy:

Course → Module → Lesson → Quiz → QuizAttempt
Enrollment → Certificate
Enrollment → QuizAttempt

The most expensive operations are the joins between:

cert.enrollment_id = e.enrollment_id
m.course_id = c.course_id
l.module_id = m.module_id
q.lesson_id = l.lesson_id
qa.quiz_id = q.quiz_id
qa.user_id = e.user_id

To improve performance, indexes were added on the columns used in the JOIN conditions.

Indexes:

CREATE INDEX IF NOT EXISTS idx_certificate_enrollment
ON certificate(enrollment_id);

CREATE INDEX IF NOT EXISTS idx_module_course
ON module(course_id);

CREATE INDEX IF NOT EXISTS idx_lesson_module
ON lesson(module_id);

CREATE INDEX IF NOT EXISTS idx_quiz_lesson
ON quiz(lesson_id);

CREATE INDEX IF NOT EXISTS idx_quiz_attempt_quiz_user
ON quiz_attempt(quiz_id, user_id);

CREATE INDEX IF NOT EXISTS idx_enrollment_date
ON enrollment(enroll_date);

Explanation:

The first five indexes reduce the cost of JOIN operations by allowing PostgreSQL to directly locate related records instead of scanning entire tables.

The index on enroll_date improves the yearly grouping operation because the report groups data by year.

Result without indexes:

GroupAggregate  (cost=80709.32..144979.11 rows=73000 width=176) (actual time=164.281..170.938 rows=2 loops=1)
  Group Key: (EXTRACT(year FROM e.enroll_date)), ycs.final_year_success
  ->  Incremental Sort  (cost=80709.32..128051.98 rows=555805 width=103) (actual time=148.388..157.374 rows=20006 loops=1)
        Sort Key: (EXTRACT(year FROM e.enroll_date)), ycs.final_year_success, c.instructor_id
        Presorted Key: (EXTRACT(year FROM e.enroll_date))
        Full-sort Groups: 2  Sort Method: quicksort  Average Memory: 29kB  Peak Memory: 29kB
        Pre-sorted Groups: 2  Sort Method: quicksort  Average Memory: 1054kB  Peak Memory: 1054kB
        ->  Merge Left Join  (cost=80598.37..91715.97 rows=555805 width=103) (actual time=137.946..145.502 rows=20006 loops=1)
              Merge Cond: ((EXTRACT(year FROM e.enroll_date)) = ycs.year)
              ->  Sort  (cost=79099.25..80488.76 rows=555805 width=43) (actual time=108.940..110.102 rows=20006 loops=1)
                    Sort Key: (EXTRACT(year FROM e.enroll_date))
                    Sort Method: quicksort  Memory: 2019kB
                    ->  Hash Right Join  (cost=1576.68..8962.73 rows=555805 width=43) (actual time=91.417..97.214 rows=20006 loops=1)
                          Hash Cond: ((qa.quiz_id = q.quiz_id) AND (qa.user_id = e.user_id))
                          ->  Seq Scan on quiz_attempt qa  (cost=0.00..328.00 rows=20000 width=16) (actual time=0.016..1.444 rows=20000 loops=1)
                          ->  Hash  (cost=805.50..805.50 rows=51412 width=39) (actual time=86.618..86.623 rows=20006 loops=1)
                                Buckets: 65536  Batches: 1  Memory Usage: 1841kB
                                ->  Hash Right Join  (cost=235.03..805.50 rows=51412 width=39) (actual time=76.887..80.470 rows=20006 loops=1)
                                      Hash Cond: (m.course_id = c.course_id)
                                      ->  Hash Right Join  (cost=29.45..67.86 rows=1850 width=8) (actual time=0.114..0.139 rows=18 loops=1)
                                            Hash Cond: (l.module_id = m.module_id)
                                            ->  Hash Right Join  (cost=14.72..48.18 rows=1850 width=8) (actual time=0.054..0.070 rows=18 loops=1)
                                                  Hash Cond: (q.lesson_id = l.lesson_id)
                                                  ->  Seq Scan on quiz q  (cost=0.00..28.50 rows=1850 width=8) (actual time=0.007..0.011 rows=18 loops=1)
                                                  ->  Hash  (cost=12.10..12.10 rows=210 width=8) (actual time=0.027..0.027 rows=18 loops=1)
                                                        Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                        ->  Seq Scan on lesson l  (cost=0.00..12.10 rows=210 width=8) (actual time=0.015..0.017 rows=18 loops=1)
                                            ->  Hash  (cost=12.10..12.10 rows=210 width=8) (actual time=0.040..0.040 rows=10 loops=1)
                                                  Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                  ->  Seq Scan on module m  (cost=0.00..12.10 rows=210 width=8) (actual time=0.029..0.031 rows=10 loops=1)
                                      ->  Hash  (cost=143.05..143.05 rows=5002 width=39) (actual time=76.677..76.679 rows=5002 loops=1)
                                            Buckets: 8192  Batches: 1  Memory Usage: 397kB
                                            ->  Hash Left Join  (cost=29.45..143.05 rows=5002 width=39) (actual time=72.865..75.398 rows=5002 loops=1)
                                                  Hash Cond: (e.enrollment_id = cert.enrollment_id)
                                                  ->  Hash Join  (cost=14.05..114.51 rows=5002 width=35) (actual time=72.822..74.641 rows=5002 loops=1)
                                                        Hash Cond: (e.course_id = c.course_id)
                                                        ->  Seq Scan on enrollment e  (cost=0.00..87.02 rows=5002 width=31) (actual time=0.014..0.477 rows=5002 loops=1)
                                                        ->  Hash  (cost=11.80..11.80 rows=180 width=8) (actual time=72.758..72.758 rows=5 loops=1)
                                                              Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                              ->  Seq Scan on course c  (cost=0.00..11.80 rows=180 width=8) (actual time=72.722..72.724 rows=5 loops=1)
                                                  ->  Hash  (cost=12.40..12.40 rows=240 width=8) (actual time=0.026..0.026 rows=1 loops=1)
                                                        Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                        ->  Seq Scan on certificate cert  (cost=0.00..12.40 rows=240 width=8) (actual time=0.019..0.019 rows=1 loops=1)
              ->  Sort  (cost=1499.12..1499.62 rows=200 width=64) (actual time=28.978..28.981 rows=2 loops=1)
                    Sort Key: ycs.year
                    Sort Method: quicksort  Memory: 25kB
                    ->  Subquery Scan on ycs  (cost=1486.47..1491.47 rows=200 width=64) (actual time=28.966..28.973 rows=2 loops=1)
                          ->  HashAggregate  (cost=1486.47..1489.47 rows=200 width=64) (actual time=28.960..28.966 rows=2 loops=1)
                                Group Key: (EXTRACT(year FROM qa_1.attempt_date))
                                Batches: 1  Memory Usage: 40kB
                                ->  HashAggregate  (cost=836.47..1186.47 rows=20000 width=68) (actual time=28.889..28.938 rows=2 loops=1)
                                      Group Key: EXTRACT(year FROM qa_1.attempt_date), c_1.course_id
                                      Batches: 1  Memory Usage: 793kB
                                      ->  Hash Join  (cost=95.12..686.47 rows=20000 width=40) (actual time=0.189..22.495 rows=20000 loops=1)
                                            Hash Cond: (m_1.course_id = c_1.course_id)
                                            ->  Hash Join  (cost=81.07..568.81 rows=20000 width=12) (actual time=0.142..15.695 rows=20000 loops=1)
                                                  Hash Cond: (l_1.module_id = m_1.module_id)
                                                  ->  Hash Join  (cost=66.35..500.53 rows=20000 width=12) (actual time=0.095..11.438 rows=20000 loops=1)
                                                        Hash Cond: (q_1.lesson_id = l_1.lesson_id)
                                                        ->  Hash Join  (cost=51.62..432.25 rows=20000 width=12) (actual time=0.053..7.286 rows=20000 loops=1)
                                                              Hash Cond: (qa_1.quiz_id = q_1.quiz_id)
                                                              ->  Seq Scan on quiz_attempt qa_1  (cost=0.00..328.00 rows=20000 width=12) (actual time=0.009..1.902 rows=20000 loops=1)
                                                              ->  Hash  (cost=28.50..28.50 rows=1850 width=8) (actual time=0.024..0.024 rows=18 loops=1)
                                                                    Buckets: 2048  Batches: 1  Memory Usage: 17kB
                                                                    ->  Seq Scan on quiz q_1  (cost=0.00..28.50 rows=1850 width=8) (actual time=0.012..0.015 rows=18 loops=1)
                                                        ->  Hash  (cost=12.10..12.10 rows=210 width=8) (actual time=0.023..0.023 rows=18 loops=1)
                                                              Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                              ->  Seq Scan on lesson l_1  (cost=0.00..12.10 rows=210 width=8) (actual time=0.012..0.014 rows=18 loops=1)
                                                  ->  Hash  (cost=12.10..12.10 rows=210 width=8) (actual time=0.028..0.029 rows=10 loops=1)
                                                        Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                        ->  Seq Scan on module m_1  (cost=0.00..12.10 rows=210 width=8) (actual time=0.018..0.020 rows=10 loops=1)
                                            ->  Hash  (cost=11.80..11.80 rows=180 width=4) (actual time=0.025..0.026 rows=5 loops=1)
                                                  Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                  ->  Seq Scan on course c_1  (cost=0.00..11.80 rows=180 width=4) (actual time=0.017..0.018 rows=5 loops=1)
Planning Time: 2.538 ms
JIT:
  Functions: 112
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 8.302 ms (Deform 4.108 ms), Inlining 0.000 ms, Optimization 3.069 ms, Emission 69.974 ms, Total 81.346 ms
Execution Time: 944.483 ms

# # Before adding indexes, PostgreSQL used sequential scans on the main joined tables, especially on quiz_attempt, which contains 20000 test records. The query execution time was 944.483 ms..

Result with indexes:

GroupAggregate  (cost=80709.32..144979.11 rows=73000 width=176) (actual time=157.669..164.135 rows=2 loops=1)
  Group Key: (EXTRACT(year FROM e.enroll_date)), ycs.final_year_success
  ->  Incremental Sort  (cost=80709.32..128051.98 rows=555805 width=103) (actual time=141.941..150.977 rows=20006 loops=1)
        Sort Key: (EXTRACT(year FROM e.enroll_date)), ycs.final_year_success, c.instructor_id
        Presorted Key: (EXTRACT(year FROM e.enroll_date))
        Full-sort Groups: 2  Sort Method: quicksort  Average Memory: 29kB  Peak Memory: 29kB
        Pre-sorted Groups: 2  Sort Method: quicksort  Average Memory: 1054kB  Peak Memory: 1054kB
        ->  Merge Left Join  (cost=80598.37..91715.97 rows=555805 width=103) (actual time=131.481..139.062 rows=20006 loops=1)
              Merge Cond: ((EXTRACT(year FROM e.enroll_date)) = ycs.year)
              ->  Sort  (cost=79099.25..80488.76 rows=555805 width=43) (actual time=104.761..105.930 rows=20006 loops=1)
                    Sort Key: (EXTRACT(year FROM e.enroll_date))
                    Sort Method: quicksort  Memory: 2019kB
                    ->  Hash Right Join  (cost=1576.68..8962.73 rows=555805 width=43) (actual time=87.109..92.938 rows=20006 loops=1)
                          Hash Cond: ((qa.quiz_id = q.quiz_id) AND (qa.user_id = e.user_id))
                          ->  Seq Scan on quiz_attempt qa  (cost=0.00..328.00 rows=20000 width=16) (actual time=0.012..1.530 rows=20000 loops=1)
                          ->  Hash  (cost=805.50..805.50 rows=51412 width=39) (actual time=82.485..82.490 rows=20006 loops=1)
                                Buckets: 65536  Batches: 1  Memory Usage: 1841kB
                                ->  Hash Right Join  (cost=235.03..805.50 rows=51412 width=39) (actual time=72.927..76.507 rows=20006 loops=1)
                                      Hash Cond: (m.course_id = c.course_id)
                                      ->  Hash Right Join  (cost=29.45..67.86 rows=1850 width=8) (actual time=0.106..0.131 rows=18 loops=1)
                                            Hash Cond: (l.module_id = m.module_id)
                                            ->  Hash Right Join  (cost=14.72..48.18 rows=1850 width=8) (actual time=0.053..0.069 rows=18 loops=1)
                                                  Hash Cond: (q.lesson_id = l.lesson_id)
                                                  ->  Seq Scan on quiz q  (cost=0.00..28.50 rows=1850 width=8) (actual time=0.006..0.010 rows=18 loops=1)
                                                  ->  Hash  (cost=12.10..12.10 rows=210 width=8) (actual time=0.025..0.026 rows=18 loops=1)
                                                        Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                        ->  Seq Scan on lesson l  (cost=0.00..12.10 rows=210 width=8) (actual time=0.014..0.016 rows=18 loops=1)
                                            ->  Hash  (cost=12.10..12.10 rows=210 width=8) (actual time=0.034..0.034 rows=10 loops=1)
                                                  Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                  ->  Seq Scan on module m  (cost=0.00..12.10 rows=210 width=8) (actual time=0.023..0.025 rows=10 loops=1)
                                      ->  Hash  (cost=143.05..143.05 rows=5002 width=39) (actual time=72.755..72.757 rows=5002 loops=1)
                                            Buckets: 8192  Batches: 1  Memory Usage: 397kB
                                            ->  Hash Left Join  (cost=29.45..143.05 rows=5002 width=39) (actual time=68.918..71.468 rows=5002 loops=1)
                                                  Hash Cond: (e.enrollment_id = cert.enrollment_id)
                                                  ->  Hash Join  (cost=14.05..114.51 rows=5002 width=35) (actual time=68.863..70.694 rows=5002 loops=1)
                                                        Hash Cond: (e.course_id = c.course_id)
                                                        ->  Seq Scan on enrollment e  (cost=0.00..87.02 rows=5002 width=31) (actual time=0.013..0.508 rows=5002 loops=1)
                                                        ->  Hash  (cost=11.80..11.80 rows=180 width=8) (actual time=68.810..68.810 rows=5 loops=1)
                                                              Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                              ->  Seq Scan on course c  (cost=0.00..11.80 rows=180 width=8) (actual time=68.775..68.777 rows=5 loops=1)
                                                  ->  Hash  (cost=12.40..12.40 rows=240 width=8) (actual time=0.037..0.038 rows=1 loops=1)
                                                        Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                        ->  Seq Scan on certificate cert  (cost=0.00..12.40 rows=240 width=8) (actual time=0.030..0.031 rows=1 loops=1)
              ->  Sort  (cost=1499.12..1499.62 rows=200 width=64) (actual time=26.690..26.694 rows=2 loops=1)
                    Sort Key: ycs.year
                    Sort Method: quicksort  Memory: 25kB
                    ->  Subquery Scan on ycs  (cost=1486.47..1491.47 rows=200 width=64) (actual time=26.678..26.684 rows=2 loops=1)
                          ->  HashAggregate  (cost=1486.47..1489.47 rows=200 width=64) (actual time=26.673..26.678 rows=2 loops=1)
                                Group Key: (EXTRACT(year FROM qa_1.attempt_date))
                                Batches: 1  Memory Usage: 40kB
                                ->  HashAggregate  (cost=836.47..1186.47 rows=20000 width=68) (actual time=26.606..26.652 rows=2 loops=1)
                                      Group Key: EXTRACT(year FROM qa_1.attempt_date), c_1.course_id
                                      Batches: 1  Memory Usage: 793kB
                                      ->  Hash Join  (cost=95.12..686.47 rows=20000 width=40) (actual time=0.181..20.697 rows=20000 loops=1)
                                            Hash Cond: (m_1.course_id = c_1.course_id)
                                            ->  Hash Join  (cost=81.07..568.81 rows=20000 width=12) (actual time=0.135..14.468 rows=20000 loops=1)
                                                  Hash Cond: (l_1.module_id = m_1.module_id)
                                                  ->  Hash Join  (cost=66.35..500.53 rows=20000 width=12) (actual time=0.094..10.598 rows=20000 loops=1)
                                                        Hash Cond: (q_1.lesson_id = l_1.lesson_id)
                                                        ->  Hash Join  (cost=51.62..432.25 rows=20000 width=12) (actual time=0.052..6.728 rows=20000 loops=1)
                                                              Hash Cond: (qa_1.quiz_id = q_1.quiz_id)
                                                              ->  Seq Scan on quiz_attempt qa_1  (cost=0.00..328.00 rows=20000 width=12) (actual time=0.005..1.791 rows=20000 loops=1)
                                                              ->  Hash  (cost=28.50..28.50 rows=1850 width=8) (actual time=0.024..0.025 rows=18 loops=1)
                                                                    Buckets: 2048  Batches: 1  Memory Usage: 17kB
                                                                    ->  Seq Scan on quiz q_1  (cost=0.00..28.50 rows=1850 width=8) (actual time=0.013..0.015 rows=18 loops=1)
                                                        ->  Hash  (cost=12.10..12.10 rows=210 width=8) (actual time=0.023..0.023 rows=18 loops=1)
                                                              Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                              ->  Seq Scan on lesson l_1  (cost=0.00..12.10 rows=210 width=8) (actual time=0.012..0.014 rows=18 loops=1)
                                                  ->  Hash  (cost=12.10..12.10 rows=210 width=8) (actual time=0.023..0.023 rows=10 loops=1)
                                                        Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                        ->  Seq Scan on module m_1  (cost=0.00..12.10 rows=210 width=8) (actual time=0.013..0.015 rows=10 loops=1)
                                            ->  Hash  (cost=11.80..11.80 rows=180 width=4) (actual time=0.026..0.026 rows=5 loops=1)
                                                  Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                  ->  Seq Scan on course c_1  (cost=0.00..11.80 rows=180 width=4) (actual time=0.017..0.018 rows=5 loops=1)
Planning Time: 1.811 ms
JIT:
  Functions: 112
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 5.875 ms (Deform 2.882 ms), Inlining 0.000 ms, Optimization 2.428 ms, Emission 66.682 ms, Total 74.985 ms
Execution Time: 171.114 ms

# After adding indexes, the execution time decreased from 944.483 ms to 171.114 ms. This shows that the added indexes significantly improved the execution of the yearly analysis query. The improvement is especially visible because the query joins multiple tables and processes 20000 quiz_attempt records.

Scenario 2: Monthly analysis of course activity, completions, instructor involvement, and student success within a given year

WITH student_course_month_performance AS (
    SELECT
        EXTRACT(YEAR FROM qa.attempt_date) AS year,
        EXTRACT(MONTH FROM qa.attempt_date) AS month,
        c.course_id,
        qa.user_id,
        ROUND(AVG(qa.score), 2) AS avg_student_score
    FROM quiz_attempt qa
    JOIN quiz q
        ON q.quiz_id = qa.quiz_id
    JOIN lesson l
        ON l.lesson_id = q.lesson_id
    JOIN module m
        ON m.module_id = l.module_id
    JOIN course c
        ON c.course_id = m.course_id
    GROUP BY
        EXTRACT(YEAR FROM qa.attempt_date),
        EXTRACT(MONTH FROM qa.attempt_date),
        c.course_id,
        qa.user_id
),
course_month_performance AS (
    SELECT
        year,
        month,
        course_id,
        ROUND(AVG(avg_student_score), 2) AS avg_course_success
    FROM student_course_month_performance
    GROUP BY
        year,
        month,
        course_id
),
month_performance_summary AS (
    SELECT
        year,
        month,
        ROUND(AVG(avg_course_success), 2) AS final_month_success
    FROM course_month_performance
    GROUP BY
        year,
        month
)
SELECT
    EXTRACT(YEAR FROM e.enroll_date) AS year,
    EXTRACT(MONTH FROM e.enroll_date) AS month,
    COUNT(DISTINCT e.enrollment_id) AS total_enrollments,
    COUNT(DISTINCT CASE
        WHEN e.completion_status = 'COMPLETED' THEN e.enrollment_id
    END) AS completed_enrollments,
    COUNT(DISTINCT c.instructor_id) AS active_instructors,
    ROUND(AVG(scmp.avg_student_score), 2) AS avg_student_success,
    ROUND(AVG(cmp.avg_course_success), 2) AS avg_course_success,
    mps.final_month_success
FROM enrollment e
JOIN course c
    ON c.course_id = e.course_id
LEFT JOIN student_course_month_performance scmp
    ON scmp.year = EXTRACT(YEAR FROM e.enroll_date)
    AND scmp.month = EXTRACT(MONTH FROM e.enroll_date)
    AND scmp.course_id = e.course_id
    AND scmp.user_id = e.user_id
LEFT JOIN course_month_performance cmp
    ON cmp.year = EXTRACT(YEAR FROM e.enroll_date)
    AND cmp.month = EXTRACT(MONTH FROM e.enroll_date)
    AND cmp.course_id = e.course_id
LEFT JOIN month_performance_summary mps
    ON mps.year = EXTRACT(YEAR FROM e.enroll_date)
    AND mps.month = EXTRACT(MONTH FROM e.enroll_date)
GROUP BY
    EXTRACT(YEAR FROM e.enroll_date),
    EXTRACT(MONTH FROM e.enroll_date),
    mps.final_month_success
ORDER BY
    year,
    month;

This query provides a month-by-month analytical overview of the platform for a selected year.

The report combines:

enrollment activity course completions instructor participation quiz performance student success metrics

The query performs multiple JOIN operations through the complete learning structure:

QuizAttempt → Quiz → Lesson → Module → Course
Enrollment → Course
Enrollment → User

The most important JOIN conditions are:

qa.quiz_id = q.quiz_id
q.lesson_id = l.lesson_id
l.module_id = m.module_id
m.course_id = c.course_id
e.course_id = c.course_id

Additionally, the report groups data by:

YEAR(enroll_date)
MONTH(enroll_date)

YEAR(attempt_date)
MONTH(attempt_date)

To optimize these operations, the following indexes were added:

CREATE INDEX IF NOT EXISTS idx_module_course
ON module(course_id);

CREATE INDEX IF NOT EXISTS idx_lesson_module
ON lesson(module_id);

CREATE INDEX IF NOT EXISTS idx_quiz_lesson
ON quiz(lesson_id);

CREATE INDEX IF NOT EXISTS idx_quiz_attempt_quiz
ON quiz_attempt(quiz_id);

CREATE INDEX IF NOT EXISTS idx_enrollment_course
ON enrollment(course_id);

CREATE INDEX IF NOT EXISTS idx_enrollment_date
ON enrollment(enroll_date);

CREATE INDEX IF NOT EXISTS idx_quiz_attempt_date
ON quiz_attempt(attempt_date);

Explanation:

Indexes on foreign key columns improve JOIN performance, while indexes on date columns improve monthly grouping and chronological analysis.

These indexes reduce the amount of data PostgreSQL needs to scan when generating monthly reports.

Result without indexes:

GroupAggregate  (cost=4486.03..5392.22 rows=5002 width=184) (actual time=52.268..79.158 rows=13 loops=1)
  Group Key: (EXTRACT(year FROM e.enroll_date)), (EXTRACT(month FROM e.enroll_date)), mps.final_month_success
  CTE student_course_month_performance
    ->  HashAggregate  (cost=986.47..1386.47 rows=20000 width=104) (actual time=38.750..38.848 rows=13 loops=1)
          Group Key: EXTRACT(year FROM qa.attempt_date), EXTRACT(month FROM qa.attempt_date), c_1.course_id, qa.user_id
          Batches: 1  Memory Usage: 793kB
          ->  Hash Join  (cost=95.12..736.47 rows=20000 width=76) (actual time=0.233..29.694 rows=20000 loops=1)
                Hash Cond: (m.course_id = c_1.course_id)
                ->  Hash Join  (cost=81.07..568.81 rows=20000 width=16) (actual time=0.183..17.984 rows=20000 loops=1)
                      Hash Cond: (l.module_id = m.module_id)
                      ->  Hash Join  (cost=66.35..500.53 rows=20000 width=16) (actual time=0.128..13.219 rows=20000 loops=1)
                            Hash Cond: (q.lesson_id = l.lesson_id)
                            ->  Hash Join  (cost=51.62..432.25 rows=20000 width=16) (actual time=0.074..8.352 rows=20000 loops=1)
                                  Hash Cond: (qa.quiz_id = q.quiz_id)
                                  ->  Seq Scan on quiz_attempt qa  (cost=0.00..328.00 rows=20000 width=16) (actual time=0.013..2.193 rows=20000 loops=1)
                                  ->  Hash  (cost=28.50..28.50 rows=1850 width=8) (actual time=0.028..0.029 rows=18 loops=1)
                                        Buckets: 2048  Batches: 1  Memory Usage: 17kB
                                        ->  Seq Scan on quiz q  (cost=0.00..28.50 rows=1850 width=8) (actual time=0.018..0.020 rows=18 loops=1)
                            ->  Hash  (cost=12.10..12.10 rows=210 width=8) (actual time=0.028..0.028 rows=18 loops=1)
                                  Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                  ->  Seq Scan on lesson l  (cost=0.00..12.10 rows=210 width=8) (actual time=0.011..0.014 rows=18 loops=1)
                      ->  Hash  (cost=12.10..12.10 rows=210 width=8) (actual time=0.028..0.028 rows=10 loops=1)
                            Buckets: 1024  Batches: 1  Memory Usage: 9kB
                            ->  Seq Scan on module m  (cost=0.00..12.10 rows=210 width=8) (actual time=0.019..0.021 rows=10 loops=1)
                ->  Hash  (cost=11.80..11.80 rows=180 width=4) (actual time=0.027..0.028 rows=5 loops=1)
                      Buckets: 1024  Batches: 1  Memory Usage: 9kB
                      ->  Seq Scan on course c_1  (cost=0.00..11.80 rows=180 width=4) (actual time=0.013..0.014 rows=5 loops=1)
  CTE course_month_performance
    ->  HashAggregate  (cost=600.00..630.00 rows=2000 width=100) (actual time=38.873..38.896 rows=13 loops=1)
          Group Key: student_course_month_performance.year, student_course_month_performance.month, student_course_month_performance.course_id
          Batches: 1  Memory Usage: 121kB
          ->  CTE Scan on student_course_month_performance  (cost=0.00..400.00 rows=20000 width=100) (actual time=38.759..38.858 rows=13 loops=1)
  ->  Incremental Sort  (cost=2469.56..3138.16 rows=5002 width=179) (actual time=49.907..76.256 rows=5002 loops=1)
        Sort Key: (EXTRACT(year FROM e.enroll_date)), (EXTRACT(month FROM e.enroll_date)), mps.final_month_success, e.enrollment_id
        Presorted Key: (EXTRACT(year FROM e.enroll_date)), (EXTRACT(month FROM e.enroll_date))
        Full-sort Groups: 13  Sort Method: quicksort  Average Memory: 28kB  Peak Memory: 28kB
        Pre-sorted Groups: 13  Sort Method: quicksort  Average Memory: 35kB  Peak Memory: 35kB
        ->  Merge Left Join  (cost=2467.91..2973.88 rows=5002 width=179) (actual time=48.700..61.930 rows=5002 loops=1)
              Merge Cond: ((EXTRACT(year FROM e.enroll_date) = cmp.year) AND (EXTRACT(month FROM e.enroll_date) = cmp.month))
              Join Filter: (cmp.course_id = e.course_id)
              Rows Removed by Join Filter: 5002
              ->  Merge Left Join  (cost=2318.26..2721.07 rows=5002 width=91) (actual time=48.654..55.635 rows=5002 loops=1)
                    Merge Cond: ((EXTRACT(year FROM e.enroll_date) = scmp.year) AND (EXTRACT(month FROM e.enroll_date) = scmp.month))
                    Join Filter: ((scmp.course_id = e.course_id) AND (scmp.user_id = e.user_id))
                    Rows Removed by Join Filter: 5002
                    ->  Merge Left Join  (cost=489.49..554.76 rows=5002 width=63) (actual time=48.610..50.944 rows=5002 loops=1)
                          Merge Cond: (((EXTRACT(year FROM e.enroll_date)) = mps.year) AND ((EXTRACT(month FROM e.enroll_date)) = mps.month))
                          ->  Sort  (cost=421.84..434.35 rows=5002 width=31) (actual time=9.656..10.054 rows=5002 loops=1)
                                Sort Key: (EXTRACT(year FROM e.enroll_date)), (EXTRACT(month FROM e.enroll_date))
                                Sort Method: quicksort  Memory: 505kB
                                ->  Hash Join  (cost=14.05..114.51 rows=5002 width=31) (actual time=0.084..3.772 rows=5002 loops=1)
                                      Hash Cond: (e.course_id = c.course_id)
                                      ->  Seq Scan on enrollment e  (cost=0.00..87.02 rows=5002 width=27) (actual time=0.015..0.505 rows=5002 loops=1)
                                      ->  Hash  (cost=11.80..11.80 rows=180 width=8) (actual time=0.033..0.033 rows=5 loops=1)
                                            Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                            ->  Seq Scan on course c  (cost=0.00..11.80 rows=180 width=8) (actual time=0.017..0.019 rows=5 loops=1)
                          ->  Sort  (cost=67.64..68.14 rows=200 width=96) (actual time=38.949..38.952 rows=13 loops=1)
                                Sort Key: mps.year, mps.month
                                Sort Method: quicksort  Memory: 25kB
                                ->  Subquery Scan on mps  (cost=55.00..60.00 rows=200 width=96) (actual time=38.921..38.932 rows=13 loops=1)
                                      ->  HashAggregate  (cost=55.00..58.00 rows=200 width=96) (actual time=38.921..38.929 rows=13 loops=1)
                                            Group Key: course_month_performance.year, course_month_performance.month
                                            Batches: 1  Memory Usage: 40kB
                                            ->  CTE Scan on course_month_performance  (cost=0.00..40.00 rows=2000 width=96) (actual time=38.875..38.903 rows=13 loops=1)
                    ->  Sort  (cost=1828.77..1878.77 rows=20000 width=104) (actual time=0.038..0.328 rows=4794 loops=1)
                          Sort Key: scmp.year, scmp.month
                          Sort Method: quicksort  Memory: 25kB
                          ->  CTE Scan on student_course_month_performance scmp  (cost=0.00..400.00 rows=20000 width=104) (actual time=0.002..0.004 rows=13 loops=1)
              ->  Sort  (cost=149.66..154.66 rows=2000 width=100) (actual time=0.040..0.330 rows=4794 loops=1)
                    Sort Key: cmp.year, cmp.month
                    Sort Method: quicksort  Memory: 25kB
                    ->  CTE Scan on course_month_performance cmp  (cost=0.00..40.00 rows=2000 width=100) (actual time=0.001..0.003 rows=13 loops=1)
Planning Time: 3.274 ms
Execution Time: 80.672 ms

Result њith indexes:

GroupAggregate  (cost=4486.03..5392.22 rows=5002 width=184) (actual time=51.214..77.962 rows=13 loops=1)
  Group Key: (EXTRACT(year FROM e.enroll_date)), (EXTRACT(month FROM e.enroll_date)), mps.final_month_success
  CTE student_course_month_performance
    ->  HashAggregate  (cost=986.47..1386.47 rows=20000 width=104) (actual time=38.401..38.500 rows=13 loops=1)
          Group Key: EXTRACT(year FROM qa.attempt_date), EXTRACT(month FROM qa.attempt_date), c_1.course_id, qa.user_id
          Batches: 1  Memory Usage: 793kB
          ->  Hash Join  (cost=95.12..736.47 rows=20000 width=76) (actual time=0.105..29.696 rows=20000 loops=1)
                Hash Cond: (m.course_id = c_1.course_id)
                ->  Hash Join  (cost=81.07..568.81 rows=20000 width=16) (actual time=0.079..17.990 rows=20000 loops=1)
                      Hash Cond: (l.module_id = m.module_id)
                      ->  Hash Join  (cost=66.35..500.53 rows=20000 width=16) (actual time=0.058..13.125 rows=20000 loops=1)
                            Hash Cond: (q.lesson_id = l.lesson_id)
                            ->  Hash Join  (cost=51.62..432.25 rows=20000 width=16) (actual time=0.031..8.272 rows=20000 loops=1)
                                  Hash Cond: (qa.quiz_id = q.quiz_id)
                                  ->  Seq Scan on quiz_attempt qa  (cost=0.00..328.00 rows=20000 width=16) (actual time=0.007..1.975 rows=20000 loops=1)
                                  ->  Hash  (cost=28.50..28.50 rows=1850 width=8) (actual time=0.018..0.019 rows=18 loops=1)
                                        Buckets: 2048  Batches: 1  Memory Usage: 17kB
                                        ->  Seq Scan on quiz q  (cost=0.00..28.50 rows=1850 width=8) (actual time=0.007..0.010 rows=18 loops=1)
                            ->  Hash  (cost=12.10..12.10 rows=210 width=8) (actual time=0.018..0.018 rows=18 loops=1)
                                  Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                  ->  Seq Scan on lesson l  (cost=0.00..12.10 rows=210 width=8) (actual time=0.007..0.009 rows=18 loops=1)
                      ->  Hash  (cost=12.10..12.10 rows=210 width=8) (actual time=0.012..0.012 rows=10 loops=1)
                            Buckets: 1024  Batches: 1  Memory Usage: 9kB
                            ->  Seq Scan on module m  (cost=0.00..12.10 rows=210 width=8) (actual time=0.008..0.009 rows=10 loops=1)
                ->  Hash  (cost=11.80..11.80 rows=180 width=4) (actual time=0.012..0.012 rows=5 loops=1)
                      Buckets: 1024  Batches: 1  Memory Usage: 9kB
                      ->  Seq Scan on course c_1  (cost=0.00..11.80 rows=180 width=4) (actual time=0.009..0.010 rows=5 loops=1)
  CTE course_month_performance
    ->  HashAggregate  (cost=600.00..630.00 rows=2000 width=100) (actual time=38.529..38.551 rows=13 loops=1)
          Group Key: student_course_month_performance.year, student_course_month_performance.month, student_course_month_performance.course_id
          Batches: 1  Memory Usage: 121kB
          ->  CTE Scan on student_course_month_performance  (cost=0.00..400.00 rows=20000 width=100) (actual time=38.403..38.505 rows=13 loops=1)
  ->  Incremental Sort  (cost=2469.56..3138.16 rows=5002 width=179) (actual time=48.839..75.086 rows=5002 loops=1)
        Sort Key: (EXTRACT(year FROM e.enroll_date)), (EXTRACT(month FROM e.enroll_date)), mps.final_month_success, e.enrollment_id
        Presorted Key: (EXTRACT(year FROM e.enroll_date)), (EXTRACT(month FROM e.enroll_date))
        Full-sort Groups: 13  Sort Method: quicksort  Average Memory: 28kB  Peak Memory: 28kB
        Pre-sorted Groups: 13  Sort Method: quicksort  Average Memory: 35kB  Peak Memory: 35kB
        ->  Merge Left Join  (cost=2467.91..2973.88 rows=5002 width=179) (actual time=47.704..60.866 rows=5002 loops=1)
              Merge Cond: ((EXTRACT(year FROM e.enroll_date) = cmp.year) AND (EXTRACT(month FROM e.enroll_date) = cmp.month))
              Join Filter: (cmp.course_id = e.course_id)
              Rows Removed by Join Filter: 5002
              ->  Merge Left Join  (cost=2318.26..2721.07 rows=5002 width=91) (actual time=47.674..54.628 rows=5002 loops=1)
                    Merge Cond: ((EXTRACT(year FROM e.enroll_date) = scmp.year) AND (EXTRACT(month FROM e.enroll_date) = scmp.month))
                    Join Filter: ((scmp.course_id = e.course_id) AND (scmp.user_id = e.user_id))
                    Rows Removed by Join Filter: 5002
                    ->  Merge Left Join  (cost=489.49..554.76 rows=5002 width=63) (actual time=47.638..49.989 rows=5002 loops=1)
                          Merge Cond: (((EXTRACT(year FROM e.enroll_date)) = mps.year) AND ((EXTRACT(month FROM e.enroll_date)) = mps.month))
                          ->  Sort  (cost=421.84..434.35 rows=5002 width=31) (actual time=9.036..9.445 rows=5002 loops=1)
                                Sort Key: (EXTRACT(year FROM e.enroll_date)), (EXTRACT(month FROM e.enroll_date))
                                Sort Method: quicksort  Memory: 505kB
                                ->  Hash Join  (cost=14.05..114.51 rows=5002 width=31) (actual time=0.045..3.475 rows=5002 loops=1)
                                      Hash Cond: (e.course_id = c.course_id)
                                      ->  Seq Scan on enrollment e  (cost=0.00..87.02 rows=5002 width=27) (actual time=0.013..0.440 rows=5002 loops=1)
                                      ->  Hash  (cost=11.80..11.80 rows=180 width=8) (actual time=0.016..0.016 rows=5 loops=1)
                                            Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                            ->  Seq Scan on course c  (cost=0.00..11.80 rows=180 width=8) (actual time=0.010..0.011 rows=5 loops=1)
                          ->  Sort  (cost=67.64..68.14 rows=200 width=96) (actual time=38.598..38.601 rows=13 loops=1)
                                Sort Key: mps.year, mps.month
                                Sort Method: quicksort  Memory: 25kB
                                ->  Subquery Scan on mps  (cost=55.00..60.00 rows=200 width=96) (actual time=38.569..38.580 rows=13 loops=1)
                                      ->  HashAggregate  (cost=55.00..58.00 rows=200 width=96) (actual time=38.568..38.577 rows=13 loops=1)
                                            Group Key: course_month_performance.year, course_month_performance.month
                                            Batches: 1  Memory Usage: 40kB
                                            ->  CTE Scan on course_month_performance  (cost=0.00..40.00 rows=2000 width=96) (actual time=38.531..38.558 rows=13 loops=1)
                    ->  Sort  (cost=1828.77..1878.77 rows=20000 width=104) (actual time=0.030..0.352 rows=4794 loops=1)
                          Sort Key: scmp.year, scmp.month
                          Sort Method: quicksort  Memory: 25kB
                          ->  CTE Scan on student_course_month_performance scmp  (cost=0.00..400.00 rows=20000 width=104) (actual time=0.002..0.004 rows=13 loops=1)
              ->  Sort  (cost=149.66..154.66 rows=2000 width=100) (actual time=0.025..0.333 rows=4794 loops=1)
                    Sort Key: cmp.year, cmp.month
                    Sort Method: quicksort  Memory: 25kB
                    ->  CTE Scan on course_month_performance cmp  (cost=0.00..40.00 rows=2000 width=100) (actual time=0.001..0.003 rows=13 loops=1)
Planning Time: 1.730 ms
Execution Time: 78.569 ms

# # After adding indexes, the execution time decreased from 80.672 ms to 78.569 ms. The improvement is smaller compared to Scenario 1, because this query performs monthly aggregation and still processes a large part of the dataset. However, the indexes slightly reduced the execution time and planning time.

Security and Protection Measures

Spring Security Authentication

The application uses Spring Security for authentication and authorization.

Unauthenticated users can access public pages such as:

home page login page registration page static resources such as CSS and JavaScript files

All other requests require authentication.

Example configuration:

@Bean
public SecurityFilterChain securityFilterChain(HttpSecurity http) throws Exception {

http
    .csrf(Customizer.withDefaults())
    .authorizeHttpRequests(auth -> auth
        .requestMatchers("/", "/login", "/register", "/css/**", "/js/**").permitAll()
        .anyRequest().authenticated()
    )
    .formLogin(form -> form
        .defaultSuccessUrl("/", true)
        .permitAll()
    )
    .logout(logout -> logout
        .logoutUrl("/logout")
        .logoutSuccessUrl("/login?logout")
        .permitAll()
    );

return http.build();

}

Explanation:

Spring Security protects the application by requiring users to log in before accessing protected resources.

The configuration also defines login and logout behavior.

Password Hashing with BCrypt

User passwords are not stored in plain text. The application uses BCrypt for password hashing.

Example:

@Bean
public PasswordEncoder passwordEncoder() {
return new BCryptPasswordEncoder();
}

During login, the raw password entered by the user is compared with the stored hashed password.

passwordEncoder.matches(rawPassword, storedHashedPassword);

Explanation:

This protects user accounts because even if someone gets access to the database, the original passwords are not directly visible.

Role-Based Access Control

The database contains a role attribute in the user_entity table.

role VARCHAR(20) NOT NULL CHECK (role IN ('USER', 'ADMIN', 'INSTRUCTOR'))

The system supports three roles:

USER — can browse courses, enroll in courses and access learning content INSTRUCTOR — can manage courses, modules, lessons and quizzes ADMIN — can manage categories, subscription plans and support tickets

Explanation:

The CHECK constraint prevents invalid roles from being inserted into the database.

The separate tables users, administrators and instructors also support the supertype/subtype structure of the system.

SQL Injection Prevention

SQL Injection is prevented by using Spring Data JPA and parameterized queries instead of directly concatenating user input into SQL strings.

Safe example:

@Query("SELECT u FROM UserEntity u WHERE u.email = ")
Optional findByEmail(@Param("email") String email);

Unsafe example:

String query = "SELECT * FROM user_entity WHERE email = '" + email + "'";

Explanation:

The safe approach treats user input as data, not as executable SQL code.

This prevents malicious input from changing the meaning of the SQL query.

Database Constraints and Data Integrity

The database schema contains constraints that protect the data from invalid states.

Examples:

email VARCHAR(150) UNIQUE NOT NULL

This prevents duplicate user accounts with the same email.

progress_percentage INT CHECK (progress_percentage BETWEEN 0 AND 100)

This prevents invalid course progress values.

certificate_code VARCHAR(100) UNIQUE NOT NULL

This ensures that every certificate has a unique certificate code.

enrollment_id INT UNIQUE NOT NULL

This ensures that one enrollment can have only one certificate.

Foreign keys are also used to protect relationships between tables:

FOREIGN KEY (user_id) REFERENCES users(id)
FOREIGN KEY (course_id) REFERENCES course(course_id)
FOREIGN KEY (enrollment_id) REFERENCES enrollment(enrollment_id)

Explanation:

These constraints prevent orphan records and keep the database consistent.

Transaction Management

Some operations must be executed as one atomic unit.

For example, when a user enrolls in a course, the system may need to:

check whether the user has an active subscription check whether the user is already enrolled insert a new enrollment record update related data

Example:

@Transactional
public void enrollUserInCourse(Long userId, Long courseId) {
// check active subscription
// check existing enrollment
// create new enrollment
}

Explanation:

If one step fails, the whole transaction is rolled back.

This protects the database from partial or inconsistent changes.

Database Connection Configuration

The application uses PostgreSQL as the database system.

Example configuration:

spring.application.name=db
spring.datasource.url=jdbc:postgresql://localhost:3307/db_202526z_va_prj_olpms
spring.datasource.username=USERNAME
spring.datasource.password=PASSWORD
spring.datasource.driver-class-name=org.postgresql.Driver

spring.jpa.hibernate.ddl-auto=validate
spring.jpa.show-sql=true
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.PostgreSQLDialect

Explanation:

Real database credentials should not be published on the wiki page. Because of that, the username and password are replaced with placeholders.

Spring Boot uses HikariCP as the default database connection pool implementation. Connection pooling improves performance by reusing already opened database connections instead of opening a new connection for every request.

Note: See TracWiki for help on using the wiki.