| | 72 | |
| | 73 | |
| | 74 | == Data Population == |
| | 75 | |
| | 76 | === Data Generation Approach === |
| | 77 | |
| | 78 | The database was populated using a combination of: |
| | 79 | |
| | 80 | - '''Local Large Language Models with Tool Calling''' – CSV data was generated programmatically, producing realistic academic data including member names, course codes, enrollment patterns, and grade distributions. |
| | 81 | |
| | 82 | - '''Python’s faker Library''' – Supplemented data generation for realistic names, dates, and other attributes where a more controlled generator was needed. |
| | 83 | |
| | 84 | === Database Size === |
| | 85 | |
| | 86 | The final database contains approximately 13 GB of data across 30 tables. The largest tables are: |
| | 87 | |
| | 88 | || '''Table''' || '''Row Count''' || '''Total Size''' || '''Data Size''' || '''Index Size''' || |
| | 89 | || student_answer || 26,574,600 || 4,230 MB || 3,194 MB || 1,036 MB || |
| | 90 | || member_message || 19,999,100 || 3,113 MB || 2,512 MB || 602 MB || |
| | 91 | || exam_attempt || 8,858,161 || 853 MB || 509 MB || 345 MB || |
| | 92 | || survey_response || 8,447,940 || 1,079 MB || 551 MB || 528 MB || |
| | 93 | || exercise_submission || 7,353,124 || 1,617 MB || 1,327 MB || 290 MB || |
| | 94 | || semester_enrollment || 4,387,348 || 373 MB || 185 MB || 188 MB || |
| | 95 | || course_enrollment || 2,214,813 || 217 MB || 94 MB || 123 MB || |
| | 96 | || student_grade || 1,087,533 || 77 MB || 46 MB || 31 MB || |
| | 97 | || member || 1,050,000 || 173 MB || 105 MB || 69 MB || |
| | 98 | || program_enrollment || 1,049,789 || 118 MB || 55 MB || 62 MB || |
| | 99 | || student || 1,000,000 || 110 MB || 42 MB || 67 MB || |
| | 100 | ...and 20 additional tables |
| | 101 | |
| | 102 | '''Total Database Size: 13 GB''' |
| | 103 | |
| | 104 | The large volume of data ensures that performance testing and query optimization exercises reflect realistic production conditions. |
| | 105 | |
| | 106 | === Trimming the Database === |
| | 107 | |
| | 108 | Initially, the database grew to approximately 63 GB. During a laboratory session, it was reduced to its current 13 GB size to ensure manageable operations while maintaining sufficient data volume for meaningful performance analysis. The database remained local throughout the project after disk space constraints were addressed. |
| | 109 | |
| | 110 | == Views Overview == |
| | 111 | |
| | 112 | === Non‑Analytical Views === |
| | 113 | |
| | 114 | The system includes several non‑analytical views (no big `COUNT`, `SUM`, `MIN`, `MAX` aggregates) for operational queries: |
| | 115 | - `v_student_current_semester` – Student’s current semester course enrollments |
| | 116 | - `v_student_upcoming_deadlines` – Upcoming exercises and exams |
| | 117 | - `v_student_transcript` – Complete academic transcript |
| | 118 | - `v_course_announcements` – Course announcements with threading |
| | 119 | - `v_exercise_grading_queue` – Pending and graded submissions |
| | 120 | - `v_course_prerequisite_chain` – Recursive prerequisite tree |
| | 121 | - `v_student_registration_state` – Course registration eligibility (later replaced by a function) |
| | 122 | - `v_teaching_assistant_eligibility` – TA eligibility for course editions (later replaced by a function) |
| | 123 | |
| | 124 | === Analytical Views === |
| | 125 | |
| | 126 | Additional analytical views (with aggregates) are provided as extras: |
| | 127 | |
| | 128 | - `v_course_edition_overview` – Enrollment and teacher statistics |
| | 129 | - `v_survey_status` – Survey response counts and capacity |
| | 130 | - `v_teacher_dashboard` – Teacher activity metrics |
| | 131 | - `v_exam_results_summary` – Exam statistics and pass rates |
| | 132 | - `v_member_inbox` – Message thread summaries |
| | 133 | - `v_program_catalog` – Program statistics |
| | 134 | - `v_course_catalog` – Course offering history |
| | 135 | - `v_exam_attempt_detail` – Detailed exam attempt analysis |
| | 136 | - `v_program_completion_status` – Graduation eligibility tracking |