Changes between Version 1 and Version 2 of DatabaseCreation


Ignore:
Timestamp:
07/01/26 17:46:23 (4 days ago)
Author:
231082
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • DatabaseCreation

    v1 v2  
    7070
    7171PostgreSQL stores only a 4‑byte numeric OID per value rather than the literal string. In contrast, a `TEXT` column with a `CHECK` constraint would store variable‑length strings (10‑12 bytes per value).
     72
     73
     74== Data Population ==
     75
     76=== Data Generation Approach ===
     77
     78The database was populated using a combination of:
     79
     80- '''Local Large Language Models with Tool Calling''' – CSV data was generated programmatically, producing realistic academic data including member names, course codes, enrollment patterns, and grade distributions.
     81
     82- '''Python’s faker Library''' – Supplemented data generation for realistic names, dates, and other attributes where a more controlled generator was needed.
     83
     84=== Database Size ===
     85
     86The final database contains approximately 13 GB of data across 30 tables. The largest tables are:
     87
     88|| '''Table''' || '''Row Count''' || '''Total Size''' || '''Data Size''' || '''Index Size''' ||
     89|| student_answer || 26,574,600 || 4,230 MB || 3,194 MB || 1,036 MB ||
     90|| member_message || 19,999,100 || 3,113 MB || 2,512 MB || 602 MB ||
     91|| exam_attempt || 8,858,161 || 853 MB || 509 MB || 345 MB ||
     92|| survey_response || 8,447,940 || 1,079 MB || 551 MB || 528 MB ||
     93|| exercise_submission || 7,353,124 || 1,617 MB || 1,327 MB || 290 MB ||
     94|| semester_enrollment || 4,387,348 || 373 MB || 185 MB || 188 MB ||
     95|| course_enrollment || 2,214,813 || 217 MB || 94 MB || 123 MB ||
     96|| student_grade || 1,087,533 || 77 MB || 46 MB || 31 MB ||
     97|| member || 1,050,000 || 173 MB || 105 MB || 69 MB ||
     98|| program_enrollment || 1,049,789 || 118 MB || 55 MB || 62 MB ||
     99|| student || 1,000,000 || 110 MB || 42 MB || 67 MB ||
     100...and 20 additional tables
     101
     102'''Total Database Size: 13 GB'''
     103
     104The large volume of data ensures that performance testing and query optimization exercises reflect realistic production conditions.
     105
     106=== Trimming the Database ===
     107
     108Initially, the database grew to approximately 63 GB. During a laboratory session, it was reduced to its current 13 GB size to ensure manageable operations while maintaining sufficient data volume for meaningful performance analysis. The database remained local throughout the project after disk space constraints were addressed.
     109
     110== Views Overview ==
     111
     112=== Non‑Analytical Views ===
     113
     114The system includes several non‑analytical views (no big `COUNT`, `SUM`, `MIN`, `MAX` aggregates) for operational queries:
     115- `v_student_current_semester` – Student’s current semester course enrollments
     116- `v_student_upcoming_deadlines` – Upcoming exercises and exams
     117- `v_student_transcript` – Complete academic transcript
     118- `v_course_announcements` – Course announcements with threading
     119- `v_exercise_grading_queue` – Pending and graded submissions
     120- `v_course_prerequisite_chain` – Recursive prerequisite tree
     121- `v_student_registration_state` – Course registration eligibility (later replaced by a function)
     122- `v_teaching_assistant_eligibility` – TA eligibility for course editions (later replaced by a function)
     123
     124=== Analytical Views ===
     125
     126Additional analytical views (with aggregates) are provided as extras:
     127
     128- `v_course_edition_overview` – Enrollment and teacher statistics
     129- `v_survey_status` – Survey response counts and capacity
     130- `v_teacher_dashboard` – Teacher activity metrics
     131- `v_exam_results_summary` – Exam statistics and pass rates
     132- `v_member_inbox` – Message thread summaries
     133- `v_program_catalog` – Program statistics
     134- `v_course_catalog` – Course offering history
     135- `v_exam_attempt_detail` – Detailed exam attempt analysis
     136- `v_program_completion_status` – Graduation eligibility tracking