Context Navigation

DatabaseCreation

Introduction

The database schema is implemented in PostgreSQL using the DDL from DDL_manual.sql. This represents the final, refined version of the relational model after addressing tool limitations and incorporating feedback from professor consultations.

Key Design Decisions

Primary Key Strategy

Two distinct primary key strategies were employed:

Surrogate keys for excessively long composite natural keys - Entities that would inherit composite keys from multiple parent entities (potentially 5‑20 attributes) receive a single id column with GENERATED ALWAYS AS IDENTITY. The natural key columns are retained with UNIQUE NOT NULL constraints to preserve semantic integrity without propagating keys further down the chain.

Surrogate keys as genuine identifiers – For many‑to‑many relationships where the foreign key pair alone does not uniquely identify a record (e.g., Exam_Attempt where a student may attempt the same exam multiple times), a surrogate id is added because no natural key exists.

`GENERATED ALWAYS AS IDENTITY` vs `SERIAL`

All surrogate keys use INTEGER GENERATED ALWAYS AS IDENTITY instead of SERIAL for the following reasons:

Feature	SERIAL	GENERATED ALWAYS AS IDENTITY
SQL Standard	Non‑standard (PostgreSQL specific)	SQL Standard compliant
Under the Hood	Creates a separate SEQUENCE object implicitly	Links a SEQUENCE directly to the column
Accidental Overwrites	Easy to accidentally bypass or break	Strictly protected (unless forced)
Permissions	Requires separate permissions on the sequence	Permissions are tied directly to the table

SERIAL is syntactic sugar that creates a sequence and sets the column’s DEFAULT to fetch from it. Because it is just a default, PostgreSQL does not prevent manual value insertion. GENERATED ALWAYS AS IDENTITY enforces strict safety rules – the database is solely responsible for generating IDs. If overriding is absolutely necessary (e.g., during data migration), OVERRIDING SYSTEM VALUE can be explicitly specified.

This choice ensures portability across enterprise database engines (Oracle, DB2, SQL Server) that support the SQL:2003 standard.

Constraint Naming Convention

Constraints follow a consistent naming pattern:

Primary Keys: pk_{table_name}
Foreign Keys: fk_{table_name}_{referenced_table}
Not Null: nn_{table_name}_{attribute}
Unique: uq_{table_name}_{attribute(s)}
Check: ck_{table_name}_{constraint_description}

Example:

CONSTRAINT pk_exercise PRIMARY KEY(id)
CONSTRAINT fk_exercise_course_edition_id FOREIGN KEY(course_edition_id) REFERENCES Course_Edition(id)
CONSTRAINT nn_exercise_title CHECK(title IS NOT NULL)
CONSTRAINT uq_semester_enrollment_peidays UNIQUE(program_enrollment_id, academic_year, semester)
CONSTRAINT ck_exercise_max_grade CHECK(max_grade > 0)

`ON DELETE SET NULL` for Historical Data Preservation

All foreign keys use ON DELETE SET NULL ON UPDATE CASCADE to preserve historical data. When a referenced record is deleted, dependent records remain in the database with a NULL foreign key, maintaining the audit trail. This is critical for academic systems where grade records, exam attempts, and submissions must be preserved even if the associated course edition or student record is removed.

Enum for Exam Types

Exam types are defined as a PostgreSQL ENUM type:

CREATE TYPE EXAM_TYPE AS ENUM (
    'partial_1',
    'partial_2',
    'session_jan',
    'session_jun',
    'session_sep'
);

PostgreSQL stores only a 4‑byte numeric OID per value rather than the literal string. In contrast, a TEXT column with a CHECK constraint would store variable‑length strings (10‑12 bytes per value).

Data Population

Data Generation Approach

The database was populated using a combination of:

Local Large Language Models with Tool Calling – CSV data was generated programmatically, producing realistic academic data including member names, course codes, enrollment patterns, and grade distributions.

Python’s faker Library – Supplemented data generation for realistic names, dates, and other attributes where a more controlled generator was needed.

Database Size

The final database contains approximately 13 GB of data across 30 tables. The largest tables are:

Table	Row Count	Total Size	Data Size	Index Size
student_answer	26,574,600	4,230 MB	3,194 MB	1,036 MB
member_message	19,999,100	3,113 MB	2,512 MB	602 MB
exam_attempt	8,858,161	853 MB	509 MB	345 MB
survey_response	8,447,940	1,079 MB	551 MB	528 MB
exercise_submission	7,353,124	1,617 MB	1,327 MB	290 MB
semester_enrollment	4,387,348	373 MB	185 MB	188 MB
course_enrollment	2,214,813	217 MB	94 MB	123 MB
student_grade	1,087,533	77 MB	46 MB	31 MB
member	1,050,000	173 MB	105 MB	69 MB
program_enrollment	1,049,789	118 MB	55 MB	62 MB
student	1,000,000	110 MB	42 MB	67 MB

...and 20 additional tables

Total Database Size: 13 GB

The large volume of data ensures that performance testing and query optimization exercises reflect realistic production conditions.

Trimming the Database

Initially, the database grew to approximately 63 GB. During a laboratory session, it was reduced to its current 13 GB size to ensure manageable operations while maintaining sufficient data volume for meaningful performance analysis. The database remained local throughout the project after disk space constraints were addressed.

Views Overview

Non‑Analytical Views

The system includes several non‑analytical views (no big COUNT, SUM, MIN, MAX aggregates) for operational queries:

v_student_current_semester – Student’s current semester course enrollments
v_student_upcoming_deadlines – Upcoming exercises and exams
v_student_transcript – Complete academic transcript
v_course_announcements – Course announcements with threading
v_exercise_grading_queue – Pending and graded submissions
v_course_prerequisite_chain – Recursive prerequisite tree
v_student_registration_state – Course registration eligibility (later replaced by a function)
v_teaching_assistant_eligibility – TA eligibility for course editions (later replaced by a function)

Analytical Views

Additional analytical views (with aggregates) are provided as extras:

v_course_edition_overview – Enrollment and teacher statistics
v_survey_status – Survey response counts and capacity
v_teacher_dashboard – Teacher activity metrics
v_exam_results_summary – Exam statistics and pass rates
v_member_inbox – Message thread summaries
v_program_catalog – Program statistics
v_course_catalog – Course offering history
v_exam_attempt_detail – Detailed exam attempt analysis
v_program_completion_status – Graduation eligibility tracking

Last modified 4 weeks ago Last modified on 07/01/26 17:46:23

Attachments (5)

DDL_manual.sql (22.7 KB ) - added by 231082 4 weeks ago. Manually written DDL.
DDL_OSDM.sql (30.5 KB ) - added by 231082 4 weeks ago. DDL generated automatically from the diagrams in Oracle SQL Developer Data Modeler.
DDL_VP.sql (14.1 KB ) - added by 231082 4 weeks ago. DDL generated automatically from the diagrams in Visual Paradigm.
pure_views.sql (16.9 KB ) - added by 231082 4 weeks ago. Non-analytical VIEWs.
analytical_views.sql (19.4 KB ) - added by 231082 4 weeks ago. Analytical VIEWs.

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text