wiki:Normalization

Version 42 (modified by 221296, 3 weeks ago) ( diff )

--

Normalization

Initial de-normalized relation and functional dependencies

Global set of attributes

This represents the initial de-normalized relation for the Online Learning Platform Management System (OLPMS), before applying any normalization steps.

All attributes from the conceptual (ER) model are combined into a single universal relation R, assuming that all data is stored in one table. Attribute names are made unique to avoid duplication.

R = {
user_id, first_name, last_name, email, password, role,

plan_id, plan_name, price, duration_months, plan_description, access_type,

subscription_id, start_date, end_date, subscription_status,

payment_id, amount,

ticket_id, admin_id, subject, ticket_description, ticket_status, created_at,

category_id, category_name, category_description,

course_id, course_name, course_price, course_status, instructor_id,

module_id, module_title, module_description,

lesson_id, lesson_title, material,

quiz_id, total_points, passing_score,

attempt_id, score, attempt_date,

enrollment_id, enroll_date, completion_status, progress_percentage,

certificate_id, issue_date, certificate_code, certificate_status
}

Functional dependencies (initial set)

The following functional dependencies represent the initial global set of dependencies that hold in the universal de-normalized relation R for the Online Learning Platform Management System (OLPMS).

They are derived from the semantics of the ER model and the business rules of the system. The goal is to capture the essential determinants that will later guide the normalization process.

User / specialization

FD1: user_id → first_name, last_name, email, password, role
FD2: instructor_id → user_id
FD3: admin_id → user_id

Subscription plan / subscription / payment

FD4: plan_id → plan_name, price, duration_months, plan_description, access_type
FD5: subscription_id → user_id, plan_id, start_date, end_date, subscription_status
FD6: payment_id → user_id, subscription_id, amount

Support ticket

FD7: ticket_id → user_id, admin_id, subject, ticket_description, ticket_status, created_at

Category / course

FD8: category_id → category_name, category_description
FD9: course_id → course_name, course_price, course_status, instructor_id

CourseCategory relation

The relation CourseCategory has composite key (course_id, category_id) and no additional non-trivial functional dependencies.

Course content hierarchy

FD10: module_id → course_id, module_title, module_description
FD11: lesson_id → module_id, lesson_title, material
FD12: quiz_id → lesson_id, total_points, passing_score
FD13: attempt_id → user_id, quiz_id, score, attempt_date

Enrollment / certificate

FD14: enrollment_id → user_id, course_id, enroll_date, completion_status, progress_percentage
FD15: certificate_id → enrollment_id, issue_date, certificate_code, certificate_status

Candidate keys and primary key selection

Determining a candidate key

To determine a candidate key of the universal de-normalized relation, we identify a minimal set of identifiers whose closure determines all attributes of R.

Because the universal relation combines several independent components of the system (subscriptions, payments, support tickets, course-category assignments, quiz attempts, and certificates), the candidate key must include one identifier from each independent component that is not functionally determined by another component.

A candidate key for the initial de-normalized relation is:

K = {payment_id, ticket_id, category_id, attempt_id, certificate_id}

Its closure includes all attributes of R, so it determines the whole relation.

Closure proof for K

Let K = {payment_id, ticket_id, category_id, attempt_id, certificate_id}

Compute K+:
Start: K+ = {payment_id, ticket_id, category_id, attempt_id, certificate_id}

From FD6:
payment_id → user_id, subscription_id, amount
Add: user_id, subscription_id, amount

From FD5:
subscription_id → plan_id, start_date, end_date, subscription_status
Add: plan_id, start_date, end_date, subscription_status

From FD4:
plan_id → plan_name, price, duration_months, plan_description, access_type
Add: plan_name, price, duration_months, plan_description, access_type

From FD1:
user_id → first_name, last_name, email, password, role
Add: first_name, last_name, email, password, role

From FD7:
ticket_id → user_id, admin_id, subject, ticket_description, ticket_status, created_at
Add: admin_id, subject, ticket_description, ticket_status, created_at
(user_id already belongs to K+)

From FD3:
admin_id → user_id
user_id already belongs to K+

From FD8:
category_id → category_name, category_description
Add: category_name, category_description

From FD13:
attempt_id → user_id, quiz_id, score, attempt_date
Add: quiz_id, score, attempt_date
(user_id already belongs to K+)

From FD12:
quiz_id → lesson_id, total_points, passing_score
Add: lesson_id, total_points, passing_score

From FD11:
lesson_id → module_id, lesson_title, material
Add: module_id, lesson_title, material

From FD10:
module_id → course_id, module_title, module_description
Add: course_id, module_title, module_description

From FD9:
course_id → course_name, course_price, course_status, instructor_id
Add: course_name, course_price, course_status, instructor_id

From FD2:
instructor_id → user_id
user_id already belongs to K+

From FD15:
certificate_id → enrollment_id, issue_date, certificate_code, certificate_status
Add: enrollment_id, issue_date, certificate_code, certificate_status

From FD14:
enrollment_id → user_id, course_id, enroll_date, completion_status, progress_percentage
Add: enroll_date, completion_status, progress_percentage
(user_id and course_id already belong to K+)

Therefore, K+ contains all attributes of the universal relation R, so K is a superkey.

K is also minimal, because removing any attribute from K would cause the loss of at least one independent component of the universal relation:

- without payment_id, payment/subscription information is lost
- without ticket_id, support ticket information is lost
- without category_id, the course-category association is lost
- without attempt_id, quiz attempt, course content and course information are lost
- without certificate_id, certificate and enrollment completion information is lost

So, K is a candidate key.

Discussion about other possible candidate keys

To prove that there are no other candidate keys for the initial universal relation R, we analyze the attributes that do not appear on the right-hand side of any functional dependency.

A candidate key must be a minimal superkey, meaning that its closure must contain all attributes of R, and no proper subset of it may also determine all attributes.

From the initial set of functional dependencies, the attributes that are not functionally determined by any other attribute are:

These attributes do not appear on the right-hand side of any functional dependency, which means they cannot be derived from other attributes and must be included in every candidate key.

payment_id, ticket_id, category_id, attempt_id, certificate_id

Therefore, every candidate key must contain all of these attributes.

The attribute course_id is not required in the candidate key, because it can be functionally determined from attempt_id:

attempt_id → quiz_id

quiz_id → lesson_id

lesson_id → module_id

module_id → course_id

This means that adding course_id to K would make the key non-minimal.

Therefore, the only candidate key, according to the given set of functional dependencies, is:

K = {payment_id, ticket_id, category_id, attempt_id, certificate_id}

Since there is no alternative determinant for any of these required attributes in the initial set of functional dependencies, there are no other candidate keys.

Therefore, based on the given functional dependencies and the analysis of attribute closures, K is the only minimal superkey, and thus the only candidate key of the universal relation R.

Chosen primary key

We select the candidate key above as the primary key of the initial de-normalized relation:

K = {payment_id, ticket_id, category_id, attempt_id, certificate_id}

Normal form checks

Check for 1NF

R satisfies 1NF because all attributes are atomic and there are no repeating groups or multi-valued attributes inside a single tuple. Each attribute stores a single value, and the universal relation is identified by the chosen composite candidate key. Therefore, R is in 1NF

Check for 2NF

The universal relation R is not in Second Normal Form (2NF) because it contains a composite primary key and several non-key attributes depend only on a part of that key (partial dependencies), instead of the whole key.

Examples of such partial dependencies:

payment_id → user_id, subscription_id, amount ticket_id → user_id, admin_id, subject, ticket_description, ticket_status, created_at course_id → course_name, course_price, course_status, instructor_id category_id → category_name, category_description attempt_id → user_id, quiz_id, score, attempt_date certificate_id → enrollment_id, issue_date, certificate_code, certificate_status

Therefore, R violates 2NF.

Decomposition to 2NF

We eliminate partial dependencies by decomposing R into smaller relations such that each determinant becomes a primary key in its own relation.

R1: UserEntity:

  • UserEntity(user_id, first_name, last_name, email, password, role)

PK: user_id

  • Derived from:

FD1: user_id → first_name, last_name, email, password, role

  • Lossless Join Check:

R ∩ UserEntity = {user_id}

user_id → UserEntity

⇒ The decomposition is lossless

  • Dependency Preservation:

FD1 is fully contained in UserEntity

⇒ PRESERVED

R2: SubscriptionPlan:

  • SubscriptionPlan(plan_id, plan_name, price, duration_months, plan_description, access_type)

PK: plan_id

  • Derived from:

FD4: plan_id → plan_name, price, duration_months, plan_description, access_type

  • Lossless Join Check:

R ∩ SubscriptionPlan = {plan_id}

plan_id → SubscriptionPlan

⇒ The decomposition is lossless

  • Dependency Preservation:

FD4 is preserved

⇒ PRESERVED

R3: UserSubscription:

  • UserSubscription(subscription_id, user_id, plan_id, start_date, end_date, subscription_status)

PK: subscription_id

  • Derived from:

FD5: subscription_id → user_id, plan_id, start_date, end_date, subscription_status

  • Lossless Join Check:

R ∩ UserSubscription = {subscription_id}

subscription_id → UserSubscription

⇒ The decomposition is lossless

  • Dependency Preservation:

FD5 is preserved

⇒ PRESERVED

R4: Payment:

  • !Payment(payment_id, user_id, subscription_id, amount)

PK: payment_id

  • Derived from:

FD6: payment_id → user_id, subscription_id, amount

  • Lossless Join Check:

R ∩ !Payment = {payment_id}

payment_id → !Payment

⇒ The decomposition is lossless

  • Dependency Preservation:

FD6 is preserved

⇒ PRESERVED

R5: SupportTicket:

  • SupportTicket(ticket_id, user_id, admin_id, subject, ticket_description, ticket_status, created_at)

PK: ticket_id

  • Derived from:

FD7: ticket_id → user_id, admin_id, subject, ticket_description, ticket_status, created_at

  • Lossless Join Check:

R ∩ SupportTicket = {ticket_id}

ticket_id → SupportTicket

⇒ The decomposition is lossless

  • Dependency Preservation:

FD7 is preserved

⇒ PRESERVED

R6: Category:

  • !Category(category_id, category_name, category_description)

PK: category_id

  • Derived from:

FD8: category_id → category_name, category_description

  • Lossless Join Check:

R ∩ !Category = {category_id}

category_id → !Category

⇒ The decomposition is lossless

  • Dependency Preservation:

FD8 is preserved

⇒ PRESERVED

R7: Course:

  • !Course(course_id, course_name, course_price, course_status, instructor_id)

PK: course_id

  • Derived from:

FD9: course_id → course_name, course_price, course_status, instructor_id

  • Lossless Join Check:

R ∩ !Course = {course_id}

course_id → !Course

⇒ The decomposition is lossless

  • Dependency Preservation:

FD9 is preserved

⇒ PRESERVED

R8: Module:

  • !Module(module_id, course_id, module_title, module_description)

PK: module_id

  • Derived from:

FD10: module_id → course_id, module_title, module_description

  • Lossless Join Check:

R ∩ !Module = {module_id}

module_id → !Module

⇒ The decomposition is lossless

  • Dependency Preservation:

FD10 is preserved

R9: Lesson:

  • !Lesson(lesson_id, module_id, lesson_title, material)

PK: lesson_id

  • Derived from:

FD11: lesson_id → module_id, lesson_title, material

  • Lossless Join Check:

R ∩ !Lesson = {lesson_id}

lesson_id → !Lesson

⇒ The decomposition is lossless

  • Dependency Preservation:

FD11 is preserved

R10: Quiz:

  • !Quiz(quiz_id, lesson_id, total_points, passing_score)

PK: quiz_id

  • Derived from:

FD12: quiz_id → lesson_id, total_points, passing_score

  • Lossless Join Check:

R ∩ !Quiz = {quiz_id}

quiz_id → !Quiz

⇒ The decomposition is lossless

  • Dependency Preservation:

FD12 is preserved

R11: QuizAttempt:

  • QuizAttempt(attempt_id, user_id, quiz_id, score, attempt_date)

PK: attempt_id

  • Derived from:

FD13: attempt_id → user_id, quiz_id, score, attempt_date

  • Lossless Join Check:

R ∩ QuizAttempt = {attempt_id}

attempt_id → QuizAttempt

⇒ The decomposition is lossless

  • Dependency Preservation:

FD13 is preserved

R12: Enrollment:

  • !Enrollment(enrollment_id, user_id, course_id, enroll_date, completion_status, progress_percentage)

PK: enrollment_id

  • Derived from:

FD14: enrollment_id → user_id, course_id, enroll_date, completion_status, progress_percentage

  • Lossless Join Check:

R ∩ !Enrollment = {enrollment_id}

enrollment_id → !Enrollment

⇒ The decomposition is lossless

  • Dependency Preservation:

FD14 is preserved

R13: Certificate:

  • !Certificate(certificate_id, enrollment_id, issue_date, certificate_code, certificate_status)

PK: certificate_id

  • Derived from:

FD15: certificate_id → enrollment_id, issue_date, certificate_code, certificate_status

  • Lossless Join Check:

R ∩ !Certificate = {certificate_id}

certificate_id → !Certificate

⇒ The decomposition is lossless

  • Dependency Preservation:

FD15 is preserved

⇒ PRESERVED

R14: CourseCategory:

  • CourseCategory(course_id, category_id)

PK: (course_id, category_id)

This relation represents a many-to-many relationship and contains no additional non-trivial functional dependencies.

Final Result of 2NF Decomposition After removing all partial dependencies, the resulting schema consists of relations where:

  • Every non-key attribute depends on the whole primary key
  • no partial dependencies remain

Therefore, the decomposition satisfies Second Normal Form (2NF).

Check for 3NF (and decomposition)

Now we look for transitive dependencies inside the relations obtained after the 2NF decomposition.

Typical transitive issues from the ER

  • In Payment(payment_id, user_id, subscription_id, amount), we have:

payment_id → subscription_id

subscription_id → user_id, plan_id, start_date, end_date, subscription_status.

Because subscription details are stored in UserSubscription, they should not be repeated in Payment. Payment keeps only payment_id, user_id, subscription_id and amount, so no transitive dependency remains inside Payment.

  • In UserSubscription(subscription_id, user_id, plan_id, start_date, end_date, subscription_status), we have: subscription_id → plan_id plan_id → plan_name, price, duration_months, plan_description, access_type Since plan descriptive attributes are stored in SubscriptionPlan, no transitive dependency remains inside UserSubscription.
  • In Course(course_id, course_name, course_price, course_status, instructor_id), we have: course_id → instructor_id instructor_id → user_id

The user descriptive attributes are stored in UserEntity, not in Course, so no transitive dependency remains inside Course.

  • In QuizAttempt(attempt_id, user_id, quiz_id, score, attempt_date), we have: attempt_id → quiz_id quiz_id → lesson_id, total_points, passing_score Quiz details are stored in Quiz, so QuizAttempt does not contain transitive dependencies.
  • In Certificate(certificate_id, enrollment_id, issue_date, certificate_code, certificate_status), we have: certificate_id → enrollment_id enrollment_id → user_id, course_id, enroll_date, completion_status, progress_percentage

Enrollment details are stored in Enrollment, so no transitive dependency remains inside Certificate.

*

User specializations (User, Administrator and Instructor)

The ER model contains a specialization of UserEntity into User, Administrator and Instructor.

The common user attributes are stored in UserEntity:

UserEntity(user_id, first_name, last_name, email, password, role)

The subtype relations store only the identifier that connects the subtype with the general user entity:

Users(user_id)
Administrators(admin_id, user_id)
Instructors(instructor_id, user_id)

This avoids storing user descriptive attributes multiple times in the subtype relations. Therefore, no transitive dependency remains inside the specialization relations.

Results after 3NF step

After checking the relations obtained from the 2NF decomposition, no additional decomposition is required for 3NF.

The resulting relations after the 3NF step are:

  • UserEntity(user_id, first_name, last_name, email, password, role)
  • Users(user_id)
  • Administrators(admin_id, user_id)
  • Instructors(instructor_id, user_id)
  • SubscriptionPlan(plan_id, plan_name, price, duration_months, plan_description, access_type)
  • UserSubscription(subscription_id, user_id, plan_id, start_date, end_date, subscription_status)
  • Payment(payment_id, user_id, subscription_id, amount)
  • SupportTicket(ticket_id, user_id, admin_id, subject, ticket_description, ticket_status, created_at)
  • Category(category_id, category_name, category_description)
  • Course(course_id, course_name, course_price, course_status, instructor_id)
  • CourseCategory(course_id, category_id)
  • Module(module_id, course_id, module_title, module_description)
  • Lesson(lesson_id, module_id, lesson_title, material)
  • Quiz(quiz_id, lesson_id, total_points, passing_score)
  • QuizAttempt(attempt_id, user_id, quiz_id, score, attempt_date)
  • Enrollment(enrollment_id, user_id, course_id, enroll_date, completion_status, progress_percentage)
  • Certificate(certificate_id, enrollment_id, issue_date, certificate_code, certificate_status)

All resulting relations satisfy 3NF because for every non-trivial functional dependency X → Y inside a relation, X is a key of that relation, or the dependent attributes are stored in a separate relation determined by their own key.

Check for BCNF

Each resulting relation is also in BCNF, because in every non-trivial functional dependency within a relation, the determinant is a superkey of that relation.

  • UserEntity: user_id is key → BCNF
  • Users: user_id is key → BCNF
  • Administrators: admin_id is key → BCNF
  • Instructors: instructor_id is key → BCNF
  • SubscriptionPlan: plan_id is key → BCNF
  • UserSubscription: subscription_id is key → BCNF
  • Payment: payment_id is key → BCNF
  • SupportTicket: ticket_id is key → BCNF
  • Category: category_id is key → BCNF
  • Course: course_id is key → BCNF
  • CourseCategory: (course_id, category_id) is composite key → BCNF
  • Module: module_id is key → BCNF
  • Lesson: lesson_id is key → BCNF
  • Quiz: quiz_id is key → BCNF
  • QuizAttempt: attempt_id is key → BCNF
  • Enrollment: enrollment_id is key → BCNF
  • Certificate: certificate_id is key → BCNF

Therefore, the resulting schema satisfies BCNF.

Check for 4NF (multivalued dependencies)

No additional non-trivial multivalued dependencies are identified in the resulting schema.

The many-to-many relationship between Course and Category is already decomposed into the separate relation:

CourseCategory(course_id, category_id)

This relation contains only the composite key and no additional descriptive attributes, so no further decomposition is required.

All other relations describe a single entity or relationship around one key, and no independent multi-valued facts are stored together in the same relation.

Therefore, the schema satisfies 4NF.

Final relations

Final relations

After applying normalization up to 4NF, the final normalized schema consists of the following relations:

UserEntity(user_id, first_name, last_name, email, password, role)
Users(user_id)
Administrators(admin_id, user_id)
Instructors(instructor_id, user_id)

SubscriptionPlan(plan_id, plan_name, price, duration_months, plan_description, access_type)
UserSubscription(subscription_id, user_id, plan_id, start_date, end_date, subscription_status)
Payment(payment_id, user_id, subscription_id, amount)

SupportTicket(ticket_id, user_id, admin_id, subject, ticket_description, ticket_status, created_at)

Category(category_id, category_name, category_description)
Course(course_id, course_name, course_price, course_status, instructor_id)
CourseCategory(course_id, category_id)

Module(module_id, course_id, module_title, module_description)
Lesson(lesson_id, module_id, lesson_title, material)
Quiz(quiz_id, lesson_id, total_points, passing_score)
QuizAttempt(attempt_id, user_id, quiz_id, score, attempt_date)

Enrollment(enrollment_id, user_id, course_id, enroll_date, completion_status, progress_percentage)
Certificate(certificate_id, enrollment_id, issue_date, certificate_code, certificate_status)

The final schema removes partial and transitive dependencies, avoids unnecessary redundancy, and preserves the functional dependencies from the initial universal relation.

Conclusion

The normalization process confirms the same structural design obtained from the ER model in phase 2.

Note: See TracWiki for help on using the wiki.