wiki:Normalization

Version 12 (modified by 221296, 11 days ago) ( diff )

--

Normalization

Initial de-normalized relation and functional dependencies

Global set of attributes

This represents the initial de-normalized relation for the Online Learning Platform Management System (OLPMS), before applying any normalization steps.

All attributes from the conceptual (ER) model are combined into a single universal relation R, assuming that all data is stored in one table. Attribute names are made unique to avoid duplication and ambiguity.

R = {
user_id, first_name, last_name, email, password, role,

plan_id, plan_name, price, duration_months, plan_description, access_type,

subscription_id, start_date, end_date, subscription_status,

payment_id, amount,

ticket_id, admin_id, subject, ticket_description, ticket_status, created_at,

category_id, category_name, category_description,

course_id, course_name, course_price, course_status, instructor_id,

module_id, module_title, module_description,

lesson_id, lesson_title, material,

quiz_id, total_points, passing_score,

attempt_id, score, attempt_date,

enrollment_id, enroll_date, completion_status, progress_percentage,

certificate_id, issue_date, certificate_code, certificate_status
}

Functional dependencies (initial set)

The following functional dependencies represent the initial global set of dependencies that hold in the universal de-normalized relation R for the Online Learning Platform Management System (OLPMS).

They are derived from the semantics of the ER model and the business rules of the system. The goal is to capture the essential determinants that will later guide the normalization process.

User / specialization

FD1: user_id → first_name, last_name, email, password, role
FD2: instructor_id → user_id
FD3: admin_id → user_id

Subscription plan / subscription / payment

FD4: plan_id → plan_name, price, duration_months, description, access_type
FD5: subscription_id → user_id, plan_id, start_date, end_date, subscription_status
FD6: payment_id → user_id, subscription_id, amount

Support ticket

FD7: ticket_id → user_id, admin_id, subject, description_ticket, ticket_status, created_at

Category / course

FD8: category_id → category_name, category_description
FD9: course_id → course_name, course_price, course_status, instructor_id

CourseCategory

The relation CourseCategory has composite key (course_id, category_id) and no additional non-trivial functional dependencies.

Course content hierarchy

FD10: module_id → course_id, module_title, module_description
FD11: lesson_id → module_id, lesson_title, material
FD12: quiz_id → lesson_id, total_points, passing_score
FD13: attempt_id → user_id, quiz_id, score, attempt_date

Enrollment / certificate

FD14: enrollment_id → user_id, course_id, enroll_date, completion_status, progress_percentage
FD15: certificate_id → enrollment_id, issue_date, certificate_code, certificate_status

Candidate keys and primary key selection

Determining a candidate key

To determine a candidate key of the universal de-normalized relation, we identify a minimal set of identifiers whose closure determines all attributes of R.

Because the universal relation combines several independent components of the system (subscriptions, payments, support tickets, course-category assignments, quiz attempts, and certificates), the candidate key must include one identifier from each independent component that is not functionally determined by another component.

A candidate key for the initial de-normalized relation is:

K = {payment_id, ticket_id, course_id, category_id, attempt_id, certificate_id}

Its closure includes all attributes of R, so it determines the whole relation.

Closure proof for K

Let K = {payment_id, ticket_id, course_id, category_id, attempt_id, certificate_id}

Compute K+:
Start: K+ = {payment_id, ticket_id, course_id, category_id, attempt_id, certificate_id}

From FD6:
payment_id → user_id, subscription_id, amount
Add: user_id, subscription_id, amount

From FD5:
subscription_id → plan_id, start_date, end_date, subscription_status
Add: plan_id, start_date, end_date, subscription_status

From FD4:
plan_id → plan_name, price, duration_months, access_type
Add: plan_name, price, duration_months, access_type

From FD1:
user_id → first_name, last_name, email, password, role
Add: first_name, last_name, email, password, role

From FD7:
ticket_id → admin_id, subject, description, ticket_status, created_at, user_id
Add: admin_id, subject, description, ticket_status, created_at
(user_id already belongs to K+)

From FD3:
admin_id → user_id
user_id already belongs to K+

From FD8:
category_id → category_name, category_description
Add: category_name, category_description

From FD9:
course_id → course_name, course_price, course_status, instructor_id
Add: course_name, course_price, course_status, instructor_id

From FD2:
instructor_id → user_id
user_id already belongs to K+

From FD13:
attempt_id → user_id, quiz_id, score, attempt_date
Add: quiz_id, score, attempt_date
(user_id already belongs to K+)

From FD12:
quiz_id → lesson_id, total_points, passing_score
Add: lesson_id, total_points, passing_score

From FD11:
lesson_id → module_id, lesson_title, material
Add: module_id, lesson_title, material

From FD10:
module_id → module_title, module_description, course_id
Add: module_title, module_description
(course_id already belongs to K+)

From FD15:
certificate_id → enrollment_id, issue_date, certificate_code, certificate_status
Add: enrollment_id, issue_date, certificate_code, certificate_status

From FD14:
enrollment_id → user_id, course_id, enroll_date, completion_status, progress_percentage
Add: enroll_date, completion_status, progress_percentage
(user_id and course_id already belong to K+)

Therefore, K+ contains all attributes of the universal relation R, so K is a superkey.

K is also minimal, because removing any attribute from K would cause the loss of at least one independent component of the universal relation:

  • without payment_id, payment/subscription information is lost
  • without ticket_id, support ticket information is lost
  • without course_id or category_id, the course-category association is lost
  • without attempt_id, quiz attempt and course content assessment information is lost
  • without certificate_id, certificate and enrollment completion information is lost

So, K is a candidate key..

Chosen primary key

We select the candidate key above as the primary key of the initial de-normalized relation:

K = {payment_id, ticket_id, course_id, category_id, attempt_id, certificate_id}

Normal form checks

Check for 1NF

R satisfies 1NF because all attributes are atomic and there are no repeating groups. Each row is uniquely identified by the chosen composite candidate key.

Check for 2NF

R does not satisfy 2NF because it has a composite primary key and many non-key attributes depend only on part of that key rather than on the whole key.

Examples:

  • listing_id → listing_status, listing_created_at, price, listing_description
  • animal_id → animal_name, species, breed, sex, date_of_birth, photo_url, owner_id
  • clinic_id → clinic_name, clinic_email, clinic_phone, clinic_address, clinic_location, clinic_city
  • review_id → reviewer_id, rating, comment, review_created_at, updated_at, is_deleted
  • notification_id → notification_type, notification_message, notification_created_at, is_read, user_id

These are partial dependencies on components of the composite key, so R violates 2NF.

Decomposition to 2NF

Grouping by determinants (partial dependencies)

  • user_id → username, email, name, surname, full_name, password_hash, user_created_at
  • client_id → user_id, is_blocked, blocked_at, blocked_reason
  • admin_id → user_id
  • owner_id → client_id
  • notification_id → notification_type, notification_message, notification_created_at, is_read, user_id
  • clinic_id → clinic_name, clinic_email, clinic_phone, clinic_address, clinic_location, clinic_city
  • application_id → app_name, app_email, app_phone, app_city, app_address, submitted_at, app_status, reviewed_at, reviewed_by, denial_reason, clinic_id
  • animal_id → animal_name, located_name, species, animal_type, breed, sex, date_of_birth, photo_url, owner_id
  • listing_id → listing_status, listing_created_at, price, listing_description, animal_id, admin_id
  • appointment_id → date_time, appointment_status, notes, animal_id, clinic_id
  • healthrecord_id → hr_type, hr_description, hr_date, animal_id, appointment_id
  • review_id → reviewer_id, rating, comment, review_created_at, updated_at, is_deleted
  • review_id → target_user_id / target_clinic_id
  • FavoriteListings has key (saved_user_id, saved_listing_id) only

2NF relations (first decomposition)

  • Users(user_id, username, email, name, surname, full_name, password_hash, user_created_at)
  • Clients(client_id, user_id, is_blocked, blocked_at, blocked_reason)
  • Admins(admin_id, user_id)
  • Owners(owner_id, client_id)
  • Notifications(notification_id, user_id, notification_type, notification_message, notification_created_at, is_read)
  • VetClinics(clinic_id, clinic_name, clinic_email, clinic_phone, clinic_address, clinic_location, clinic_city)
  • VetClinicApplications(application_id, clinic_id, app_name, app_email, app_phone, app_city, app_address, submitted_at, app_status, reviewed_at, reviewed_by, denial_reason)
  • Animals(animal_id, owner_id, animal_name, located_name, species, animal_type, breed, sex, date_of_birth, photo_url)
  • Listings(listing_id, admin_id, animal_id, listing_status, listing_created_at, price, listing_description)
  • Appointments(appointment_id, clinic_id, animal_id, date_time, appointment_status, notes)
  • HealthRecords(healthrecord_id, appointment_id, animal_id, hr_type, hr_description, hr_date)
  • Reviews(review_id, reviewer_id, rating, comment, review_created_at, updated_at, is_deleted)
  • FavoriteListings(saved_user_id, saved_listing_id)

Lossless join:
The decomposition is lossless because each new relation is formed from a determinant that becomes the key of that relation. The common attributes used in the decomposition are keys in at least one of the resulting relations, so no information is lost.

Dependency preservation:
The decomposition is dependency-preserving because each functional dependency FD1–FD14 is represented fully inside one of the resulting relations. FavoriteListings has only its composite key and no additional non-trivial dependencies.

Check for 3NF (and decomposition)

Now we look for transitive dependencies inside the produced relations.

Typical transitive issues from the ER

  • In Clients(client_id, user_id, …), we have client_id → user_id, but user descriptive attributes are stored in Users, so no transitive dependency remains inside Clients.
  • In Listings(listing_id, admin_id, animal_id, …), we have listing_id → admin_id and globally admin_id → user_id, but user attributes are stored in Users, not in Listings, so no transitive dependency remains inside Listings.
  • In VetClinicApplications(application_id, clinic_id, …), we have application_id → clinic_id and clinic_id determines clinic attributes, but those attributes are stored in VetClinics, not in VetClinicApplications, so no transitive dependency remains there either.

Review subtypes (User Review and Clinic Review)

The ER model contains a specialization of Review into UserReview and ClinicReview. If both target_user_id and target_clinic_id are stored in Reviews, the relation mixes subtype-specific semantics and may introduce nulls or invalid combinations. So we decompose:

UserReviews(review_id, target_user_id)
ClinicReviews(review_id, target_clinic_id)

Now:

  • Reviews stores the common review attributes
  • UserReviews stores the target user of a user review
  • ClinicReviews stores the target clinic of a clinic review

Results after 3NF step

After separating subtype-specific attributes into UserReviews and ClinicReviews, all resulting relations satisfy 3NF.

Check for BCNF

Each resulting relation is also in BCNF because, in every non-trivial functional dependency within a relation, the determinant is a superkey of that relation.

  • Users: user_id is key → BCNF
  • Clients: client_id is key → BCNF
  • Admins: admin_id is key → BCNF
  • Owners: owner_id is key → BCNF
  • Notifications: notification_id is key → BCNF
  • VetClinics: clinic_id is key → BCNF
  • VetClinicApplications: application_id is key → BCNF
  • Animals: animal_id is key → BCNF
  • Listings: listing_id is key → BCNF
  • Appointments: appointment_id is key → BCNF
  • HealthRecords: healthrecord_id is key → BCNF
  • Reviews: review_id is key → BCNF
  • UserReviews: review_id is key → BCNF
  • ClinicReviews: review_id is key → BCNF
  • FavoriteListings: (saved_user_id, saved_listing_id) is key → BCNF

So the schema satisfies BCNF.

Check for 4NF (multivalued dependencies)

No additional non-trivial multivalued dependencies are identified in the resulting schema. The many-to-many relationship FavoriteListing is already decomposed into the separate relation FavoriteListings(saved_user_id, saved_listing_id), so no further decomposition is required for 4NF. The schema satisfies 4NF.

Final relations

  • Users(user_id, username, email, name, surname, full_name, password_hash, user_created_at)
  • Clients(client_id, user_id, is_blocked, blocked_at, blocked_reason)
  • Admins(admin_id, user_id)
  • Owners(owner_id, client_id)
  • Notifications(notification_id, user_id, notification_type, notification_message, notification_created_at, is_read)
  • VetClinics(clinic_id, clinic_name, clinic_email, clinic_phone, clinic_address, clinic_location, clinic_city)
  • VetClinicApplications(application_id, clinic_id, app_name, app_email, app_phone, app_city, app_address, submitted_at, app_status, reviewed_at, reviewed_by, denial_reason)
  • Animals(animal_id, owner_id, animal_name, located_name, species, animal_type, breed, sex, date_of_birth, photo_url)
  • Listings(listing_id, admin_id, animal_id, listing_status, listing_created_at, price, listing_description)
  • FavoriteListings(saved_user_id, saved_listing_id)
  • Appointments(appointment_id, clinic_id, animal_id, date_time, appointment_status, notes)
  • HealthRecords(healthrecord_id, appointment_id, animal_id, hr_type, hr_description, hr_date)
  • Reviews(review_id, reviewer_id, rating, comment, review_created_at, updated_at, is_deleted)
  • UserReviews(review_id, target_user_id)
  • ClinicReviews(review_id, target_clinic_id)

Conclusion

The normalization process confirms the same structural design obtained from the ER model in phase 2.

Note: See TracWiki for help on using the wiki.