wiki:Normalization

Version 4 (modified by 231035, 2 weeks ago) ( diff )

--

Normalization

Initial de-normalized relation and functional dependencies

Global set of attributes (single de-normalized relation)

This represents the initial de-normalized relation before any normalization is applied. Attribute names are made unique to avoid duplication.

R = {
user_id, username, email, name, surname, full_name, password_hash, user_created_at,
client_id, is_blocked, blocked_at, blocked_reason,
admin_id,
owner_id,

notification_id, notification_type, notification_message, notification_created_at, is_read,

animal_id, animal_name, located_name, species, animal_type, breed, sex, date_of_birth, photo_url,

listing_id, listing_status, listing_created_at, price, listing_description,

saved_user_id, saved_listing_id,

clinic_id, clinic_name, clinic_email, clinic_phone, clinic_address, clinic_location, clinic_city,

application_id, app_name, app_email, app_phone, app_city, app_address, submitted_at, app_status, reviewed_at, reviewed_by, denial_reason,

appointment_id, date_time, appointment_status, notes,

healthrecord_id, hr_type, hr_description, hr_date,

review_id, reviewer_id, rating, comment, review_created_at, updated_at, is_deleted,
target_user_id, target_clinic_id
}

Functional dependencies (initial set)

User / specialization

FD1: user_id → username, email, name, surname, full_name, password_hash, user_created_at
FD2: client_id → user_id, is_blocked, blocked_at, blocked_reason
FD3: admin_id → user_id
FD4: owner_id → client_id 

Notification

FD5: notification_id → notification_type, notification_message, notification_created_at, is_read, user_id

Vet Clinic / application

FD6: clinic_id → clinic_name, clinic_email, clinic_phone, clinic_address, clinic_location, clinic_city
FD7: application_id → app_name, app_email, app_phone, app_city, app_address, submitted_at, app_status, reviewed_at, reviewed_by, denial_reason, clinic_id

Animal

FD8: animal_id → animal_name, located_name, species, animal_type, breed, sex, date_of_birth, photo_url, owner_id

Listing

FD9: listing_id → listing_status, listing_created_at, price, listing_description, animal_id, admin_id

Appointment

FD10: appointment_id → date_time, appointment_status, notes, animal_id, clinic_id

Health Record

FD11: healthrecord_id → hr_type, hr_description, hr_date, animal_id, appointment_id

Review and subtypes

FD12: review_id → reviewer_id, rating, comment, review_created_at, updated_at, is_deleted
FD13: review_id → target_user_id 
FD14: review_id → target_clinic_id 

Favorite Listing

FD15: (saved_user_id, saved_listing_id) → ∅ 

Candidate keys and primary key selection

Attributes appearing only on the left side

user_id, client_id, admin_id, owner_id,
notification_id, clinic_id, application_id,
animal_id, listing_id, appointment_id, healthrecord_id, review_id,
(saved_user_id, saved_listing_id)


Because these identifiers can’t be derived from other attributes, they must be included in any superkey of the fully de-normalized R.
A valid superkey for R is:

K = {notification_id, application_id, clinic_id, animal_id, listing_id,
 appointment_id, healthrecord_id, review_id, saved_user_id, saved_listing_id}

Its closure includes all descriptive attributes via FD1–FD15, so it determines all attributes of R.

Closure proof for K

Let K = {notification_id, application_id, clinic_id, animal_id, listing_id,
         appointment_id, healthrecord_id, review_id, saved_user_id, saved_listing_id}

Compute K+:
Start: K+ = K

From FD5:
notification_id → notification_type, notification_message, notification_created_at, is_read, user_id
Add: notification_type, notification_message, notification_created_at, is_read, user_id

From FD1:
user_id → username, email, name, surname, full_name, password_hash, user_created_at
Add: username, email, name, surname, full_name, password_hash, user_created_at

From FD9:
listing_id → listing_status, listing_created_at, price, listing_description, animal_id, admin_id
Add: listing_status, listing_created_at, price, listing_description, admin_id
(animal_id already in K)

From FD3:
admin_id → user_id (already in K+)

From FD8:
animal_id → animal_name, located_name, species, animal_type, breed, sex, date_of_birth, photo_url, owner_id
Add: animal_name, located_name, species, animal_type, breed, sex, date_of_birth, photo_url, owner_id

From FD4:
owner_id → client_id
Add: client_id

From FD2:
client_id → user_id, is_blocked, blocked_at, blocked_reason
Add: is_blocked, blocked_at, blocked_reason (user_id already in K+)

From FD6:
clinic_id → clinic_name, clinic_email, clinic_phone, clinic_address, clinic_location, clinic_city
Add: clinic_name, clinic_email, clinic_phone, clinic_address, clinic_location, clinic_city

From FD7:
application_id → app_name, app_email, app_phone, app_city, app_address, submitted_at, app_status,
                 reviewed_at, reviewed_by, denial_reason, clinic_id
Add: app_name, app_email, app_phone, app_city, app_address, submitted_at, app_status,
     reviewed_at, reviewed_by, denial_reason (clinic_id already in K+)

From FD10:
appointment_id → date_time, appointment_status, notes, animal_id, clinic_id
Add: date_time, appointment_status, notes (animal_id, clinic_id already in K+)

From FD11:
healthrecord_id → hr_type, hr_description, hr_date, animal_id, appointment_id
Add: hr_type, hr_description, hr_date

From FD12:
review_id → reviewer_id, rating, comment, review_created_at, updated_at, is_deleted
Add: reviewer_id, rating, comment, review_created_at, updated_at, is_deleted

From FD13 and FD14:
review_id → target_user_id, target_clinic_id
Add: target_user_id, target_clinic_id

FD15 adds no attributes.

Therefore K+ = R, so K is a superkey.
[[BR]]
K is also a minimal and a candidate key.

Chosen primary key

We select the minimal superkey built from "left-only identifiers":
Primary key (of the initial de-normalized R):

{notification_id, application_id, clinic_id, animal_id, listing_id, 
appointment_id, healthrecord_id, review_id, saved_user_id, saved_listing_id}

Normal form checks

Check for 1NF

R satisfies 1NF (all attributes are atomic and each row is uniquely identified by the chosen primary key).

Check for 2NF

R does not satisfy 2NF, because many non-key attributes depend only on a part of the composite primary key:

  • listing_id → price, listing_description, listing_status, …
  • animal_id → species, breed, …
  • clinic_id → clinic_name, …
  • user_id → username, email, …
  • review_id → rating, comment, …
  • notification_id → message, type, …

So we decompose by grouping attributes by the key they depend on.

Decomposition to 2NF

Grouping by determinants (partial dependencies)

  • user_id → username, email, name, surname, full_name, password_hash, user_created_at
  • client_id → user_id, is_blocked, blocked_at, blocked_reason
  • admin_id → user_id
  • owner_id → client_id
  • notification_id → notification_type, notification_message, notification_created_at, is_read, user_id
  • clinic_id → clinic_name, clinic_email, clinic_phone, clinic_address, clinic_location, clinic_city
  • application_id → app_*, submitted_at, app_status, reviewed_at, reviewed_by, denial_reason, clinic_id
  • animal_id → animal_*, owner_id
  • listing_id → listing_*, animal_id, admin_id
  • appointment_id → date_time, appointment_status, notes, animal_id, clinic_id
  • healthrecord_id → hr_type, hr_description, hr_date, animal_id, appointment_id
  • review_id → reviewer_id, rating, comment, review_created_at, updated_at, is_deleted
  • review_id → target_user_id / target_clinic_id (subtype dependent)
  • (saved_user_id, saved_listing_id) → /

2NF relations(first decomposition)

  • Users(user_id, username, email, name, surname, full_name, password_hash, user_created_at)
  • Clients(client_id, user_id, is_blocked, blocked_at, blocked_reason)
  • Admins(admin_id, user_id)
  • Owners(owner_id, client_id)
  • Notifications(notification_id, user_id, notification_type, notification_message, notification_created_at, is_read)
  • VetClinics(clinic_id, clinic_name, clinic_email, clinic_phone, clinic_address, clinic_location, clinic_city)
  • VetClinicApplications(application_id, clinic_id, app_name, app_email, app_phone, app_city, app_address, submitted_at, app_status, reviewed_at, reviewed_by, denial_reason)
  • Animals(animal_id, owner_id, animal_name, located_name, species, animal_type, breed, sex, date_of_birth, photo_url)
  • Listings(listing_id, admin_id, animal_id, listing_status, listing_created_at, price, listing_description)
  • Appointments(appointment_id, clinic_id, animal_id, date_time, appointment_status, notes)
  • HealthRecords(healthrecord_id, appointment_id, animal_id, hr_type, hr_description, hr_date)
  • Reviews(review_id, reviewer_id, rating, comment, review_created_at, updated_at, is_deleted)
  • FavoriteListings(saved_user_id, saved_listing_id)

Lossless join:
For each decomposition R → R1, R2, the intersection (R1 ∩ R2) is a key of at least one of the relations(user_id is key of Users etc.).So each decomposition step is lossless. Dependency preservation: FD1–FD15 are preserved because each FD now lives entirely inside one of the new relations.

Check for 3NF(and decomposition)

Now we look for transitive dependencies inside the produced relations.

Typical transitive issues from the ER

  • In Clients(client_id, user_id, …) we have client_id → user_id and user_id → (email, username, …) but those user attributes are no longer in Clients, so this is already clean.
  • In Listings(listing_id, admin_id, animal_id, …) we have admin_id → user_id (FD3) but user data is in Users, not in Listings, so no transitive dependency remains in Listings.
  • In VetClinicApplications(application_id, clinic_id, …) we have clinic_id → clinic attributes, but clinic attributes are not stored there.

Review subtypes(User Review and Clinic Review)

The ER has specialization:

Review is generalized into UserReview and ClinicReview. If we keep target_user_id and target_clinic_id inside Reviews, we introduce subtype semantics and potential null-heavy design.


So we decompose:
UserReviews(review_id, target_user_id) ClinicReviews(review_id, target_clinic_id) Now:

  • review_id determines the base review fields in Reviews (FD12)
  • review_id determines subtype target in the subtype table (FD13/FD14)

Results after 3NF step

All relations are now in 3NF.

Check for BCNF

For each relation, every non-trivial FD has a determinant that is a superkey:

  • Users: user_id is key → BCNF
  • Clients: client_id is key → BCNF
  • Admins: admin_id is key → BCNF
  • Owners: owner_id is key → BCNF
  • Notifications: notification_id is key → BCNF
  • VetClinics: clinic_id is key → BCNF
  • VetClinicApplications: application_id is key → BCNF
  • Animals: animal_id is key → BCNF
  • Listings: listing_id is key → BCNF
  • Appointments: appointment_id is key → BCNF
  • HealthRecords: healthrecord_id is key → BCNF
  • Reviews: review_id is key → BCNF
  • UserReviews / ClinicReviews: review_id is key → BCNF
  • FavoriteListings: (saved_user_id, saved_listing_id) is key → BCNF

So the schema satisfies BCNF.

Check for 4NF(multivalued dependencies)

No additional multivalued dependencies are identified beyond the M:N relationship FavoriteListing,which is already represented as its own relation FavoriteListings(saved_user_id, saved_listing_id). So no further 4NF decomposition is required. The schema satisfies 4NF.

Final relations

  • Users(user_id, username, email, name, surname, full_name, password_hash, user_created_at)
  • Clients(client_id, user_id, is_blocked, blocked_at, blocked_reason)
  • Admins(admin_id, user_id)
  • Owners(owner_id, client_id)
  • Notifications(notification_id, user_id, notification_type, notification_message, notification_created_at, is_read)
  • VetClinics(clinic_id, clinic_name, clinic_email, clinic_phone, clinic_address, clinic_location, clinic_city)
  • VetClinicApplications(application_id, clinic_id, app_name, app_email, app_phone, app_city, app_address, submitted_at, app_status, reviewed_at, reviewed_by, denial_reason)
  • Animals(animal_id, owner_id, animal_name, located_name, species, animal_type, breed, sex, date_of_birth, photo_url)
  • Listings(listing_id, admin_id, animal_id, listing_status, listing_created_at, price, listing_description)
  • Appointments(appointment_id, clinic_id, animal_id, date_time, appointment_status, notes)
  • HealthRecords(healthrecord_id, appointment_id, animal_id, hr_type, hr_description, hr_date)
  • Reviews(review_id, reviewer_id, rating, comment, review_created_at, updated_at, is_deleted)
  • UserReviews(review_id, target_user_id)
  • ClinicReviews(review_id, target_clinic_id)

Business constraint: A ClinicReview is allowed only if the reviewer has at least one Appointment at that clinic with status = 'DONE'.

Conclusion

The final normalized relational design matches the Phase 2 ER-based design:

  • Each entity becomes a relation.
  • Specializations are represented as subtype relations.
  • The M:N relationship FavoriteListing is represented as a junction relation FavoriteListings.

Therefore, there are no structural differences from Phase 2.

Note: See TracWiki for help on using the wiki.