wiki:Normalization

Version 7 (modified by 181201, 2 weeks ago) ( diff )

--

Normalization

Initial de-normalized relation and functional dependencies

Global set of attributes

Single unified de-normalized relation that includes all attributes from the ER model. It is important that there are no duplicate attribute names.

R = { user_id, username, first_name, last_name, password, email,

admin_id,

owner_id,

sitter_id,

pettype_id, species, average_lifespan, needs_outdoor_walk,

pet_id, pet_name, photo, age, special_needs, pet_description,

service_id, service_type, service_description,

booking_id, booking_status, date_from, date_to, address,

review_id, rating, comment,

payment_id, amount, payment_type

}

Functional dependencies

FD1: user_idusername, first_name, last_name, password, email

FD2: admin_iduser_id

FD3: owner_iduser_id

FD4: sitter_iduser_id

FD5: pettype_idspecies, average_lifespan, needs_outdoor_walk

FD6: pet_idpet_name, photo, age, special_needs, pet_description, owner_id, pettype_id

FD7: service_idservice_type, service_description

FD8: booking_idbooking_status, date_from, date_to, address, owner_id, sitter_id

FD9: review_idrating, comment, booking_id

FD10: payment_idamount, payment_type, booking_id

FD11: (admin_id, user_id) → ∅ (represents AdminManagement M:N)

FD12: (booking_id, pet_id) → ∅ (represents BookingPets M:N)

FD13: (sitter_id, service_id) → ∅ (represents SitterServices M:N)

FD14: (booking_id, service_id) → ∅ (represents BookingServices M:N)

Candidate keys and primary key selection

Determination of Candidate Keys

To find a candidate key, we must find a minimal set of attributes whose closure contains all attributes of the relation. We begin by classifying all attributes based on where they appear in the Functional Dependencies.

Attribute Classification (Left / Right Side)

Attribute Left Side Right Side Classification
admin_id ✓ (FD2, FD11) Left only
pet_id ✓ (FD6, FD12) Left only
service_id ✓ (FD7, FD13, FD14) Left only
review_id ✓ (FD9) Left only
payment_id ✓ (FD10) Left only
user_id ✓ (FD1, FD11) ✓ (FD2, FD3, FD4) Both
owner_id ✓ (FD3) ✓ (FD6, FD8) Both
sitter_id ✓ (FD4, FD13) ✓ (FD8) Both
pettype_id ✓ (FD5) ✓ (FD6) Both
booking_id ✓ (FD8, FD12, FD14) ✓ (FD9, FD10) Both
username, first_name, last_name, password, email ✓ (FD1) Right only
species, average_lifespan, needs_outdoor_walk ✓ (FD5) Right only
pet_name, photo, age, special_needs, pet_description ✓ (FD6) Right only
service_type, service_description ✓ (FD7) Right only
booking_status, date_from, date_to, address ✓ (FD8) Right only
rating, comment ✓ (FD9) Right only
amount, payment_type ✓ (FD10) Right only

Attributes That Appear ONLY on the Left Side

According to relational theory, attributes that appear only on the left side of functional dependencies can never be derived from any other attribute. Therefore, they must be part of every candidate key:

  • admin_id
  • pet_id
  • service_id
  • review_id
  • payment_id

Closure Computation

Step 1: We start with our mandatory attributes {admin_id, pet_id, service_id, review_id, payment_id} and compute the mathematical closure:

{admin_id, pet_id, service_id, review_id, payment_id}⁺ :

  • From FD2 (admin_iduser_id): We obtain user_id
  • From FD1 (user_idusername, first_name...): We obtain username, first_name, last_name, password, email
  • From FD6 (pet_idpet_name, photo... owner_id, pettype_id): We obtain pet_name, photo, age, special_needs, pet_description, owner_id, pettype_id
  • From FD5 (pettype_idspecies...): We obtain species, average_lifespan, needs_outdoor_walk
  • From FD7 (service_idservice_type...): We obtain service_type, service_description
  • From FD9 (review_idrating, comment, booking_id): We obtain rating, comment, booking_id
  • From FD8 (booking_idbooking_status... sitter_id): We obtain booking_status, date_from, date_to, address, sitter_id
  • From FD10 (payment_idamount, payment_type, booking_id): We obtain amount, payment_type (booking_id is already in closure)
  • From FD3 & FD4: owner_id and sitter_id map to user_id, which is already in the closure.
  • From FD11-FD14: All composite pairings (e.g., booking_id and pet_id) are already present in the closure, satisfying the M:N relations.

Closure = Universal_Relation ✓ (all 33 attributes are successfully derived)

Minimality Check

To formally prove this key is minimal, we test proper subsets of K = {admin_id, pet_id, service_id, review_id, payment_id}:

Subset Closure Equals Universal_Relation? Justification
K − {admin_id} ✗ NO Cannot derive admin_id.
K − {pet_id} ✗ NO Cannot derive pet attributes (pet_name, photo, age, special_needs, pet_description).
K − {service_id} ✗ NO Cannot derive service attributes (service_type, service_description).
K − {review_id} ✗ NO Cannot derive review attributes (review_id, rating, comment).
K − {payment_id} ✗ NO Cannot derive payment attributes (payment_id, amount, payment_type).

Conclusion: K = {admin_id, pet_id, service_id, review_id, payment_id} is minimal and is the strictly mathematically correct candidate key.

Choice of Primary Key

Chosen Primary Key: {admin_id, pet_id, service_id, review_id, payment_id}

Justification: This composite key is the officially proven minimal Candidate Key. Attempting to include other logical identifiers (such as owner_id, sitter_id, or booking_id) would violate the rule of minimality, because the closure proof demonstrates those attributes are functionally dependent on the core transactional IDs.

Current Normal Form Status

Before decomposition, the universal relation R is in 1NF. It satisfies 1NF because all attributes contain atomic values and there are no repeating groups. However, it violates 2NF due to severe partial dependencies on the composite primary key.

Step-by-step decomposition to highest possible normal form

1. Decomposition to 2NF

Relation analyzed: The universal de-normalized relation R.

Issues with higher normal forms: R violates 2NF because non-prime attributes are partially dependent on components of the composite primary key. For example, pet_name depends ONLY on pet_id, not on the entire composite key K.

Action taken: We decompose R into smaller relations by grouping attributes based on their exact determinants (eliminating partial dependencies).

Relations obtained after 2NF decomposition:

Users (user_id, username, first_name, last_name, password, email)

Admins (admin_id, user_id)

PetOwners (owner_id, user_id)

PetSitters (sitter_id, user_id)

PetTypes (pettype_id, species, average_lifespan, needs_outdoor_walk)

Pets (pet_id, pet_name, photo, age, special_needs, pet_description, owner_id, pettype_id)

Services (service_id, service_type, service_description)

Bookings (booking_id, booking_status, date_from, date_to, address, owner_id, sitter_id)

Reviews (review_id, rating, comment, booking_id)

Payments (payment_id, amount, payment_type, booking_id)

AdminManagement (admin_id, user_id)

BookingPets (booking_id, pet_id)

SitterServices (sitter_id, service_id)

BookingServices (booking_id, service_id)

Validation of 2NF Decomposition:

1. Loss-less Join Property Validation: By definition (Heath's Theorem), a decomposition of relation R into R1 and R2 is lossless if the intersection of their attributes (R1R2) forms a superkey for either R1 or R2. To achieve our final schema without data loss, the universal relation R was decomposed through 13 successive binary splits. Below is the formal proof of each split, detailing the exact attributes partitioned.

(Let R_remainder denote the remainder relation after each split)

Initial State: R = {user_id, username, first_name, last_name, password, email, admin_id, owner_id, sitter_id, pettype_id, species, average_lifespan, needs_outdoor_walk, pet_id, pet_name, photo, age, special_needs, pet_description, service_id, service_type, service_description, booking_id, booking_status, date_from, date_to, address, review_id, rating, comment, payment_id, amount, payment_type}

Split 1: Extracting Users (FD1)

  • Decomposition: R is split into Users and R_remainder1.
  • Attributes in Users: {user_id, username, first_name, last_name, password, email}
  • Attributes in R_remainder1: {user_id, admin_id, owner_id, sitter_id, pettype_id, species, average_lifespan, needs_outdoor_walk, pet_id, pet_name, photo, age, special_needs, pet_description, service_id, service_type, service_description, booking_id, booking_status, date_from, date_to, address, review_id, rating, comment, payment_id, amount, payment_type}
  • Intersection: UsersR_remainder1 = {user_id}
  • Proof: user_id is the Primary Key of the Users table. The split is lossless.

Split 2: Extracting Admins (FD2)

  • Decomposition: R_remainder1 is split into Admins and R_remainder2.
  • Attributes in Admins: {admin_id, user_id}
  • Attributes in R_remainder2: {admin_id, owner_id, sitter_id, pettype_id, species, average_lifespan, needs_outdoor_walk, pet_id, pet_name, photo, age, special_needs, pet_description, service_id, service_type, service_description, booking_id, booking_status, date_from, date_to, address, review_id, rating, comment, payment_id, amount, payment_type}
  • Intersection: AdminsR_remainder2 = {admin_id}
  • Proof: admin_id is the Primary Key of the Admins table. The split is lossless.

Split 3: Extracting Pet Owners (FD3)

  • Decomposition: R_remainder2 is split into PetOwners and R_remainder3.
  • Attributes in PetOwners: {owner_id, user_id}
  • Attributes in R_remainder3: {admin_id, owner_id, sitter_id, pettype_id, species, average_lifespan, needs_outdoor_walk, pet_id, pet_name, photo, age, special_needs, pet_description, service_id, service_type, service_description, booking_id, booking_status, date_from, date_to, address, review_id, rating, comment, payment_id, amount, payment_type}
  • Intersection: PetOwnersR_remainder3 = {owner_id}
  • Proof: owner_id is the Primary Key of the PetOwners table. The split is lossless.

Split 4: Extracting Pet Sitters (FD4)

  • Decomposition: R_remainder3 is split into PetSitters and R_remainder4.
  • Attributes in PetSitters: {sitter_id, user_id}
  • Attributes in R_remainder4: {admin_id, owner_id, sitter_id, pettype_id, species, average_lifespan, needs_outdoor_walk, pet_id, pet_name, photo, age, special_needs, pet_description, service_id, service_type, service_description, booking_id, booking_status, date_from, date_to, address, review_id, rating, comment, payment_id, amount, payment_type}
  • Intersection: PetSittersR_remainder4 = {sitter_id}
  • Proof: sitter_id is the Primary Key of the PetSitters table. The split is lossless.

Split 5: Extracting Pet Types (FD5)

  • Decomposition: R_remainder4 is split into PetTypes and R_remainder5.
  • Attributes in PetTypes: {pettype_id, species, average_lifespan, needs_outdoor_walk}
  • Attributes in R_remainder5: {admin_id, owner_id, sitter_id, pettype_id, pet_id, pet_name, photo, age, special_needs, pet_description, service_id, service_type, service_description, booking_id, booking_status, date_from, date_to, address, review_id, rating, comment, payment_id, amount, payment_type}
  • Intersection: PetTypesR_remainder5 = {pettype_id}
  • Proof: pettype_id is the Primary Key of the PetTypes table. The split is lossless.

Split 6: Extracting Pets (FD6)

  • Decomposition: R_remainder5 is split into Pets and R_remainder6.
  • Attributes in Pets: {pet_id, pet_name, photo, age, special_needs, pet_description, owner_id, pettype_id}
  • Attributes in R_remainder6: {admin_id, owner_id, sitter_id, pet_id, service_id, service_type, service_description, booking_id, booking_status, date_from, date_to, address, review_id, rating, comment, payment_id, amount, payment_type}
  • Intersection: PetsR_remainder6 = {pet_id}
  • Proof: pet_id is the Primary Key of the Pets table. The split is lossless.

Split 7: Extracting Services (FD7)

  • Decomposition: R_remainder6 is split into Services and R_remainder7.
  • Attributes in Services: {service_id, service_type, service_description}
  • Attributes in R_remainder7: {admin_id, owner_id, sitter_id, pet_id, service_id, booking_id, booking_status, date_from, date_to, address, review_id, rating, comment, payment_id, amount, payment_type}
  • Intersection: ServicesR_remainder7 = {service_id}
  • Proof: service_id is the Primary Key of the Services table. The split is lossless.

Split 8: Extracting Bookings (FD8)

  • Decomposition: R_remainder7 is split into Bookings and R_remainder8.
  • Attributes in Bookings: {booking_id, booking_status, date_from, date_to, address, owner_id, sitter_id}
  • Attributes in R_remainder8: {admin_id, sitter_id, pet_id, service_id, booking_id, review_id, rating, comment, payment_id, amount, payment_type}
  • Intersection: BookingsR_remainder8 = {booking_id}
  • Proof: booking_id is the Primary Key of the Bookings table. The split is lossless.

Split 9: Extracting Reviews (FD9)

  • Decomposition: R_remainder8 is split into Reviews and R_remainder9.
  • Attributes in Reviews: {review_id, rating, comment, booking_id}
  • Attributes in R_remainder9: {admin_id, sitter_id, pet_id, service_id, booking_id, payment_id, amount, payment_type}
  • Intersection: ReviewsR_remainder9 = {review_id}
  • Proof: review_id is the Primary Key of the Reviews table. The split is lossless.

Split 10: Extracting Payments (FD10)

  • Decomposition: R_remainder9 is split into Payments and R_remainder10.
  • Attributes in Payments: {payment_id, amount, payment_type, booking_id}
  • Attributes in R_remainder10: {admin_id, sitter_id, pet_id, service_id, booking_id}
  • Intersection: PaymentsR_remainder10 = {payment_id}
  • Proof: payment_id is the Primary Key of the Payments table. The split is lossless.

Splits 11-13: Extracting M:N Relations/Junctions (FD11-FD14) The final remainder relation (R_remainder10) contains only the composite keys representing the Many-to-Many relationships:

  • AdminManagement: {admin_id, user_id}
  • BookingPets: {booking_id, pet_id}
  • SitterServices: {sitter_id, service_id}
  • BookingServices: {booking_id, service_id}

Because these final tables consist entirely of their composite primary keys, any further binary separation along these boundaries trivially satisfies Heath's Theorem.

Conclusion: Because every single step of the decomposition shared an intersection that was a guaranteed Primary Key, the final 14-table schema is strictly lossless.

2. Dependency Preservation: A decomposition is dependency preserving if the union of the functional dependencies in the new tables is equivalent to the original set of FDs (FD1 to FD14). Because we explicitly created our relations based on the exact determinants of our FDs, every single functional dependency is fully localized within a single table.

  • For example, FD5 (pettype_id → species, average_lifespan, needs_outdoor_walk) can be completely verified within the new PetTypes table without needing to perform a JOIN operation with any other table.
  • Similarly, FD7 (service_id → service_type, service_description) is entirely contained within the Services table.

Since all 14 original FDs are preserved natively inside the 14 new relations, the decomposition is strictly dependency preserving.

Step 2: Decomposition to 3NF

Relations analyzed: All 14 relations obtained from the 2NF decomposition.

Issues with higher normal forms: A relation violates 3NF if there is a transitive dependency (A → B → C) where a non-prime attribute determines another non-prime attribute.

Action taken: We inspect the 2NF relations to ensure that all non-key attributes depend only on the primary key, and not on any other non-key attributes. We specifically target tables containing Foreign Keys, as this is where transitive chains can be present.

  • In Bookings: While booking_id determines owner_id, we must check if owner_id (a non-prime attribute in this table) determines any other non-prime attributes within this table. It does not. The descriptive user details for the owner are safely isolated in the Users table.
  • In Reviews: While review_id determines booking_id, we must check if booking_id (a non-prime attribute in this table) determines any other non-prime attributes within this table (such as the sitter's details). It does not. Those details are isolated in their respective tables.
  • In Base Entities (Users, PetTypes, Services): All non-prime attributes logically depend directly on the primary key (example species depends only on pettype_id), making transitive chains mathematically impossible.
  • In Junction Tables (BookingPets, AdminManagement): These relations consist purely of composite primary keys with zero non-prime attributes, automatically satisfying 3NF.

Result: Because we did not store redundant, descriptive data alongside our foreign keys, there are no instances where a non-prime attribute determines another non-prime attribute within the same relation.

Status: All 14 relations naturally satisfy 3NF.

3. Decomposition to BCNF

Relations analyzed: All 14 relations currently in 3NF.

Issues with higher normal forms: A relation violates Boyce-Codd Normal Form (BCNF) if a non-trivial functional dependency X → Y exists where X is not a superkey.

Action taken: We evaluate the left side (determinant) of every functional dependency within all 14 relations:

In Users, user_id is a superkey.

In Admins, admin_id is a superkey.

In PetOwners, owner_id is a superkey.

In PetSitters, sitter_id is a superkey.

In PetTypes, pettype_id is a superkey.

In Pets, pet_id is a superkey.

In Services, service_id is a superkey.

In Bookings, booking_id is a superkey.

In Reviews, review_id is a superkey.

In Payments, payment_id is a superkey.

In AdminManagement, the composite key (admin_id, user_id) is the only determinant, meaning it is trivially a superkey.

In BookingPets, the composite key (booking_id, pet_id) is the only determinant, meaning it is trivially a superkey.

In SitterServices, the composite key (sitter_id, service_id) is the only determinant, meaning it is trivially a superkey.

In BookingServices, the composite key (booking_id, service_id) is the only determinant, meaning it is trivially a superkey.

Result: In every single relation, the determinant is a candidate/superkey.

Status: All 14 relations naturally satisfy BCNF. The decomposition process is complete.

Final result and discussion

Final normalized relational design

The final database schema operates in strict BCNF with the following 14 relations:

Users (user_id, username, first_name, last_name, password, email)

Admins (admin_id, user_id)

PetOwners (owner_id, user_id)

PetSitters (sitter_id, user_id)

PetTypes (pettype_id, species, average_lifespan, needs_outdoor_walk)

Pets (pet_id, pet_name, photo, age, special_needs, pet_description, owner_id, pettype_id)

Services (service_id, service_type, service_description)

Bookings (booking_id, booking_status, date_from, date_to, address, owner_id, sitter_id)

Reviews (review_id, rating, comment, booking_id)

Payments (payment_id, amount, payment_type, booking_id)

AdminManagement (admin_id, user_id)

BookingPets (booking_id, pet_id)

SitterServices (sitter_id, service_id)

BookingServices (booking_id, service_id)

Discussion - compare with Phase 2

With completing the normalization process up to BCNF, we have proven that the resulting schema is identical to the logical design created in Phase 2.

Note: See TracWiki for help on using the wiki.