wiki:Normalization

Version 5 (modified by 181201, 7 hours ago) ( diff )

--

Normalization

Initial de-normalized relation and functional dependencies

Global set of attributes

Single unified de-normalized relation that includes all attributes from the ER model. It is important that there are no duplicate attribute names.

R = { user_id, username, first_name, last_name, password, email,

admin_id,

owner_id,

sitter_id,

pettype_id, species, average_lifespan, needs_outdoor_walk,

pet_id, pet_name, photo, age, special_needs, pet_description,

service_id, service_type, service_description,

booking_id, booking_status, date_from, date_to, address,

review_id, rating, comment,

payment_id, amount, payment_type

}

Functional dependencies

FD1: user_idusername, first_name, last_name, password, email

FD2: admin_iduser_id

FD3: owner_iduser_id

FD4: sitter_iduser_id

FD5: pettype_idspecies, average_lifespan, needs_outdoor_walk

FD6: pet_idpet_name, photo, age, special_needs, pet_description, owner_id, pettype_id

FD7: service_idservice_type, service_description

FD8: booking_idbooking_status, date_from, date_to, address, owner_id, sitter_id

FD9: review_idrating, comment, booking_id

FD10: payment_idamount, payment_type, booking_id

FD11: (admin_id, user_id) → ∅ (represents AdminManagement M:N)

FD12: (booking_id, pet_id) → ∅ (represents BookingPets M:N)

FD13: (sitter_id, service_id) → ∅ (represents SitterServices M:N)

FD14: (booking_id, service_id) → ∅ (represents BookingServices M:N)

Candidate keys and primary key selection

Determination of Candidate Keys

To find a candidate key, we must find a minimal set of attributes whose closure contains all attributes of the relation. We begin by classifying all attributes based on where they appear in the Functional Dependencies.

Attribute Classification (Left / Right Side)

Attribute Left Side Right Side Classification
admin_id ✓ (FD2, FD11) Left only
pet_id ✓ (FD6, FD12) Left only
service_id ✓ (FD7, FD13, FD14) Left only
review_id ✓ (FD9) Left only
payment_id ✓ (FD10) Left only
user_id ✓ (FD1, FD11) ✓ (FD2, FD3, FD4) Both
owner_id ✓ (FD3) ✓ (FD6, FD8) Both
sitter_id ✓ (FD4, FD13) ✓ (FD8) Both
pettype_id ✓ (FD5) ✓ (FD6) Both
booking_id ✓ (FD8, FD12, FD14) ✓ (FD9, FD10) Both
username, first_name, last_name, password, email ✓ (FD1) Right only
species, average_lifespan, needs_outdoor_walk ✓ (FD5) Right only
pet_name, photo, age, special_needs, pet_description ✓ (FD6) Right only
service_type, service_description ✓ (FD7) Right only
booking_status, date_from, date_to, address ✓ (FD8) Right only
rating, comment ✓ (FD9) Right only
amount, payment_type ✓ (FD10) Right only

Attributes That Appear ONLY on the Left Side

According to relational theory, attributes that appear only on the left side of functional dependencies can never be derived from any other attribute. Therefore, they must be part of every candidate key:

  • admin_id
  • pet_id
  • service_id
  • review_id
  • payment_id

Closure Computation

Step 1: We start with our mandatory attributes {admin_id, pet_id, service_id, review_id, payment_id} and compute the mathematical closure:

{admin_id, pet_id, service_id, review_id, payment_id}⁺ :

  • From FD2 (admin_iduser_id): We obtain user_id
  • From FD1 (user_idusername, first_name...): We obtain username, first_name, last_name, password, email
  • From FD6 (pet_idpet_name, photo... owner_id, pettype_id): We obtain pet_name, photo, age, special_needs, pet_description, owner_id, pettype_id
  • From FD5 (pettype_idspecies...): We obtain species, average_lifespan, needs_outdoor_walk
  • From FD7 (service_idservice_type...): We obtain service_type, service_description
  • From FD9 (review_idrating, comment, booking_id): We obtain rating, comment, booking_id
  • From FD8 (booking_idbooking_status... sitter_id): We obtain booking_status, date_from, date_to, address, sitter_id
  • From FD10 (payment_idamount, payment_type, booking_id): We obtain amount, payment_type (booking_id is already in closure)
  • From FD3 & FD4: owner_id and sitter_id map to user_id, which is already in the closure.
  • From FD11-FD14: All composite pairings (e.g., booking_id and pet_id) are already present in the closure, satisfying the M:N relations.

Closure = Universal_Relation ✓ (all 33 attributes are successfully derived)

Minimality Check

To formally prove this key is minimal, we test proper subsets of K = {admin_id, pet_id, service_id, review_id, payment_id}:

Subset Closure Equals Universal_Relation? Justification
K − {admin_id} ✗ NO Cannot derive admin_id.
K − {pet_id} ✗ NO Cannot derive pet attributes (pet_name, photo, age, special_needs, pet_description).
K − {service_id} ✗ NO Cannot derive service attributes (service_type, service_description).
K − {review_id} ✗ NO Cannot derive review attributes (review_id, rating, comment).
K − {payment_id} ✗ NO Cannot derive payment attributes (payment_id, amount, payment_type).

Conclusion: K = {admin_id, pet_id, service_id, review_id, payment_id} is minimal and is the strictly mathematically correct candidate key.

Choice of Primary Key

Chosen Primary Key: {admin_id, pet_id, service_id, review_id, payment_id}

Justification: This composite key is the officially proven minimal Candidate Key. Attempting to include other logical identifiers (such as owner_id, sitter_id, or booking_id) would violate the rule of minimality, because the closure proof demonstrates those attributes are functionally dependent on the core transactional IDs.

Current Normal Form Status

Before decomposition, the universal relation R is in 1NF. It satisfies 1NF because all attributes contain atomic values and there are no repeating groups. However, it violates 2NF due to severe partial dependencies on the composite primary key.

Step-by-step decomposition to highest possible normal form

1. Decomposition to 2NF

Relation analyzed: The universal de-normalized relation R.

Issues with higher normal forms: R violates 2NF because non-prime attributes are partially dependent on components of the composite primary key. For example, pet_name depends ONLY on pet_id, not on the entire composite key K.

Action taken: We decompose R into smaller relations by grouping attributes based on their exact determinants (eliminating partial dependencies).

Relations obtained after 2NF decomposition:

Users (user_id, username, first_name, last_name, password, email)

Admins (admin_id, user_id)

PetOwners (owner_id, user_id)

PetSitters (sitter_id, user_id)

PetTypes (pettype_id, species, average_lifespan, needs_outdoor_walk)

Pets (pet_id, pet_name, photo, age, special_needs, pet_description, owner_id, pettype_id)

Services (service_id, service_type, service_description)

Bookings (booking_id, booking_status, date_from, date_to, address, owner_id, sitter_id)

Reviews (review_id, rating, comment, booking_id)

Payments (payment_id, amount, payment_type, booking_id)

AdminManagement (admin_id, user_id)

BookingPets (booking_id, pet_id)

SitterServices (sitter_id, service_id)

BookingServices (booking_id, service_id)

Validation of 2NF Decomposition:

1. Loss-less Join Property: By definition (Heath's Theorem), a decomposition of relation R into R1 and R2 is lossless if the common attributes (R1R2) form a superkey for either R1 or R2. Our 14 relations were not created in a single split, but through successive binary decompositions.

  • For example, in the first step, R was decomposed into R_Users and R_Remainder based on FD1. The intersection of these two relations is user_id. Because user_id is the primary key of R_Users, the functional dependency user_id → username, first_name, last_name, password, email is preserved, proving this specific binary split is lossless.
  • In another step, the remainder was split into R_Pets and a new remainder based on FD6. The intersection is pet_id. Since pet_id is the primary key of R_Pets, this split is also lossless.
  • By recursively applying this exact binary splitting logic along the boundaries of FD2 through FD14, every single decomposition step shared a primary key with the remainder relation. Therefore, the final 14-table schema is strictly lossless.

2. Dependency Preservation: A decomposition is dependency preserving if the union of the functional dependencies in the new tables is equivalent to the original set of FDs (FD1 to FD14). Because we explicitly created our relations based on the exact determinants of our FDs, every single functional dependency is fully localized within a single table.

  • For example, FD5 (pettype_id → species, average_lifespan, needs_outdoor_walk) can be completely verified within the new PetTypes table without needing to perform a JOIN operation with any other table.
  • Similarly, FD7 (service_id → service_type, service_description) is entirely contained within the Services table.

Since all 14 original FDs are preserved natively inside the 14 new relations, the decomposition is strictly dependency preserving.

Step 2: Decomposition to 3NF

Relations analyzed: All 14 relations obtained from the 2NF decomposition.

Issues with higher normal forms: A relation violates 3NF if there is a transitive dependency (A → B → C) where a non-prime attribute determines another non-prime attribute.

Action taken: We inspect the 2NF relations to ensure that all non-key attributes depend only on the primary key, and not on any other non-key attributes. We specifically target tables containing Foreign Keys, as this is where transitive chains can be present.

  • In Bookings: While booking_id determines owner_id, we must check if owner_id (a non-prime attribute in this table) determines any other non-prime attributes within this table. It does not. The descriptive user details for the owner are safely isolated in the Users table.
  • In Reviews: While review_id determines booking_id, we must check if booking_id (a non-prime attribute in this table) determines any other non-prime attributes within this table (such as the sitter's details). It does not. Those details are isolated in their respective tables.
  • In Base Entities (Users, PetTypes, Services): All non-prime attributes logically depend directly on the primary key (example species depends only on pettype_id), making transitive chains mathematically impossible.
  • In Junction Tables (BookingPets, AdminManagement): These relations consist purely of composite primary keys with zero non-prime attributes, automatically satisfying 3NF.

Result: Because we did not store redundant, descriptive data alongside our foreign keys, there are no instances where a non-prime attribute determines another non-prime attribute within the same relation.

Status: All 14 relations naturally satisfy 3NF.

3. Decomposition to BCNF

Relations analyzed: All 14 relations currently in 3NF.

Issues with higher normal forms: A relation violates Boyce-Codd Normal Form (BCNF) if a non-trivial functional dependency X → Y exists where X is not a superkey.

Action taken: We evaluate the left side (determinant) of every functional dependency within all 14 relations:

In Users, user_id is a superkey.

In Admins, admin_id is a superkey.

In PetOwners, owner_id is a superkey.

In PetSitters, sitter_id is a superkey.

In PetTypes, pettype_id is a superkey.

In Pets, pet_id is a superkey.

In Services, service_id is a superkey.

In Bookings, booking_id is a superkey.

In Reviews, review_id is a superkey.

In Payments, payment_id is a superkey.

In AdminManagement, the composite key (admin_id, user_id) is the only determinant, meaning it is trivially a superkey.

In BookingPets, the composite key (booking_id, pet_id) is the only determinant, meaning it is trivially a superkey.

In SitterServices, the composite key (sitter_id, service_id) is the only determinant, meaning it is trivially a superkey.

In BookingServices, the composite key (booking_id, service_id) is the only determinant, meaning it is trivially a superkey.

Result: In every single relation, the determinant is a candidate/superkey.

Status: All 14 relations naturally satisfy BCNF. The decomposition process is complete.

Final result and discussion

Final normalized relational design

The final database schema operates in strict BCNF with the following 14 relations:

Users (user_id, username, first_name, last_name, password, email)

Admins (admin_id, user_id)

PetOwners (owner_id, user_id)

PetSitters (sitter_id, user_id)

PetTypes (pettype_id, species, average_lifespan, needs_outdoor_walk)

Pets (pet_id, pet_name, photo, age, special_needs, pet_description, owner_id, pettype_id)

Services (service_id, service_type, service_description)

Bookings (booking_id, booking_status, date_from, date_to, address, owner_id, sitter_id)

Reviews (review_id, rating, comment, booking_id)

Payments (payment_id, amount, payment_type, booking_id)

AdminManagement (admin_id, user_id)

BookingPets (booking_id, pet_id)

SitterServices (sitter_id, service_id)

BookingServices (booking_id, service_id)

Discussion - compare with Phase 2

With completing the normalization process up to BCNF, we have proven that the resulting schema is identical to the logical design created in Phase 2.

Note: See TracWiki for help on using the wiki.