wiki:Normalization

Version 3 (modified by 221514, 8 days ago) ( diff )

--

Normalization & design improvements

1. Functional Dependencies

Given the ER Diagram:

![[Pasted image 20250705104406.png]]

And the relational database schema:

![[Pasted image 20250705104424.png]]

The database can be made so it is represented in exactly 1 relation with all the attributes in it. Let R be the relation just described: R(`account_id`, accout_email, person_name, person_surname, password, years_experience, - company_name, company_embg -, - years_experience -, - `vehiche_id`, plate_num, model, brand, capacity, year_manuf -, - `route_id`, days_active -, - `trip_id`, date, status, free_seats -, - time_stop -, - `location_id` , loc_name, latitude, longitude - , - `ticket_id`, time_purchased, date_purchased, two_way, student_disc, child_disc, child_embg, parent_embg -, -`payment_id`, total_price, n_tickets -, - `review_id`, review_description, rating - `)

1.1. Initial functional dependencies:

  1. - account_id → accout_email, person_name, person_surname, password
  2. - account_id → years_experience _(if driver)_
  3. - account_id → company_name, company_embg _(if organizer)_
  4. - vehiche_id → plate_num, model, brand, capacity, year_manuf
  5. - plate_num → vehiche_id, model, brand, capacity, year_manuf
  6. - route_id → days_active
  7. - trip_id → date, status, free_seats, route_id
  8. - location_id → loc_name, latitude, longitude
  9. - (latitude, longitude) → location_id, loc_name
  1. - ticket_id → two_way, payment_id
  2. - ticket_id → student_disc _(if student ticket)_
  3. - ticket_id → child_disc, child_embg, parent_embg _(if child ticket)_
  4. - payment_id → total_price, n_tickets, time_purchased, date_purchased
  5. - review_id → review_description, rating
  6. `(trip_id, location_id) → time_stop
  7. - (Possibly) ticket_id → payment_id
  8. company_embg → company_name
  9. ticket_id → child_embg
  10. child_embg → parent_embg

1.2. Derived Functional Dependencies absent from the initial list

Transitive

trip_id → route_id _(from FD 7)_ route_id → days_active _(FD 6)_

Derivation:

  1. trip_id → days_active

ticket_id → payment_id _(FD 15)_ payment_id → total_price, n_tickets, date_purchased, time purchased _(FD 13)_

Derivation:

  1. ticket_id → total_price, n_tickets, date_purchased, time_purchased

1.3. Functional Dependency Analysis using LHS/RHS Classification

This is a method for analyzing functional dependencies by grouping attributes into three categories:

Only LHS — attributes that only appear on the left-hand side (as determinants) Only RHS — attributes that only appear on the right-hand side (as dependents) Both LHS and RHS — attributes that appear on both sides

This technique helps in understanding the roles attributes play in determining others, and it's useful for normalization and candidate key discovery.

Attribute Classification

Only LHS (Left-hand Side only)

Attributes that appear only as determinants:

account_id, trip_id, ticket_id, payment_id, review_id

These are potential candidates for keys, since they determine other attributes and are not themselves determined by anything else.

Only RHS (Right-hand Side only)

Attributes that only appear as dependents:

acc_email, person_name, person_surname, password, years_experience, company_name, company_embg, model, brand, capacity, year_manuf, days_active, date, status, free_seats, time_purchased, date_purchased, two_way, student_disc, child_disc, child_embg, parent_embg, total_price, n_tickets, review_description, rating

These are non-key attributes, which must be fully and non-transitively dependent on candidate keys in higher normal forms (2NF, 3NF).

Both LHS and RHS

Attributes that appear on both sides of FDs:

plate_num, vehicle_id, route_id, location_id, latitude, longitude

These attributes may introduce some transitive dependencies and problems.

Transitive Closure on LHS

Let X+ = {ticket_id, trip_id, account_id, review_id}+

Computing the attribute closure on the set X would give:

X+ = {
	account_id, trip_id, ticket_id, payment_id, review_id,
	acc_email, person_name, person_surname, password,
	years_experience, company_name, company_embg,
	date, status, free_seats, route_id, days_active,
	time_purchased, date_purchased, two_way,
	student_disc, child_disc, child_embg, parent_embg,
	total_price, n_tickets,
	review_description, rating
}

This relation does does not resemble the set of attributes X as a candidate key, but adding location_id and vehicle id to form the set Y = {ticket_id, trip_id, account_id, review_id, location_id, vehicle_id} would result in the following attribute closure:

Y+ = {
	ticket_id, trip_id, account_id, review_id, location_id, vehicle_id,
	time_purchased, date_purchased, two_way,
	student_disc, child_disc, child_embg, parent_embg,
	payment_id, total_price, n_tickets,
	date, status, free_seats, route_id, days_active,
	acc_email, person_name, person_surname, password,
	years_experience, company_name, company_embg,
	review_description, rating,
	plate_num, model, brand, capacity, year_manuf,
	loc_name, latitude, longitude, time_stop
}

Adding only either of location_id or vehicle_id would not give a candidate key, because we would be missing location information without location_id and vehicle information without vehicle_id. Which indeed gives all the attributes. We may treat the set of attributes Y as a a candidate key and declare it as a primary key here.

2. Normalizing

2.1. Current Normal Form

Given the functional dependencies, the relation R is already in 1NF according to it's definition, because:

  1. There are no multi-valued attributes.
  2. The database schema was implemented using SQL DDL, which means that the relational schema is intrinsically in 1NF.

We conclude that R satisfies 1NF. But according to the definition of 2NF, a relation is in 2NF iff:

  1. It is already in First Normal Form (1NF) defined above
  2. There are no partial dependencies of any non-prime attribute (determined attributes) on a strict subset of any candidate key.

But the relation is not in 2NF due to the fact that we have clear partial dependencies and t. A counterexample is route_id → days_active - but there are many more.

2.2. Decomposition of R into in order to achieve BCNF

We may first start splitting R by grouping the determinants on LHS with RHS, to get the following relations:

R1: Account

From: account_id → acc_email, person_name, person_surname, password, years_experience, company_name, company_embg

R1(account_id, acc_email, person_name, person_surname, password, years_experience, company_name, company_embg)

R2: Vehicle

From: vehicle_id → plate_num, model, brand, capacity, year_manuf

R2(vehicle_id, plate_num, model, brand, capacity, year_manuf)

R3: Trip

From: trip_id → date, status, free_seats, route_id

R3(trip_id, date, status, free_seats, route_id)

R4: Location

From: location_id → loc_name, latitude, longitude

R4(location_id, loc_name, latitude, longitude)

R5: Ticket

From: ticket_id → two_way, student_disc, child_disc, child_embg, parent_embg

R5(ticket_id, two_way, student_disc, child_disc, child_embg, parent_embg)

R6: Payment

From: payment_id → total_price, n_tickets, time_purchased, date_purchased

R6(payment_id, total_price, n_tickets, time_purchased, date_purchased)

Note that FD 20 and FD 21 are lost due to the split of payment information in the split of R into R6.

R7: Review

From: review_id → review_description, rating

R7(review_id, review_description, rating)

2.3. Normal form check after first decomposition

R1: Account

Attribute set: account_id, acc_email, person_name, person_surname, password, years_experience, company_name, company_embg

Functional Dependencies (FDs) that apply:

  1. account_id → acc_email, person_name, person_surname, password, years_experience, company_name, company_embg
  2. company_embg → company_name

Candidate Key: account_id

account_id is the only attribute that functionally determines all other attributes in the relation (either directly or indirectly via company_embg). Therefore, it is the sole candidate key and is declared as the primary key.

The relation is in 2NF, as the primary key is a single attribute, meaning no partial dependencies are possible. All non-key attributes are fully functionally dependent on the whole primary key (account_id). But, it is not in 3NF,

Satisfication of 3NF is defined with:

For every non-trivial functional dependency X → A that holds in the relation, at least one of the following is true: X is a superkey A is a prime attribute (i.e., A is part of some candidate key)

Here, company_name is a non-prime attribute that is transitively dependent on the primary key through another non-prime attribute (company_embg).

  • account_id → company_embg → company_name

We conclue that R1 does NOT satisfy 3NF.

Further Decomposition (to satisfy 3NF and BCNF):

We split the relation into two:

R11: Account

Attribute set: `account_id, acc_email, person_name, person_surname, password, years_experience

Functional dependencies: `account_id → acc_email, person_name, person_surname, password, years_experience `

The primary key here would remain account_id since its the only prime attribute and determinant. Now we no longer have the functional dependency present in the R1 relation. This means that the table now satisfies 3NF. According to the definition for BCNF, it requires that for every non-trivial functional dependency X → A, X must be a superkey - meaning that the values this attribute set X is different in each tuple. Our only functional dependency is one where the superkey is present, so we may conclude that R11 is in BCNF.

R12: Company

This relation now only represents company information, with no link to account_id.

Attribute set:

company_embg, company_name

Functional Dependencies:

company_embg → company_name

As before, company_embg (EMBG of the company) is a unique identifier and therefore a candidate key, meaning it is also a superkey. Again, we only have 1 functional dependency with the determinant being a superkey so we may conclude that R12 is in BCNF.

R2: Vehicle

Attribute set: vehicle_id, plate_num, model, brand, capacity, year_manuf

Functional dependencies:

vehicle_id → plate_num, model, brand, capacity, year_manuf plate_num → vehicle_id, model, brand, capacity, year_manuf

Candidate Keys: vehicle_id and plate_num - which are both unique identifiers

Because the keys are single attributes, there are clearly no partial dependencies, so R2 is in 2NF.

No attribute functionally determines another non-key attribute - all the determinants are unique identifiers and candidate keys, so no transitive dependencies exist. This means that R2 is in 3NF. We may choose any of these keys as primary keys as well - but we chose a generated one when creating the relational schema because the plate number may change.

Since the determinants (vehicle_id and plate_num) are candidate keys and therefore superkeys, BCNF is satisfied.

R3: Trip

Attribute set: trip_id, date, status, free_seats, route_id

Functional dependencies: trip_id → date, status, free_seats, route_id

Candidate Key: trip_id

The single attribute candidate key means no partial dependencies and thus 2NF is satisfied.

No transitive dependencies because all non-key attributes depend solely on trip_id. The relation is in 3NF.

Since trip_id is the sole determinant and a candidate key and therefore a superkey, it is in BCNF.

R4: Route

Attribute set: route_id, days_active

Functional dependencies: route_id → days_active

Candidate Key: route_id

Single attribute candidate key, no partial dependencies → 2NF.

No transitive dependencies are present, so 3NF.

Determinant is candidate key again, meaning it satisfies BCNF.

R5: Location

Attribute set: location_id, loc_name, latitude, longitude

Functional dependencies: location_id → loc_name, latitude, longitude

Candidate Key: location_id

A single attribute key is present, indicating no partial dependencies, so 2NF is satisfied.

No transitive dependencies are present, so 3NF.

Determinant is candidate key again, meaning it satisfies BCNF.

R6: Ticket

Attributes: ticket_id, two_way, student_disc, child_disc, child_embg, parent_embg

Functional dependencies:

ticket_id → time_purchased, date_purchased, two_way, student_disc, child_disc, child_embg, parent_embg, payment_id child_embg → child_disc child_embg → parent_embg

Candidate key:

ticket_id - since it uniquely determines all other attributes in R5.

The primary key is a single attribute (ticket_id), so no partial dependencies are possible. This means 2NF is satisfied in R5.

Due to the following transitive dependency

ticket_id → child_embg → child_disc child_disc is transitively dependent on ticket_id via child_embg.

3NF is clearly violated here according to it's definition, so we will decompose again.

Further Decomposition (to satisfy 3NF and BCNF)

R51: Ticket

Attribute set:

ticket_id, two_way, student_disc

Functional Dependencies:

`ticket_id → two_way, student_disc

Candidate Key: ticket_id

With the elimination of the transitive dependency present in R5, this relation now satisfied 3NF.

This relation also satisfies BCNF because ticket_id is a candidate key in the only present FD.

R52: Child_Ticket

Attribute set:

child_embg, child_discout

Functional Dependencies:

child_embg → child_discount

Candidate Key:

child_embg

This relation satisfied 3NF, as there are no transitive dependencies present.

This relation also satisfies BCNF because child_embg is a candidate key in the only present FD.

R7: Payment

Attribute set: payment_id, total_price, n_tickets, time_purchased, date_purchased

Functional Dependencies: payment_id → total_price, n_tickets, time_purchased, date_purchased

Candidate Key: payment_id

There is a single attribute key in the only FD, so no partial dependencies are present, meaning 2NF is satisfied.

No transitive dependencies are present, so R7 satisfies 3NF.

The determinant payment_id is candidate key, so BCNF is satisfied.

R8: Review

Attribute set: review_id, review_description, rating

Functional Dependencies: review_id → review_description, rating

Candidate Key: review_id

Single attribute key → no partial dependencies → 2NF.

No transitive dependencies are present, so R7 satisfies 3NF.

The determinant review_id is candidate key, so BCNF is satisfied.

3. Summary

All resulting tables are in BCNF - and we may further semantically infer other tables that are present in the ER schema, but the core tables that have many attributes present in them are confirmed to exist and adhere to the design of the ER diagram.

Note: See TracWiki for help on using the wiki.