Version 4 (modified by 8 days ago) ( diff ) | ,
---|
Normalization & design improvements
1. Functional Dependencies
Given the relational schema we already created in earlier sections of this project, the database can be made so it is represented in exactly 1 relation with all the attributes from every relation concatenated together. Of course, duplicates would be removed by doing this which is not a problem.
Let R be the relation just described:
R(`account_id`, accout_email, person_name, person_surname, password, years_experience, - company_name, company_embg -, - years_experience -, - `vehiche_id`, plate_num, model, brand, capacity, year_manuf -, - `route_id`, days_active -, - `trip_id`, date, status, free_seats -, - time_stop -, - `location_id` , loc_name, latitude, longitude - , - `ticket_id`, time_purchased, date_purchased, two_way, student_disc, child_disc, child_embg, parent_embg -, -`payment_id`, total_price, n_tickets -, - `review_id`, review_description, rating - `)
1.1. Initial functional dependencies:
- -
account_id → accout_email, person_name, person_surname, password
- -
account_id → years_experience
_(if driver)_ - -
account_id → company_name, company_embg
_(if organizer)_ - -
vehiche_id → plate_num, model, brand, capacity, year_manuf
- -
plate_num → vehiche_id, model, brand, capacity, year_manuf
- -
route_id → days_active
- -
trip_id → date, status, free_seats, route_id
- -
location_id → loc_name, latitude, longitude
- -
(latitude, longitude) → location_id, loc_name
- -
ticket_id → two_way, payment_id
- -
ticket_id → student_disc
_(if student ticket)_ - -
ticket_id → child_disc, child_embg, parent_embg
_(if child ticket)_ - -
payment_id → total_price, n_tickets, time_purchased, date_purchased
- -
review_id → review_description, rating
- `(trip_id, location_id) → time_stop
- - (Possibly)
ticket_id → payment_id
company_embg → company_name
ticket_id → child_embg
child_embg → parent_embg
1.2. Derived Functional Dependencies absent from the initial list
Transitive
trip_id → route_id
_(from FD 7)_route_id → days_active
_(FD 6)_
Derivation:
trip_id → days_active
ticket_id → payment_id
_(FD 15)_
payment_id → total_price, n_tickets, date_purchased, time purchased
_(FD 13)_
Derivation:
ticket_id → total_price, n_tickets, date_purchased, time_purchased
1.3. Functional Dependency Analysis using LHS/RHS Classification
This is a method for analyzing functional dependencies by grouping attributes into three categories:
Only LHS — attributes that only appear on the left-hand side (as determinants) Only RHS — attributes that only appear on the right-hand side (as dependents) Both LHS and RHS — attributes that appear on both sides
This technique helps in understanding the roles attributes play in determining others, and it's useful for normalization and candidate key discovery.
Attribute Classification
Only LHS (Left-hand Side only)
Attributes that appear only as determinants:
account_id, trip_id, ticket_id, payment_id, review_id
These are potential candidates for keys, since they determine other attributes and are not themselves determined by anything else.
Only RHS (Right-hand Side only)
Attributes that only appear as dependents:
acc_email, person_name, person_surname, password, years_experience, company_name, company_embg, model, brand, capacity, year_manuf, days_active, date, status, free_seats, time_purchased, date_purchased, two_way, student_disc, child_disc, child_embg, parent_embg, total_price, n_tickets, review_description, rating
These are non-key attributes, which must be fully and non-transitively dependent on candidate keys in higher normal forms (2NF, 3NF).
Both LHS and RHS
Attributes that appear on both sides of FDs:
plate_num, vehicle_id, route_id, location_id, latitude, longitude
These attributes may introduce some transitive dependencies and problems.
Transitive Closure on LHS
Let X+ = {ticket_id, trip_id, account_id, review_id}+
Computing the attribute closure on the set X would give:
X+ = { account_id, trip_id, ticket_id, payment_id, review_id, acc_email, person_name, person_surname, password, years_experience, company_name, company_embg, date, status, free_seats, route_id, days_active, time_purchased, date_purchased, two_way, student_disc, child_disc, child_embg, parent_embg, total_price, n_tickets, review_description, rating }
This relation does does not resemble the set of attributes X as a candidate key, but adding location_id
and vehicle id
to form the set Y = {ticket_id, trip_id, account_id, review_id, location_id, vehicle_id}
would result in the following attribute closure:
Y+ = { ticket_id, trip_id, account_id, review_id, location_id, vehicle_id, time_purchased, date_purchased, two_way, student_disc, child_disc, child_embg, parent_embg, payment_id, total_price, n_tickets, date, status, free_seats, route_id, days_active, acc_email, person_name, person_surname, password, years_experience, company_name, company_embg, review_description, rating, plate_num, model, brand, capacity, year_manuf, loc_name, latitude, longitude, time_stop }
Adding only either of location_id
or vehicle_id
would not give a candidate key, because we would be missing location information without location_id
and vehicle information without vehicle_id
. Which indeed gives all the attributes. We may treat the set of attributes Y
as a a candidate key and declare it as a primary key here.
2. Normalizing
2.1. Current Normal Form
Given the functional dependencies, the relation R
is already in 1NF according to it's definition, because:
- There are no multi-valued attributes.
- The database schema was implemented using SQL DDL, which means that the relational schema is intrinsically in 1NF.
We conclude that R
satisfies 1NF. But according to the definition of 2NF, a relation is in 2NF iff:
- It is already in First Normal Form (1NF) defined above
- There are no partial dependencies of any non-prime attribute (determined attributes) on a strict subset of any candidate key.
But the relation is not in 2NF due to the fact that we have clear partial dependencies and t. A counterexample is route_id → days_active
- but there are many more.
2.2. Decomposition of R
into in order to achieve BCNF
We may first start splitting R
by grouping the determinants on LHS with RHS, to get the following relations:
R1: Account
From: account_id → acc_email, person_name, person_surname, password, years_experience, company_name, company_embg
R1(account_id, acc_email, person_name, person_surname, password, years_experience, company_name, company_embg)
R2: Vehicle
From: vehicle_id → plate_num, model, brand, capacity, year_manuf
R2(vehicle_id, plate_num, model, brand, capacity, year_manuf)
R3: Trip
From: trip_id → date, status, free_seats, route_id
R3(trip_id, date, status, free_seats, route_id)
R4: Location
From: location_id → loc_name, latitude, longitude
R4(location_id, loc_name, latitude, longitude)
R5: Ticket
From: ticket_id → two_way, student_disc, child_disc, child_embg, parent_embg
R5(ticket_id, two_way, student_disc, child_disc, child_embg, parent_embg)
R6: Payment
From: payment_id → total_price, n_tickets, time_purchased, date_purchased
R6(payment_id, total_price, n_tickets, time_purchased, date_purchased)
Note that FD 20 and FD 21 are lost due to the split of payment information in the split of R
into R6
.
R7: Review
From: review_id → review_description, rating
R7(review_id, review_description, rating)
2.3. Normal form check after first decomposition
R1: Account
Attribute set:
account_id, acc_email, person_name, person_surname, password, years_experience, company_name, company_embg
Functional Dependencies (FDs) that apply:
account_id → acc_email, person_name, person_surname, password, years_experience, company_name, company_embg
company_embg → company_name
Candidate Key:
account_id
account_id
is the only attribute that functionally determines all other attributes in the relation (either directly or indirectly via company_embg
). Therefore, it is the sole candidate key and is declared as the primary key.
The relation is in 2NF, as the primary key is a single attribute, meaning no partial dependencies are possible. All non-key attributes are fully functionally dependent on the whole primary key (account_id
). But, it is not in 3NF,
Satisfication of 3NF is defined with:
For every non-trivial functional dependency X → A that holds in the relation, at least one of the following is true: X is a superkey A is a prime attribute (i.e., A is part of some candidate key)
Here, company_name
is a non-prime attribute that is transitively dependent on the primary key through another non-prime attribute (company_embg
).
account_id → company_embg → company_name
We conclue that R1 does NOT satisfy 3NF.
Further Decomposition (to satisfy 3NF and BCNF):
We split the relation into two:
R11: Account
Attribute set: `account_id, acc_email, person_name, person_surname, password, years_experience
Functional dependencies: `account_id → acc_email, person_name, person_surname, password, years_experience `
The primary key here would remain account_id
since its the only prime attribute and determinant. Now we no longer have the functional dependency present in the R1 relation. This means that the table now satisfies 3NF.
According to the definition for BCNF, it requires that for every non-trivial functional dependency X → A, X must be a superkey - meaning that the values this attribute set X is different in each tuple.
Our only functional dependency is one where the superkey is present, so we may conclude that R11 is in BCNF.
R12: Company
This relation now only represents company information, with no link to account_id
.
Attribute set:
company_embg, company_name
Functional Dependencies:
company_embg → company_name
As before, company_embg
(EMBG of the company) is a unique identifier and therefore a candidate key, meaning it is also a superkey. Again, we only have 1 functional dependency with the determinant being a superkey so we may conclude that R12 is in BCNF.
R2: Vehicle
Attribute set:
vehicle_id, plate_num, model, brand, capacity, year_manuf
Functional dependencies:
vehicle_id → plate_num, model, brand, capacity, year_manuf
plate_num → vehicle_id, model, brand, capacity, year_manuf
Candidate Keys:
vehicle_id
and plate_num
- which are both unique identifiers
Because the keys are single attributes, there are clearly no partial dependencies, so R2 is in 2NF.
No attribute functionally determines another non-key attribute - all the determinants are unique identifiers and candidate keys, so no transitive dependencies exist. This means that R2 is in 3NF. We may choose any of these keys as primary keys as well - but we chose a generated one when creating the relational schema because the plate number may change.
Since the determinants (vehicle_id
and plate_num
) are candidate keys and therefore superkeys, BCNF is satisfied.
R3: Trip
Attribute set:
trip_id, date, status, free_seats, route_id
Functional dependencies:
trip_id → date, status, free_seats, route_id
Candidate Key:
trip_id
The single attribute candidate key means no partial dependencies and thus 2NF is satisfied.
No transitive dependencies because all non-key attributes depend solely on trip_id
. The relation is in 3NF.
Since trip_id
is the sole determinant and a candidate key and therefore a superkey, it is in BCNF.
R4: Route
Attribute set:
route_id, days_active
Functional dependencies:
route_id → days_active
Candidate Key:
route_id
Single attribute candidate key, no partial dependencies → 2NF.
No transitive dependencies are present, so 3NF.
Determinant is candidate key again, meaning it satisfies BCNF.
R5: Location
Attribute set:
location_id, loc_name, latitude, longitude
Functional dependencies:
location_id → loc_name, latitude, longitude
Candidate Key:
location_id
A single attribute key is present, indicating no partial dependencies, so 2NF is satisfied.
No transitive dependencies are present, so 3NF.
Determinant is candidate key again, meaning it satisfies BCNF.
R6: Ticket
Attributes:
ticket_id, two_way, student_disc, child_disc, child_embg, parent_embg
Functional dependencies:
ticket_id → time_purchased, date_purchased, two_way, student_disc, child_disc, child_embg, parent_embg, payment_id
child_embg → child_disc
child_embg → parent_embg
Candidate key:
ticket_id
- since it uniquely determines all other attributes in R5.
The primary key is a single attribute (ticket_id
), so no partial dependencies are possible. This means 2NF is satisfied in R5.
Due to the following transitive dependency
ticket_id → child_embg → child_disc
child_disc
is transitively dependent on ticket_id
via child_embg
.
3NF is clearly violated here according to it's definition, so we will decompose again.
Further Decomposition (to satisfy 3NF and BCNF)
R51: Ticket
Attribute set:
ticket_id, two_way, student_disc
Functional Dependencies:
`ticket_id → two_way, student_disc
Candidate Key: ticket_id
With the elimination of the transitive dependency present in R5, this relation now satisfied 3NF.
This relation also satisfies BCNF because ticket_id
is a candidate key in the only present FD.
R52: Child_Ticket
Attribute set:
child_embg, child_discout
Functional Dependencies:
child_embg → child_discount
Candidate Key:
child_embg
This relation satisfied 3NF, as there are no transitive dependencies present.
This relation also satisfies BCNF because child_embg
is a candidate key in the only present FD.
R7: Payment
Attribute set:
payment_id, total_price, n_tickets, time_purchased, date_purchased
Functional Dependencies:
payment_id → total_price, n_tickets, time_purchased, date_purchased
Candidate Key:
payment_id
There is a single attribute key in the only FD, so no partial dependencies are present, meaning 2NF is satisfied.
No transitive dependencies are present, so R7 satisfies 3NF.
The determinant payment_id
is candidate key, so BCNF is satisfied.
R8: Review
Attribute set:
review_id, review_description, rating
Functional Dependencies:
review_id → review_description, rating
Candidate Key:
review_id
Single attribute key → no partial dependencies → 2NF.
No transitive dependencies are present, so R7 satisfies 3NF.
The determinant review_id
is candidate key, so BCNF is satisfied.
3. Summary
All resulting tables are in BCNF - and we may further semantically infer other tables that are present in the ER schema, but the core tables that have many attributes present in them are confirmed to exist and adhere to the design of the ER diagram.