= Normalization & design improvements = = 1. Functional Dependencies = Given the relational schema we already created in earlier sections of this project, the database can be made so it is represented in exactly 1 relation with all the attributes from every relation concatenated together. Of course, duplicates would be removed by doing this which is not a problem. Let R be the relation just described: {{{ R( n_tickets,vehicle_id,route_id,latitude, child_disc,rating,free_seats,date, days_active,acc_email,plate_num,person_surname, years_experience,trip_id,ticket_id,company_embg, account_id,time_stop,model,time_purchased,payment_id, total_price,person_name,student_disc, company_name,brand,account_email,location_id, child_embg,longitude,loc_name,two_way, capacity,status,parent_embg,date_purchased, password,review_description,year_manuf,review_id ) }}} == 1.1. Initial functional dependencies: == 1. - `account_id → accout_email, person_name, person_surname, password` 2. - `account_id → years_experience` _(if driver)_ 3. - `account_id → company_name, company_embg` _(if organizer)_ 4. - `vehiche_id → plate_num, model, brand, capacity, year_manuf` 5. - `plate_num → vehiche_id, model, brand, capacity, year_manuf` 6. - `route_id → days_active` 7. - `trip_id → date, status, free_seats, route_id` 8. - `location_id → loc_name, latitude, longitude` 9. - `(latitude, longitude) → location_id, loc_name` 10. - `ticket_id → two_way, payment_id` 11. - `ticket_id → student_disc` _(if student ticket)_ 12. - `ticket_id → child_disc, child_embg, parent_embg` _(if child ticket)_ 13. - `payment_id → total_price, n_tickets, time_purchased, date_purchased` 14. - `review_id → review_description, rating` 15. `(trip_id, location_id) → time_stop 16. - (Possibly) `ticket_id → payment_id` 17. `company_embg → company_name` 18. `ticket_id → child_embg` 19. `child_embg → parent_embg` == 1.2. Derived Functional Dependencies absent from the initial list == === Transitive === `trip_id → route_id` _(from FD 7)_ `route_id → days_active` _(FD 6)_ Derivation: 20. '''`trip_id → days_active`''' `ticket_id → payment_id` _(FD 15)_ `payment_id → total_price, n_tickets, date_purchased, time purchased` _(FD 13)_ Derivation: 21. '''`ticket_id → total_price, n_tickets, date_purchased, time_purchased`''' == 1.3. '''Functional Dependency Analysis using LHS/RHS Classification''' == This is a method for analyzing functional dependencies by grouping attributes into three categories: '''Only LHS''' — attributes that only appear on the left-hand side (as determinants) '''Only RHS''' — attributes that only appear on the right-hand side (as dependents) '''Both LHS and RHS''' — attributes that appear on both sides This technique helps in understanding the roles attributes play in determining others, and it's useful for normalization and candidate key discovery. === '''Attribute Classification''' === ==== '''Only LHS (Left-hand Side only)''' ==== Attributes that appear only as determinants: `account_id, trip_id, ticket_id, payment_id, review_id` These are potential '''candidates for keys''', since they determine other attributes and are not themselves determined by anything else. ==== '''Only RHS (Right-hand Side only)''' ==== Attributes that only appear as dependents: `acc_email, person_name, person_surname, password, years_experience, company_name, company_embg, model, brand, capacity, year_manuf, days_active, date, status, free_seats, time_purchased, date_purchased, two_way, student_disc, child_disc, child_embg, parent_embg, total_price, n_tickets, review_description, rating` These are '''non-key attributes''', which must be fully and non-transitively dependent on candidate keys in higher normal forms ('''2NF''', '''3NF'''). ==== '''Both LHS and RHS''' ==== Attributes that appear on both sides of FDs: `plate_num, vehicle_id, route_id, location_id, latitude, longitude` These attributes may introduce some transitive dependencies and problems. == Transitive Closure on LHS == Let `X+ = {ticket_id, trip_id, account_id, review_id}+` Computing the attribute closure on the set X would give: {{{ X+ = { account_id, trip_id, ticket_id, payment_id, review_id, acc_email, person_name, person_surname, password, years_experience, company_name, company_embg, date, status, free_seats, route_id, days_active, time_purchased, date_purchased, two_way, student_disc, child_disc, child_embg, parent_embg, total_price, n_tickets, review_description, rating } }}} This relation does does not resemble the set of attributes X as a candidate key, but adding `location_id` and `vehicle id` to form the set `Y = {ticket_id, trip_id, account_id, review_id, location_id, vehicle_id}` would result in the following attribute closure: {{{ Y+ = { ticket_id, trip_id, account_id, review_id, location_id, vehicle_id, time_purchased, date_purchased, two_way, student_disc, child_disc, child_embg, parent_embg, payment_id, total_price, n_tickets, date, status, free_seats, route_id, days_active, acc_email, person_name, person_surname, password, years_experience, company_name, company_embg, review_description, rating, plate_num, model, brand, capacity, year_manuf, loc_name, latitude, longitude, time_stop } }}} Adding only either of `location_id` or `vehicle_id` would not give a candidate key, because we would be missing location information without `location_id` and vehicle information without `vehicle_id`. Which indeed gives all the attributes. We may treat the set of attributes `Y` as a a candidate key and declare it as a primary key here. = 2. Normalizing = == 2.1. Current Normal Form == Given the functional dependencies, the relation `R` is already in '''1NF''' according to it's definition, because: 1. There are no multi-valued attributes. 2. The database schema was implemented using SQL DDL, which means that the relational schema is intrinsically in '''1NF'''. We conclude that `R` satisfies '''1NF'''. But according to the definition of '''2NF''', a relation is in '''2NF''' iff: 1. It is already in '''First Normal Form (1NF)''' defined above 2. There are no partial dependencies of any non-prime attribute (determined attributes) on a strict subset of any candidate key. But the relation is not in '''2NF''' due to the fact that we have clear partial dependencies and t. A counterexample is `route_id → days_active` - but there are many more. == 2.2. Decomposition of `R` into in order to achieve BCNF == We may first start splitting `R` by grouping the determinants on LHS with RHS, to get the following relations: === R1: Account === From: `account_id → acc_email, person_name, person_surname, password, years_experience, company_name, company_embg` `R1(account_id, acc_email, person_name, person_surname, password, years_experience, company_name, company_embg)` === R2: Vehicle === From: `vehicle_id → plate_num, model, brand, capacity, year_manuf` `R2(vehicle_id, plate_num, model, brand, capacity, year_manuf)` === R3: Trip === From: `trip_id → date, status, free_seats, route_id` `R3(trip_id, date, status, free_seats, route_id)` === R4: Location === From: `location_id → loc_name, latitude, longitude` `R4(location_id, loc_name, latitude, longitude)` === R5: Ticket === From: `ticket_id → two_way, student_disc, child_disc, child_embg, parent_embg` `R5(ticket_id, two_way, student_disc, child_disc, child_embg, parent_embg)` === R6: Payment === From: `payment_id → total_price, n_tickets, time_purchased, date_purchased` `R6(payment_id, total_price, n_tickets, time_purchased, date_purchased)` Note that FD 20 and FD 21 are lost due to the split of payment information in the split of `R` into `R6`. === R7: Review === From: `review_id → review_description, rating` `R7(review_id, review_description, rating)` == 2.3. Normal form check after first decomposition == === '''R1: Account''' === '''Attribute set:''' `account_id, acc_email, person_name, person_surname, password, years_experience, company_name, company_embg` '''Functional Dependencies (FDs) that apply:''' 1. `account_id → acc_email, person_name, person_surname, password, years_experience, company_name, company_embg` 2. `company_embg → company_name` '''Candidate Key:''' `account_id` `account_id` is the only attribute that functionally determines all other attributes in the relation (either directly or indirectly via `company_embg`). Therefore, it is the sole candidate key and is declared as the primary key. The relation is in '''2NF''', as the primary key is a single attribute, meaning no partial dependencies are possible. All non-key attributes are fully functionally dependent on the whole primary key (`account_id`). But, it is not in '''3NF''', Satisfication of '''3NF''' is defined with: For every non-trivial functional dependency '''X → A''' that holds in the relation, at least one of the following is true: '''X is a superkey''' '''A is a prime attribute''' (i.e., A is part of some candidate key) Here, `company_name` is a '''non-prime attribute''' that is transitively dependent on the primary key through another non-prime attribute (`company_embg`). - `account_id → company_embg → company_name` We conclue that R1 does NOT satisfy '''3NF'''. === Further Decomposition (to satisfy '''3NF''' and '''BCNF'''): === We split the relation into two: ==== '''R11: Account''' ==== Attribute set: `account_id, acc_email, person_name, person_surname, password, years_experience Functional dependencies: `account_id → acc_email, person_name, person_surname, password, years_experience ` The primary key here would remain `account_id` since its the only prime attribute and determinant. Now we no longer have the functional dependency present in the R1 relation. This means that the table now satisfies '''3NF'''. According to the definition for '''BCNF''', it requires that for every non-trivial functional dependency '''X → A''', '''X must be a superkey''' - meaning that the values this attribute set X is different in each tuple. Our only functional dependency is one where the superkey is present, so we may conclude that '''R11''' is in '''BCNF'''. ==== '''R12: Company''' ==== This relation now '''only''' represents company information, with no link to `account_id`. '''Attribute set:''' `company_embg, company_name` '''Functional Dependencies:''' `company_embg → company_name` As before, `company_embg` (EMBG of the company) is a unique identifier and therefore a candidate key, meaning it is also a superkey. Again, we only have 1 functional dependency with the determinant being a superkey so we may conclude that '''R12''' is in '''BCNF'''. === '''R2: Vehicle''' === Attribute set: `vehicle_id, plate_num, model, brand, capacity, year_manuf` Functional dependencies: `vehicle_id → plate_num, model, brand, capacity, year_manuf` `plate_num → vehicle_id, model, brand, capacity, year_manuf` Candidate Keys: `vehicle_id` and `plate_num` - which are both unique identifiers Because the keys are single attributes, there are clearly no partial dependencies, so '''R2''' is in '''2NF'''. No attribute functionally determines another non-key attribute - all the determinants are unique identifiers and candidate keys, so no transitive dependencies exist. This means that '''R2''' is in '''3NF'''. We may choose any of these keys as primary keys as well - but we chose a generated one when creating the relational schema because the plate number may change. Since the determinants (`vehicle_id` and `plate_num`) are candidate keys and therefore superkeys, '''BCNF''' is satisfied. === '''R3: Trip''' === Attribute set: `trip_id, date, status, free_seats, route_id` Functional dependencies: `trip_id → date, status, free_seats, route_id` Candidate Key: `trip_id` The single attribute candidate key means no partial dependencies and thus '''2NF''' is satisfied. No transitive dependencies because all non-key attributes depend solely on `trip_id`. The relation is in '''3NF'''. Since `trip_id` is the sole determinant and a candidate key and therefore a superkey, it is in '''BCNF'''. === '''R4: Route''' === Attribute set: `route_id, days_active` Functional dependencies: `route_id → days_active` Candidate Key: `route_id` Single attribute candidate key, no partial dependencies → '''2NF'''. No transitive dependencies are present, so '''3NF'''. Determinant is candidate key again, meaning it satisfies '''BCNF'''. === '''R5: Location''' === Attribute set: `location_id, loc_name, latitude, longitude` Functional dependencies: `location_id → loc_name, latitude, longitude` Candidate Key: `location_id` A single attribute key is present, indicating no partial dependencies, so '''2NF''' is satisfied. No transitive dependencies are present, so '''3NF'''. Determinant is candidate key again, meaning it satisfies '''BCNF'''. === '''R6: Ticket''' === '''Attributes:''' `ticket_id, two_way, student_disc, child_disc, child_embg, parent_embg` Functional dependencies: `ticket_id → time_purchased, date_purchased, two_way, student_disc, child_disc, child_embg, parent_embg, payment_id` `child_embg → child_disc` `child_embg → parent_embg` Candidate key: `ticket_id` - since it uniquely determines all other attributes in R5. The primary key is a single attribute (`ticket_id`), so no partial dependencies are possible. This means '''2NF''' is satisfied in R5. Due to the following transitive dependency `ticket_id → child_embg → child_disc` `child_disc` is transitively dependent on `ticket_id` via `child_embg`. '''3NF''' is clearly violated here according to it's definition, so we will decompose again. === Further Decomposition (to satisfy '''3NF''' and '''BCNF''') === ==== '''R51: Ticket''' ==== Attribute set: `ticket_id, two_way, student_disc` Functional Dependencies: `ticket_id → two_way, student_disc Candidate Key: `ticket_id` With the elimination of the transitive dependency present in R5, this relation now satisfied '''3NF'''. This relation also satisfies '''BCNF''' because `ticket_id` is a candidate key in the only present FD. ==== '''R52: Child_Ticket''' ==== Attribute set: `child_embg, child_discout` Functional Dependencies: `child_embg → child_discount` Candidate Key: `child_embg` This relation satisfied '''3NF''', as there are no transitive dependencies present. This relation also satisfies '''BCNF''' because `child_embg` is a candidate key in the only present FD. === '''R7: Payment''' === Attribute set: `payment_id, total_price, n_tickets, time_purchased, date_purchased` Functional Dependencies: `payment_id → total_price, n_tickets, time_purchased, date_purchased` Candidate Key: `payment_id` There is a single attribute key in the only FD, so no partial dependencies are present, meaning '''2NF''' is satisfied. No transitive dependencies are present, so R7 satisfies '''3NF'''. The determinant `payment_id` is candidate key, so '''BCNF''' is satisfied. === '''R8: Review''' === Attribute set: `review_id, review_description, rating` Functional Dependencies: `review_id → review_description, rating` Candidate Key: `review_id` Single attribute key → no partial dependencies → '''2NF'''. No transitive dependencies are present, so R7 satisfies '''3NF'''. The determinant `review_id` is candidate key, so '''BCNF''' is satisfied. = 3. Summary = All resulting tables are in BCNF - and we may further semantically infer other tables that are present in the ER schema, but the core tables that have many attributes present in them are confirmed to exist and adhere to the design of the ER diagram.