Changes between Version 2 and Version 3 of Normalization


Ignore:
Timestamp:
07/21/25 23:51:41 (7 days ago)
Author:
221514
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Normalization

    v2 v3  
    11= Normalization & design improvements =
     2
     3= 1. Functional Dependencies =
     4
     5Given the ER Diagram:
     6
     7{{{
     8![[Pasted image 20250705104406.png]]
     9}}}
     10
     11And the relational database schema:
     12
     13{{{
     14![[Pasted image 20250705104424.png]]
     15}}}
     16
     17The database can be made so it is represented in exactly 1 relation with all the attributes in it.
     18Let R be the relation just described:
     19{{{R(`account_id`, accout_email, person_name, person_surname, password, years_experience, - company_name, company_embg -, - years_experience -, - `vehiche_id`, plate_num, model, brand, capacity, year_manuf -, - `route_id`, days_active -, - `trip_id`, date, status, free_seats -, - time_stop -, - `location_id` , loc_name, latitude, longitude - , - `ticket_id`, time_purchased, date_purchased, two_way, student_disc, child_disc, child_embg, parent_embg -, -`payment_id`, total_price, n_tickets -, - `review_id`,  review_description, rating - `)}}}
     20
     21== 1.1. Initial functional dependencies: ==
     22 1. - `account_id → accout_email, person_name, person_surname, password`
     23 2. - `account_id → years_experience` _(if driver)_
     24 3. - `account_id → company_name, company_embg` _(if organizer)_
     25 4. - `vehiche_id → plate_num, model, brand, capacity, year_manuf`
     26 5. - `plate_num → vehiche_id, model, brand, capacity, year_manuf`
     27 6. - `route_id → days_active`
     28 7. - `trip_id → date, status, free_seats, route_id`
     29 8. - `location_id → loc_name, latitude, longitude`
     30 9. - `(latitude, longitude) → location_id, loc_name`
     3110. - `ticket_id → two_way, payment_id`
     3211. - `ticket_id → student_disc` _(if student ticket)_
     3312. - `ticket_id → child_disc, child_embg, parent_embg` _(if child ticket)_
     3413. - `payment_id → total_price, n_tickets, time_purchased, date_purchased`
     3514. - `review_id → review_description, rating`
     3615. `(trip_id, location_id) → time_stop
     3716. - (Possibly) `ticket_id → payment_id`
     3817. `company_embg → company_name`
     3918. `ticket_id → child_embg`
     4019. `child_embg → parent_embg`
     41
     42== 1.2. Derived Functional Dependencies absent from the initial list ==
     43
     44=== Transitive ===
     45 `trip_id → route_id` _(from FD 7)_
     46 `route_id → days_active` _(FD 6)_
     47
     48Derivation:
     4920. '''`trip_id → days_active`'''
     50
     51`ticket_id → payment_id` _(FD 15)_
     52`payment_id → total_price, n_tickets, date_purchased, time purchased` _(FD 13)_
     53
     54Derivation:
     5521. '''`ticket_id → total_price, n_tickets, date_purchased, time_purchased`'''
     56
     57== 1.3. '''Functional Dependency Analysis using LHS/RHS Classification''' ==
     58
     59This is a method for analyzing functional dependencies by grouping attributes into three categories:
     60
     61'''Only LHS''' — attributes that only appear on the left-hand side (as determinants)
     62'''Only RHS''' — attributes that only appear on the right-hand side (as dependents)
     63'''Both LHS and RHS''' — attributes that appear on both sides
     64
     65This technique helps in understanding the roles attributes play in determining others, and it's useful for normalization and candidate key discovery.
     66
     67=== '''Attribute Classification''' ===
     68
     69==== '''Only LHS (Left-hand Side only)''' ====
     70
     71Attributes that appear only as determinants:
     72
     73`account_id, trip_id, ticket_id, payment_id, review_id`
     74
     75These are potential '''candidates for keys''', since they determine other attributes and are not themselves determined by anything else.
     76
     77==== '''Only RHS (Right-hand Side only)''' ====
     78
     79Attributes that only appear as dependents:
     80
     81`acc_email, person_name, person_surname, password, years_experience, company_name, company_embg, model, brand, capacity, year_manuf, days_active, date, status, free_seats, time_purchased, date_purchased, two_way, student_disc, child_disc, child_embg, parent_embg, total_price, n_tickets, review_description, rating`
     82
     83These are '''non-key attributes''', which must be fully and non-transitively dependent on candidate keys in higher normal forms ('''2NF''', '''3NF''').
     84
     85==== '''Both LHS and RHS''' ====
     86
     87Attributes that appear on both sides of FDs:
     88
     89`plate_num, vehicle_id, route_id, location_id, latitude, longitude`
     90
     91These attributes may introduce some transitive dependencies and problems.
     92
     93== Transitive Closure on LHS ==
     94
     95Let `X+ = {ticket_id, trip_id, account_id, review_id}+`
     96
     97Computing the attribute closure on the set X would give:
     98
     99{{{
     100X+ = {
     101        account_id, trip_id, ticket_id, payment_id, review_id,
     102        acc_email, person_name, person_surname, password,
     103        years_experience, company_name, company_embg,
     104        date, status, free_seats, route_id, days_active,
     105        time_purchased, date_purchased, two_way,
     106        student_disc, child_disc, child_embg, parent_embg,
     107        total_price, n_tickets,
     108        review_description, rating
     109}
     110}}}
     111
     112This relation does does not resemble the set of attributes X as a candidate key, but adding `location_id` and `vehicle id` to form the set `Y = {ticket_id, trip_id, account_id, review_id, location_id, vehicle_id}` would result in the following attribute closure:
     113
     114{{{
     115Y+ = {
     116        ticket_id, trip_id, account_id, review_id, location_id, vehicle_id,
     117        time_purchased, date_purchased, two_way,
     118        student_disc, child_disc, child_embg, parent_embg,
     119        payment_id, total_price, n_tickets,
     120        date, status, free_seats, route_id, days_active,
     121        acc_email, person_name, person_surname, password,
     122        years_experience, company_name, company_embg,
     123        review_description, rating,
     124        plate_num, model, brand, capacity, year_manuf,
     125        loc_name, latitude, longitude, time_stop
     126}
     127}}}
     128
     129Adding only either of `location_id` or `vehicle_id` would not give a candidate key, because we would be missing location information without `location_id` and vehicle information without `vehicle_id`. Which indeed gives all the attributes. We may treat the set of attributes `Y` as a a candidate key and declare it as a primary key here.
     130
     131= 2. Normalizing =
     132
     133== 2.1. Current Normal Form ==
     134
     135Given the functional dependencies, the relation `R` is already in '''1NF''' according to it's definition, because:
     136
     137 1. There are no multi-valued attributes.
     138 2. The database schema was implemented using SQL DDL, which means that the relational schema is intrinsically in '''1NF'''.
     139
     140We conclude that `R` satisfies '''1NF'''. But according to the definition of '''2NF''', a relation is in '''2NF''' iff:
     141
     142 1. It is already in '''First Normal Form (1NF)''' defined above
     143 2. There are no partial dependencies of any non-prime attribute (determined attributes) on a strict subset of any candidate key.
     144
     145But the relation is not in '''2NF''' due to the fact that we have clear partial dependencies and t. A counterexample is `route_id → days_active` - but there are many more.
     146
     147== 2.2. Decomposition of `R` into in order to achieve BCNF ==
     148
     149We may first start splitting `R` by grouping the determinants on LHS with RHS, to get the following relations:
     150
     151=== R1: Account ===
     152
     153From: `account_id → acc_email, person_name, person_surname, password, years_experience, company_name, company_embg`
     154
     155`R1(account_id, acc_email, person_name, person_surname, password, years_experience, company_name, company_embg)`
     156
     157=== R2: Vehicle ===
     158
     159From: `vehicle_id → plate_num, model, brand, capacity, year_manuf`
     160
     161`R2(vehicle_id, plate_num, model, brand, capacity, year_manuf)`
     162
     163=== R3: Trip ===
     164
     165From: `trip_id → date, status, free_seats, route_id`
     166
     167`R3(trip_id, date, status, free_seats, route_id)`
     168
     169=== R4: Location ===
     170
     171From: `location_id → loc_name, latitude, longitude`
     172
     173`R4(location_id, loc_name, latitude, longitude)`
     174
     175=== R5: Ticket ===
     176
     177From: `ticket_id → two_way, student_disc, child_disc, child_embg, parent_embg`
     178
     179`R5(ticket_id, two_way, student_disc, child_disc, child_embg, parent_embg)`
     180
     181=== R6: Payment ===
     182
     183From: `payment_id → total_price, n_tickets, time_purchased, date_purchased`
     184
     185`R6(payment_id, total_price, n_tickets, time_purchased, date_purchased)`
     186
     187Note that FD 20 and FD 21 are lost due to the split of payment information in the split of `R` into `R6`.
     188
     189=== R7: Review ===
     190
     191From: `review_id → review_description, rating`
     192
     193`R7(review_id, review_description, rating)`
     194
     195== 2.3. Normal form check after first decomposition ==
     196=== '''R1: Account''' ===
     197
     198'''Attribute set:''' 
     199`account_id, acc_email, person_name, person_surname, password, years_experience, company_name, company_embg`
     200
     201'''Functional Dependencies (FDs) that apply:'''
     202
     203 1. `account_id → acc_email, person_name, person_surname, password, years_experience, company_name, company_embg`
     204 2. `company_embg → company_name`
     205
     206'''Candidate Key:''' 
     207`account_id`
     208
     209`account_id` is the only attribute that functionally determines all other attributes in the relation (either directly or indirectly via `company_embg`). Therefore, it is the sole candidate key and is declared as the primary key.
     210
     211The relation is in '''2NF''', as the primary key is a single attribute, meaning no partial dependencies are possible. All non-key attributes are fully functionally dependent on the whole primary key (`account_id`). But, it is not in '''3NF''',
     212
     213Satisfication of '''3NF''' is defined with:
     214
     215For every non-trivial functional dependency 
     216'''X → A''' 
     217that holds in the relation, at least one of the following is true:
     218'''X is a superkey'''
     219'''A is a prime attribute''' (i.e., A is part of some candidate key)
     220
     221Here, `company_name` is a '''non-prime attribute''' that is transitively dependent on the primary key through another non-prime attribute (`company_embg`).
     222
     223- `account_id → company_embg → company_name`
     224
     225We conclue that R1 does NOT satisfy '''3NF'''.
     226
     227=== Further Decomposition (to satisfy '''3NF''' and '''BCNF'''): ===
     228
     229We split the relation into two:
     230
     231==== '''R11: Account''' ====
     232
     233Attribute set:
     234`account_id, acc_email, person_name, person_surname, password, years_experience
     235
     236Functional dependencies:
     237`account_id → acc_email, person_name, person_surname, password, years_experience
     238`
     239
     240The primary key here would remain `account_id` since its the only prime attribute and determinant. Now we no longer have the functional dependency present in the R1 relation. This means that the table now satisfies '''3NF'''.
     241According to the definition for '''BCNF''', it requires that for every non-trivial functional dependency '''X → A''', '''X must be a superkey''' - meaning that the values this attribute set X is different in each tuple.
     242Our only functional dependency is one where the superkey is present, so we may conclude that '''R11''' is in '''BCNF'''.
     243
     244==== '''R12: Company''' ====
     245
     246This relation now '''only''' represents company information, with no link to `account_id`.
     247
     248'''Attribute set:'''
     249
     250`company_embg, company_name`
     251
     252'''Functional Dependencies:'''
     253
     254`company_embg → company_name`
     255
     256As before, `company_embg` (EMBG of the company) is a unique identifier and therefore a candidate key, meaning it is also a superkey. Again, we only have 1 functional dependency with the determinant being a superkey so we may conclude that '''R12''' is in '''BCNF'''.
     257
     258=== '''R2: Vehicle''' ===
     259
     260Attribute set: 
     261`vehicle_id, plate_num, model, brand, capacity, year_manuf`
     262
     263Functional dependencies:
     264
     265`vehicle_id → plate_num, model, brand, capacity, year_manuf`
     266`plate_num → vehicle_id, model, brand, capacity, year_manuf`
     267
     268Candidate Keys: 
     269`vehicle_id` and `plate_num` - which are both unique identifiers
     270
     271Because the keys are single attributes, there are clearly no partial dependencies, so '''R2''' is in '''2NF'''.
     272
     273No attribute functionally determines another non-key attribute - all the determinants are unique identifiers and candidate keys, so no transitive dependencies exist. This means that '''R2''' is in '''3NF'''. We may choose any of these keys as primary keys as well - but we chose a generated one when creating the relational schema because the plate number may change.
     274
     275Since the determinants (`vehicle_id` and `plate_num`) are candidate keys and therefore superkeys, '''BCNF''' is satisfied.
     276
     277=== '''R3: Trip''' ===
     278
     279Attribute set: 
     280`trip_id, date, status, free_seats, route_id`
     281
     282Functional dependencies: 
     283`trip_id → date, status, free_seats, route_id`
     284
     285Candidate Key: 
     286`trip_id`
     287
     288The single attribute candidate key means no partial dependencies and thus '''2NF''' is satisfied.
     289
     290No transitive dependencies because all non-key attributes depend solely on `trip_id`. The relation is in '''3NF'''.
     291
     292Since `trip_id` is the sole determinant and a candidate key and therefore a superkey, it is in '''BCNF'''.
     293
     294=== '''R4: Route''' ===
     295
     296Attribute set: 
     297`route_id, days_active`
     298
     299Functional dependencies: 
     300`route_id → days_active`
     301
     302Candidate Key: 
     303`route_id`
     304
     305Single attribute candidate key, no partial dependencies → '''2NF'''.
     306
     307No transitive dependencies are present, so '''3NF'''.
     308
     309Determinant is candidate key again, meaning it satisfies '''BCNF'''.
     310
     311=== '''R5: Location''' ===
     312
     313Attribute set: 
     314`location_id, loc_name, latitude, longitude`
     315
     316Functional dependencies: 
     317`location_id → loc_name, latitude, longitude`
     318
     319Candidate Key:
     320`location_id`
     321
     322A single attribute key is present, indicating no partial dependencies, so '''2NF''' is satisfied.
     323
     324No transitive dependencies are present, so '''3NF'''.
     325
     326Determinant is candidate key again, meaning it satisfies '''BCNF'''.
     327
     328=== '''R6: Ticket''' ===
     329
     330'''Attributes:''' 
     331`ticket_id, two_way, student_disc, child_disc, child_embg, parent_embg`
     332
     333Functional dependencies:
     334
     335`ticket_id → time_purchased, date_purchased, two_way, student_disc, child_disc, child_embg, parent_embg, payment_id`
     336`child_embg → child_disc`
     337`child_embg → parent_embg`
     338
     339Candidate key:
     340
     341`ticket_id` - since it uniquely determines all other attributes in R5.
     342
     343The primary key is a single attribute (`ticket_id`), so no partial dependencies are possible. This means '''2NF''' is satisfied in R5.
     344
     345Due to the following transitive dependency
     346
     347`ticket_id → child_embg → child_disc`
     348`child_disc` is transitively dependent on `ticket_id` via `child_embg`.
     349
     350'''3NF''' is clearly violated here according to it's definition, so we will decompose again.
     351
     352=== Further Decomposition (to satisfy '''3NF''' and '''BCNF''') ===
     353
     354==== '''R51: Ticket''' ====
     355
     356Attribute set:
     357
     358`ticket_id, two_way, student_disc`
     359
     360Functional Dependencies:
     361
     362`ticket_id → two_way, student_disc
     363
     364Candidate Key: `ticket_id`
     365
     366With the elimination of the transitive dependency present in R5, this relation now satisfied '''3NF'''.
     367
     368This relation also satisfies '''BCNF''' because `ticket_id` is a candidate key in the only present FD.
     369
     370==== '''R52: Child_Ticket''' ====
     371
     372Attribute set:
     373
     374`child_embg, child_discout`
     375
     376Functional Dependencies:
     377
     378`child_embg → child_discount`
     379
     380Candidate Key:
     381
     382`child_embg`
     383
     384This relation satisfied '''3NF''', as there are no transitive dependencies present.
     385
     386This relation also satisfies '''BCNF''' because `child_embg` is a candidate key in the only present FD.
     387
     388=== '''R7: Payment''' ===
     389
     390Attribute set: 
     391`payment_id, total_price, n_tickets, time_purchased, date_purchased`
     392
     393Functional Dependencies: 
     394`payment_id → total_price, n_tickets, time_purchased, date_purchased`
     395
     396Candidate Key: 
     397`payment_id`
     398
     399There is a single attribute key in the only FD, so no partial dependencies are present, meaning '''2NF''' is satisfied.
     400
     401No transitive dependencies are present, so R7 satisfies '''3NF'''.
     402
     403The determinant `payment_id` is candidate key, so '''BCNF''' is satisfied.
     404
     405=== '''R8: Review''' ===
     406
     407Attribute set: 
     408`review_id, review_description, rating`
     409
     410Functional Dependencies: 
     411`review_id → review_description, rating`
     412
     413Candidate Key: 
     414`review_id`
     415
     416Single attribute key → no partial dependencies → '''2NF'''.
     417
     418No transitive dependencies are present, so R7 satisfies '''3NF'''.
     419
     420The determinant `review_id` is candidate key, so '''BCNF''' is satisfied.
     421
     422= 3. Summary =
     423
     424All resulting tables are in BCNF - and we may further semantically infer other tables that are present in the ER schema, but the core tables that have many attributes present in them are confirmed to exist and adhere to the design of the ER diagram.