= Phase P5: Normalization = == Initial de-normalized relation and functional dependencies == **Universal Relation R** All attributes from the ER model are merged into one flat relation. Attribute names are changed where the same concept appears in multiple entities. {{{ R( admin_id, admin_name, admin_email, admin_password, architect_id, architect_name, building_id, building_name, building_address, building_description, floor_id, floor_number, layout_image, unit_id, unit_number, room_number, floor_area, unit_status, unit_price, unit_image, unit_floorplan, vector_image, agent_id, agent_name, agent_email, agent_password, client_id, client_name, client_email, client_phone, timeslot_id, ts_date, ts_time_start, ts_time_end, ts_status, inquiry_id, inquiry_message, inquiry_status, inquiry_created_at, appointment_id, appt_status ) }}} === Canonical Cover === {{{ FD01: admin_id → admin_name, admin_email, admin_password FD02: admin_email → admin_id FD03: architect_id → architect_name FD04: building_id → building_name, building_address, building_description, admin_id FD05: floor_id → floor_number, layout_image, building_id FD06: (building_id, floor_number) → floor_id FD07: unit_id → unit_number, room_number, floor_area, unit_status, unit_price, unit_image, unit_floorplan, vector_image, floor_id FD08: (floor_id, unit_number) → unit_id FD09: agent_id → agent_name, agent_email, agent_password FD10: agent_email → agent_id FD11: client_id → client_name, client_email, client_phone FD12: client_email → client_id FD13: timeslot_id → ts_date, ts_time_start, ts_time_end, ts_status, agent_id FD14: (agent_id, ts_date, ts_time_start) → timeslot_id FD15: inquiry_id → inquiry_message, inquiry_status, inquiry_created_at, unit_id, client_id, agent_id FD16: appointment_id → appt_status, client_id, unit_id, timeslot_id FD17: timeslot_id → appointment_id }}} ---- == Candidate Keys & Primary Key of R == === Identifying the Candidate Key === Before we start the aim of a candidate key is to satisfy the following requirements: - The candidate key closure must be able to cover every attribute in the relation R. - And for the key to be minimal, we only keep ones that will be able to satisfy the above mentioned rule. The starting point would be finding every attributes that are never found on the right side in any of the functional dependencies. This means that these attributes don’t get determined by any other. So due to this we need to include them always in the candidate keys. {{{ architect_id — nothing in Fc produces it inquiry_id — nothing in Fc produces it appointment_id — appears on the RHS of FD17 (timeslot_id → appointment_id), but timeslot_id itself appears on the RHS of FD16 (appointment_id → timeslot_id), meaning the two mutually determine each other. Neither can be derived from anything outside that pair, so one of them must be in every key. }}} This gives us the base set to test: {architect_id, inquiry_id, appointment_id}. Testing the coverage of this candidate key results in the complete cover of all attributes in R {architect_id, inquiry_id, appointment_id} is a superkey. If we try to remove each to get a minimal candidate key the following happens: By removing architect_id we lose the ability to reach architect_name which is only reachable via FD03 and that requires an architect_id. By removing inquiry_id we lose the ability to reach inquiry_message, inquiry_status, inquiry_created_at which are only reachable via FD15. By removing appointment_id we lose the ability to reach multiple attributes (appt_status, timeslot_id, ts_date, ts_time_start, ts_time_end, ts_status). Another viable candidate key is {architect_id, inquiry_id, timeslot_id}: From the closure test above we know appointment_id is required to reach appt_status and the timeslot chain. However, appointment_id and timeslot_id mutually determine each other: {{{ FD16: appointment_id → timeslot_id FD17: timeslot_id → appointment_id }}} This means timeslot_id can substitute for appointment_id. Testing the closure of {architect_id, inquiry_id, timeslot_id} results in all attributes being covered. There arent any other substitutable attributes left, so that leaves us with the 2 candidate keys we tested. A candidate key must functionally determine all attributes in R and be minimal. **Candidate Keys:** * `{architect_id, inquiry_id, appointment_id}` * `{architect_id, inquiry_id, timeslot_id}` **Primary Key:** `{architect_id, inquiry_id, appointment_id}` === Normal Form of R Before Decomposition === R is in 1NF but not in 2NF. * **1NF**: All attributes hold atomic scalar values, and there are no repeating groups. Every row is uniquely identified by the PK. '''R is in 1NF.''' * **2NF**: A relation is in 2NF if it is in 1NF and every non‑key attribute is fully functionally dependent on the entire primary key. '''R does not conform to 2NF.''' **Example of a 2NF violation:** `architect_name` is determined by `architect_id` (FD03), which is only a proper subset of the primary key `{architect_id, inquiry_id, appointment_id}`. If an architect is associated with multiple appointments, their name is repeated unnecessarily which is a clear partial dependency. ---- == Step-by-step Decomposition == === Decompose from 1NF to 2NF === ==== 1.1 Extract `architect_id → architect_name` (FD03) ==== * **FD causing issue:** FD03, where `architect_id` ⊂ PK. * **New relation:** {{{ R_Architect( architect_id PK, architect_name ) FDs: FD03 (architect_id → architect_name) Candidate key / PK: {architect_id} }}} * **Remaining relation R1:** R minus `architect_name`, with PK unchanged `{architect_id, inquiry_id, appointment_id}` and FK `architect_id → R_Architect`. * **Lossless:** Yes – `architect_id` is the key of `R_Architect`, natural join on `architect_id` reconstructs R. * **Dependency preservation:** FD03 is wholly inside `R_Architect`; all other FDs remain in R1. ==== 1.2 Extract `inquiry_id → inquiry_message, inquiry_status, inquiry_created_at, unit_id, client_id, agent_id` (FD15) ==== * **FD causing issue:** FD15, `inquiry_id` ⊂ PK. * **From R1**, extract: {{{ R_Inquiry( inquiry_id PK, inquiry_message, inquiry_status, inquiry_created_at, unit_id, client_id, agent_id ) FDs: FD15 (inquiry_id → inquiry_message, inquiry_status, inquiry_created_at, unit_id, client_id, agent_id) Candidate key / PK: {inquiry_id} }}} * **Remaining relation R2:** R1 minus the dependent attributes {inquiry_message, inquiry_status, inquiry_created_at, unit_id, client_id, agent_id}, PK unchanged, FK `inquiry_id → R_Inquiry`. * **Lossless:** Yes – `inquiry_id` is key of `R_Inquiry`. * **Dependency preservation:** FD15 is in `R_Inquiry`. ==== 1.3 Extract `appointment_id → appt_status, client_id, unit_id, timeslot_id` (FD16) ==== * **FD causing issue:** FD16, `appointment_id` ⊂ PK. * **From R2**, extract: {{{ R_Appointment( appointment_id PK, appt_status, client_id, unit_id, timeslot_id ) FDs: FD16 (appointment_id → appt_status, client_id, unit_id, timeslot_id) FD17 (timeslot_id → appointment_id) Candidate keys: {appointment_id}, {timeslot_id} PK: {appointment_id} }}} * **Remaining relation R3:** R2 minus the dependent attributes {appt_status, client_id, unit_id, timeslot_id}, PK unchanged, FK `appointment_id → R_Appointment`. Note: `timeslot_id` moves entirely into R_Appointment and does NOT remain in R3; it is a dependent attribute of appointment_id, not a property of the residual relation. * **Lossless:** Yes – `appointment_id` is key of `R_Appointment`. * **Dependency preservation:** FD16 and FD17 are in `R_Appointment`. ==== 1.4 Extract `timeslot_id → ts_date, ts_time_start, ts_time_end, ts_status, agent_id` (FD13) ==== `timeslot_id` is the PK of R_Appointment, and FD13 shows it determines further attributes (ts_date, ts_time_start, ts_time_end, ts_status, agent_id) that were placed in R_Appointment in step 1.3 only as part of the dependent set of FD16. Those timeslot attributes are themselves determined by timeslot_id via FD13 — a partial dependency on timeslot_id (a candidate key of R_Appointment, not the chosen PK) — so they must be extracted into their own relation. * **From R_Appointment**, extract: {{{ R_Timeslot( timeslot_id PK, ts_date, ts_time_start, ts_time_end, ts_status, agent_id ) FDs: FD13 (timeslot_id → ts_date, ts_time_start, ts_time_end, ts_status, agent_id) FD14 ((agent_id, ts_date, ts_time_start) → timeslot_id) Candidate keys: {timeslot_id}, {agent_id, ts_date, ts_time_start} PK: {timeslot_id} }}} * **R_Appointment updated:** timeslot attributes removed; timeslot_id is retained as FK → R_Timeslot. R_Appointment now holds: {{{ R_Appointment( appointment_id PK, appt_status, client_id, unit_id, timeslot_id FK → R_Timeslot ) }}} * **Lossless:** Yes – `timeslot_id` is key of `R_Timeslot`. * **Dependency preservation:** FD13 and FD14 are in `R_Timeslot`. ==== 1.5 Extract `unit_id → unit_number, room_number, floor_area, unit_status, unit_price, unit_image, unit_floorplan, vector_image, floor_id` (FD07) ==== In R3, `unit_id` is a non-key attribute (it depends on `inquiry_id` and `appointment_id`, both proper subsets of the PK). Its dependent attributes cause a 2NF violation. * **From R3**, extract: {{{ R_Unit( unit_id PK, unit_number, room_number, floor_area, unit_status, unit_price, unit_image, unit_floorplan, vector_image, floor_id ) FDs: FD07 (unit_id → unit_number, room_number, floor_area, unit_status, unit_price, unit_image, unit_floorplan, vector_image, floor_id) FD08 ((floor_id, unit_number) → unit_id) Candidate keys: {unit_id}, {floor_id, unit_number} PK: {unit_id} }}} * **Remaining relation R4:** R3 minus the dependent attributes, PK unchanged, FK `unit_id → R_Unit`. * **Lossless:** Yes – `unit_id` is key of `R_Unit`. * **Dependency preservation:** FD07 and FD08 are in `R_Unit`. ==== 1.6 Extract `floor_id → floor_number, layout_image, building_id` (FD05) ==== In R4, `floor_id` is a non-key attribute, depending on `unit_id` which is itself a partial determinant — a 2NF violation. * **From R4**, extract: {{{ R_Floor( floor_id PK, floor_number, layout_image, building_id ) FDs: FD05 (floor_id → floor_number, layout_image, building_id) FD06 ((building_id, floor_number) → floor_id) Candidate keys: {floor_id}, {building_id, floor_number} PK: {floor_id} }}} * **Remaining relation R5:** R4 minus the dependent attributes, PK unchanged, FK `floor_id → R_Floor`. * **Lossless:** Yes – `floor_id` is key of `R_Floor`. * **Dependency preservation:** FD05 and FD06 are in `R_Floor`. ==== 1.7 Extract `building_id → building_name, building_address, building_description, admin_id` (FD04) ==== In R5, `building_id` is a non-key attribute depending on `floor_id` — a 2NF violation. * **From R5**, extract: {{{ R_Building( building_id PK, building_name, building_address, building_description, admin_id ) FDs: FD04 (building_id → building_name, building_address, building_description, admin_id) Candidate key / PK: {building_id} }}} * **Remaining relation R6:** R5 minus the dependent attributes, PK unchanged, FK `building_id → R_Building`. * **Lossless:** Yes – `building_id` is key of `R_Building`. * **Dependency preservation:** FD04 is in `R_Building`. ==== 1.8 Extract `admin_id → admin_name, admin_email, admin_password` (FD01, FD02) ==== In R6, `admin_id` is a non-key attribute depending on `building_id` — a 2NF violation. * **From R6**, extract: {{{ R_Admin( admin_id PK, admin_name, admin_email UNIQUE, admin_password ) FDs: FD01 (admin_id → admin_name, admin_email, admin_password) FD02 (admin_email → admin_id) Candidate keys: {admin_id}, {admin_email} PK: {admin_id} }}} * **Remaining relation R7:** R6 minus the dependent attributes, PK unchanged, FK `admin_id → R_Admin`. * **Lossless:** Yes – `admin_id` is key of `R_Admin`. * **Dependency preservation:** FD01 and FD02 are in `R_Admin`. ==== 1.9 Extract `agent_id → agent_name, agent_email, agent_password` (FD09, FD10) ==== `agent_id` appears as a non-key attribute in both R_Inquiry (step 1.2) and R_Timeslot (step 1.4). At the point of R_Inquiry's extraction, agent_id became a non-key attribute in that relation with FD09 applying to it — a partial dependency on inquiry_id. We extract the agent entity from R_Inquiry: {{{ R_Agent( agent_id PK, agent_name, agent_email UNIQUE, agent_password ) FDs: FD09 (agent_id → agent_name, agent_email, agent_password) FD10 (agent_email → agent_id) Candidate keys: {agent_id}, {agent_email} PK: {agent_id} }}} R_Inquiry and R_Timeslot both retain `agent_id` as FK → R_Agent. * **Lossless:** Yes – `agent_id` is key of `R_Agent`. * **Dependency preservation:** FD09 and FD10 are in `R_Agent`. ==== 1.10 Extract `client_id → client_name, client_email, client_phone` (FD11, FD12) ==== `client_id` appears as a non-key attribute in both R_Inquiry (step 1.2) and R_Appointment (step 1.3). At the point of R_Inquiry's extraction, client_id became a non-key attribute with FD11 applying — a partial dependency on inquiry_id. We extract the client entity from R_Inquiry: {{{ R_Client( client_id PK, client_name, client_email UNIQUE, client_phone ) FDs: FD11 (client_id → client_name, client_email, client_phone) FD12 (client_email → client_id) Candidate keys: {client_id}, {client_email} PK: {client_id} }}} R_Inquiry and R_Appointment both retain `client_id` as FK → R_Client. * **Lossless:** Yes – `client_id` is key of `R_Client`. * **Dependency preservation:** FD11 and FD12 are in `R_Client`. ==== 1.11 Materialise the M:N relationship "Designs" ==== After all extractions, R7 still contains `architect_id` (part of the original PK) and `building_id` (now a FK via the floor→building chain). The architect-building relationship is M:N with no non-key attributes of its own. It is extracted as a pure association relation: {{{ R_Designs( architect_id PK, FK → R_Architect, building_id PK, FK → R_Building ) FDs: none beyond the composite PK Candidate key / PK: {architect_id, building_id} }}} After this extraction, the residual relation R8 contains only `{architect_id, inquiry_id, appointment_id}` — the original PK of the universal relation R. This residual has no non-key attributes of its own; it exists solely as a structural artifact of treating all entities as one flat relation. The three attributes are already individually covered as PKs or FKs in R_Architect, R_Inquiry, and R_Appointment respectively, and no domain rule requires tracking the specific combination of (architect, inquiry, appointment) as a meaningful association. R8 is therefore dissolved; all information and all constraints are captured by the 11 extracted relations. **Result after Step 1: 11 relations, all in 2NF.** Lossless join holds because every extraction used a determinant that is the key of the new relation. All functional dependencies FD01-FD17 are preserved locally within exactly one relation. === Decompose from 2NF to 3NF === Each relation from 2NF is checked below. ==== R_Admin ==== {{{ Attributes : admin_id, admin_name, admin_email, admin_password FDs : FD01 (admin_id → admin_name, admin_email, admin_password) FD02 (admin_email → admin_id) CKs : {admin_id}, {admin_email} PK : admin_id }}} Both FD01 and FD02 have superkeys on the left-hand side. There is no non-key attribute that determines another — admin_email is a full candidate key, not a "middle step" in a transitive chain. '''3NF. No decomposition needed.''' ==== R_Architect ==== {{{ Attributes : architect_id, architect_name FDs : FD03 (architect_id → architect_name) CKs / PK : {architect_id} }}} Only one non-key attribute exists. '''3NF. No decomposition needed.''' ==== R_Building ==== {{{ Attributes : building_id, building_name, building_address, building_description, admin_id FDs : FD04 (building_id → building_name, building_address, building_description, admin_id) CKs / PK : {building_id} }}} admin_id is a non-key attribute (FK to R_Admin). The question is whether admin_id determines any other attribute inside R_Building — it does not. admin_id → admin_name etc. is a fact about R_Admin, not about R_Building. No transitive chain exists within this relation. '''3NF. No decomposition needed.''' ==== R_Designs ==== {{{ Attributes : architect_id, building_id FDs : none beyond the composite PK CKs / PK : {architect_id, building_id} }}} No non-key attributes. '''3NF. No decomposition needed.''' ==== R_Floor ==== {{{ Attributes : floor_id, floor_number, layout_image, building_id FDs : FD05 (floor_id → floor_number, layout_image, building_id) FD06 ((building_id, floor_number) → floor_id) CKs : {floor_id}, {building_id, floor_number} PK : floor_id }}} building_id is a non-key attribute (FK to R_Building). FD06 uses (building_id, floor_number) together as a composite candidate key, which is a superkey. No transitive chain. '''3NF. No decomposition needed.''' ==== R_Unit ==== {{{ Attributes : unit_id, unit_number, room_number, floor_area, unit_status, unit_price, unit_image, unit_floorplan, vector_image, floor_id FDs : FD07 (unit_id → unit_number, room_number, floor_area, unit_status, unit_price, unit_image, unit_floorplan, vector_image, floor_id) FD08 ((floor_id, unit_number) → unit_id) CKs : {unit_id}, {floor_id, unit_number} PK : unit_id }}} floor_id is a non-key attribute (FK to R_Floor). Within R_Unit, floor_id does not determine room_number, floor_area, or any other attribute on its own — those all require unit_id. FD08 (floor_id, unit_number) is a candidate key. No transitive chain. '''3NF. No decomposition needed.''' ==== R_Agent ==== {{{ Attributes : agent_id, agent_name, agent_email, agent_password FDs : FD09 (agent_id → agent_name, agent_email, agent_password) FD10 (agent_email → agent_id) CKs : {agent_id}, {agent_email} PK : agent_id }}} Same structure as R_Admin. agent_email is a candidate key, so FD10 does not introduce a transitive dependency — both attributes are superkeys. '''3NF. No decomposition needed.''' ==== R_Client ==== {{{ Attributes : client_id, client_name, client_email, client_phone FDs : FD11 (client_id → client_name, client_email, client_phone) FD12 (client_email → client_id) CKs : {client_id}, {client_email} PK : client_id }}} client_email is a candidate key, so FD12 is not a transitive dependency. '''3NF. No decomposition needed.''' ==== R_Timeslot ==== {{{ Attributes : timeslot_id, ts_date, ts_time_start, ts_time_end, ts_status, agent_id FDs : FD13 (timeslot_id → ts_date, ts_time_start, ts_time_end, ts_status, agent_id) FD14 ((agent_id, ts_date, ts_time_start) → timeslot_id) CKs : {timeslot_id}, {agent_id, ts_date, ts_time_start} PK : timeslot_id }}} agent_id is a non-key attribute (FK to R_Agent). The composite (agent_id, ts_date, ts_time_start) is a candidate key, so FD14 has a superkey. No transitive chain. '''3NF. No decomposition needed.''' ==== R_Inquiry ==== {{{ Attributes : inquiry_id, inquiry_message, inquiry_status, inquiry_created_at, unit_id, client_id, agent_id FDs : FD15 (inquiry_id → inquiry_message, inquiry_status, inquiry_created_at, unit_id, client_id, agent_id) CK / PK : {inquiry_id} }}} unit_id, client_id, and agent_id are all non-key FK attributes. None of them determines any other attribute within R_Inquiry. All non-key attributes depend directly on inquiry_id. '''3NF. No decomposition needed.''' ==== R_Appointment ==== {{{ Attributes : appointment_id, appt_status, client_id, unit_id, timeslot_id FDs : FD16 (appointment_id → appt_status, client_id, unit_id, timeslot_id) FD17 (timeslot_id → appointment_id) CKs : {appointment_id}, {timeslot_id} PK : appointment_id }}} client_id and unit_id are FKs that determine nothing within R_Appointment. '''3NF. No decomposition needed.''' ---- '''No decomposition was required at the 3NF step.''' All 11 relations produced in Step 1 are already in 3NF. === Decompose from 3NF to BCNF === Check every FD in each relation: * R_Admin: FD01 (admin_id superkey), FD02 (admin_email superkey) – BCNF. * R_Architect: FD03 (architect_id superkey) – BCNF. * R_Building: FD04 (building_id superkey) – BCNF. * R_Designs: no non‑trivial FDs – BCNF. * R_Floor: FD05 (floor_id superkey), FD06 (composite CK superkey) – BCNF. * R_Unit: FD07 (unit_id superkey), FD08 (composite CK superkey) – BCNF. * R_Agent: FD09 (agent_id superkey), FD10 (agent_email superkey) – BCNF. * R_Client: FD11 (client_id superkey), FD12 (client_email superkey) – BCNF. * R_Timeslot: FD13 (timeslot_id superkey), FD14 (composite CK superkey) – BCNF. * R_Inquiry: FD15 (inquiry_id superkey) – BCNF. * R_Appointment: FD16 (appointment_id superkey), FD17 (timeslot_id superkey) – BCNF. '''All 11 relations are in BCNF. No decomposition needed.''' == Final Normalized Design & Discussion == === Final Relations === {{{ Admin( admin_id PK, admin_name, admin_email UNIQUE, admin_password ) Architect( architect_id PK, architect_name ) Building( building_id PK, building_name, building_address, building_description, admin_id FK → Admin ) Designs( architect_id PK, FK → Architect, building_id PK, FK → Building ) Floor( floor_id PK, floor_number, layout_image, building_id FK → Building, UNIQUE (building_id, floor_number) ) Unit( unit_id PK, unit_number, room_number, floor_area, unit_status, unit_price, unit_image, unit_floorplan, vector_image, floor_id FK → Floor, UNIQUE (floor_id, unit_number) ) Agent( agent_id PK, agent_name, agent_email UNIQUE, agent_password ) Client( client_id PK, client_name, client_email UNIQUE, client_phone ) Timeslot( timeslot_id PK, ts_date, ts_time_start, ts_time_end, ts_status, agent_id FK → Agent, UNIQUE (agent_id, ts_date, ts_time_start) ) Inquiry( inquiry_id PK, inquiry_message, inquiry_status, inquiry_created_at, unit_id FK → Unit, client_id FK → Client, agent_id FK → Agent ) Appointment( appointment_id PK, appt_status, client_id FK → Client, unit_id FK → Unit, timeslot_id FK → Timeslot, UNIQUE ) }}} === Differences compared to Phase 2 === || '''Aspect''' || '''Phase 2 Design''' || '''P5 Normalized Design''' || || `floor.admin_id` || Present as FK column || Not present as it is reachable via floor→building→admin || || `unit.admin_id` || Present as FK column || Not present as it is reachable via unit→floor→building→admin || || `appointment.agent_id` || Present as FK column || Not present as it is reachable via appointment→timeslot→agent || || Highest normal form || 2NF (three transitive dependencies) || BCNF / 4NF || || Admin reachability || Direct FK on floor and unit || Via FK chain – no data loss, no redundancy || || Agent reachability || Direct FK on appointment || Via FK chain through timeslot || || Update anomaly risk || Changing building’s admin forced UPDATE on all floor & unit rows || Single UPDATE on `Building` row only || === Changes Required for Subsequent Phases === The P5 normalized design (BCNF/4NF) will be used for all following project phases. The Phase 2 DDL must be updated by: 1. Dropping column `admin_id` and its FK constraint from `floor`. 2. Dropping column `admin_id` and its FK constraint from `unit`. 3. Dropping column `agent_id` and its FK constraint from `appointment`. All application queries that previously joined directly on those redundant columns must instead go trough the FK chain (e.g., `JOIN Floor → Building → Admin`).