wiki:Normalization

Phase P5: Normalization

Initial de-normalized relation and functional dependencies

Universal Relation R

All attributes from the ER model are merged into one flat relation. Attribute names are changed where the same concept appears in multiple entities.

R(
  admin_id, admin_name, admin_email, admin_password,
  architect_id, architect_name,
  building_id, building_name, building_address, building_description,
  floor_id, floor_number, layout_image,
  unit_id, unit_number, room_number, floor_area, unit_status,
          unit_price, unit_image, unit_floorplan, vector_image,
  agent_id, agent_name, agent_email, agent_password,
  client_id, client_name, client_email, client_phone,
  timeslot_id, ts_date, ts_time_start, ts_time_end, ts_status,
  inquiry_id, inquiry_message, inquiry_status, inquiry_created_at,
  appointment_id, appt_status
)

Canonical Cover

FD01: admin_id                        → admin_name, admin_email, admin_password
FD02: admin_email                     → admin_id
FD03: architect_id                    → architect_name
FD04: building_id                     → building_name, building_address, building_description, admin_id
FD05: floor_id                        → floor_number, layout_image, building_id
FD06: (building_id, floor_number)     → floor_id
FD07: unit_id                         → unit_number, room_number, floor_area, unit_status, unit_price, unit_image, unit_floorplan, vector_image, floor_id
FD08: (floor_id, unit_number)         → unit_id
FD09: agent_id                        → agent_name, agent_email, agent_password
FD10: agent_email                     → agent_id
FD11: client_id                       → client_name, client_email, client_phone
FD12: client_email                    → client_id
FD13: timeslot_id                     → ts_date, ts_time_start, ts_time_end, ts_status, agent_id
FD14: (agent_id, ts_date, ts_time_start) → timeslot_id
FD15: inquiry_id                      → inquiry_message, inquiry_status, inquiry_created_at, unit_id, client_id, agent_id
FD16: appointment_id                  → appt_status, client_id, unit_id, timeslot_id
FD17: timeslot_id                     → appointment_id

Candidate Keys & Primary Key of R

Identifying the Candidate Key

A candidate key must functionally determine all attributes in R and be minimal.

Candidate Keys:

  • {architect_id, inquiry_id, appointment_id}
  • {architect_id, inquiry_id, timeslot_id}

Primary Key: {architect_id, inquiry_id, appointment_id}

Normal Form of R Before Decomposition

R is in 1NF but not in 2NF.

  • 1NF: All attributes hold atomic scalar values, and there are no repeating groups. Every row is uniquely identified by the PK. R is in 1NF.
  • 2NF: A relation is in 2NF if it is in 1NF and every non‑key attribute is fully functionally dependent on the entire primary key. R does not conform to 2NF.

Example of a 2NF violation: architect_name is determined by architect_id (FD03), which is only a proper subset of the primary key {architect_id, inquiry_id, appointment_id}. If an architect is associated with multiple appointments, their name is repeated unnecessarily which is a clear partial dependency.


Step-by-step Decomposition

Decompose from 1NF to 2NF

1.1 Extract architect_id → architect_name (FD03)

  • FD causing issue: FD03, where architect_id ⊂ PK.
  • New relation:
    R_Architect( architect_id PK, architect_name )
    FDs: FD03 (architect_id → architect_name)
    Candidate key / PK: {architect_id}
    
  • Remaining relation R1: R minus architect_name, with PK unchanged {architect_id, inquiry_id, appointment_id} and FK architect_id → R_Architect.
  • Lossless: Yes – architect_id is the key of R_Architect, natural join on architect_id reconstructs R.
  • Dependency preservation: FD03 is wholly inside R_Architect; all other FDs remain in R1.

1.2 Extract inquiry_id → inquiry_message, inquiry_status, inquiry_created_at, unit_id, client_id, agent_id (FD15)

  • FD causing issue: FD15, inquiry_id ⊂ PK.
  • From R1, extract:
    R_Inquiry(
      inquiry_id   PK,
      inquiry_message,
      inquiry_status,
      inquiry_created_at,
      unit_id,
      client_id,
      agent_id
    )
    FDs: FD15 (inquiry_id → inquiry_message, inquiry_status,
                             inquiry_created_at, unit_id, client_id, agent_id)
    Candidate key / PK: {inquiry_id}
    
  • Remaining relation R2: R1 minus the dependent attributes {inquiry_message, inquiry_status, inquiry_created_at, unit_id, client_id, agent_id}, PK unchanged, FK inquiry_id → R_Inquiry.
  • Lossless: Yes – inquiry_id is key of R_Inquiry.
  • Dependency preservation: FD15 is in R_Inquiry.

1.3 Extract appointment_id → appt_status, client_id, unit_id, timeslot_id (FD16)

  • FD causing issue: FD16, appointment_id ⊂ PK.
  • From R2, extract:
    R_Appointment(
      appointment_id  PK,
      appt_status,
      client_id,
      unit_id,
      timeslot_id
    )
    FDs: FD16 (appointment_id → appt_status, client_id, unit_id, timeslot_id)
         FD17 (timeslot_id → appointment_id)
    Candidate keys: {appointment_id}, {timeslot_id}
    PK: {appointment_id}
    
  • Remaining relation R3: R2 minus the dependent attributes {appt_status, client_id, unit_id, timeslot_id}, PK unchanged, FK appointment_id → R_Appointment. Note: timeslot_id moves entirely into R_Appointment and does NOT remain in R3; it is a dependent attribute of appointment_id, not a property of the residual relation.
  • Lossless: Yes – appointment_id is key of R_Appointment.
  • Dependency preservation: FD16 and FD17 are in R_Appointment.

1.4 Extract timeslot_id → ts_date, ts_time_start, ts_time_end, ts_status, agent_id (FD13)

timeslot_id is the PK of R_Appointment, and FD13 shows it determines further attributes (ts_date, ts_time_start, ts_time_end, ts_status, agent_id) that were placed in R_Appointment in step 1.3 only as part of the dependent set of FD16. Those timeslot attributes are themselves determined by timeslot_id via FD13 — a partial dependency on timeslot_id (a candidate key of R_Appointment, not the chosen PK) — so they must be extracted into their own relation.

  • From R_Appointment, extract:
    R_Timeslot(
      timeslot_id  PK,
      ts_date,
      ts_time_start,
      ts_time_end,
      ts_status,
      agent_id
    )
    FDs: FD13 (timeslot_id → ts_date, ts_time_start, ts_time_end, ts_status, agent_id)
         FD14 ((agent_id, ts_date, ts_time_start) → timeslot_id)
    Candidate keys: {timeslot_id}, {agent_id, ts_date, ts_time_start}
    PK: {timeslot_id}
    
  • R_Appointment updated: timeslot attributes removed; timeslot_id is retained as FK → R_Timeslot. R_Appointment now holds:
    R_Appointment(
      appointment_id  PK,
      appt_status,
      client_id,
      unit_id,
      timeslot_id     FK → R_Timeslot
    )
    
  • Lossless: Yes – timeslot_id is key of R_Timeslot.
  • Dependency preservation: FD13 and FD14 are in R_Timeslot.

1.5 Extract unit_id → unit_number, room_number, floor_area, unit_status, unit_price, unit_image, unit_floorplan, vector_image, floor_id (FD07)

In R3, unit_id is a non-key attribute (it depends on inquiry_id and appointment_id, both proper subsets of the PK). Its dependent attributes cause a 2NF violation.

  • From R3, extract:
    R_Unit(
      unit_id  PK,
      unit_number,
      room_number,
      floor_area,
      unit_status,
      unit_price,
      unit_image,
      unit_floorplan,
      vector_image,
      floor_id
    )
    FDs: FD07 (unit_id → unit_number, room_number, floor_area, unit_status,
                          unit_price, unit_image, unit_floorplan, vector_image, floor_id)
         FD08 ((floor_id, unit_number) → unit_id)
    Candidate keys: {unit_id}, {floor_id, unit_number}
    PK: {unit_id}
    
  • Remaining relation R4: R3 minus the dependent attributes, PK unchanged, FK unit_id → R_Unit.
  • Lossless: Yes – unit_id is key of R_Unit.
  • Dependency preservation: FD07 and FD08 are in R_Unit.

1.6 Extract floor_id → floor_number, layout_image, building_id (FD05)

In R4, floor_id is a non-key attribute, depending on unit_id which is itself a partial determinant — a 2NF violation.

  • From R4, extract:
    R_Floor(
      floor_id  PK,
      floor_number,
      layout_image,
      building_id
    )
    FDs: FD05 (floor_id → floor_number, layout_image, building_id)
         FD06 ((building_id, floor_number) → floor_id)
    Candidate keys: {floor_id}, {building_id, floor_number}
    PK: {floor_id}
    
  • Remaining relation R5: R4 minus the dependent attributes, PK unchanged, FK floor_id → R_Floor.
  • Lossless: Yes – floor_id is key of R_Floor.
  • Dependency preservation: FD05 and FD06 are in R_Floor.

1.7 Extract building_id → building_name, building_address, building_description, admin_id (FD04)

In R5, building_id is a non-key attribute depending on floor_id — a 2NF violation.

  • From R5, extract:
    R_Building(
      building_id  PK,
      building_name,
      building_address,
      building_description,
      admin_id
    )
    FDs: FD04 (building_id → building_name, building_address,
                             building_description, admin_id)
    Candidate key / PK: {building_id}
    
  • Remaining relation R6: R5 minus the dependent attributes, PK unchanged, FK building_id → R_Building.
  • Lossless: Yes – building_id is key of R_Building.
  • Dependency preservation: FD04 is in R_Building.

1.8 Extract admin_id → admin_name, admin_email, admin_password (FD01, FD02)

In R6, admin_id is a non-key attribute depending on building_id — a 2NF violation.

  • From R6, extract:
    R_Admin(
      admin_id  PK,
      admin_name,
      admin_email UNIQUE,
      admin_password
    )
    FDs: FD01 (admin_id → admin_name, admin_email, admin_password)
         FD02 (admin_email → admin_id)
    Candidate keys: {admin_id}, {admin_email}
    PK: {admin_id}
    
  • Remaining relation R7: R6 minus the dependent attributes, PK unchanged, FK admin_id → R_Admin.
  • Lossless: Yes – admin_id is key of R_Admin.
  • Dependency preservation: FD01 and FD02 are in R_Admin.

1.9 Extract agent_id → agent_name, agent_email, agent_password (FD09, FD10)

agent_id appears as a non-key attribute in both R_Inquiry (step 1.2) and R_Timeslot (step 1.4). At the point of R_Inquiry's extraction, agent_id became a non-key attribute in that relation with FD09 applying to it — a partial dependency on inquiry_id. We extract the agent entity from R_Inquiry:

R_Agent(
  agent_id  PK,
  agent_name,
  agent_email UNIQUE,
  agent_password
)
FDs: FD09 (agent_id → agent_name, agent_email, agent_password)
     FD10 (agent_email → agent_id)
Candidate keys: {agent_id}, {agent_email}
PK: {agent_id}

R_Inquiry and R_Timeslot both retain agent_id as FK → R_Agent.

  • Lossless: Yes – agent_id is key of R_Agent.
  • Dependency preservation: FD09 and FD10 are in R_Agent.

1.10 Extract client_id → client_name, client_email, client_phone (FD11, FD12)

client_id appears as a non-key attribute in both R_Inquiry (step 1.2) and R_Appointment (step 1.3). At the point of R_Inquiry's extraction, client_id became a non-key attribute with FD11 applying — a partial dependency on inquiry_id. We extract the client entity from R_Inquiry:

R_Client(
  client_id  PK,
  client_name,
  client_email UNIQUE,
  client_phone
)
FDs: FD11 (client_id → client_name, client_email, client_phone)
     FD12 (client_email → client_id)
Candidate keys: {client_id}, {client_email}
PK: {client_id}

R_Inquiry and R_Appointment both retain client_id as FK → R_Client.

  • Lossless: Yes – client_id is key of R_Client.
  • Dependency preservation: FD11 and FD12 are in R_Client.

1.11 Materialise the M:N relationship "Designs"

After all extractions, R7 still contains architect_id (part of the original PK) and building_id (now a FK via the floor→building chain). The architect-building relationship is M:N with no non-key attributes of its own. It is extracted as a pure association relation:

R_Designs(
  architect_id  PK, FK → R_Architect,
  building_id   PK, FK → R_Building
)
FDs: none beyond the composite PK
Candidate key / PK: {architect_id, building_id}

After this extraction, the residual relation R8 contains only {architect_id, inquiry_id, appointment_id} — the original PK of the universal relation R. This residual has no non-key attributes of its own; it exists solely as a structural artifact of treating all entities as one flat relation. The three attributes are already individually covered as PKs or FKs in R_Architect, R_Inquiry, and R_Appointment respectively, and no domain rule requires tracking the specific combination of (architect, inquiry, appointment) as a meaningful association. R8 is therefore dissolved; all information and all constraints are captured by the 11 extracted relations.

Result after Step 1: 11 relations, all in 2NF. Lossless join holds because every extraction used a determinant that is the key of the new relation. All functional dependencies FD01-FD17 are preserved locally within exactly one relation.

Decompose from 2NF to 3NF

Each relation from Step 1 is checked below.

R_Admin

Attributes : admin_id, admin_name, admin_email, admin_password
FDs        : FD01 (admin_id → admin_name, admin_email, admin_password)
             FD02 (admin_email → admin_id)
CKs        : {admin_id}, {admin_email}
PK         : admin_id

Both FD01 and FD02 have superkeys on the left-hand side. There is no non-key attribute that determines another — admin_email is a full candidate key, not a "middle step" in a transitive chain. 3NF. No decomposition needed.

R_Architect

Attributes : architect_id, architect_name
FDs        : FD03 (architect_id → architect_name)
CKs / PK   : {architect_id}

Only one non-key attribute exists. 3NF. No decomposition needed.

R_Building

Attributes : building_id, building_name, building_address,
             building_description, admin_id
FDs        : FD04 (building_id → building_name, building_address,
                                 building_description, admin_id)
CKs / PK   : {building_id}

admin_id is a non-key attribute (FK to R_Admin). The question is whether admin_id determines any other attribute inside R_Building — it does not. admin_id → admin_name etc. is a fact about R_Admin, not about R_Building. No transitive chain exists within this relation. 3NF. No decomposition needed.

R_Designs

Attributes : architect_id, building_id
FDs        : none beyond the composite PK
CKs / PK   : {architect_id, building_id}

No non-key attributes. 3NF. No decomposition needed.

R_Floor

Attributes : floor_id, floor_number, layout_image, building_id
FDs        : FD05 (floor_id → floor_number, layout_image, building_id)
             FD06 ((building_id, floor_number) → floor_id)
CKs        : {floor_id}, {building_id, floor_number}
PK         : floor_id

building_id is a non-key attribute (FK to R_Building). FD06 uses (building_id, floor_number) together as a composite candidate key, which is a superkey. No transitive chain. 3NF. No decomposition needed.

R_Unit

Attributes : unit_id, unit_number, room_number, floor_area, unit_status,
             unit_price, unit_image, unit_floorplan, vector_image, floor_id
FDs        : FD07 (unit_id → unit_number, room_number, floor_area,
                              unit_status, unit_price, unit_image,
                              unit_floorplan, vector_image, floor_id)
             FD08 ((floor_id, unit_number) → unit_id)
CKs        : {unit_id}, {floor_id, unit_number}
PK         : unit_id

floor_id is a non-key attribute (FK to R_Floor). Within R_Unit, floor_id does not determine room_number, floor_area, or any other attribute on its own — those all require unit_id. FD08 (floor_id, unit_number) is a candidate key. No transitive chain. 3NF. No decomposition needed.

R_Agent

Attributes : agent_id, agent_name, agent_email, agent_password
FDs        : FD09 (agent_id → agent_name, agent_email, agent_password)
             FD10 (agent_email → agent_id)
CKs        : {agent_id}, {agent_email}
PK         : agent_id

Same structure as R_Admin. agent_email is a candidate key, so FD10 does not introduce a transitive dependency — both attributes are superkeys. 3NF. No decomposition needed.

R_Client

Attributes : client_id, client_name, client_email, client_phone
FDs        : FD11 (client_id → client_name, client_email, client_phone)
             FD12 (client_email → client_id)
CKs        : {client_id}, {client_email}
PK         : client_id

client_email is a candidate key, so FD12 is not a transitive dependency. 3NF. No decomposition needed.

R_Timeslot

Attributes : timeslot_id, ts_date, ts_time_start, ts_time_end,
             ts_status, agent_id
FDs        : FD13 (timeslot_id → ts_date, ts_time_start, ts_time_end,
                                  ts_status, agent_id)
             FD14 ((agent_id, ts_date, ts_time_start) → timeslot_id)
CKs        : {timeslot_id}, {agent_id, ts_date, ts_time_start}
PK         : timeslot_id

agent_id is a non-key attribute (FK to R_Agent). The composite (agent_id, ts_date, ts_time_start) is a candidate key, so FD14 has a superkey. No transitive chain. 3NF. No decomposition needed.

R_Inquiry

Attributes : inquiry_id, inquiry_message, inquiry_status,
             inquiry_created_at, unit_id, client_id, agent_id
FDs        : FD15 (inquiry_id → inquiry_message, inquiry_status,
                                inquiry_created_at, unit_id, client_id, agent_id)
CK / PK   : {inquiry_id}

unit_id, client_id, and agent_id are all non-key FK attributes. None of them determines any other attribute within R_Inquiry. All non-key attributes depend directly on inquiry_id. 3NF. No decomposition needed.

R_Appointment

Attributes : appointment_id, appt_status, client_id, unit_id, timeslot_id
FDs        : FD16 (appointment_id → appt_status, client_id, unit_id, timeslot_id)
             FD17 (timeslot_id → appointment_id)
CKs        : {appointment_id}, {timeslot_id}
PK         : appointment_id

client_id and unit_id are FKs that determine nothing within R_Appointment. 3NF. No decomposition needed.


No decomposition was required at the 3NF step. All 11 relations produced in Step 1 are already in 3NF.

STEP 3 — Decompose from 3NF to BCNF

Check every FD in each relation:

  • R_Admin: FD01 (admin_id superkey), FD02 (admin_email superkey) – BCNF.
  • R_Architect: FD03 (architect_id superkey) – BCNF.
  • R_Building: FD04 (building_id superkey) – BCNF.
  • R_Designs: no non‑trivial FDs – BCNF.
  • R_Floor: FD05 (floor_id superkey), FD06 (composite CK superkey) – BCNF.
  • R_Unit: FD07 (unit_id superkey), FD08 (composite CK superkey) – BCNF.
  • R_Agent: FD09 (agent_id superkey), FD10 (agent_email superkey) – BCNF.
  • R_Client: FD11 (client_id superkey), FD12 (client_email superkey) – BCNF.
  • R_Timeslot: FD13 (timeslot_id superkey), FD14 (composite CK superkey) – BCNF.
  • R_Inquiry: FD15 (inquiry_id superkey) – BCNF.
  • R_Appointment: FD16 (appointment_id superkey), FD17 (timeslot_id superkey) – BCNF.

All 11 relations are in BCNF. No decomposition needed.

4NF Note

No non‑trivial multi‑valued dependencies exist – every attribute is single‑valued per key. Therefore all relations are also in 4NF.

BCNF / 4NF is the highest normal form achieved.


Final Normalized Design & Discussion

Final Relations

Admin(
  admin_id     PK,
  admin_name,
  admin_email  UNIQUE,
  admin_password
)

Architect(
  architect_id  PK,
  architect_name
)

Building(
  building_id           PK,
  building_name,
  building_address,
  building_description,
  admin_id              FK → Admin
)

Designs(
  architect_id  PK, FK → Architect,
  building_id   PK, FK → Building
)

Floor(
  floor_id      PK,
  floor_number,
  layout_image,
  building_id   FK → Building,
  UNIQUE (building_id, floor_number)
)

Unit(
  unit_id       PK,
  unit_number,
  room_number,
  floor_area,
  unit_status,
  unit_price,
  unit_image,
  unit_floorplan,
  vector_image,
  floor_id      FK → Floor,
  UNIQUE (floor_id, unit_number)
)

Agent(
  agent_id      PK,
  agent_name,
  agent_email   UNIQUE,
  agent_password
)

Client(
  client_id     PK,
  client_name,
  client_email  UNIQUE,
  client_phone
)

Timeslot(
  timeslot_id   PK,
  ts_date,
  ts_time_start,
  ts_time_end,
  ts_status,
  agent_id      FK → Agent,
  UNIQUE (agent_id, ts_date, ts_time_start)
)

Inquiry(
  inquiry_id        PK,
  inquiry_message,
  inquiry_status,
  inquiry_created_at,
  unit_id           FK → Unit,
  client_id         FK → Client,
  agent_id          FK → Agent
)

Appointment(
  appointment_id  PK,
  appt_status,
  client_id       FK → Client,
  unit_id         FK → Unit,
  timeslot_id     FK → Timeslot,  UNIQUE
)

Differences vs. Phase 2 Design

Aspect Phase 2 Design P5 Normalized Design
floor.admin_id Present as FK column Not present as it is reachable via floor→building→admin
unit.admin_id Present as FK column Not present as it is reachable via unit→floor→building→admin
appointment.agent_id Present as FK column Not present as it is reachable via appointment→timeslot→agent
Highest normal form 2NF (three transitive dependencies) BCNF / 4NF
Admin reachability Direct FK on floor and unit Via FK chain – no data loss, no redundancy
Agent reachability Direct FK on appointment Via FK chain through timeslot
Update anomaly risk Changing building’s admin forced UPDATE on all floor & unit rows Single UPDATE on Building row only

Changes Required for Subsequent Phases

The P5 normalized design (BCNF/4NF) will be used for all following project phases. The Phase 2 DDL must be updated by:

  1. Dropping column admin_id and its FK constraint from floor.
  2. Dropping column admin_id and its FK constraint from unit.
  3. Dropping column agent_id and its FK constraint from appointment.

All application queries that previously joined directly on those redundant columns must instead go trough the FK chain (e.g., JOIN Floor → Building → Admin).

Last modified 7 hours ago Last modified on 05/25/26 22:42:41
Note: See TracWiki for help on using the wiki.