wiki:Normalization

Phase P5: Normalization

Initial de-normalized relation and functional dependencies

Universal Relation R

All attributes from the ER model are merged into one flat relation. Attribute names are changed where the same concept appears in multiple entities.

R(
  admin_id, admin_name, admin_email, admin_password,
  architect_id, architect_name,
  building_id, building_name, building_address, building_description,
  floor_id, floor_number, layout_image,
  unit_id, unit_number, room_number, floor_area, unit_status,
          unit_price, unit_image, unit_floorplan, vector_image,
  agent_id, agent_name, agent_email, agent_password,
  client_id, client_name, client_email, client_phone,
  timeslot_id, ts_date, ts_time_start, ts_time_end, ts_status,
  inquiry_id, inquiry_message, inquiry_status, inquiry_created_at,
  appointment_id, appt_status
)

Canonical Cover

FD01: admin_id                        → admin_name, admin_email, admin_password
FD02: admin_email                     → admin_id
FD03: architect_id                    → architect_name
FD04: building_id                     → building_name, building_address, building_description, admin_id
FD05: floor_id                        → floor_number, layout_image, building_id
FD06: (building_id, floor_number)     → floor_id
FD07: unit_id                         → unit_number, room_number, floor_area, unit_status, unit_price, unit_image, unit_floorplan, vector_image, floor_id
FD08: (floor_id, unit_number)         → unit_id
FD09: agent_id                        → agent_name, agent_email, agent_password
FD10: agent_email                     → agent_id
FD11: client_id                       → client_name, client_email, client_phone
FD12: client_email                    → client_id
FD13: timeslot_id                     → ts_date, ts_time_start, ts_time_end, ts_status, agent_id
FD14: (agent_id, ts_date, ts_time_start) → timeslot_id
FD15: inquiry_id                      → inquiry_message, inquiry_status, inquiry_created_at, unit_id, client_id, agent_id
FD16: appointment_id                  → appt_status, client_id, unit_id, timeslot_id
FD17: timeslot_id                     → appointment_id

Candidate Keys & Primary Key of R

Identifying the Candidate Key

Before we start the aim of a candidate key is to satisfy the following requirements:

  • The candidate key closure must be able to cover every attribute in the relation R.
  • And for the key to be minimal, we only keep ones that will be able to satisfy the above mentioned rule.

The starting point would be finding every attributes that are never found on the right side in any of the functional dependencies.

This means that these attributes don’t get determined by any other. So due to this we need to include them always in the candidate keys.

  architect_id  — nothing in Fc produces it
  inquiry_id    — nothing in Fc produces it
  appointment_id — appears on the RHS of FD17 (timeslot_id → appointment_id),
                   but timeslot_id itself appears on the RHS of FD16
                   (appointment_id → timeslot_id), meaning the two mutually
                   determine each other. Neither can be derived from anything
                   outside that pair, so one of them must be in every key.

This gives us the base set to test: {architect_id, inquiry_id, appointment_id}.

Testing the coverage of this candidate key results in the complete cover of all attributes in R {architect_id, inquiry_id, appointment_id} is a superkey.

If we try to remove each to get a minimal candidate key the following happens:

By removing architect_id we lose the ability to reach architect_name which is only reachable via FD03 and that requires an architect_id.

By removing inquiry_id we lose the ability to reach inquiry_message, inquiry_status, inquiry_created_at which are only reachable via FD15.

By removing appointment_id we lose the ability to reach multiple attributes (appt_status, timeslot_id, ts_date, ts_time_start, ts_time_end, ts_status).

Another viable candidate key is {architect_id, inquiry_id, timeslot_id}:

From the closure test above we know appointment_id is required to reach appt_status and the timeslot chain. However, appointment_id and timeslot_id mutually determine each other:

  FD16: appointment_id → timeslot_id
  FD17: timeslot_id    → appointment_id

This means timeslot_id can substitute for appointment_id. Testing the closure of {architect_id, inquiry_id, timeslot_id} results in all attributes being covered.

There arent any other substitutable attributes left, so that leaves us with the 2 candidate keys we tested.

A candidate key must functionally determine all attributes in R and be minimal.

Candidate Keys:

  • {architect_id, inquiry_id, appointment_id}
  • {architect_id, inquiry_id, timeslot_id}

Primary Key: {architect_id, inquiry_id, appointment_id}

Normal Form of R Before Decomposition

R is in 1NF but not in 2NF.

  • 1NF: All attributes hold atomic scalar values, and there are no repeating groups. Every row is uniquely identified by the PK. R is in 1NF.
  • 2NF: A relation is in 2NF if it is in 1NF and every non‑key attribute is fully functionally dependent on the entire primary key. R does not conform to 2NF.

Example of a 2NF violation: architect_name is determined by architect_id (FD03), which is only a proper subset of the primary key {architect_id, inquiry_id, appointment_id}. If an architect is associated with multiple appointments, their name is repeated unnecessarily which is a clear partial dependency.


Step-by-step Decomposition

Decompose from 1NF to 2NF

1.1 Extract architect_id → architect_name (FD03)

  • FD causing issue: FD03, where architect_id ⊂ PK.
  • New relation:
    R_Architect( architect_id PK, architect_name )
    FDs: FD03 (architect_id → architect_name)
    Candidate key / PK: {architect_id}
    
  • Remaining relation R1: R minus architect_name, with PK unchanged {architect_id, inquiry_id, appointment_id} and FK architect_id → R_Architect.
  • Lossless: Yes – architect_id is the key of R_Architect, natural join on architect_id reconstructs R.
  • Dependency preservation: FD03 is wholly inside R_Architect; all other FDs remain in R1.

1.2 Extract inquiry_id → inquiry_message, inquiry_status, inquiry_created_at, unit_id, client_id, agent_id (FD15)

  • FD causing issue: FD15, inquiry_id ⊂ PK.
  • From R1, extract:
    R_Inquiry(
      inquiry_id   PK,
      inquiry_message,
      inquiry_status,
      inquiry_created_at,
      unit_id,
      client_id,
      agent_id
    )
    FDs: FD15 (inquiry_id → inquiry_message, inquiry_status,
                             inquiry_created_at, unit_id, client_id, agent_id)
    Candidate key / PK: {inquiry_id}
    
  • Remaining relation R2: R1 minus the dependent attributes {inquiry_message, inquiry_status, inquiry_created_at, unit_id, client_id, agent_id}, PK unchanged, FK inquiry_id → R_Inquiry.
  • Lossless: Yes – inquiry_id is key of R_Inquiry.
  • Dependency preservation: FD15 is in R_Inquiry.

1.3 Extract appointment_id → appt_status, client_id, unit_id, timeslot_id (FD16)

  • FD causing issue: FD16, appointment_id ⊂ PK.
  • From R2, extract:
    R_Appointment(
      appointment_id  PK,
      appt_status,
      client_id,
      unit_id,
      timeslot_id
    )
    FDs: FD16 (appointment_id → appt_status, client_id, unit_id, timeslot_id)
         FD17 (timeslot_id → appointment_id)
    Candidate keys: {appointment_id}, {timeslot_id}
    PK: {appointment_id}
    
  • Remaining relation R3: R2 minus the dependent attributes {appt_status, client_id, unit_id, timeslot_id}, PK unchanged, FK appointment_id → R_Appointment. Note: timeslot_id moves entirely into R_Appointment and does NOT remain in R3; it is a dependent attribute of appointment_id, not a property of the residual relation.
  • Lossless: Yes – appointment_id is key of R_Appointment.
  • Dependency preservation: FD16 and FD17 are in R_Appointment.

1.4 Extract timeslot_id → ts_date, ts_time_start, ts_time_end, ts_status, agent_id (FD13)

timeslot_id is the PK of R_Appointment, and FD13 shows it determines further attributes (ts_date, ts_time_start, ts_time_end, ts_status, agent_id) that were placed in R_Appointment in step 1.3 only as part of the dependent set of FD16. Those timeslot attributes are themselves determined by timeslot_id via FD13 — a partial dependency on timeslot_id (a candidate key of R_Appointment, not the chosen PK) — so they must be extracted into their own relation.

  • From R_Appointment, extract:
    R_Timeslot(
      timeslot_id  PK,
      ts_date,
      ts_time_start,
      ts_time_end,
      ts_status,
      agent_id
    )
    FDs: FD13 (timeslot_id → ts_date, ts_time_start, ts_time_end, ts_status, agent_id)
         FD14 ((agent_id, ts_date, ts_time_start) → timeslot_id)
    Candidate keys: {timeslot_id}, {agent_id, ts_date, ts_time_start}
    PK: {timeslot_id}
    
  • R_Appointment updated: timeslot attributes removed; timeslot_id is retained as FK → R_Timeslot. R_Appointment now holds:
    R_Appointment(
      appointment_id  PK,
      appt_status,
      client_id,
      unit_id,
      timeslot_id     FK → R_Timeslot
    )
    
  • Lossless: Yes – timeslot_id is key of R_Timeslot.
  • Dependency preservation: FD13 and FD14 are in R_Timeslot.

1.5 Extract unit_id → unit_number, room_number, floor_area, unit_status, unit_price, unit_image, unit_floorplan, vector_image, floor_id (FD07)

In R3, unit_id is a non-key attribute (it depends on inquiry_id and appointment_id, both proper subsets of the PK). Its dependent attributes cause a 2NF violation.

  • From R3, extract:
    R_Unit(
      unit_id  PK,
      unit_number,
      room_number,
      floor_area,
      unit_status,
      unit_price,
      unit_image,
      unit_floorplan,
      vector_image,
      floor_id
    )
    FDs: FD07 (unit_id → unit_number, room_number, floor_area, unit_status,
                          unit_price, unit_image, unit_floorplan, vector_image, floor_id)
         FD08 ((floor_id, unit_number) → unit_id)
    Candidate keys: {unit_id}, {floor_id, unit_number}
    PK: {unit_id}
    
  • Remaining relation R4: R3 minus the dependent attributes, PK unchanged, FK unit_id → R_Unit.
  • Lossless: Yes – unit_id is key of R_Unit.
  • Dependency preservation: FD07 and FD08 are in R_Unit.

1.6 Extract floor_id → floor_number, layout_image, building_id (FD05)

In R4, floor_id is a non-key attribute, depending on unit_id which is itself a partial determinant — a 2NF violation.

  • From R4, extract:
    R_Floor(
      floor_id  PK,
      floor_number,
      layout_image,
      building_id
    )
    FDs: FD05 (floor_id → floor_number, layout_image, building_id)
         FD06 ((building_id, floor_number) → floor_id)
    Candidate keys: {floor_id}, {building_id, floor_number}
    PK: {floor_id}
    
  • Remaining relation R5: R4 minus the dependent attributes, PK unchanged, FK floor_id → R_Floor.
  • Lossless: Yes – floor_id is key of R_Floor.
  • Dependency preservation: FD05 and FD06 are in R_Floor.

1.7 Extract building_id → building_name, building_address, building_description, admin_id (FD04)

In R5, building_id is a non-key attribute depending on floor_id — a 2NF violation.

  • From R5, extract:
    R_Building(
      building_id  PK,
      building_name,
      building_address,
      building_description,
      admin_id
    )
    FDs: FD04 (building_id → building_name, building_address,
                             building_description, admin_id)
    Candidate key / PK: {building_id}
    
  • Remaining relation R6: R5 minus the dependent attributes, PK unchanged, FK building_id → R_Building.
  • Lossless: Yes – building_id is key of R_Building.
  • Dependency preservation: FD04 is in R_Building.

1.8 Extract admin_id → admin_name, admin_email, admin_password (FD01, FD02)

In R6, admin_id is a non-key attribute depending on building_id — a 2NF violation.

  • From R6, extract:
    R_Admin(
      admin_id  PK,
      admin_name,
      admin_email UNIQUE,
      admin_password
    )
    FDs: FD01 (admin_id → admin_name, admin_email, admin_password)
         FD02 (admin_email → admin_id)
    Candidate keys: {admin_id}, {admin_email}
    PK: {admin_id}
    
  • Remaining relation R7: R6 minus the dependent attributes, PK unchanged, FK admin_id → R_Admin.
  • Lossless: Yes – admin_id is key of R_Admin.
  • Dependency preservation: FD01 and FD02 are in R_Admin.

1.9 Extract agent_id → agent_name, agent_email, agent_password (FD09, FD10)

agent_id appears as a non-key attribute in both R_Inquiry (step 1.2) and R_Timeslot (step 1.4). At the point of R_Inquiry's extraction, agent_id became a non-key attribute in that relation with FD09 applying to it — a partial dependency on inquiry_id. We extract the agent entity from R_Inquiry:

R_Agent(
  agent_id  PK,
  agent_name,
  agent_email UNIQUE,
  agent_password
)
FDs: FD09 (agent_id → agent_name, agent_email, agent_password)
     FD10 (agent_email → agent_id)
Candidate keys: {agent_id}, {agent_email}
PK: {agent_id}

R_Inquiry and R_Timeslot both retain agent_id as FK → R_Agent.

  • Lossless: Yes – agent_id is key of R_Agent.
  • Dependency preservation: FD09 and FD10 are in R_Agent.

1.10 Extract client_id → client_name, client_email, client_phone (FD11, FD12)

client_id appears as a non-key attribute in both R_Inquiry (step 1.2) and R_Appointment (step 1.3). At the point of R_Inquiry's extraction, client_id became a non-key attribute with FD11 applying — a partial dependency on inquiry_id. We extract the client entity from R_Inquiry:

R_Client(
  client_id  PK,
  client_name,
  client_email UNIQUE,
  client_phone
)
FDs: FD11 (client_id → client_name, client_email, client_phone)
     FD12 (client_email → client_id)
Candidate keys: {client_id}, {client_email}
PK: {client_id}

R_Inquiry and R_Appointment both retain client_id as FK → R_Client.

  • Lossless: Yes – client_id is key of R_Client.
  • Dependency preservation: FD11 and FD12 are in R_Client.

1.11 Materialise the M:N relationship "Designs"

After all extractions, R7 still contains architect_id (part of the original PK) and building_id (now a FK via the floor→building chain). The architect-building relationship is M:N with no non-key attributes of its own. It is extracted as a pure association relation:

R_Designs(
  architect_id  PK, FK → R_Architect,
  building_id   PK, FK → R_Building
)
FDs: none beyond the composite PK
Candidate key / PK: {architect_id, building_id}

After this extraction, the residual relation R8 contains only {architect_id, inquiry_id, appointment_id} — the original PK of the universal relation R. This residual has no non-key attributes of its own; it exists solely as a structural artifact of treating all entities as one flat relation. The three attributes are already individually covered as PKs or FKs in R_Architect, R_Inquiry, and R_Appointment respectively, and no domain rule requires tracking the specific combination of (architect, inquiry, appointment) as a meaningful association. R8 is therefore dissolved; all information and all constraints are captured by the 11 extracted relations.

Result after Step 1: 11 relations, all in 2NF. Lossless join holds because every extraction used a determinant that is the key of the new relation. All functional dependencies FD01-FD17 are preserved locally within exactly one relation.

Decompose from 2NF to 3NF

Each relation from 2NF is checked below.

R_Admin

Attributes : admin_id, admin_name, admin_email, admin_password
FDs        : FD01 (admin_id → admin_name, admin_email, admin_password)
             FD02 (admin_email → admin_id)
CKs        : {admin_id}, {admin_email}
PK         : admin_id

Both FD01 and FD02 have superkeys on the left-hand side. There is no non-key attribute that determines another — admin_email is a full candidate key, not a "middle step" in a transitive chain. 3NF. No decomposition needed.

R_Architect

Attributes : architect_id, architect_name
FDs        : FD03 (architect_id → architect_name)
CKs / PK   : {architect_id}

Only one non-key attribute exists. 3NF. No decomposition needed.

R_Building

Attributes : building_id, building_name, building_address,
             building_description, admin_id
FDs        : FD04 (building_id → building_name, building_address,
                                 building_description, admin_id)
CKs / PK   : {building_id}

admin_id is a non-key attribute (FK to R_Admin). The question is whether admin_id determines any other attribute inside R_Building — it does not. admin_id → admin_name etc. is a fact about R_Admin, not about R_Building. No transitive chain exists within this relation. 3NF. No decomposition needed.

R_Designs

Attributes : architect_id, building_id
FDs        : none beyond the composite PK
CKs / PK   : {architect_id, building_id}

No non-key attributes. 3NF. No decomposition needed.

R_Floor

Attributes : floor_id, floor_number, layout_image, building_id
FDs        : FD05 (floor_id → floor_number, layout_image, building_id)
             FD06 ((building_id, floor_number) → floor_id)
CKs        : {floor_id}, {building_id, floor_number}
PK         : floor_id

building_id is a non-key attribute (FK to R_Building). FD06 uses (building_id, floor_number) together as a composite candidate key, which is a superkey. No transitive chain. 3NF. No decomposition needed.

R_Unit

Attributes : unit_id, unit_number, room_number, floor_area, unit_status,
             unit_price, unit_image, unit_floorplan, vector_image, floor_id
FDs        : FD07 (unit_id → unit_number, room_number, floor_area,
                              unit_status, unit_price, unit_image,
                              unit_floorplan, vector_image, floor_id)
             FD08 ((floor_id, unit_number) → unit_id)
CKs        : {unit_id}, {floor_id, unit_number}
PK         : unit_id

floor_id is a non-key attribute (FK to R_Floor). Within R_Unit, floor_id does not determine room_number, floor_area, or any other attribute on its own — those all require unit_id. FD08 (floor_id, unit_number) is a candidate key. No transitive chain. 3NF. No decomposition needed.

R_Agent

Attributes : agent_id, agent_name, agent_email, agent_password
FDs        : FD09 (agent_id → agent_name, agent_email, agent_password)
             FD10 (agent_email → agent_id)
CKs        : {agent_id}, {agent_email}
PK         : agent_id

Same structure as R_Admin. agent_email is a candidate key, so FD10 does not introduce a transitive dependency — both attributes are superkeys. 3NF. No decomposition needed.

R_Client

Attributes : client_id, client_name, client_email, client_phone
FDs        : FD11 (client_id → client_name, client_email, client_phone)
             FD12 (client_email → client_id)
CKs        : {client_id}, {client_email}
PK         : client_id

client_email is a candidate key, so FD12 is not a transitive dependency. 3NF. No decomposition needed.

R_Timeslot

Attributes : timeslot_id, ts_date, ts_time_start, ts_time_end,
             ts_status, agent_id
FDs        : FD13 (timeslot_id → ts_date, ts_time_start, ts_time_end,
                                  ts_status, agent_id)
             FD14 ((agent_id, ts_date, ts_time_start) → timeslot_id)
CKs        : {timeslot_id}, {agent_id, ts_date, ts_time_start}
PK         : timeslot_id

agent_id is a non-key attribute (FK to R_Agent). The composite (agent_id, ts_date, ts_time_start) is a candidate key, so FD14 has a superkey. No transitive chain. 3NF. No decomposition needed.

R_Inquiry

Attributes : inquiry_id, inquiry_message, inquiry_status,
             inquiry_created_at, unit_id, client_id, agent_id
FDs        : FD15 (inquiry_id → inquiry_message, inquiry_status,
                                inquiry_created_at, unit_id, client_id, agent_id)
CK / PK   : {inquiry_id}

unit_id, client_id, and agent_id are all non-key FK attributes. None of them determines any other attribute within R_Inquiry. All non-key attributes depend directly on inquiry_id. 3NF. No decomposition needed.

R_Appointment

Attributes : appointment_id, appt_status, client_id, unit_id, timeslot_id
FDs        : FD16 (appointment_id → appt_status, client_id, unit_id, timeslot_id)
             FD17 (timeslot_id → appointment_id)
CKs        : {appointment_id}, {timeslot_id}
PK         : appointment_id

client_id and unit_id are FKs that determine nothing within R_Appointment. 3NF. No decomposition needed.


No decomposition was required at the 3NF step. All 11 relations produced in Step 1 are already in 3NF.

Decompose from 3NF to BCNF

Check every FD in each relation:

  • R_Admin: FD01 (admin_id superkey), FD02 (admin_email superkey) – BCNF.
  • R_Architect: FD03 (architect_id superkey) – BCNF.
  • R_Building: FD04 (building_id superkey) – BCNF.
  • R_Designs: no non‑trivial FDs – BCNF.
  • R_Floor: FD05 (floor_id superkey), FD06 (composite CK superkey) – BCNF.
  • R_Unit: FD07 (unit_id superkey), FD08 (composite CK superkey) – BCNF.
  • R_Agent: FD09 (agent_id superkey), FD10 (agent_email superkey) – BCNF.
  • R_Client: FD11 (client_id superkey), FD12 (client_email superkey) – BCNF.
  • R_Timeslot: FD13 (timeslot_id superkey), FD14 (composite CK superkey) – BCNF.
  • R_Inquiry: FD15 (inquiry_id superkey) – BCNF.
  • R_Appointment: FD16 (appointment_id superkey), FD17 (timeslot_id superkey) – BCNF.

All 11 relations are in BCNF. No decomposition needed.

Final Normalized Design & Discussion

Final Relations

Admin(
  admin_id     PK,
  admin_name,
  admin_email  UNIQUE,
  admin_password
)

Architect(
  architect_id  PK,
  architect_name
)

Building(
  building_id           PK,
  building_name,
  building_address,
  building_description,
  admin_id              FK → Admin
)

Designs(
  architect_id  PK, FK → Architect,
  building_id   PK, FK → Building
)

Floor(
  floor_id      PK,
  floor_number,
  layout_image,
  building_id   FK → Building,
  UNIQUE (building_id, floor_number)
)

Unit(
  unit_id       PK,
  unit_number,
  room_number,
  floor_area,
  unit_status,
  unit_price,
  unit_image,
  unit_floorplan,
  vector_image,
  floor_id      FK → Floor,
  UNIQUE (floor_id, unit_number)
)

Agent(
  agent_id      PK,
  agent_name,
  agent_email   UNIQUE,
  agent_password
)

Client(
  client_id     PK,
  client_name,
  client_email  UNIQUE,
  client_phone
)

Timeslot(
  timeslot_id   PK,
  ts_date,
  ts_time_start,
  ts_time_end,
  ts_status,
  agent_id      FK → Agent,
  UNIQUE (agent_id, ts_date, ts_time_start)
)

Inquiry(
  inquiry_id        PK,
  inquiry_message,
  inquiry_status,
  inquiry_created_at,
  unit_id           FK → Unit,
  client_id         FK → Client,
  agent_id          FK → Agent
)

Appointment(
  appointment_id  PK,
  appt_status,
  client_id       FK → Client,
  unit_id         FK → Unit,
  timeslot_id     FK → Timeslot,  UNIQUE
)

Differences compared to Phase 2

Aspect Phase 2 Design P5 Normalized Design
floor.admin_id Present as FK column Not present as it is reachable via floor→building→admin
unit.admin_id Present as FK column Not present as it is reachable via unit→floor→building→admin
appointment.agent_id Present as FK column Not present as it is reachable via appointment→timeslot→agent
Highest normal form 2NF (three transitive dependencies) BCNF / 4NF
Admin reachability Direct FK on floor and unit Via FK chain – no data loss, no redundancy
Agent reachability Direct FK on appointment Via FK chain through timeslot
Update anomaly risk Changing building’s admin forced UPDATE on all floor & unit rows Single UPDATE on Building row only

Changes Required for Subsequent Phases

The P5 normalized design (BCNF/4NF) will be used for all following project phases. The Phase 2 DDL must be updated by:

  1. Dropping column admin_id and its FK constraint from floor.
  2. Dropping column admin_id and its FK constraint from unit.
  3. Dropping column agent_id and its FK constraint from appointment.

All application queries that previously joined directly on those redundant columns must instead go trough the FK chain (e.g., JOIN Floor → Building → Admin).

Last modified 22 hours ago Last modified on 06/15/26 11:08:56
Note: See TracWiki for help on using the wiki.