Changes between Version 4 and Version 5 of Normalization
- Timestamp:
- 03/10/26 00:09:48 (11 days ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Normalization
v4 v5 1 1 = Normalization 2 2 3 == Initial de-normalized relation and functional dependencies 3 === Global set of attributes (single de-normalized relation) 4 5 === Global set of attributes 4 6 This represents the initial de-normalized relation before any normalization is applied. 5 7 Attribute names are made unique to avoid duplication. … … 31 33 } 32 34 }}} 35 33 36 === Functional dependencies (initial set) 37 34 38 User / specialization 35 39 {{{ … … 37 41 FD2: client_id → user_id, is_blocked, blocked_at, blocked_reason 38 42 FD3: admin_id → user_id 39 FD4: owner_id → client_id 40 }}} 43 FD4: owner_id → client_id 44 }}} 45 41 46 Notification 42 47 {{{ 43 48 FD5: notification_id → notification_type, notification_message, notification_created_at, is_read, user_id 44 49 }}} 50 45 51 Vet Clinic / application 46 52 {{{ … … 48 54 FD7: application_id → app_name, app_email, app_phone, app_city, app_address, submitted_at, app_status, reviewed_at, reviewed_by, denial_reason, clinic_id 49 55 }}} 56 50 57 Animal 51 58 {{{ 52 59 FD8: animal_id → animal_name, located_name, species, animal_type, breed, sex, date_of_birth, photo_url, owner_id 53 60 }}} 61 54 62 Listing 55 63 {{{ 56 64 FD9: listing_id → listing_status, listing_created_at, price, listing_description, animal_id, admin_id 57 65 }}} 66 58 67 Appointment 59 68 {{{ 60 69 FD10: appointment_id → date_time, appointment_status, notes, animal_id, clinic_id 61 70 }}} 62 Health Record 71 72 Health Record 63 73 {{{ 64 74 FD11: healthrecord_id → hr_type, hr_description, hr_date, animal_id, appointment_id 65 75 }}} 76 66 77 Review and subtypes 67 78 {{{ 68 79 FD12: review_id → reviewer_id, rating, comment, review_created_at, updated_at, is_deleted 69 FD13: review_id → target_user_id 70 FD14: review_id → target_clinic_id 71 }}} 80 FD13: review_id → target_user_id 81 FD14: review_id → target_clinic_id 82 }}} 83 72 84 Favorite Listing 73 85 {{{ 74 FD15: (saved_user_id, saved_listing_id) → ∅ 75 }}} 86 The relation FavoriteListings has composite key (saved_user_id, saved_listing_id) 87 and no additional non-trivial functional dependencies. 88 }}} 89 76 90 == Candidate keys and primary key selection 77 === Attributes appearing only on the left side 78 {{{ 79 user_id, client_id, admin_id, owner_id, 80 notification_id, clinic_id, application_id, 81 animal_id, listing_id, appointment_id, healthrecord_id, review_id, 82 (saved_user_id, saved_listing_id) 83 }}} 84 [[BR]] 85 Because these identifiers can’t be derived from other attributes, they must be included in any superkey of the fully de-normalized R. 86 [[BR]] 87 A valid superkey for R is: 88 {{{ 89 K = {notification_id, application_id, clinic_id, animal_id, listing_id, 90 appointment_id, healthrecord_id, review_id, saved_user_id, saved_listing_id} 91 }}} 92 Its closure includes all descriptive attributes via FD1–FD15, so it determines all attributes of R. 93 === Closure proof for K 94 {{{ 95 Let K = {notification_id, application_id, clinic_id, animal_id, listing_id, 96 appointment_id, healthrecord_id, review_id, saved_user_id, saved_listing_id} 91 === Determining a candidate key 92 To determine a candidate key of the universal de-normalized relation, we must identify a minimal set of identifiers whose closure determines all attributes in R. 93 {{{#!comment It is not correct to simply collect all identifiers, because some identifiers are functionally determined by others. For example: 94 application_id → clinic_id 95 listing_id → animal_id, admin_id 96 animal_id → owner_id 97 owner_id → client_id 98 healthrecord_id → appointment_id, animal_id 99 appointment_id → clinic_id, animal_id 100 101 102 Therefore, attributes such as clinic_id, animal_id, appointment_id, admin_id, owner_id, and client_id do not need to appear explicitly in the candidate key if they can already be derived from other identifiers. 103 }}} 104 105 A candidate key for the initial de-normalized relation is: 106 107 {{{ 108 K = {notification_id, application_id, listing_id, healthrecord_id, review_id, saved_user_id, saved_listing_id} 109 }}} 110 111 Its closure includes all attributes of R, so it determines the whole relation. 112 113 === Closure proof for K 114 {{{ 115 Let K = {notification_id, application_id, listing_id, healthrecord_id, review_id, saved_user_id, saved_listing_id} 97 116 98 117 Compute K+: … … 107 126 Add: username, email, name, surname, full_name, password_hash, user_created_at 108 127 128 From FD7: 129 application_id → app_name, app_email, app_phone, app_city, app_address, submitted_at, app_status, reviewed_at, reviewed_by, denial_reason, clinic_id 130 Add: app_name, app_email, app_phone, app_city, app_address, submitted_at, app_status, reviewed_at, reviewed_by, denial_reason, clinic_id 131 132 From FD6: 133 clinic_id → clinic_name, clinic_email, clinic_phone, clinic_address, clinic_location, clinic_city 134 Add: clinic_name, clinic_email, clinic_phone, clinic_address, clinic_location, clinic_city 135 109 136 From FD9: 110 137 listing_id → listing_status, listing_created_at, price, listing_description, animal_id, admin_id 111 Add: listing_status, listing_created_at, price, listing_description, admin_id 112 (animal_id already in K) 138 Add: listing_status, listing_created_at, price, listing_description, animal_id, admin_id 113 139 114 140 From FD3: 115 admin_id → user_id (already in K+) 141 admin_id → user_id 142 user_id already belongs to K+ 116 143 117 144 From FD8: … … 125 152 From FD2: 126 153 client_id → user_id, is_blocked, blocked_at, blocked_reason 127 Add: is_blocked, blocked_at, blocked_reason (user_id already in K+) 128 129 From FD6: 130 clinic_id → clinic_name, clinic_email, clinic_phone, clinic_address, clinic_location, clinic_city 131 Add: clinic_name, clinic_email, clinic_phone, clinic_address, clinic_location, clinic_city 132 133 From FD7: 134 application_id → app_name, app_email, app_phone, app_city, app_address, submitted_at, app_status, 135 reviewed_at, reviewed_by, denial_reason, clinic_id 136 Add: app_name, app_email, app_phone, app_city, app_address, submitted_at, app_status, 137 reviewed_at, reviewed_by, denial_reason (clinic_id already in K+) 154 Add: is_blocked, blocked_at, blocked_reason 155 user_id already belongs to K+ 156 157 From FD11: 158 healthrecord_id → hr_type, hr_description, hr_date, animal_id, appointment_id 159 Add: hr_type, hr_description, hr_date, appointment_id 160 animal_id already belongs to K+ 138 161 139 162 From FD10: 140 163 appointment_id → date_time, appointment_status, notes, animal_id, clinic_id 141 Add: date_time, appointment_status, notes (animal_id, clinic_id already in K+) 142 143 From FD11: 144 healthrecord_id → hr_type, hr_description, hr_date, animal_id, appointment_id 145 Add: hr_type, hr_description, hr_date 164 Add: date_time, appointment_status, notes 165 animal_id and clinic_id already belong to K+ 146 166 147 167 From FD12: … … 149 169 Add: reviewer_id, rating, comment, review_created_at, updated_at, is_deleted 150 170 151 From FD13 and FD14: 152 review_id → target_user_id, target_clinic_id 153 Add: target_user_id, target_clinic_id 154 155 FD15 adds no attributes. 156 157 Therefore K+ = R, so K is a superkey. 158 [[BR]] 159 K is also a minimal and a candidate key. 160 }}} 171 From FD13: 172 review_id → target_user_id 173 Add: target_user_id 174 175 From FD14: 176 review_id → target_clinic_id 177 Add: target_clinic_id 178 }}} 179 180 Therefore, K+ = R, so K is a superkey. 181 182 K is also minimal, because removing any attribute from K would cause the loss of one independent component of the universal relation 183 {{{#!comment 184 * without notification_id, notification attributes cannot be derived 185 * without application_id, application attributes cannot be derived 186 * without listing_id, listing attributes cannot be derived 187 * without healthrecord_id, health record attributes cannot be derived 188 * without review_id, review attributes cannot be derived 189 * without saved_user_id or saved_listing_id, the FavoriteListings relationship cannot be identified 190 }}} 191 192 So K is a candidate key. 193 161 194 === Chosen primary key 162 We select the minimal superkey built from "left-only identifiers": 163 [[BR]] 164 Primary key (of the initial de-normalized R): 165 [[BR]] 166 {{{ 167 {notification_id, application_id, clinic_id, animal_id, listing_id, 168 appointment_id, healthrecord_id, review_id, saved_user_id, saved_listing_id} 169 }}} 195 We select the candidate key above as the primary key of the initial de-normalized relation: 196 197 {{{ 198 {notification_id, application_id, listing_id, healthrecord_id, review_id, saved_user_id, saved_listing_id} 199 }}} 200 170 201 == Normal form checks 202 171 203 === Check for 1NF 172 R satisfies 1NF (all attributes are atomic and each row is uniquely identified by the chosen primary key). 204 R satisfies 1NF because all attributes are atomic and there are no repeating groups. 205 Each row is uniquely identified by the chosen composite candidate key. 206 173 207 === Check for 2NF 174 R does not satisfy 2NF, because many non-key attributes depend only on a part of the composite primary key: 175 * listing_id → price, listing_description, listing_status, … 176 * animal_id → species, breed, … 177 * clinic_id → clinic_name, … 178 * user_id → username, email, … 179 * review_id → rating, comment, … 180 * notification_id → message, type, … 181 So we decompose by grouping attributes by the key they depend on. 208 R does not satisfy 2NF because it has a composite primary key and many non-key attributes depend only on part of that key rather than on the whole key. 209 210 Examples: 211 * listing_id → listing_status, listing_created_at, price, listing_description 212 * animal_id → animal_name, species, breed, sex, date_of_birth, photo_url, owner_id 213 * clinic_id → clinic_name, clinic_email, clinic_phone, clinic_address, clinic_location, clinic_city 214 * review_id → reviewer_id, rating, comment, review_created_at, updated_at, is_deleted 215 * notification_id → notification_type, notification_message, notification_created_at, is_read, user_id 216 217 These are partial dependencies on components of the composite key, so R violates 2NF. 218 182 219 == Decomposition to 2NF 220 183 221 === Grouping by determinants (partial dependencies) 184 222 * user_id → username, email, name, surname, full_name, password_hash, user_created_at … … 188 226 * notification_id → notification_type, notification_message, notification_created_at, is_read, user_id 189 227 * clinic_id → clinic_name, clinic_email, clinic_phone, clinic_address, clinic_location, clinic_city 190 * application_id → app_ *, submitted_at, app_status, reviewed_at, reviewed_by, denial_reason, clinic_id191 * animal_id → animal_ *, owner_id192 * listing_id → listing_ *, animal_id, admin_id228 * application_id → app_name, app_email, app_phone, app_city, app_address, submitted_at, app_status, reviewed_at, reviewed_by, denial_reason, clinic_id 229 * animal_id → animal_name, located_name, species, animal_type, breed, sex, date_of_birth, photo_url, owner_id 230 * listing_id → listing_status, listing_created_at, price, listing_description, animal_id, admin_id 193 231 * appointment_id → date_time, appointment_status, notes, animal_id, clinic_id 194 232 * healthrecord_id → hr_type, hr_description, hr_date, animal_id, appointment_id 195 233 * review_id → reviewer_id, rating, comment, review_created_at, updated_at, is_deleted 196 * review_id → target_user_id / target_clinic_id (subtype dependent) 197 * (saved_user_id, saved_listing_id) → / 198 === 2NF relations(first decomposition) 234 * review_id → target_user_id / target_clinic_id 235 * FavoriteListings has key (saved_user_id, saved_listing_id) only 236 237 === 2NF relations (first decomposition) 199 238 * Users(user_id, username, email, name, surname, full_name, password_hash, user_created_at) 200 239 * Clients(client_id, user_id, is_blocked, blocked_at, blocked_reason) … … 210 249 * Reviews(review_id, reviewer_id, rating, comment, review_created_at, updated_at, is_deleted) 211 250 * FavoriteListings(saved_user_id, saved_listing_id) 251 212 252 {{{Lossless join:}}} 213 253 [[BR]] 214 For each decomposition R → R1, R2, the intersection (R1 ∩ R2) is a key of at least one of the relations(user_id is key of Users etc.).So each decomposition step is lossless. 215 {{{Dependency preservation:}}} FD1–FD15 are preserved because each FD now lives entirely inside one of the new relations. 216 == Check for 3NF(and decomposition) 254 The decomposition is lossless because each new relation is formed from a determinant that becomes the key of that relation. The common attributes used in the decomposition are keys in at least one of the resulting relations, so no information is lost. 255 256 {{{Dependency preservation:}}} 257 [[BR]] 258 The decomposition is dependency-preserving because each functional dependency FD1–FD14 is represented fully inside one of the resulting relations. FavoriteListings has only its composite key and no additional non-trivial dependencies. 259 260 == Check for 3NF (and decomposition) 217 261 Now we look for transitive dependencies inside the produced relations. 262 218 263 === Typical transitive issues from the ER 219 * In Clients(client_id, user_id, …) we have client_id → user_id and user_id → (email, username, …) but those user attributes are no longer in Clients, so this is already clean. 220 * In Listings(listing_id, admin_id, animal_id, …) we have admin_id → user_id (FD3) but user data is in Users, not in Listings, so no transitive dependency remains in Listings. 221 * In VetClinicApplications(application_id, clinic_id, …) we have clinic_id → clinic attributes, but clinic attributes are not stored there. 222 === Review subtypes(User Review and Clinic Review) 223 The ER has specialization: 224 [[BR]] 225 Review is generalized into UserReview and ClinicReview. If we keep target_user_id and target_clinic_id inside Reviews, we introduce subtype semantics and potential null-heavy design. 226 [[BR]] 264 * In Clients(client_id, user_id, …), we have client_id → user_id, but user descriptive attributes are stored in Users, so no transitive dependency remains inside Clients. 265 * In Listings(listing_id, admin_id, animal_id, …), we have listing_id → admin_id and globally admin_id → user_id, but user attributes are stored in Users, not in Listings, so no transitive dependency remains inside Listings. 266 * In VetClinicApplications(application_id, clinic_id, …), we have application_id → clinic_id and clinic_id determines clinic attributes, but those attributes are stored in VetClinics, not in VetClinicApplications, so no transitive dependency remains there either. 267 268 === Review subtypes (User Review and Clinic Review) 269 The ER model contains a specialization of Review into UserReview and ClinicReview. 270 If both target_user_id and target_clinic_id are stored in Reviews, the relation mixes subtype-specific semantics and may introduce nulls or invalid combinations. 227 271 So we decompose: 228 [[BR]] 229 {{{UserReviews}}}(review_id, target_user_id) 230 {{{ClinicReviews}}}(review_id, target_clinic_id) 272 {{{ 273 UserReviews(review_id, target_user_id) 274 ClinicReviews(review_id, target_clinic_id) 275 }}} 276 231 277 Now: 232 * review_id determines the base review fields in Reviews (FD12) 233 * review_id determines subtype target in the subtype table (FD13/FD14) 278 * Reviews stores the common review attributes 279 * UserReviews stores the target user of a user review 280 * ClinicReviews stores the target clinic of a clinic review 281 234 282 === Results after 3NF step 235 283 * UserReviews(review_id, target_user_id) 236 284 * ClinicReviews(review_id, target_clinic_id) 237 All relations are now in 3NF. 285 286 After separating subtype-specific attributes into UserReviews and ClinicReviews, all resulting relations satisfy 3NF. 287 238 288 == Check for BCNF 239 For each relation, every non-trivial FD has a determinant that is a superkey: 289 Each resulting relation is also in BCNF because, in every non-trivial functional dependency within a relation, the determinant is a superkey of that relation. 290 240 291 * Users: user_id is key → BCNF 241 292 * Clients: client_id is key → BCNF … … 250 301 * HealthRecords: healthrecord_id is key → BCNF 251 302 * Reviews: review_id is key → BCNF 252 * UserReviews / ClinicReviews: review_id is key → BCNF 303 * UserReviews: review_id is key → BCNF 304 * ClinicReviews: review_id is key → BCNF 253 305 * FavoriteListings: (saved_user_id, saved_listing_id) is key → BCNF 254 306 {{{So the schema satisfies BCNF.}}} 255 == Check for 4NF(multivalued dependencies) 256 No additional multivalued dependencies are identified beyond the M:N relationship FavoriteListing,which is already represented as its own relation FavoriteListings(saved_user_id, saved_listing_id). So no further 4NF decomposition is required. 307 308 == Check for 4NF (multivalued dependencies) 309 No additional non-trivial multivalued dependencies are identified in the resulting schema. 310 The many-to-many relationship FavoriteListing is already decomposed into the separate relation FavoriteListings(saved_user_id, saved_listing_id), so no further decomposition is required for 4NF. 257 311 {{{The schema satisfies 4NF.}}} 312 258 313 == Final relations 259 314 * Users(user_id, username, email, name, surname, full_name, password_hash, user_created_at) … … 261 316 * Admins(admin_id, user_id) 262 317 * Owners(owner_id, client_id) 263 264 318 * Notifications(notification_id, user_id, notification_type, notification_message, notification_created_at, is_read) 265 266 319 * VetClinics(clinic_id, clinic_name, clinic_email, clinic_phone, clinic_address, clinic_location, clinic_city) 267 268 320 * VetClinicApplications(application_id, clinic_id, app_name, app_email, app_phone, app_city, app_address, submitted_at, app_status, reviewed_at, reviewed_by, denial_reason) 269 270 321 * Animals(animal_id, owner_id, animal_name, located_name, species, animal_type, breed, sex, date_of_birth, photo_url) 271 272 322 * Listings(listing_id, admin_id, animal_id, listing_status, listing_created_at, price, listing_description) 273 274 323 * FavoriteListings(saved_user_id, saved_listing_id) 275 276 324 * Appointments(appointment_id, clinic_id, animal_id, date_time, appointment_status, notes) 277 278 325 * HealthRecords(healthrecord_id, appointment_id, animal_id, hr_type, hr_description, hr_date) 279 280 326 * Reviews(review_id, reviewer_id, rating, comment, review_created_at, updated_at, is_deleted) 281 327 * UserReviews(review_id, target_user_id) 282 328 * ClinicReviews(review_id, target_clinic_id) 283 329 284 Business constraint:285 A ClinicReview is allowed only if the reviewer has at least one Appointment at that clinic with status = 'DONE'.286 330 == Conclusion 287 The final normalized relational design matches the Phase 2 ER-based design: 288 * Each entity becomes a relation. 289 * Specializations are represented as subtype relations. 290 * The M:N relationship FavoriteListing is represented as a junction relation FavoriteListings. 291 Therefore, there are no structural differences from Phase 2. 331 The normalization process confirms the same structural design obtained from the ER model in phase 2.
