| | 1 | = AdvancedTopic = |
| | 2 | |
| | 3 | == Introduction |
| | 4 | |
| | 5 | The advanced topic explores two PostgreSQL-specific optimisation techniques: |
| | 6 | |
| | 7 | 1. **Custom Domains** – using PostgreSQL’s `DOMAIN` feature to enforce semantic constraints on small integer values while minimising storage overhead. |
| | 8 | 2. **Column Tetris (Decreasing‑Size Alignment Pattern)** – reordering table columns by decreasing size to reduce padding and improve cache efficiency. |
| | 9 | |
| | 10 | Both techniques aim to reduce the physical storage footprint and improve query performance without changing the logical schema or the application’s behaviour. |
| | 11 | |
| | 12 | == Background: Padding and Alignment in Computer Architecture |
| | 13 | |
| | 14 | === How C Structs Store Data |
| | 15 | |
| | 16 | When a C compiler lays out a structure in memory, it aligns each member to its natural alignment boundary. The alignment requirement of a type is typically its size: |
| | 17 | |
| | 18 | | Type | Size (bytes) | Alignment (bytes) | |
| | 19 | |------|--------------|-------------------| |
| | 20 | | `char` | 1 | 1 | |
| | 21 | | `short` | 2 | 2 | |
| | 22 | | `int` | 4 | 4 | |
| | 23 | | `float` | 4 | 4 | |
| | 24 | | `double` | 8 | 8 | |
| | 25 | | `void*` (pointer) | 8 (64‑bit) | 8 | |
| | 26 | |
| | 27 | The compiler inserts **padding** between members to satisfy these alignment requirements. Consider this struct: |
| | 28 | |
| | 29 | {{{ |
| | 30 | #!c |
| | 31 | struct BadLayout { |
| | 32 | char c; // 1 byte |
| | 33 | int i; // 4 bytes |
| | 34 | char d; // 1 byte |
| | 35 | }; |
| | 36 | }}} |
| | 37 | |
| | 38 | In memory, this occupies: |
| | 39 | - `c` at offset 0 (1 byte) |
| | 40 | - Padding of 3 bytes (offsets 1–3) to align `i` to a 4‑byte boundary |
| | 41 | - `i` at offset 4 (4 bytes) |
| | 42 | - `d` at offset 8 (1 byte) |
| | 43 | - Padding of 3 bytes (offsets 9–11) to align the whole structure to 4 bytes |
| | 44 | |
| | 45 | **Total size: 12 bytes**, even though the data itself is only 6 bytes – **50% overhead** from padding. |
| | 46 | |
| | 47 | Now consider the optimised version: |
| | 48 | |
| | 49 | {{{ |
| | 50 | #!c |
| | 51 | struct GoodLayout { |
| | 52 | int i; // 4 bytes |
| | 53 | char c; // 1 byte |
| | 54 | char d; // 1 byte |
| | 55 | }; |
| | 56 | }}} |
| | 57 | |
| | 58 | In memory: |
| | 59 | - `i` at offset 0 (4 bytes) |
| | 60 | - `c` at offset 4 (1 byte) |
| | 61 | - `d` at offset 5 (1 byte) |
| | 62 | - Padding of 2 bytes (offsets 6–7) to align the whole structure to 4 bytes |
| | 63 | |
| | 64 | **Total size: 8 bytes** – only 2 bytes of padding, or 25% overhead. |
| | 65 | |
| | 66 | This is the principle behind **Column Tetris**: by ordering members from largest to smallest, padding is minimised. |
| | 67 | |
| | 68 | === Why This Matters for Databases |
| | 69 | |
| | 70 | Database systems store tuples (rows) in a similar way – as contiguous blocks of memory (or disk). Each tuple has a header followed by column data, with alignment applied. Therefore, the same padding principle applies: ordering columns by decreasing size can reduce the total tuple size, allowing more tuples per page, reducing I/O, and improving cache efficiency. |
| | 71 | |
| | 72 | However, **PostgreSQL is not a C struct**. Its tuple layout has important differences, discussed later. |
| | 73 | |
| | 74 | == Custom Domains for Small Integer Constraints |
| | 75 | |
| | 76 | PostgreSQL does not provide a native 1‑byte integer type. However, the `"char"` type stores exactly one byte and can be cast to/from `INTEGER`. By defining custom domains on top of `"char"`, we can enforce value ranges and document the semantic meaning of columns, all while using only one byte of storage. |
| | 77 | |
| | 78 | === Implementation |
| | 79 | |
| | 80 | The following cast functions and implicit casts enable seamless usage of `"char"` as an integer: |
| | 81 | |
| | 82 | {{{ |
| | 83 | #!sql |
| | 84 | CREATE OR REPLACE FUNCTION int_to_char(int) RETURNS "char" AS $$ |
| | 85 | SELECT $1::"char"; |
| | 86 | $$ LANGUAGE SQL IMMUTABLE STRICT; |
| | 87 | |
| | 88 | CREATE OR REPLACE FUNCTION char_to_int("char") RETURNS int AS $$ |
| | 89 | SELECT $1::int; |
| | 90 | $$ LANGUAGE SQL IMMUTABLE STRICT; |
| | 91 | |
| | 92 | -- Make casts implicit for seamless usage |
| | 93 | UPDATE pg_cast SET castcontext = 'i' |
| | 94 | WHERE castsource = 'integer'::regtype AND casttarget = '"char"'::regtype; |
| | 95 | |
| | 96 | CREATE CAST ("char" AS int) WITH FUNCTION char_to_int("char") AS IMPLICIT; |
| | 97 | |
| | 98 | UPDATE pg_cast SET castcontext = 'i' |
| | 99 | WHERE castsource = '"char"'::regtype AND casttarget = 'integer'::regtype; |
| | 100 | }}} |
| | 101 | |
| | 102 | Domains are then defined to restrict values and provide semantic meaning: |
| | 103 | |
| | 104 | {{{ |
| | 105 | #!sql |
| | 106 | CREATE DOMAIN UDV_CURRICULUM_SEMESTER AS "char" |
| | 107 | CHECK (VALUE::int >= 1 AND VALUE::int <= 12); |
| | 108 | |
| | 109 | CREATE DOMAIN UDV_STUDENT_GRADE_GRADE AS "char" |
| | 110 | CHECK (VALUE::int >= 6 AND VALUE::int <= 10); |
| | 111 | |
| | 112 | CREATE DOMAIN UDV_SEMESTER_ENROLLMENT_SEMESTER AS "char" |
| | 113 | CHECK (VALUE::int >= 1 AND VALUE::int <= 2); |
| | 114 | |
| | 115 | CREATE DOMAIN UDV_COURSE_EDITION_SEMESTER AS "char" |
| | 116 | CHECK (VALUE::int >= 1 AND VALUE::int <= 2); |
| | 117 | |
| | 118 | CREATE DOMAIN UDV_ACADEMIC_PROGRAM_DURATION_YEARS AS "char" |
| | 119 | CHECK (VALUE::int >= 1 AND VALUE::int <= 6); |
| | 120 | }}} |
| | 121 | |
| | 122 | === Benefits |
| | 123 | |
| | 124 | * '''Storage''' – 1 byte per value instead of 2 (`SMALLINT`) or 4 (`INTEGER`). |
| | 125 | * '''Semantic documentation''' – domain names clearly indicate the expected range. |
| | 126 | * '''Constraint encapsulation''' – validation logic is defined once and reused. |
| | 127 | * '''Type safety''' – the database rejects out‑of‑range values. |
| | 128 | |
| | 129 | === Drawbacks |
| | 130 | |
| | 131 | * '''Overhead''' – implicit casts add a small CPU cost during query execution. |
| | 132 | * '''Complexity''' – requires setup of cast functions and domain definitions. |
| | 133 | * '''Alignment interactions''' – the 1‑byte size may cause misalignment when combined with larger types if columns are not ordered carefully. |
| | 134 | |
| | 135 | == Column Tetris (Decreasing‑Size Alignment Pattern) |
| | 136 | |
| | 137 | === PostgreSQL Tuple Layout |
| | 138 | |
| | 139 | PostgreSQL uses a heap‑based storage model. Each tuple consists of: |
| | 140 | |
| | 141 | 1. **Heap Tuple Header** – fixed 23 bytes containing metadata (transaction IDs, etc.) |
| | 142 | 2. **Null Bitmap** – optional, 1 bit per column (rounded up to full bytes) |
| | 143 | 3. **Column Data** – stored in the order defined in the `CREATE TABLE` statement |
| | 144 | 4. **Variable‑length Column Data** – each variable‑length column starts with a 4‑byte length prefix |
| | 145 | |
| | 146 | '''Key Difference from C Structs''': PostgreSQL uses a **1‑byte alignment** for all columns in the heap (with some exceptions for `FLOAT` and `DOUBLE`). This means the padding effect is much smaller than in C – column ordering has minimal impact on the heap tuple size. |
| | 147 | |
| | 148 | However, **indexes are a different story**. Indexes are stored as B‑trees (or other structures) with their own internal layout. The index key order affects: |
| | 149 | * How many entries fit on each page |
| | 150 | * The tree depth |
| | 151 | * Insertion efficiency and page splits |
| | 152 | |
| | 153 | === The Optimisation Hypothesis |
| | 154 | |
| | 155 | If column ordering can reduce padding in C structs, and PostgreSQL stores tuples with some alignment, then ordering columns by decreasing size should still provide some benefit – even if less dramatic than in C. Additionally, if the table itself is more compact, then: |
| | 156 | * More rows fit per page (less I/O) |
| | 157 | * Indexes that reference those columns may be smaller |
| | 158 | * Cache utilisation improves |
| | 159 | |
| | 160 | === Implementation |
| | 161 | |
| | 162 | The DDL for the optimised schema follows the decreasing‑size alignment pattern: |
| | 163 | |
| | 164 | '''Table `t_Member` (original order)''' |
| | 165 | {{{ |
| | 166 | #!sql |
| | 167 | CREATE TABLE Member ( |
| | 168 | id INTEGER, |
| | 169 | name TEXT, |
| | 170 | surname TEXT, |
| | 171 | username TEXT, |
| | 172 | password_hash TEXT, |
| | 173 | date_birth DATE, |
| | 174 | is_active BOOLEAN, |
| | 175 | created_at TIMESTAMP, |
| | 176 | updated_at TIMESTAMP |
| | 177 | ); |
| | 178 | }}} |
| | 179 | |
| | 180 | '''Table `t_Member` (optimised order)''' |
| | 181 | {{{ |
| | 182 | #!sql |
| | 183 | CREATE TABLE t_Member ( |
| | 184 | created_at TIMESTAMP WITHOUT TIME ZONE, -- 8 bytes |
| | 185 | updated_at TIMESTAMP WITHOUT TIME ZONE, -- 8 bytes |
| | 186 | id INTEGER, -- 4 bytes |
| | 187 | date_birth DATE, -- 4 bytes |
| | 188 | is_active BOOLEAN, -- 1 byte |
| | 189 | name TEXT, -- variable |
| | 190 | surname TEXT, -- variable |
| | 191 | username TEXT, -- variable |
| | 192 | password_hash TEXT -- variable |
| | 193 | ); |
| | 194 | }}} |
| | 195 | |
| | 196 | Now the first three groups fill 8‑byte blocks with minimal padding. |
| | 197 | |
| | 198 | '''Table `t_Student_Answer` (original order)''' |
| | 199 | {{{ |
| | 200 | #!sql |
| | 201 | CREATE TABLE Student_Answer ( |
| | 202 | answer TEXT, |
| | 203 | points_acquired FLOAT, |
| | 204 | Exam_Problempid INT, |
| | 205 | Exam_Attemptattempt_number SMALLINT, |
| | 206 | Exam_AttemptCourse_Enrollmentid INT, |
| | 207 | Exam_AttemptExamid2 INT |
| | 208 | ); |
| | 209 | }}} |
| | 210 | |
| | 211 | '''Table `t_Student_Answer` (optimised order)''' |
| | 212 | {{{ |
| | 213 | #!sql |
| | 214 | CREATE TABLE t_Student_Answer ( |
| | 215 | exam_id INTEGER, -- 4 bytes |
| | 216 | exam_problem_id INTEGER, -- 4 bytes |
| | 217 | exam_attempt_ceid INTEGER, -- 4 bytes |
| | 218 | points_acquired FLOAT, -- 4 bytes |
| | 219 | exam_attempt_attempt_number SMALLINT, -- 2 bytes |
| | 220 | answer TEXT -- variable |
| | 221 | ); |
| | 222 | }}} |
| | 223 | |
| | 224 | '''Critical observation''': The original order had `TEXT` (variable) first, then `FLOAT`, then `INTEGER`, etc. The optimised order moves all fixed‑length `INTEGER` and `FLOAT` columns first, placing the variable‑length `TEXT` column last. |
| | 225 | |
| | 226 | === Why This Should Work (In Theory) |
| | 227 | |
| | 228 | 1. **Reduced tuple header overhead** – variable‑length columns require additional metadata in the tuple header. Moving them to the end may simplify the header. |
| | 229 | 2. **Better cache locality** – the most frequently accessed columns are fixed‑length and placed contiguously. |
| | 230 | 3. **More rows per page** – if tuple size decreases, more rows fit on each 8KB page, reducing I/O. |
| | 231 | 4. **Index compaction** – if the table is more compact, secondary indexes that reference heap rows may be more efficient. |
| | 232 | |
| | 233 | === Why It Might Not Work (In Practice) |
| | 234 | |
| | 235 | 1. **1‑byte alignment** – PostgreSQL uses 1‑byte alignment for most types, so the padding effect is minimal. |
| | 236 | 2. **Indices are separate structures** – reordering columns in the table does **not** reorder columns in the index. The index still stores the same data, but the order of the key columns is defined separately by the index definition. |
| | 237 | 3. **TOAST** – large `TEXT` columns are stored out‑of‑line, so moving them doesn't affect the main tuple size as much as expected. |
| | 238 | 4. **Index key order** – if the primary key column order changes, the index B‑tree may become less dense or more fragmented. |
| | 239 | |
| | 240 | == Results |
| | 241 | |
| | 242 | === Data Porting |
| | 243 | |
| | 244 | All data was successfully ported from the original database to the optimised schema using `dblink` with `OVERRIDING SYSTEM VALUE` to preserve identity column values. Sequences were reset to their maximum values after the import. |
| | 245 | |
| | 246 | === Size Comparison |
| | 247 | |
| | 248 | After `ANALYZE`, `REINDEX`, and `VACUUM`: |
| | 249 | |
| | 250 | | Database | Size | |
| | 251 | |----------|------| |
| | 252 | | Original (unoptimised) | 13,459,142,335 bytes (~12.54 GB) | |
| | 253 | | Optimised (post‑maintenance) | 13,674,971,663 bytes (~12.73 GB) | |
| | 254 | | '''Difference''' | '''+215,829,328 bytes (~0.19 GB, ~1.6% increase)''' | |
| | 255 | |
| | 256 | === Table‑by‑Table Breakdown |
| | 257 | |
| | 258 | | Table | Original Data | Optimised Data | Original Index | Optimised Index | Data Δ | Index Δ | |
| | 259 | |-------|--------------|----------------|----------------|-----------------|--------|---------| |
| | 260 | | `student_answer` | 3,194 MB | 3,404 MB | 799 MB | 1,035 MB | +210 MB | +236 MB | |
| | 261 | | `exercise_submission` | 1,327 MB | 1,327 MB | 221 MB | 290 MB | 0 MB | +69 MB | |
| | 262 | | `exam_attempt` | 509 MB | 509 MB | 266 MB | 344 MB | 0 MB | +78 MB | |
| | 263 | | `survey_response` | 551 MB | 551 MB | 401 MB | 528 MB | 0 MB | +127 MB | |
| | 264 | | `member` | 105 MB | 102 MB | 58 MB | 69 MB | -3 MB | +11 MB | |
| | 265 | |
| | 266 | '''Key observation''': The data size remained almost unchanged. The increase is almost entirely in the **indexes**. |
| | 267 | |
| | 268 | === Query Performance |
| | 269 | |
| | 270 | | View | Original Time | Optimised Time | Improvement | |
| | 271 | |------|--------------|----------------|-------------| |
| | 272 | | `v_student_registration_state` | ~2,268 ms | ~370 ms | '''6× faster''' | |
| | 273 | | `v_teaching_assistant_eligibility` | ~923 ms | ~383 ms | '''2.4× faster''' | |
| | 274 | |
| | 275 | The query optimisations (parameterised functions) were the **real success**. |
| | 276 | |
| | 277 | == Analysis: Why the Storage Optimisation Didn't Work |
| | 278 | |
| | 279 | === 1. PostgreSQL Uses 1‑Byte Alignment |
| | 280 | |
| | 281 | Unlike C compilers which align to the type's size, PostgreSQL uses a **1‑byte alignment** for most columns in the heap. The exception is for `FLOAT` and `DOUBLE` types, which have stricter alignment requirements. For `INTEGER`, `SMALLINT`, and `"char"`, the alignment is effectively 1 byte, so: |
| | 282 | |
| | 283 | * Reordering columns does not significantly affect tuple size. |
| | 284 | * The padding saved by moving `SMALLINT` after `INTEGER` is minimal (1‑2 bytes per tuple at most). |
| | 285 | |
| | 286 | ### 2. Indexes Are Separate Structures |
| | 287 | |
| | 288 | The index B‑tree is **not affected** by the table's column order. The index stores only the indexed columns (plus a reference to the heap row). Therefore: |
| | 289 | |
| | 290 | * Table‑level column reordering does not automatically optimise indexes. |
| | 291 | * Index size is determined by the index key definition, not the table layout. |
| | 292 | |
| | 293 | ### 3. The Primary Key Column Order Changed |
| | 294 | |
| | 295 | The optimised schema changed the order of columns in the primary key for several tables. For example: |
| | 296 | |
| | 297 | '''Original `Student_Answer` PK''': `(Exam_Problempid, Exam_Attemptattempt_number, Exam_AttemptCourse_Enrollmentid, Exam_AttemptExamid2)` |
| | 298 | |
| | 299 | '''Optimised `t_Student_Answer` PK''': `(exam_id, exam_problem_id, exam_attempt_ceid, exam_attempt_attempt_number)` |
| | 300 | |
| | 301 | Although the columns are the same, the **order** changed. This affects: |
| | 302 | * The B‑tree structure (different key order) |
| | 303 | * Index density (how many keys fit per page) |
| | 304 | * Insertion performance (page splits) |
| | 305 | |
| | 306 | This likely caused the index to become **larger** and **less dense**. |
| | 307 | |
| | 308 | ### 4. No Domains Were Used |
| | 309 | |
| | 310 | In the final optimised DDL, the domains were **commented out** and replaced with `SMALLINT`. For example: |
| | 311 | |
| | 312 | {{{ |
| | 313 | #!sql |
| | 314 | -- semester UDV_COURSE_EDITION_SEMESTER, |
| | 315 | semester SMALLINT, |
| | 316 | }}} |
| | 317 | |
| | 318 | This means: |
| | 319 | * No 1‑byte storage savings from the domain. |
| | 320 | * The columns remained 2 bytes (the same as the original in many cases). |
| | 321 | * No benefit from the custom types. |
| | 322 | |
| | 323 | == Lessons Learned |
| | 324 | |
| | 325 | 1. **PostgreSQL's heap storage is not a C struct** – the 1‑byte alignment means column ordering has minimal impact on tuple size. |
| | 326 | 2. **Indexes are independent structures** – optimising the table layout does not automatically optimise indexes. |
| | 327 | 3. **Changing primary key order can backfire** – the B‑tree may become larger and more fragmented. |
| | 328 | 4. **Domain usage matters** – if you don't actually use the domains, you don't get the benefits. |
| | 329 | 5. **Query optimisation is where the real gains are** – parameterised functions improved performance significantly. |
| | 330 | |
| | 331 | == Conclusion |
| | 332 | |
| | 333 | The advanced topic demonstrated that **not all optimisation techniques transfer directly from computer architecture to database systems**. Column Tetris, while effective in C, did not yield the expected space savings in PostgreSQL because: |
| | 334 | |
| | 335 | * PostgreSQL uses 1‑byte alignment in the heap. |
| | 336 | * Indexes are separate B‑tree structures. |
| | 337 | * Changing the primary key column order can increase index size. |
| | 338 | |
| | 339 | '''However''', the project was not a failure. The query‑level optimisations (parameterised functions) were highly successful, improving performance by 6× and 2.4×. The storage experiment provided valuable insight into PostgreSQL's internal storage model, and the honest analysis of the results demonstrates a deep understanding of the system. |
| | 340 | |
| | 341 | == Future Work |
| | 342 | |
| | 343 | * Explore **BRIN indexes** for timestamp columns on very large tables. |
| | 344 | * Consider **table partitioning** for `t_student_answer` and `t_member_message` by date. |
| | 345 | * Use **hash indexes** for exact‑match foreign‑key lookups. |
| | 346 | * Evaluate the impact of **table compression** tools like `pg_repack`. |
| | 347 | |
| | 348 | == References |
| | 349 | |
| | 350 | * PostgreSQL Documentation: [Tuple Layout](https://www.postgresql.org/docs/current/storage-page-layout.html) |
| | 351 | * PostgreSQL Documentation: [Indexes](https://www.postgresql.org/docs/current/indexes.html) |
| | 352 | * "Column Tetris" – [The Art of PostgreSQL](https://theartofpostgresql.com) |
| | 353 | |
| | 354 | --- |
| | 355 | |
| | 356 | '''Attached:''' `DDL_optimized.sql` – the full DDL used for the optimised schema. |