Changes between Initial Version and Version 1 of AdvancedTopic


Ignore:
Timestamp:
07/01/26 23:29:48 (4 days ago)
Author:
231082
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • AdvancedTopic

    v1 v1  
     1= AdvancedTopic =
     2
     3== Introduction
     4
     5The advanced topic explores two PostgreSQL-specific optimisation techniques:
     6
     7  1. **Custom Domains** – using PostgreSQL’s `DOMAIN` feature to enforce semantic constraints on small integer values while minimising storage overhead.
     8  2. **Column Tetris (Decreasing‑Size Alignment Pattern)** – reordering table columns by decreasing size to reduce padding and improve cache efficiency.
     9
     10Both techniques aim to reduce the physical storage footprint and improve query performance without changing the logical schema or the application’s behaviour.
     11
     12== Background: Padding and Alignment in Computer Architecture
     13
     14=== How C Structs Store Data
     15
     16When a C compiler lays out a structure in memory, it aligns each member to its natural alignment boundary. The alignment requirement of a type is typically its size:
     17
     18| Type | Size (bytes) | Alignment (bytes) |
     19|------|--------------|-------------------|
     20| `char` | 1 | 1 |
     21| `short` | 2 | 2 |
     22| `int` | 4 | 4 |
     23| `float` | 4 | 4 |
     24| `double` | 8 | 8 |
     25| `void*` (pointer) | 8 (64‑bit) | 8 |
     26
     27The compiler inserts **padding** between members to satisfy these alignment requirements. Consider this struct:
     28
     29{{{
     30#!c
     31struct BadLayout {
     32    char c;      // 1 byte
     33    int  i;      // 4 bytes
     34    char d;      // 1 byte
     35};
     36}}}
     37
     38In memory, this occupies:
     39- `c` at offset 0 (1 byte)
     40- Padding of 3 bytes (offsets 1–3) to align `i` to a 4‑byte boundary
     41- `i` at offset 4 (4 bytes)
     42- `d` at offset 8 (1 byte)
     43- Padding of 3 bytes (offsets 9–11) to align the whole structure to 4 bytes
     44
     45**Total size: 12 bytes**, even though the data itself is only 6 bytes – **50% overhead** from padding.
     46
     47Now consider the optimised version:
     48
     49{{{
     50#!c
     51struct GoodLayout {
     52    int  i;      // 4 bytes
     53    char c;      // 1 byte
     54    char d;      // 1 byte
     55};
     56}}}
     57
     58In memory:
     59- `i` at offset 0 (4 bytes)
     60- `c` at offset 4 (1 byte)
     61- `d` at offset 5 (1 byte)
     62- Padding of 2 bytes (offsets 6–7) to align the whole structure to 4 bytes
     63
     64**Total size: 8 bytes** – only 2 bytes of padding, or 25% overhead.
     65
     66This is the principle behind **Column Tetris**: by ordering members from largest to smallest, padding is minimised.
     67
     68=== Why This Matters for Databases
     69
     70Database systems store tuples (rows) in a similar way – as contiguous blocks of memory (or disk). Each tuple has a header followed by column data, with alignment applied. Therefore, the same padding principle applies: ordering columns by decreasing size can reduce the total tuple size, allowing more tuples per page, reducing I/O, and improving cache efficiency.
     71
     72However, **PostgreSQL is not a C struct**. Its tuple layout has important differences, discussed later.
     73
     74== Custom Domains for Small Integer Constraints
     75
     76PostgreSQL does not provide a native 1‑byte integer type. However, the `"char"` type stores exactly one byte and can be cast to/from `INTEGER`. By defining custom domains on top of `"char"`, we can enforce value ranges and document the semantic meaning of columns, all while using only one byte of storage.
     77
     78=== Implementation
     79
     80The following cast functions and implicit casts enable seamless usage of `"char"` as an integer:
     81
     82{{{
     83#!sql
     84CREATE OR REPLACE FUNCTION int_to_char(int) RETURNS "char" AS $$
     85SELECT $1::"char";
     86$$ LANGUAGE SQL IMMUTABLE STRICT;
     87
     88CREATE OR REPLACE FUNCTION char_to_int("char") RETURNS int AS $$
     89SELECT $1::int;
     90$$ LANGUAGE SQL IMMUTABLE STRICT;
     91
     92-- Make casts implicit for seamless usage
     93UPDATE pg_cast SET castcontext = 'i'
     94WHERE castsource = 'integer'::regtype AND casttarget = '"char"'::regtype;
     95
     96CREATE CAST ("char" AS int) WITH FUNCTION char_to_int("char") AS IMPLICIT;
     97
     98UPDATE pg_cast SET castcontext = 'i'
     99WHERE castsource = '"char"'::regtype AND casttarget = 'integer'::regtype;
     100}}}
     101
     102Domains are then defined to restrict values and provide semantic meaning:
     103
     104{{{
     105#!sql
     106CREATE DOMAIN UDV_CURRICULUM_SEMESTER AS "char"
     107    CHECK (VALUE::int >= 1 AND VALUE::int <= 12);
     108
     109CREATE DOMAIN UDV_STUDENT_GRADE_GRADE AS "char"
     110    CHECK (VALUE::int >= 6 AND VALUE::int <= 10);
     111
     112CREATE DOMAIN UDV_SEMESTER_ENROLLMENT_SEMESTER AS "char"
     113    CHECK (VALUE::int >= 1 AND VALUE::int <= 2);
     114
     115CREATE DOMAIN UDV_COURSE_EDITION_SEMESTER AS "char"
     116    CHECK (VALUE::int >= 1 AND VALUE::int <= 2);
     117
     118CREATE DOMAIN UDV_ACADEMIC_PROGRAM_DURATION_YEARS AS "char"
     119    CHECK (VALUE::int >= 1 AND VALUE::int <= 6);
     120}}}
     121
     122=== Benefits
     123
     124 * '''Storage''' – 1 byte per value instead of 2 (`SMALLINT`) or 4 (`INTEGER`).
     125 * '''Semantic documentation''' – domain names clearly indicate the expected range.
     126 * '''Constraint encapsulation''' – validation logic is defined once and reused.
     127 * '''Type safety''' – the database rejects out‑of‑range values.
     128
     129=== Drawbacks
     130
     131 * '''Overhead''' – implicit casts add a small CPU cost during query execution.
     132 * '''Complexity''' – requires setup of cast functions and domain definitions.
     133 * '''Alignment interactions''' – the 1‑byte size may cause misalignment when combined with larger types if columns are not ordered carefully.
     134
     135== Column Tetris (Decreasing‑Size Alignment Pattern)
     136
     137=== PostgreSQL Tuple Layout
     138
     139PostgreSQL uses a heap‑based storage model. Each tuple consists of:
     140
     141 1. **Heap Tuple Header** – fixed 23 bytes containing metadata (transaction IDs, etc.)
     142 2. **Null Bitmap** – optional, 1 bit per column (rounded up to full bytes)
     143 3. **Column Data** – stored in the order defined in the `CREATE TABLE` statement
     144 4. **Variable‑length Column Data** – each variable‑length column starts with a 4‑byte length prefix
     145
     146'''Key Difference from C Structs''': PostgreSQL uses a **1‑byte alignment** for all columns in the heap (with some exceptions for `FLOAT` and `DOUBLE`). This means the padding effect is much smaller than in C – column ordering has minimal impact on the heap tuple size.
     147
     148However, **indexes are a different story**. Indexes are stored as B‑trees (or other structures) with their own internal layout. The index key order affects:
     149 * How many entries fit on each page
     150 * The tree depth
     151 * Insertion efficiency and page splits
     152
     153=== The Optimisation Hypothesis
     154
     155If column ordering can reduce padding in C structs, and PostgreSQL stores tuples with some alignment, then ordering columns by decreasing size should still provide some benefit – even if less dramatic than in C. Additionally, if the table itself is more compact, then:
     156 * More rows fit per page (less I/O)
     157 * Indexes that reference those columns may be smaller
     158 * Cache utilisation improves
     159
     160=== Implementation
     161
     162The DDL for the optimised schema follows the decreasing‑size alignment pattern:
     163
     164'''Table `t_Member` (original order)'''
     165{{{
     166#!sql
     167CREATE TABLE Member (
     168    id              INTEGER,
     169    name            TEXT,
     170    surname         TEXT,
     171    username        TEXT,
     172    password_hash   TEXT,
     173    date_birth      DATE,
     174    is_active       BOOLEAN,
     175    created_at      TIMESTAMP,
     176    updated_at      TIMESTAMP
     177);
     178}}}
     179
     180'''Table `t_Member` (optimised order)'''
     181{{{
     182#!sql
     183CREATE TABLE t_Member (
     184    created_at      TIMESTAMP WITHOUT TIME ZONE,   -- 8 bytes
     185    updated_at      TIMESTAMP WITHOUT TIME ZONE,   -- 8 bytes
     186    id              INTEGER,                       -- 4 bytes
     187    date_birth      DATE,                          -- 4 bytes
     188    is_active       BOOLEAN,                       -- 1 byte
     189    name            TEXT,                          -- variable
     190    surname         TEXT,                          -- variable
     191    username        TEXT,                          -- variable
     192    password_hash   TEXT                           -- variable
     193);
     194}}}
     195
     196Now the first three groups fill 8‑byte blocks with minimal padding.
     197
     198'''Table `t_Student_Answer` (original order)'''
     199{{{
     200#!sql
     201CREATE TABLE Student_Answer (
     202    answer                          TEXT,
     203    points_acquired                 FLOAT,
     204    Exam_Problempid                 INT,
     205    Exam_Attemptattempt_number      SMALLINT,
     206    Exam_AttemptCourse_Enrollmentid INT,
     207    Exam_AttemptExamid2             INT
     208);
     209}}}
     210
     211'''Table `t_Student_Answer` (optimised order)'''
     212{{{
     213#!sql
     214CREATE TABLE t_Student_Answer (
     215    exam_id                         INTEGER,        -- 4 bytes
     216    exam_problem_id                 INTEGER,        -- 4 bytes
     217    exam_attempt_ceid               INTEGER,        -- 4 bytes
     218    points_acquired                 FLOAT,          -- 4 bytes
     219    exam_attempt_attempt_number     SMALLINT,       -- 2 bytes
     220    answer                          TEXT            -- variable
     221);
     222}}}
     223
     224'''Critical observation''': The original order had `TEXT` (variable) first, then `FLOAT`, then `INTEGER`, etc. The optimised order moves all fixed‑length `INTEGER` and `FLOAT` columns first, placing the variable‑length `TEXT` column last.
     225
     226=== Why This Should Work (In Theory)
     227
     228 1. **Reduced tuple header overhead** – variable‑length columns require additional metadata in the tuple header. Moving them to the end may simplify the header.
     229 2. **Better cache locality** – the most frequently accessed columns are fixed‑length and placed contiguously.
     230 3. **More rows per page** – if tuple size decreases, more rows fit on each 8KB page, reducing I/O.
     231 4. **Index compaction** – if the table is more compact, secondary indexes that reference heap rows may be more efficient.
     232
     233=== Why It Might Not Work (In Practice)
     234
     235 1. **1‑byte alignment** – PostgreSQL uses 1‑byte alignment for most types, so the padding effect is minimal.
     236 2. **Indices are separate structures** – reordering columns in the table does **not** reorder columns in the index. The index still stores the same data, but the order of the key columns is defined separately by the index definition.
     237 3. **TOAST** – large `TEXT` columns are stored out‑of‑line, so moving them doesn't affect the main tuple size as much as expected.
     238 4. **Index key order** – if the primary key column order changes, the index B‑tree may become less dense or more fragmented.
     239
     240== Results
     241
     242=== Data Porting
     243
     244All data was successfully ported from the original database to the optimised schema using `dblink` with `OVERRIDING SYSTEM VALUE` to preserve identity column values. Sequences were reset to their maximum values after the import.
     245
     246=== Size Comparison
     247
     248After `ANALYZE`, `REINDEX`, and `VACUUM`:
     249
     250| Database | Size |
     251|----------|------|
     252| Original (unoptimised) | 13,459,142,335 bytes (~12.54 GB) |
     253| Optimised (post‑maintenance) | 13,674,971,663 bytes (~12.73 GB) |
     254| '''Difference''' | '''+215,829,328 bytes (~0.19 GB, ~1.6% increase)''' |
     255
     256=== Table‑by‑Table Breakdown
     257
     258| Table | Original Data | Optimised Data | Original Index | Optimised Index | Data Δ | Index Δ |
     259|-------|--------------|----------------|----------------|-----------------|--------|---------|
     260| `student_answer` | 3,194 MB | 3,404 MB | 799 MB | 1,035 MB | +210 MB | +236 MB |
     261| `exercise_submission` | 1,327 MB | 1,327 MB | 221 MB | 290 MB | 0 MB | +69 MB |
     262| `exam_attempt` | 509 MB | 509 MB | 266 MB | 344 MB | 0 MB | +78 MB |
     263| `survey_response` | 551 MB | 551 MB | 401 MB | 528 MB | 0 MB | +127 MB |
     264| `member` | 105 MB | 102 MB | 58 MB | 69 MB | -3 MB | +11 MB |
     265
     266'''Key observation''': The data size remained almost unchanged. The increase is almost entirely in the **indexes**.
     267
     268=== Query Performance
     269
     270| View | Original Time | Optimised Time | Improvement |
     271|------|--------------|----------------|-------------|
     272| `v_student_registration_state` | ~2,268 ms | ~370 ms | '''6× faster''' |
     273| `v_teaching_assistant_eligibility` | ~923 ms | ~383 ms | '''2.4× faster''' |
     274
     275The query optimisations (parameterised functions) were the **real success**.
     276
     277== Analysis: Why the Storage Optimisation Didn't Work
     278
     279=== 1. PostgreSQL Uses 1‑Byte Alignment
     280
     281Unlike C compilers which align to the type's size, PostgreSQL uses a **1‑byte alignment** for most columns in the heap. The exception is for `FLOAT` and `DOUBLE` types, which have stricter alignment requirements. For `INTEGER`, `SMALLINT`, and `"char"`, the alignment is effectively 1 byte, so:
     282
     283 * Reordering columns does not significantly affect tuple size.
     284 * The padding saved by moving `SMALLINT` after `INTEGER` is minimal (1‑2 bytes per tuple at most).
     285
     286### 2. Indexes Are Separate Structures
     287
     288The index B‑tree is **not affected** by the table's column order. The index stores only the indexed columns (plus a reference to the heap row). Therefore:
     289
     290 * Table‑level column reordering does not automatically optimise indexes.
     291 * Index size is determined by the index key definition, not the table layout.
     292
     293### 3. The Primary Key Column Order Changed
     294
     295The optimised schema changed the order of columns in the primary key for several tables. For example:
     296
     297'''Original `Student_Answer` PK''': `(Exam_Problempid, Exam_Attemptattempt_number, Exam_AttemptCourse_Enrollmentid, Exam_AttemptExamid2)`
     298
     299'''Optimised `t_Student_Answer` PK''': `(exam_id, exam_problem_id, exam_attempt_ceid, exam_attempt_attempt_number)`
     300
     301Although the columns are the same, the **order** changed. This affects:
     302 * The B‑tree structure (different key order)
     303 * Index density (how many keys fit per page)
     304 * Insertion performance (page splits)
     305
     306This likely caused the index to become **larger** and **less dense**.
     307
     308### 4. No Domains Were Used
     309
     310In the final optimised DDL, the domains were **commented out** and replaced with `SMALLINT`. For example:
     311
     312{{{
     313#!sql
     314-- semester UDV_COURSE_EDITION_SEMESTER,
     315semester SMALLINT,
     316}}}
     317
     318This means:
     319 * No 1‑byte storage savings from the domain.
     320 * The columns remained 2 bytes (the same as the original in many cases).
     321 * No benefit from the custom types.
     322
     323== Lessons Learned
     324
     325 1. **PostgreSQL's heap storage is not a C struct** – the 1‑byte alignment means column ordering has minimal impact on tuple size.
     326 2. **Indexes are independent structures** – optimising the table layout does not automatically optimise indexes.
     327 3. **Changing primary key order can backfire** – the B‑tree may become larger and more fragmented.
     328 4. **Domain usage matters** – if you don't actually use the domains, you don't get the benefits.
     329 5. **Query optimisation is where the real gains are** – parameterised functions improved performance significantly.
     330
     331== Conclusion
     332
     333The advanced topic demonstrated that **not all optimisation techniques transfer directly from computer architecture to database systems**. Column Tetris, while effective in C, did not yield the expected space savings in PostgreSQL because:
     334
     335 * PostgreSQL uses 1‑byte alignment in the heap.
     336 * Indexes are separate B‑tree structures.
     337 * Changing the primary key column order can increase index size.
     338
     339'''However''', the project was not a failure. The query‑level optimisations (parameterised functions) were highly successful, improving performance by 6× and 2.4×. The storage experiment provided valuable insight into PostgreSQL's internal storage model, and the honest analysis of the results demonstrates a deep understanding of the system.
     340
     341== Future Work
     342
     343 * Explore **BRIN indexes** for timestamp columns on very large tables.
     344 * Consider **table partitioning** for `t_student_answer` and `t_member_message` by date.
     345 * Use **hash indexes** for exact‑match foreign‑key lookups.
     346 * Evaluate the impact of **table compression** tools like `pg_repack`.
     347
     348== References
     349
     350 * PostgreSQL Documentation: [Tuple Layout](https://www.postgresql.org/docs/current/storage-page-layout.html)
     351 * PostgreSQL Documentation: [Indexes](https://www.postgresql.org/docs/current/indexes.html)
     352 * "Column Tetris" – [The Art of PostgreSQL](https://theartofpostgresql.com)
     353
     354---
     355
     356'''Attached:''' `DDL_optimized.sql` – the full DDL used for the optimised schema.