| | 1 | Advanced Topics |
| | 2 | |
| | 3 | == Chosen Topic |
| | 4 | |
| | 5 | For the Advanced Topics phase of the project, we chose to implement a ''Data Warehouse'' together with several ''Data Cubes''. |
| | 6 | |
| | 7 | The main motivation behind this choice was our interest in analytical databases and business intelligence systems. While our original relational database was designed for day-to-day operational activities of a library, a data warehouse allows us to analyze historical data and generate meaningful statistics and insights. |
| | 8 | |
| | 9 | Our library management system already contained large amounts of transactional data related to rentals, memberships, libraries, resources, and events, making it a suitable candidate for data warehousing techniques. |
| | 10 | |
| | 11 | == Inspiration and Motivation |
| | 12 | |
| | 13 | During the course we learned that operational databases and analytical databases serve different purposes. |
| | 14 | |
| | 15 | The operational database is optimized for transactions such as: |
| | 16 | |
| | 17 | * Creating memberships |
| | 18 | * Borrowing resources |
| | 19 | * Returning resources |
| | 20 | * Registering users for events |
| | 21 | |
| | 22 | However, answering analytical questions such as: |
| | 23 | |
| | 24 | * Which library has the highest number of rentals? |
| | 25 | * Which books are the most popular? |
| | 26 | * What is the average rental duration? |
| | 27 | * Which event types attract the most participants? |
| | 28 | * How often are resources returned late? |
| | 29 | |
| | 30 | can become expensive and complicated on a transactional database. |
| | 31 | |
| | 32 | For this reason we decided to design and implement a dedicated data warehouse that stores historical and aggregated information optimized for reporting and analysis. |
| | 33 | |
| | 34 | == Data Warehouse Design |
| | 35 | |
| | 36 | The warehouse was built around two major business processes: |
| | 37 | |
| | 38 | * Resource rentals |
| | 39 | * Library events |
| | 40 | |
| | 41 | For each process we created transactional fact tables that store detailed events and several aggregated fact tables that serve as data cubes for analytical processing. |
| | 42 | |
| | 43 | The warehouse follows a dimensional modeling approach consisting of fact tables and dimension tables. |
| | 44 | |
| | 45 | == Rental Analytics |
| | 46 | |
| | 47 | The original system contains a ''Rental'' table where every borrowing transaction is recorded. |
| | 48 | |
| | 49 | Using this information we created a rental-oriented warehouse model that enables analysis of: |
| | 50 | |
| | 51 | * Rental frequency |
| | 52 | * Penalties |
| | 53 | * Late returns |
| | 54 | * Rental duration |
| | 55 | * Library performance |
| | 56 | * Resource popularity |
| | 57 | |
| | 58 | === Rental Warehouse Schema |
| | 59 | |
| | 60 | [[Image(rental_warehouse_schema.png, width=900)]] |
| | 61 | |
| | 62 | ''Figure 1: Rental data warehouse schema.'' |
| | 63 | |
| | 64 | The schema contains: |
| | 65 | |
| | 66 | * Dimension tables describing members, libraries, editions, and dates. |
| | 67 | * A transactional fact table storing individual rental activities. |
| | 68 | * Multiple aggregated fact tables used as analytical data cubes. |
| | 69 | |
| | 70 | === Rental Dimensions |
| | 71 | |
| | 72 | The following dimensions were created: |
| | 73 | |
| | 74 | * '''dim_member''' – information about members and membership plans. |
| | 75 | * '''dim_library''' – information about libraries and locations. |
| | 76 | * '''dim_edition''' – information about editions and resources. |
| | 77 | * '''dim_date''' – calendar dimension used for time-based analysis. |
| | 78 | |
| | 79 | Several dimensions implement Slowly Changing Dimensions (SCD Type 2) in order to preserve historical changes. |
| | 80 | |
| | 81 | === Rental Fact Tables |
| | 82 | |
| | 83 | The rental warehouse contains the following fact tables: |
| | 84 | |
| | 85 | * '''fact_rental''' – transactional fact table containing individual rental records. |
| | 86 | * '''fact_rental_monthly''' – monthly aggregation by member, library, edition, and date. |
| | 87 | * '''fact_rental_library_monthly''' – monthly aggregation by library. |
| | 88 | * '''fact_rental_library_edition_monthly''' – monthly aggregation by library and edition. |
| | 89 | |
| | 90 | These tables provide the foundation for OLAP-style analysis and reporting. |
| | 91 | |
| | 92 | == Event Analytics |
| | 93 | |
| | 94 | Besides rental information, our system also stores information about library events and user registrations. |
| | 95 | |
| | 96 | To support event-related analytics we designed a separate warehouse model focused on event participation and popularity. |
| | 97 | |
| | 98 | The warehouse enables analysis of: |
| | 99 | |
| | 100 | * Event attendance |
| | 101 | * Event popularity |
| | 102 | * Capacity utilization |
| | 103 | * Fully booked events |
| | 104 | * Registration behaviour |
| | 105 | * Event type trends over time |
| | 106 | |
| | 107 | === Event Warehouse Schema |
| | 108 | |
| | 109 | [[Image(event_warehouse_schema.png, width=900)]] |
| | 110 | |
| | 111 | ''Figure 2: Event data warehouse schema.'' |
| | 112 | |
| | 113 | === Event Dimensions |
| | 114 | |
| | 115 | The event warehouse reuses existing dimensions and introduces: |
| | 116 | |
| | 117 | * '''dim_event_type''' – event category dimension. |
| | 118 | |
| | 119 | === Event Fact Tables |
| | 120 | |
| | 121 | * '''fact_event''' – transactional fact table containing detailed event statistics. |
| | 122 | * '''fact_event_type_monthly''' – monthly analytics by event type. |
| | 123 | * '''fact_event_type_library_monthly''' – monthly analytics by event type and library. |
| | 124 | |
| | 125 | == Data Cubes |
| | 126 | |
| | 127 | Based on the fact tables we implemented several analytical cubes. |
| | 128 | |
| | 129 | === Rental Cube |
| | 130 | |
| | 131 | Dimensions: |
| | 132 | |
| | 133 | * Date |
| | 134 | * Library |
| | 135 | * Edition |
| | 136 | * Member |
| | 137 | |
| | 138 | Measures: |
| | 139 | |
| | 140 | * Rental count |
| | 141 | * Total penalties |
| | 142 | * Average penalties |
| | 143 | * Average rental duration |
| | 144 | * Overdue rate |
| | 145 | * Average return delay |
| | 146 | |
| | 147 | === Library Performance Cube |
| | 148 | |
| | 149 | Dimensions: |
| | 150 | |
| | 151 | * Date |
| | 152 | * Library |
| | 153 | |
| | 154 | Measures: |
| | 155 | |
| | 156 | * Rental count |
| | 157 | * Total penalties |
| | 158 | * Average rental duration |
| | 159 | * Overdue rate |
| | 160 | |
| | 161 | === Event Participation Cube |
| | 162 | |
| | 163 | Dimensions: |
| | 164 | |
| | 165 | * Date |
| | 166 | * Event Type |
| | 167 | * Library |
| | 168 | |
| | 169 | Measures: |
| | 170 | |
| | 171 | * Event count |
| | 172 | * Total registrations |
| | 173 | * Capacity utilization |
| | 174 | * Average fill rate |
| | 175 | * Fully booked events |
| | 176 | * Average registration lead time |
| | 177 | |
| | 178 | == ETL Process |
| | 179 | |
| | 180 | The warehouse is populated through a series of ETL procedures implemented in PostgreSQL. |
| | 181 | |
| | 182 | The ETL process performs: |
| | 183 | |
| | 184 | * Extraction of data from the operational database. |
| | 185 | * Transformation of source records into dimensional structures. |
| | 186 | * Handling of Slowly Changing Dimensions. |
| | 187 | * Loading of transactional fact tables. |
| | 188 | * Aggregation of data into analytical fact tables and cubes. |
| | 189 | |
| | 190 | == Conclusion |
| | 191 | |
| | 192 | The implementation of a Data Warehouse and Data Cubes extended the functionality of our library management system beyond transactional processing. |
| | 193 | |
| | 194 | The warehouse enables efficient analytical queries, supports historical reporting, and provides valuable insights regarding rentals, members, libraries, resources, and events. |