(DataExpert.io) Bootcamp - Day 3 - Lecture
Today’s lecture is about dimensional additivity and how to build a flexible data model ready for graph database consumption
Index
- Additive VS non-additive dimensions
- The power of Enums
- When should you use flexible data types ?
- Graph data modeling
Additive vs Non-additive dimensions
What makes a dimension additive ?
Additivity refers to whether numerical facts (measures) in a fact table can be meaningfully aggregated across different dimensions.
If you take all the sub-totals and sum them up you should have the total
…(DataExpert.io) Bootcamp - Day 2 - Lecture
Today’s lecture deals with Slowly Changing Dimensions and Idempotency.
Slowly changing dimensions = An attribute that drifts over time
Example: Your favorite food
Idempotency
You need to model slowly dimensions the right way because they impact idempotency.
Idempotent = Denoting an element of a set which is unchanged in value when multiplied or otherwise operated on by itself.
Idempotent pipeline = The ability for your data pipeline to produce the same results whether it’s running in production or in backfill.
…
(DataExpert.io) Bootcamp - Day 1 - Lab
Goal : Create a cumulative table design
Problem overview
We have a table containing the stats for the NBA players, there’s one record for each player’s season.
postgres=# \d player_seasons;
Table public.player_seasons
Column | Type |
---|---|
player_name | text |
age | integer |
height | text |
weight | integer |
college | text |
country | text |
draft_year | text |
draft_round | text |
draft_number | text |
gp | real |
pts | real |
reb | real |
ast | real |
netrtg | real |
oreb_pct | real |
dreb_pct | real |
usg_pct | real |
ts_pct | real |
ast_pct | real |
season | integer |
Indexes:
"player_seasons_pkey" PRIMARY KEY, btree (player_name, season)
(DataExpert.io) Bootcamp - Day 1 - Lecture
Today lecture will deal with complex data types and cumulation
What is a dimension ?
Dimensions are the attributes of an entity
- Some dimensions are identifiers
- Some dimensions are just attributes
Dimensions come in two flavors (generally) :
- Slowly changing (time dependent)
- Makes things harder to model
- Fixed (doesn’t change over time)
Topics of the day (index)
- Knowing your data consumer
- OLTP vs OLAP modelling
- Cumulative table design
- The compactness vs usability tradeoff
- Temporal cardinality explosion
- Run-length encoding compression gotchas
Knowing your consumer
Who is going to consume the data ?
…