(DataExpert.io) Bootcamp - Day 3 - Lecture


Today’s lecture is about dimensional additivity and how to build a flexible data model ready for graph database consumption

Index

  • Additive VS non-additive dimensions
  • The power of Enums
  • When should you use flexible data types ?
  • Graph data modeling

Additive vs Non-additive dimensions

What makes a dimension additive ?

Additivity refers to whether numerical facts (measures) in a fact table can be meaningfully aggregated across different dimensions.

If you take all the sub-totals and sum them up you should have the total

Read more ⟶

(DataExpert.io) Bootcamp - Day 2 - Lecture


Today’s lecture deals with Slowly Changing Dimensions and Idempotency.

Slowly changing dimensions = An attribute that drifts over time

Example: Your favorite food

Idempotency

You need to model slowly dimensions the right way because they impact idempotency.

Idempotent = Denoting an element of a set which is unchanged in value when multiplied or otherwise operated on by itself.

Idempotent pipeline = The ability for your data pipeline to produce the same results whether it’s running in production or in backfill.

Read more ⟶

(DataExpert.io) Bootcamp - Day 1 - Lab


Goal : Create a cumulative table design

Problem overview

We have a table containing the stats for the NBA players, there’s one record for each player’s season.

postgres=# \d player_seasons;

Table public.player_seasons

ColumnType
player_nametext
ageinteger
heighttext
weightinteger
collegetext
countrytext
draft_yeartext
draft_roundtext
draft_numbertext
gpreal
ptsreal
rebreal
astreal
netrtgreal
oreb_pctreal
dreb_pctreal
usg_pctreal
ts_pctreal
ast_pctreal
seasoninteger

Indexes: "player_seasons_pkey" PRIMARY KEY, btree (player_name, season)

Read more ⟶

(DataExpert.io) Bootcamp - Day 1 - Lecture


Today lecture will deal with complex data types and cumulation

What is a dimension ?

Dimensions are the attributes of an entity

  • Some dimensions are identifiers
  • Some dimensions are just attributes

Dimensions come in two flavors (generally) :

  • Slowly changing (time dependent)
    • Makes things harder to model
  • Fixed (doesn’t change over time)

Topics of the day (index)

  • Knowing your data consumer
  • OLTP vs OLAP modelling
  • Cumulative table design
  • The compactness vs usability tradeoff
  • Temporal cardinality explosion
  • Run-length encoding compression gotchas

Knowing your consumer

Who is going to consume the data ?

Read more ⟶