Project Setup

Project Setup

This project sets up the data modelling and day-to-day-operations of theLook e-commerce DWH leveraging:

dbt-core
BigQuery
Cloud Composer
Google Cloud Provider for Terraform

Data Modelling Principles & Guidelines

The DWH transformations of theLook e-commerce data were architected under the following principles and guidelines

Be Analyst Friendly

Analysts shouldn't have to do multiple joins to retrieve meaningful data

Be Subject-Oriented

Tables are organized around major topics of interest, such as customers, products, orders
Each subject represents One-Big-Table with nested arrays and structs
- child objects should never be orphans
- child objects will always be queried within the context of the parent object

Be Relevant

Data should reflect how current underlying platform functions
Data should reflect the topics of interest to business

Be Cost Efficient

Only process pieces of information that have changed
Avoid scanning too much data per run

Be Easy to Maintain

Backfilling historical data should be possible via the scheduled run without the need for extra code adjustments
Changes in data should be easy to trace and audit

Avoid complex dependencies

Processing by topic instead of monolitic schedules of all topics together

Enforcing Code Quality

The following linters are in place

SQL linting with custom configuration for .sqlfluff
YAML linting with custom configuration for .yamllint
Python linting with default configuration via pylint
Markdown linting with default configuration with pymarkdownlint

SQL Linting

To see if your SQL is compliant to the defined standard, you can run the following commands

# lint a specific file
sqlfluff lint path/to/file.sql

# lint a file directory
sqlfluff lint directory/of/sql/files

# let the linter fix your code
sqlfluff fix folder/model.sql

SQL linting (and fixing) is enforced via pre-commit hooks for sqlfluff

YAML Linting

# check which files will be linted by default
yamllint --list-files .

# lint a specific file
yamllint my_file.yml

# OR
yamllint .

Markdown Linting

Linitng rules have been defined in .markdownlint.yaml and are enforced via pymarkdownlint pre-commit hooks

### [pre-commit hooks](https://github.com/pre-commit/pre-commit-hooks)

Pre-commit have been set up in this repo to check and fix for:

- missing lines at the end
- trailing whitespaces
- violations of sql standards
- errors in yaml syntax

### [dbt-checkpoint hooks](https://github.com/dbt-checkpoint/dbt-checkpoint)

dbt dbt-checkpoint hooks have been set up to check that:

- there are no compilation errors

- [no dbt script is directly referring to a table](https://github.com/dbt-checkpoint/dbt-checkpoint/blob/main/HOOKS.md#check-script-has-no-table-name)

- [script contains only existing sources or macros](https://github.com/dbt-checkpoint/dbt-checkpoint/blob/main/HOOKS.md#check-script-ref-and-source)

- [no semi-colons have been forgotten at the end of sql queries](https://github.com/dbt-checkpoint/dbt-checkpoint/blob/main/HOOKS.md#remove-script-semicolon)

- [check source has freshness](https://github.com/dbt-checkpoint/dbt-checkpoint/blob/main/HOOKS.md#check-source-has-freshness)

- [check source has tests](https://github.com/dbt-checkpoint/dbt-checkpoint/blob/main/HOOKS.md#check-source-has-tests)

- [check source has tests by group](https://github.com/dbt-checkpoint/dbt-checkpoint/blob/main/HOOKS.md#check-source-has-tests-by-group)

Hence, when working with the repo, make sure you've got the pre-commit installed so that they run upon your every commit

```bash
# install the githook scripts
pre-commit install

# run against all existing files
pre-commit run --all-files

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
airflow		airflow
dbt		dbt
terraform		terraform
.gitignore		.gitignore
.markdownlint.yaml		.markdownlint.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.sqlfluff		.sqlfluff
.yamllint		.yamllint
README.md		README.md
dbt_project.yml		dbt_project.yml
package-lock.yml		package-lock.yml
packages.yml		packages.yml
profiles.yml		profiles.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Setup

Data Modelling Principles & Guidelines

Be Analyst Friendly

Be Subject-Oriented

Be Relevant

Be Cost Efficient

Be Easy to Maintain

Avoid complex dependencies

Enforcing Code Quality

SQL Linting

YAML Linting

Markdown Linting

Setting up Local Testing Environments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project Setup

Data Modelling Principles & Guidelines

Be Analyst Friendly

Be Subject-Oriented

Be Relevant

Be Cost Efficient

Be Easy to Maintain

Avoid complex dependencies

Enforcing Code Quality

SQL Linting

YAML Linting

Markdown Linting

Setting up Local Testing Environments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages