Skip to content

Latest commit

 

History

History
109 lines (71 loc) · 3.07 KB

File metadata and controls

109 lines (71 loc) · 3.07 KB

Data

Big Data

See big-data.md

Data Validation

Start by validating data formats for correctness.

Scripts for this can be found in both the DevOps-Python-tools and DevOps-Bash-tools repos.

Then proceed to more advanced content validation.

Data Generation

'Faker' libraries are available in many languages inspired by the original Perl library.

Perl version: https://metacpan.org/dist/Data-Faker

Java version: :octocat: DiUS/java-faker

Python version :octocat: joke2k/faker - comes with a faker command convenient for shell scripts:

Generate 10 fake addresses:

faker -r 10 address

Data Integration

  • DBT - open-source data pipeline workflow tool
  • DVC - data version control
  • Informatica - proprietary legacy now available via SaaS, with self-hosted agents on VMs or Kubernetes
  • Airbyte - open source self-hosted or SaaS proprietary with 300+ connectors
  • Meltano - open-source CLI based ELT
  • Apache Camel - open source with 100+ connectors
  • Spring Integration - XML config, only use for Spring heavy shops
  • Mulesoft - XML config, only use for proprietary connectors

Mulesoft

  • lightweight enterprise service bus + integration framework
  • proprietary connectors
  • Anypoint Studio (Eclipse-based IDE)
  • Anypoint Enterprise Security - security features, transactions

Spring Integration

TODO

Data Visualization

See the Diagrams and Visualization docs.

Power BI

Free for desktop version:

https://www.microsoft.com/en-us/power-platform/products/power-bi/desktop

Diagrams

Top 9 Systems Integrations

Encoding vs Encryption vs Tokenization

Encoding vs Encryption vs Tokenization

Memes

Trump Tariff CSV Imports

Trump Tariff CSV Imports

USB vs Floppy

USB vs Floppy

Ported from private Knowledge Base pages 2016+