See big-data.md
Start by validating data formats for correctness.
Scripts for this can be found in both the DevOps-Python-tools and DevOps-Bash-tools repos.
Then proceed to more advanced content validation.
'Faker' libraries are available in many languages inspired by the original Perl library.
Perl version: https://metacpan.org/dist/Data-Faker
Java version:
DiUS/java-faker
Python version
joke2k/faker -
comes with a faker command convenient for shell scripts:
Generate 10 fake addresses:
faker -r 10 address- DBT - open-source data pipeline workflow tool
- DVC - data version control
- Informatica - proprietary legacy now available via SaaS, with self-hosted agents on VMs or Kubernetes
- Airbyte - open source self-hosted or SaaS proprietary with 300+ connectors
- Meltano - open-source CLI based ELT
- Apache Camel - open source with 100+ connectors
- Spring Integration - XML config, only use for Spring heavy shops
- Mulesoft - XML config, only use for proprietary connectors
- lightweight enterprise service bus + integration framework
- proprietary connectors
- Anypoint Studio (Eclipse-based IDE)
- Anypoint Enterprise Security - security features, transactions
TODO
See the Diagrams and Visualization docs.
Free for desktop version:
https://www.microsoft.com/en-us/power-platform/products/power-bi/desktop
Ported from private Knowledge Base pages 2016+



