Welcome to the Machine Learning Pipeline repository! This project encompasses a complete MLOps training pipeline using open-source technologies, aimed at providing a robust foundation for machine learning workflows. This tool is designed to assist in both educational and production-level ML projects
The purpose of this repository is twofold:
- To serve as a practical MLOps training tool.
- To offer a blueprint for building scalable and maintainable production ML pipelines.
- DVC: For data version control.
- MLflow: For experiment tracking and model registry.
- Apache Airflow: For orchestrating the ML pipeline.
- OmegaConf: For managing configuration.
- Optuna: For hyperparameter optimization.
- Docker: For containerization and isolation of the environment.
- MinIO: For S3-compatible storage.
- Flower: For monitoring Celery workers.
Follow these steps to set up and run the pipeline in your local environment:
git clone https://github.com/AlexandreManai/ML_pipeline.gitCheck Docker's official documentation and install it according to your operating system.
echo -e "AIRFLOW_UID=$(id -u)" > .envpip install docker-composedocker compose up Here are the links to access various components of the pipeline:
- Airflow (workflow management): http://localhost:8080 (default credentials:
airflow/airflow) - JupyterLab (interactive development): http://localhost:8888 (token:
cd4ml) - MLflow (tracking and registry): http://localhost:5000
- MinIO (S3-compatible storage): http://localhost:9001 (credentials:
mlflow_access/mlflow_secret) - Flower (monitoring Celery workers): http://localhost:5555
To stop and clean up resources, run the following commands:
docker compose stop # Stops containers written in docker compose v2
docker compose down # Removes containers, networks, volumes, and images created by 'up' written in docker compose v2
docker rm $(docker ps -aq) # Removes all containers
docker rmi $(docker images -q) # Removes all images
docker volume prune # Removes all unused volumesCheck out the Airflow environment requirements for necessary dependencies.
This project has been tested on macOS and Linux with:
- Python 3.10.6
- Docker version 20.10.10
- Docker-compose version 1.29.2
This README serves as a living document and may be updated as the project evolves. π