A highly efficient one-way sync tool that securely pulls files from an Azure Blob Storage container to your local machine.
This tool strictly downloads data from Azure to your local machine. It will never upload local changes back to Azure, and it will never delete files from your Azure container. It only downloads new or changed files based on an ETag/Last-Modified manifest, making it safe to run repeatedly.
- Scheduled Backups: Run via a cron job to keep a localized, up-to-date replica of a cloud container.
- Large Dataset Retrieval: Sync machine learning training sets or large media libraries before processing them locally.
- Offline Content Delivery: Pull down localized media and configuration files for edge devices, kiosks, or disconnected laptops.
- Cloud Migration: Use this as the "download" step before uploading the retrieved files to another cloud provider (e.g., AWS S3).
- CI/CD Artifact Fetching: Retrieve build artifacts or release binaries from a storage container onto a deployment server.
- Python 3.10+
- An Azure Storage account with at least one blob container
First, create and activate a virtual environment:
# Windows
python -m venv .venv
.venv\Scripts\activate
# macOS/Linux
python -m venv .venv
source .venv/bin/activateThen install the dependencies:
pip install -r requirements.txtYou can configure the tool three ways (in order of precedence):
python sync.py \
--connection-string "DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...;EndpointSuffix=core.windows.net" \
--container my-container \
--local-dir ./outputcp config.example.json config.jsonEdit config.json with your values:
{
"connection_string": "DefaultEndpointsProtocol=https;AccountName=YOUR_ACCOUNT;AccountKey=YOUR_KEY;EndpointSuffix=core.windows.net",
"container_name": "my-container",
"local_dir": "./synced-files",
"prefix": "",
"delete_orphaned": false
}Then run with:
python sync.py --config config.jsonexport AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=https;..."
python sync.py --container my-containerNote: CLI flags override config file values, which override env vars.
# Basic sync using a config file
python sync.py --config config.json
# Sync only blobs under a specific path
python sync.py -c config.json --prefix "images/2025/"
# Sync and delete local files that were removed from Azure
python sync.py -c config.json --delete-orphaned
# Verbose output for debugging
python sync.py -c config.json -v
# Override output directory
python sync.py -c config.json --local-dir /data/backup| Flag | Short | Default | Description |
|---|---|---|---|
--config |
-c |
— | Path to JSON config file |
--connection-string |
— | Azure Storage connection string | |
--container |
— | Blob container name | |
--local-dir |
./synced-files |
Local directory to sync into | |
--prefix |
"" |
Only sync blobs matching this prefix | |
--delete-orphaned |
false |
Remove local files deleted from Azure | |
--verbose |
-v |
false |
Enable debug-level logging |
- List — queries all blobs in the container (filtered by
--prefixif set) - Compare — checks each blob's
etagandlast_modifiedagainst a local.sync_manifest.json - Download — only pulls blobs that are new or changed
- Clean up (optional) — with
--delete-orphaned, removes local files that no longer exist in the container - Save manifest — writes
.sync_manifest.jsonso the next run knows what's already synced
The manifest is stored inside your --local-dir. Deleting it will cause a full re-download on the next run.
pip install pytest
pytest test_sync.py -vAll Azure SDK calls are mocked — no real Azure connection needed.
- Go to the Azure Portal
- Navigate to your Storage Account → Access keys
- Click Show next to Key 1 and copy the Connection string
Security: Never commit
config.jsonto source control — it's in.gitignoreby default.