A comprehensive Python script for migrating ECAL (Experimental Car Audio Log) data from regular Azure blob storage to cold storage while maintaining complete directory structures and updating database records.
This migration tool provides an interactive interface to move ECAL data files and their associated directory structures from a source Azure storage account to a cold storage account. It maintains data integrity by copying entire directory hierarchies, updates database records with new locations, and optionally removes source data after successful migration.
- Interactive Migration Options: Choose from three migration types:
- Migrate specific ECAL IDs
- Migrate data from a single date
- Migrate data from a date range
- Complete Directory Migration: Copies entire directory structures, not just individual files
- Database Integration: Updates PostgreSQL database with new cold storage locations
- Cross-Account Azure Storage: Supports migration between different Azure storage accounts
- Automatic Cleanup: Optionally deletes source directories after successful migration
- Comprehensive Logging: Detailed logging for monitoring and troubleshooting
- SAS Token Authentication: Secure cross-account blob copying using generated SAS tokens
- Copy Status Monitoring: Tracks copy operations with timeout protection
- Transaction Safety: Database updates only occur after successful file copying
- Error Handling: Robust error handling with detailed logging for failed operations
- Python 3.7 or higher
- Network access to both source and destination Azure storage accounts
- PostgreSQL database access
psycopg2-binary
azure-storage-blob
azure-core
The script requires the following environment variables to be set:
export DB_HOST="your-postgres-host"
export DB_NAME="your-database-name"
export DB_USER="your-database-username"
export DB_PASSWORD="your-database-password"
export DB_PORT="5432" # Optional, defaults to 5432export SOURCE_AZURE_CONNECTION_STRING="DefaultEndpointsProtocol=https;AccountName=sourceaccount;AccountKey=sourcekey;EndpointSuffix=core.windows.net"
export COLD_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=https;AccountName=coldstorage;AccountKey=coldkey;EndpointSuffix=core.windows.net"
export COLD_STORAGE_CONTAINER="ecal-cold-storage" # Optional, defaults to 'ecal-cold-storage'-
Clone or download the script:
wget ecal_migration.py # or copy the script to your desired location -
Install required packages:
pip install psycopg2-binary azure-storage-blob azure-core
-
Set environment variables:
# Create a .env file or export variables directly source your-environment-file.env
-
Make script executable:
chmod +x ecal_migration.py
python ecal_migration.py-
Launch the script: Run the script and select your migration type from the interactive menu
-
Choose Migration Type:
-
Option 1 - ECAL IDs: Enter comma-separated ECAL IDs
Example: ECAL001,ECAL002,ECAL003 -
Option 2 - Single Date: Enter a specific date
Example: 2025-02-14 -
Option 3 - Date Range: Enter start and end dates
Example: Start date: 2025-02-01 End date: 2025-02-28
-
-
Review and Confirm: The script will display:
- Number of records found
- Sample record details (first 5 records)
- Total data size to be migrated
- Migration warnings and confirmations
-
Migration Process: After confirmation, the script will:
- Copy entire directory structures to cold storage
- Update database records with new URLs
- Delete source directories (after successful migration)
- Provide detailed progress logging
==========================================================
ECAL Data Cold Storage Migration Tool
==========================================================
Select the type of migration:
1. Migrate by ECAL IDs
2. Migrate by single date
3. Migrate by date range
Enter your choice (1-3): 2
Enter date (YYYY-MM-DD format):
Date: 2025-02-14
Selected date: 2025-02-14
Fetching records from database...
Found 5 record(s) to migrate:
--------------------------------------------------
1. ECAL ID: ECAL_2025021401
Recording Start: 2025-02-14 15:33:21
Size: 2.45 GB
Current Location: https://datacollectionblob.blob.core.windows.net/ecal-batchstore/recording/2025-02-14...
Total size to migrate: 12.3 GB
Do you want to proceed with the migration? (yes/no): yes
Starting migration process...
The script expects a PostgreSQL table named ecal with the following structure:
CREATE TABLE ecal (
ecal_id VARCHAR PRIMARY KEY,
ecal_name TEXT, -- URL to the ECAL file
recording_start_time TIMESTAMP,
upload_start_time TIMESTAMP,
upload_end_time TIMESTAMP,
map TEXT,
size_gb DECIMAL,
length INTEGER,
is_valid BOOLEAN,
invalid_reason_id INTEGER,
recording_end_time TIMESTAMP
);- User Input Processing: Parse user selection and validate input parameters
- Database Query: Fetch matching ECAL records based on selection criteria
- Directory Analysis: Extract directory paths from ECAL URLs
- Blob Enumeration: List all blobs within each directory structure
- Cross-Account Copy: Use SAS tokens to copy blobs between storage accounts
- Database Update: Update ecal_name field with new cold storage URLs
- Source Cleanup: Delete original directory structure after successful migration
The script handles Azure blob URLs with the following pattern:
https://sourceaccount.blob.core.windows.net/container/recording/YYYY-MM-DD/session_name/timestamp_directory/filename.ecalmeas
After migration, URLs are transformed to:
https://coldstorage.blob.core.windows.net/ecal-cold-storage/recording/YYYY-MM-DD/session_name/timestamp_directory/filename.ecalmeas
The core functionality is encapsulated in the ECALMigrationManager class:
fetch_ecal_records_by_ids(): Query database by ECAL ID listfetch_ecal_records_by_date(): Query database by date rangecopy_directory_to_cold_storage(): Handle cross-account directory copyingupdate_ecal_record(): Update database with new storage locationsdelete_source_directory(): Clean up source data after migration
- Parses Azure connection strings to extract account credentials
- Generates temporary SAS tokens for secure cross-account access
- Handles blob service client initialization for both storage accounts
- Level: INFO level logging with timestamps
- Format:
%(asctime)s - %(levelname)s - %(message)s - Output: Console output with detailed progress information
- Missing Environment Variables: Validation at startup with clear error messages
- Database Connection Failures: Automatic connection retry and error reporting
- Blob Copy Failures: Individual blob error handling with operation continuation
- Timeout Handling: Copy operation timeouts with configurable limits (1 hour per file)
- Invalid URL Formats: URL parsing validation with descriptive error messages
- Failed migrations can be retried (script skips already migrated data)
- Detailed logging helps identify specific failure points
- Database transactions ensure consistency between file operations and record updates
- Parallel Operations: Concurrent blob copying within directories
- SAS Token Caching: Efficient token generation for large directories
- Progress Monitoring: Real-time copy status tracking
- Memory Efficiency: Streaming blob operations without local storage
- Timeout Protection: 1-hour maximum wait time per file copy operation
- Connection Management: Automatic connection pooling for database operations
- Large Directory Handling: Efficient handling of directories with hundreds of files
- Environment Variable Storage: Sensitive credentials stored in environment variables
- Temporary SAS Tokens: 2-hour expiration on generated access tokens
- Database Transactions: Atomic operations prevent partial migrations
- Connection String Parsing: Secure extraction of authentication components
- Copy Verification: Verification of successful copy operations before source deletion
- Transaction Rollback: Database rollback on migration failures
- Audit Logging: Comprehensive logging for security auditing
-
"Missing environment variables" error:
- Verify all required environment variables are set
- Check connection string format matches Azure standards
-
"Database connection failed" error:
- Verify PostgreSQL server accessibility
- Check database credentials and permissions
-
"Blob copy failed" error:
- Verify Azure storage account permissions
- Check network connectivity between accounts
- Ensure source blobs exist and are accessible
-
"Copy operation timed out" error:
- Large files may require longer copy times
- Check network stability and bandwidth
- Consider running migration during off-peak hours
For additional debugging information, modify the logging level:
logging.basicConfig(level=logging.DEBUG)This script is designed for internal ECAL data management operations. For support or modifications, contact the development team responsible for ECAL data infrastructure.
Note: This script performs destructive operations (deletes source data after migration). Always ensure proper backups and test in a non-production environment before running against production data.