Skip to content

minuszero/mz-data-cold-storage-migration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

ECAL Data Cold Storage Migration Script

A comprehensive Python script for migrating ECAL (Experimental Car Audio Log) data from regular Azure blob storage to cold storage while maintaining complete directory structures and updating database records.

Overview

This migration tool provides an interactive interface to move ECAL data files and their associated directory structures from a source Azure storage account to a cold storage account. It maintains data integrity by copying entire directory hierarchies, updates database records with new locations, and optionally removes source data after successful migration.

Features

Core Functionality

  • Interactive Migration Options: Choose from three migration types:
    • Migrate specific ECAL IDs
    • Migrate data from a single date
    • Migrate data from a date range
  • Complete Directory Migration: Copies entire directory structures, not just individual files
  • Database Integration: Updates PostgreSQL database with new cold storage locations
  • Cross-Account Azure Storage: Supports migration between different Azure storage accounts
  • Automatic Cleanup: Optionally deletes source directories after successful migration
  • Comprehensive Logging: Detailed logging for monitoring and troubleshooting

Data Safety Features

  • SAS Token Authentication: Secure cross-account blob copying using generated SAS tokens
  • Copy Status Monitoring: Tracks copy operations with timeout protection
  • Transaction Safety: Database updates only occur after successful file copying
  • Error Handling: Robust error handling with detailed logging for failed operations

Prerequisites

System Requirements

  • Python 3.7 or higher
  • Network access to both source and destination Azure storage accounts
  • PostgreSQL database access

Required Python Packages

psycopg2-binary
azure-storage-blob
azure-core

Environment Variables

The script requires the following environment variables to be set:

Database Configuration

export DB_HOST="your-postgres-host"
export DB_NAME="your-database-name"
export DB_USER="your-database-username"
export DB_PASSWORD="your-database-password"
export DB_PORT="5432"  # Optional, defaults to 5432

Azure Storage Configuration

export SOURCE_AZURE_CONNECTION_STRING="DefaultEndpointsProtocol=https;AccountName=sourceaccount;AccountKey=sourcekey;EndpointSuffix=core.windows.net"
export COLD_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=https;AccountName=coldstorage;AccountKey=coldkey;EndpointSuffix=core.windows.net"
export COLD_STORAGE_CONTAINER="ecal-cold-storage"  # Optional, defaults to 'ecal-cold-storage'

Installation

  1. Clone or download the script:

    wget ecal_migration.py
    # or copy the script to your desired location
  2. Install required packages:

    pip install psycopg2-binary azure-storage-blob azure-core
  3. Set environment variables:

    # Create a .env file or export variables directly
    source your-environment-file.env
  4. Make script executable:

    chmod +x ecal_migration.py

Usage

Basic Usage

python ecal_migration.py

Interactive Migration Process

  1. Launch the script: Run the script and select your migration type from the interactive menu

  2. Choose Migration Type:

    • Option 1 - ECAL IDs: Enter comma-separated ECAL IDs

      Example: ECAL001,ECAL002,ECAL003
      
    • Option 2 - Single Date: Enter a specific date

      Example: 2025-02-14
      
    • Option 3 - Date Range: Enter start and end dates

      Example: 
      Start date: 2025-02-01
      End date: 2025-02-28
      
  3. Review and Confirm: The script will display:

    • Number of records found
    • Sample record details (first 5 records)
    • Total data size to be migrated
    • Migration warnings and confirmations
  4. Migration Process: After confirmation, the script will:

    • Copy entire directory structures to cold storage
    • Update database records with new URLs
    • Delete source directories (after successful migration)
    • Provide detailed progress logging

Example Session

==========================================================
          ECAL Data Cold Storage Migration Tool
==========================================================

Select the type of migration:
1. Migrate by ECAL IDs
2. Migrate by single date
3. Migrate by date range

Enter your choice (1-3): 2

Enter date (YYYY-MM-DD format):
Date: 2025-02-14
Selected date: 2025-02-14

Fetching records from database...
Found 5 record(s) to migrate:
--------------------------------------------------
1. ECAL ID: ECAL_2025021401
   Recording Start: 2025-02-14 15:33:21
   Size: 2.45 GB
   Current Location: https://datacollectionblob.blob.core.windows.net/ecal-batchstore/recording/2025-02-14...

Total size to migrate: 12.3 GB

Do you want to proceed with the migration? (yes/no): yes

Starting migration process...

Database Schema

The script expects a PostgreSQL table named ecal with the following structure:

CREATE TABLE ecal (
    ecal_id VARCHAR PRIMARY KEY,
    ecal_name TEXT,                    -- URL to the ECAL file
    recording_start_time TIMESTAMP,
    upload_start_time TIMESTAMP,
    upload_end_time TIMESTAMP,
    map TEXT,
    size_gb DECIMAL,
    length INTEGER,
    is_valid BOOLEAN,
    invalid_reason_id INTEGER,
    recording_end_time TIMESTAMP
);

Architecture and Data Flow

Migration Process Flow

  1. User Input Processing: Parse user selection and validate input parameters
  2. Database Query: Fetch matching ECAL records based on selection criteria
  3. Directory Analysis: Extract directory paths from ECAL URLs
  4. Blob Enumeration: List all blobs within each directory structure
  5. Cross-Account Copy: Use SAS tokens to copy blobs between storage accounts
  6. Database Update: Update ecal_name field with new cold storage URLs
  7. Source Cleanup: Delete original directory structure after successful migration

URL Structure Handling

The script handles Azure blob URLs with the following pattern:

https://sourceaccount.blob.core.windows.net/container/recording/YYYY-MM-DD/session_name/timestamp_directory/filename.ecalmeas

After migration, URLs are transformed to:

https://coldstorage.blob.core.windows.net/ecal-cold-storage/recording/YYYY-MM-DD/session_name/timestamp_directory/filename.ecalmeas

Configuration Details

ECALMigrationManager Class

The core functionality is encapsulated in the ECALMigrationManager class:

Key Methods

  • fetch_ecal_records_by_ids(): Query database by ECAL ID list
  • fetch_ecal_records_by_date(): Query database by date range
  • copy_directory_to_cold_storage(): Handle cross-account directory copying
  • update_ecal_record(): Update database with new storage locations
  • delete_source_directory(): Clean up source data after migration

Authentication Handling

  • Parses Azure connection strings to extract account credentials
  • Generates temporary SAS tokens for secure cross-account access
  • Handles blob service client initialization for both storage accounts

Error Handling and Logging

Logging Configuration

  • Level: INFO level logging with timestamps
  • Format: %(asctime)s - %(levelname)s - %(message)s
  • Output: Console output with detailed progress information

Common Error Scenarios

  • Missing Environment Variables: Validation at startup with clear error messages
  • Database Connection Failures: Automatic connection retry and error reporting
  • Blob Copy Failures: Individual blob error handling with operation continuation
  • Timeout Handling: Copy operation timeouts with configurable limits (1 hour per file)
  • Invalid URL Formats: URL parsing validation with descriptive error messages

Recovery and Troubleshooting

  • Failed migrations can be retried (script skips already migrated data)
  • Detailed logging helps identify specific failure points
  • Database transactions ensure consistency between file operations and record updates

Performance Considerations

Optimization Features

  • Parallel Operations: Concurrent blob copying within directories
  • SAS Token Caching: Efficient token generation for large directories
  • Progress Monitoring: Real-time copy status tracking
  • Memory Efficiency: Streaming blob operations without local storage

Scalability Limits

  • Timeout Protection: 1-hour maximum wait time per file copy operation
  • Connection Management: Automatic connection pooling for database operations
  • Large Directory Handling: Efficient handling of directories with hundreds of files

Security Considerations

Access Control

  • Environment Variable Storage: Sensitive credentials stored in environment variables
  • Temporary SAS Tokens: 2-hour expiration on generated access tokens
  • Database Transactions: Atomic operations prevent partial migrations
  • Connection String Parsing: Secure extraction of authentication components

Data Protection

  • Copy Verification: Verification of successful copy operations before source deletion
  • Transaction Rollback: Database rollback on migration failures
  • Audit Logging: Comprehensive logging for security auditing

Troubleshooting

Common Issues and Solutions

  1. "Missing environment variables" error:

    • Verify all required environment variables are set
    • Check connection string format matches Azure standards
  2. "Database connection failed" error:

    • Verify PostgreSQL server accessibility
    • Check database credentials and permissions
  3. "Blob copy failed" error:

    • Verify Azure storage account permissions
    • Check network connectivity between accounts
    • Ensure source blobs exist and are accessible
  4. "Copy operation timed out" error:

    • Large files may require longer copy times
    • Check network stability and bandwidth
    • Consider running migration during off-peak hours

Debug Mode

For additional debugging information, modify the logging level:

logging.basicConfig(level=logging.DEBUG)

License and Support

This script is designed for internal ECAL data management operations. For support or modifications, contact the development team responsible for ECAL data infrastructure.


Note: This script performs destructive operations (deletes source data after migration). Always ensure proper backups and test in a non-production environment before running against production data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages