Skip to content
/ ECHO Public

ECHO simplifies audio model interpretability by providing visualization tools for exploring voice-based AI models, with a focus on explainability.

License

Notifications You must be signed in to change notification settings

AnasSAV/ECHO

ECHO

ECHO - Explainable Computation for Hearing Outputs

Version License Stars Forks Issues

Learning Interpretability Tool for Audio Models

Interpreting how deep learning models make decisions is crucial, especially in high-stakes applications like speech recognition, emotion detection, and speaker identification. While the Learning Interpretability Tool (LIT) enables exploration of text and tabular models, there's a lack of equivalent tools for voice-based models. Voice data poses additional challenges due to its temporal nature and multi-modal representations (e.g., waveform, spectrogram).

ECHO extends the interpretability paradigm to audio models, providing researchers and developers with tools to analyze and debug speech models with greater transparency. Through interactive visualizations, attention mechanisms, and perturbation analyses, you can gain deeper insights into how your audio models make decisions.

Features

  • Audio Data Management: Upload and manage audio datasets with metadata
  • Waveform Visualization: Interactive waveform viewer with playback controls
  • Model Prediction Analysis: Examine model predictions and confidence scores
  • Attention Visualization: Explore attention patterns in transformer-based audio models
  • Embedding Analysis: Visualize high-dimensional audio embeddings in 2D/3D space
  • Saliency Mapping: Identify important regions in audio input using gradient-based methods
  • Perturbation Tools: Apply various audio perturbations to test model robustness
  • Interactive Dashboard: Comprehensive interface for exploring model behavior

Tech Stack

  • Frontend: React 18 + TypeScript + Vite
  • UI Framework: Tailwind CSS + shadcn/ui components
  • State Management: TanStack Query
  • Data Visualization: Custom React components with Chart.js integration
  • Audio Processing: Web Audio API
  • Backend: FastAPI + Python 3.11
  • Models: Transformer-based audio models (Whisper, Wav2Vec2)
  • Storage: Redis for caching predictions and results

Prerequisites

  • Frontend:

    • Node.js (v18 or higher)
    • npm or bun package manager
  • Backend:

    • Python 3.11
    • Docker (for Redis)

Installation

1. Clone the repository

git clone https://github.com/AnasSAV/ECHO.git
cd ECHO

2. Set up the Frontend

cd Frontend
npm install
npm run dev

3. Start Redis server in Docker

# In a new terminal
cd Backend
docker compose up -d

4. Set up the Backend

Using Python venv (recommended)

cd Backend
python -m venv .venv
.venv\Scripts\activate  # On Windows
source .venv/bin/activate  # On Unix or MacOS
python -m pip install --upgrade pip
pip install -r requirements.txt
uvicorn app.main:app --reload

Using Miniconda (alternative)

# Initialize conda for your shell (Only if you have not used Conda before)
conda init cmd.exe

# Navigate to your project folder
cd Backend

# Create the environment with Python 3.10
conda create -n ECHO python=3.10 -y

# Activate the environment
conda activate ECHO

# Install dependencies
conda install -c pytorch -c nvidia -c conda-forge fastapi uvicorn starlette httpx python-multipart python-dotenv pydantic-settings anyio numpy pandas librosa pysoundfile transformers pytorch torchvision torchaudio pytorch-cuda=12.1 redis-py pytest pytest-asyncio requests -y

# Start the backend server
uvicorn app.main:app --reload

5. Access the Application

Open your browser and navigate to http://localhost:8080

Project Structure

ECHO/
├── Frontend/                # React frontend application
│   ├── components/          # React components
│   │   ├── analysis/        # Analysis and perturbation tools
│   │   ├── audio/           # Audio visualization components
│   │   ├── layout/          # Layout components
│   │   ├── panels/          # Dashboard panels
│   │   ├── ui/              # Reusable UI components
│   │   └── visualization/   # Data visualization components
│   ├── hooks/               # Custom React hooks
│   ├── lib/                 # Utility functions
│   └── pages/               # Page components
│
├── Backend/                 # FastAPI backend application
│   ├── app/                 # Application code
│   │   ├── api/             # API routes and endpoints
│   │   ├── core/            # Core functionality
│   │   └── services/        # Business logic services
│   ├── data/                # Sample datasets
│   ├── tests/               # Backend tests
│   └── uploads/             # User-uploaded audio files
│
├── CODE_OF_CONDUCT.md       # Community guidelines
├── CONTRIBUTING.md          # Contribution guidelines
├── LICENSE                  # MIT License
├── README.md                # Project documentation
└── SECURITY.md              # Security policy

Available Scripts

Frontend

  • npm run dev - Start development server
  • npm run build - Build for production
  • npm run lint - Run ESLint
  • npm run preview - Preview production build

Backend

  • pytest - Run backend tests
  • uvicorn app.main:app --reload - Start the API server in development mode

Usage

  1. Upload Audio Data: Use the audio uploader to load your audio files
  2. Select Models: Choose from available audio models for analysis
  3. Explore Visualizations:
    • Examine waveforms and spectrograms
    • View model predictions and confidence scores
    • Explore attention patterns and embedding spaces
    • Generate saliency maps to highlight important audio regions
  4. Apply Perturbations: Test model robustness with various audio perturbations
  5. Analyze Results: Use the interactive dashboard to gain insights

Contributing

We welcome contributions! Please read our Contributing Guidelines for more information.

Security

For security-related issues, please refer to our Security Policy.

Authors

Mentor

  • Dr Uthayasanker Thayasivam - NLP Researcher & Senior Lecturer and Head of Department at Computer Science & Engineering, University of Moratuwa, Sri Lanka

Acknowledgments

  • Inspired by Google's Learning Interpretability Tool (LIT)
  • Built with modern React ecosystem and TypeScript
  • Special thanks to the open-source community for the amazing tools and libraries

Roadmap

  • Backend API enhancements for model serving
  • Support for more audio model architectures
  • Advanced perturbation techniques
  • Real-time audio processing capabilities
  • Export functionality for visualizations
  • Multi-language support
  • Plugin system for custom analysis tools

License

This project is licensed under the MIT License - see the LICENSE file for details.


Built for audio model interpretability

About

ECHO simplifies audio model interpretability by providing visualization tools for exploring voice-based AI models, with a focus on explainability.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •