Learning Interpretability Tool for Audio Models
Interpreting how deep learning models make decisions is crucial, especially in high-stakes applications like speech recognition, emotion detection, and speaker identification. While the Learning Interpretability Tool (LIT) enables exploration of text and tabular models, there's a lack of equivalent tools for voice-based models. Voice data poses additional challenges due to its temporal nature and multi-modal representations (e.g., waveform, spectrogram).
ECHO extends the interpretability paradigm to audio models, providing researchers and developers with tools to analyze and debug speech models with greater transparency. Through interactive visualizations, attention mechanisms, and perturbation analyses, you can gain deeper insights into how your audio models make decisions.
- Audio Data Management: Upload and manage audio datasets with metadata
- Waveform Visualization: Interactive waveform viewer with playback controls
- Model Prediction Analysis: Examine model predictions and confidence scores
- Attention Visualization: Explore attention patterns in transformer-based audio models
- Embedding Analysis: Visualize high-dimensional audio embeddings in 2D/3D space
- Saliency Mapping: Identify important regions in audio input using gradient-based methods
- Perturbation Tools: Apply various audio perturbations to test model robustness
- Interactive Dashboard: Comprehensive interface for exploring model behavior
- Frontend: React 18 + TypeScript + Vite
- UI Framework: Tailwind CSS + shadcn/ui components
- State Management: TanStack Query
- Data Visualization: Custom React components with Chart.js integration
- Audio Processing: Web Audio API
- Backend: FastAPI + Python 3.11
- Models: Transformer-based audio models (Whisper, Wav2Vec2)
- Storage: Redis for caching predictions and results
-
Frontend:
- Node.js (v18 or higher)
- npm or bun package manager
-
Backend:
- Python 3.11
- Docker (for Redis)
git clone https://github.com/AnasSAV/ECHO.git
cd ECHOcd Frontend
npm install
npm run dev# In a new terminal
cd Backend
docker compose up -dcd Backend
python -m venv .venv
.venv\Scripts\activate # On Windows
source .venv/bin/activate # On Unix or MacOS
python -m pip install --upgrade pip
pip install -r requirements.txt
uvicorn app.main:app --reload# Initialize conda for your shell (Only if you have not used Conda before)
conda init cmd.exe
# Navigate to your project folder
cd Backend
# Create the environment with Python 3.10
conda create -n ECHO python=3.10 -y
# Activate the environment
conda activate ECHO
# Install dependencies
conda install -c pytorch -c nvidia -c conda-forge fastapi uvicorn starlette httpx python-multipart python-dotenv pydantic-settings anyio numpy pandas librosa pysoundfile transformers pytorch torchvision torchaudio pytorch-cuda=12.1 redis-py pytest pytest-asyncio requests -y
# Start the backend server
uvicorn app.main:app --reloadOpen your browser and navigate to http://localhost:8080
ECHO/
├── Frontend/ # React frontend application
│ ├── components/ # React components
│ │ ├── analysis/ # Analysis and perturbation tools
│ │ ├── audio/ # Audio visualization components
│ │ ├── layout/ # Layout components
│ │ ├── panels/ # Dashboard panels
│ │ ├── ui/ # Reusable UI components
│ │ └── visualization/ # Data visualization components
│ ├── hooks/ # Custom React hooks
│ ├── lib/ # Utility functions
│ └── pages/ # Page components
│
├── Backend/ # FastAPI backend application
│ ├── app/ # Application code
│ │ ├── api/ # API routes and endpoints
│ │ ├── core/ # Core functionality
│ │ └── services/ # Business logic services
│ ├── data/ # Sample datasets
│ ├── tests/ # Backend tests
│ └── uploads/ # User-uploaded audio files
│
├── CODE_OF_CONDUCT.md # Community guidelines
├── CONTRIBUTING.md # Contribution guidelines
├── LICENSE # MIT License
├── README.md # Project documentation
└── SECURITY.md # Security policy
npm run dev- Start development servernpm run build- Build for productionnpm run lint- Run ESLintnpm run preview- Preview production build
pytest- Run backend testsuvicorn app.main:app --reload- Start the API server in development mode
- Upload Audio Data: Use the audio uploader to load your audio files
- Select Models: Choose from available audio models for analysis
- Explore Visualizations:
- Examine waveforms and spectrograms
- View model predictions and confidence scores
- Explore attention patterns and embedding spaces
- Generate saliency maps to highlight important audio regions
- Apply Perturbations: Test model robustness with various audio perturbations
- Analyze Results: Use the interactive dashboard to gain insights
We welcome contributions! Please read our Contributing Guidelines for more information.
For security-related issues, please refer to our Security Policy.
- Anas Hussaindeen - GitHub Profile
- Chandupa Ambepitiya - GitHub Profile
- Dewmike Amarasinghe - GitHub Profile
- Dr Uthayasanker Thayasivam - NLP Researcher & Senior Lecturer and Head of Department at Computer Science & Engineering, University of Moratuwa, Sri Lanka
- Inspired by Google's Learning Interpretability Tool (LIT)
- Built with modern React ecosystem and TypeScript
- Special thanks to the open-source community for the amazing tools and libraries
- Backend API enhancements for model serving
- Support for more audio model architectures
- Advanced perturbation techniques
- Real-time audio processing capabilities
- Export functionality for visualizations
- Multi-language support
- Plugin system for custom analysis tools
This project is licensed under the MIT License - see the LICENSE file for details.
Built for audio model interpretability
