This repository provides the Audio Simulation module of a modular audio-visual software architecture. It generates realistic synthetic multi-channel microphone recordings for multiple moving speakers inside a virtual indoor environment using room acoustics simulation.
The module is designed to work together with the Unity-based visual simulation and downstream audio and video detection modules.
- Physically plausible room acoustics simulation using the Image Source Method (ISM).
- GPU-accelerated room impulse response (RIR) generation via gpuRIR (CUDA-based).
- Supports multiple moving sound sources and multiple microphones.
- Generates time-synchronized, reproducible audio data aligned with visual simulation output.
- Produces lossless multi-channel WAV recordings (32-bit, 44.1 kHz).
- Fully driven by standardized JSON / JSONL interfaces shared with other modules.
The Audio Simulation module performs the following steps:
-
Room Acoustic Simulation Models sound propagation in a rectangular room with configurable dimensions and reverberation time.
-
Room Impulse Response (RIR) Generation Computes RIRs for all source–microphone pairs using gpuRIR based on room geometry and wall absorption.
-
Audio Rendering Convolves dry source signals with time-varying RIRs to generate realistic microphone signals for moving speakers.
The module consumes the following inputs generated by the Unity simulation:
-
config.jsonStatic scene description containing:- Room dimensions
- Microphone positions
- Reverberation time (T60)
-
groundtruth_sources.jsonlTime-dependent 3D trajectories of moving speakers. -
Dry (anechoic) source audio signals (e.g., speech WAV files).
-
Multi-channel microphone recordings:
multichannel_audio_<timestamp>.wavEach file contains synchronized signals from all simulated microphones.
gpuRIR/
├── examples/
│ ├── config.py # Example simulation parameters
│ ├── simulate_static_sources.py # main script for simulate static sound source
│ ├── simulate_moving_sources.py # main script for simulate moving sound sources
│ ├── clap_mono_16_44100.wav # dry signal: single hand clap
│ ├── speech_mono_16_44100.wav # dry signal: male "Can you keep a secret?"
└── README.md
This module integrates into the shared experiment directory structure:
📁 experiment_001/
├── config.json
├── groundtruth_sources.jsonl
├── audio/
│ └── wav/
│ └── multichannel_audio_<timestamp>.wav
├── video/
│ ├── rgb/
│ │ └── RGB_frame_<timestamp>.png
│ └── depth/
│ └── Depth_frame_<timestamp>.png
├── localization/
│ ├── audio_localizations.jsonl
│ └── video_localizations.jsonl
└── tracking/
├── audio_tracking.jsonl
├── video_tracking.jsonl
└── audio_video_tracking.jsonl
-
Install Python 3.11.3.
-
Install CUDA-compatible GPU drivers.
-
Install dependencies:
pip install gpuRIR numpy scipy soundfile
-
Clone the repository:
git clone https://code.fbi.h-da.de/est/est-workgroup/gpuRIR.git
-
Ensure
config.jsonandgroundtruth_sources.jsonlare available from the Unity simulation. -
Prepare dry source audio files.
-
Run the audio simulation:
python examples/simulate_moving_sources.py
-
Generated multi-channel WAV files will be written to the experiment output directory.
- The simulation uses a geometrical acoustics model (ISM) and is most accurate above the Schroeder frequency.
- Frequency-dependent wall absorption, microphone frequency response, and background noise are not modeled.
- The human speaker is approximated as a point source.
These assumptions ensure computational efficiency and reproducibility while remaining suitable for audio localization and sensor fusion research.
- Python 3.11.3
- gpuRIR (CUDA-enabled GPU required)
- numpy
- scipy
- soundfile
Created by: Laurens Sillekens
gpuRIR is a free and open-source Python library for Room Impulse Response (RIR) simulation using the Image Source Method (ISM) with GPU acceleration. It can compute the RIRs between several source and receivers positions in parallel using CUDA GPUs. It is approximately 100 times faster than CPU implementations [1].
-
OS: It has been tested on GNU/Linux systems (Ubuntu and centOS) and Windows 10. Please, let me know if you successfully install it on Mac OSX systems.
-
Compilers: To install the package you will need the NVIDIA CUDA Toolkit (it has been tested with the release 8.0 and 10.0 but it should work fine with any version that includes cuRAND) and a C++11 compiler, such as GCC or MSVC++.
-
CMake: Finally, you will need, at least, the version 3.23 of CMake. You can easily get it by
pip install cmake. -
Python: It has been tested in Python 3, but should work fine with Python 2.
Note for PyTorch users: If you are going to use this module with PyTorch, the compiler you use to build gpuRIR must be ABI-compatible with the compiler PyTorch was built with, so you must use GCC version 4.9 and above.
You can use pip to install gpuRIR from our repository through pip install https://github.com/DavidDiazGuerra/gpuRIR/zipball/master. You can also clone or download our repository and run pip install gpuRIR/.
The library is subject to AGPL-3.0 license and comes with no warranty. If you find it useful for your research work, please, acknowledge it to [1].
Room Impulse Responses (RIRs) simulation using the Image Source Method (ISM). For further details see [1].
- room_sz : array_like with 3 elements. Size of the room (in meters).
-
beta : array_like with 6 elements.
Reflection coefficients of the walls as
$[\beta_{x0}, \beta_{x1}, \beta_{y0}, \beta_{y1}, \beta_{z0}, \beta_{z1}]$ , where$\beta_{x0}$ and$\beta_{x1}$ are the reflection coefficents of the walls orthogonal to the x axis at x=0 and x=room_sz[0], respectively. - pos_src, pos_rcv : ndarray with 2 dimensions and 3 columns. Position of the sources and the receivers (in meters).
- nb_img : array_like with 3 integer elements Number of images to simulate in each dimension.
- Tmax : float RIRs length (in seconds).
- fs : float RIRs sampling frequency (in Hertz).
- Tdiff : float, optional Time (in seconds) when the ISM is replaced by a diffuse reverberation model. Default is Tmax (full ISM simulation).
- spkr_pattern : {"omni", "homni", "card", "hypcard", "subcard", "bidir"}, optional. Polar pattern of the sources (the same for all of them).
-
mic_pattern : {"omni", "homni", "card", "hypcard", "subcard", "bidir"}, optional.
Polar pattern of the receivers (the same for all of them).
- "omni" : Omnidireccional (default).
- "homni": Half omnidirectional, 1 in front of the microphone, 0 backwards.
- "card": Cardioid.
- "hypcard": Hypercardioid.
- "subcard": Subcardioid.
- "bidir": Bidirectional, a.k.a. figure 8.
- orV_src : ndarray with 2 dimensions and 3 columns or None, optional. Orientation of the sources as vectors pointing in the same direction. Applies to each source. None (default) is only valid for omnidirectional patterns.
- orV_rcv : ndarray with 2 dimensions and 3 columns or None, optional. Orientation of the receivers as vectors pointing in the same direction. Applies to each receiver. None (default) is only valid for omnidirectional patterns.
- c : float, optional. Speed of sound (in m/s). The default is 343.0.
3D ndarray The first axis is the source, the second the receiver and the third the time.
Asking for too much and too long RIRs (specially for full ISM simulations) may exceed the GPU memory and crash the kernel.
Filter an audio signal by the RIRs of a motion trajectory recorded with a microphone array.
- source_signal : array_like. Signal of the moving source.
- RIRs : 3D ndarray Room Impulse Responses generated with simulateRIR.
- timestamps : array_like, optional Timestamp of each RIR [s]. By default, the RIRs are equispaced through the trajectory.
- fs : float, optional Sampling frequency (in Hertz). It is only needed for custom timestamps.
2D ndarray Matrix with the signals captured by each microphone in each column.
Activate the mixed precision mode, only for Pascal GPU architecture or superior.
- activate : bool, optional. True for activate and Flase for deactivate. True by default.
Activate the lookup table for the sinc computations.
- activate : bool, optional. True for activate and Flase for deactivate. True by default.
Estimation of the reflection coefficients needed to have the desired reverberation time.
- room_sz : 3 elements list or numpy array. Size of the room (in meters).
- T60 : float. Reverberation time of the room (seconds to reach 60dB attenuation).
- abs_weights : array_like with 6 elements, optional. Absorption coefficient ratios of the walls (the default is [1.0]*6).
ndarray with 6 elements.
Reflection coefficients of the walls as
Estimation of the time for the RIR to reach a certain attenuation using the Sabine model.
- att_dB : float. Desired attenuation (in dB).
- T60 : float. Reverberation time of the room (seconds to reach 60dB attenuation).
float. Time (in seconds) to reach the desired attenuation.
Estimation of the number of images needed for a correct RIR simulation.
- T : float. RIRs length (in seconds).
- room_sz : 3 elements list or numpy array. Size of the room (in meters).
- c : float, optional. Speed of sound (the default is 343.0).
3 elements list of integers. The number of images sources to compute in each dimension.
[1] Diaz-Guerra, D., Miguel, A. & Beltran, J.R. gpuRIR: A python library for room impulse response simulation with GPU acceleration. Multimed Tools Appl (2020). [DOI] [SharedIt] [arXiv preprint]