Audio Simulation (gpuRIR)

This repository provides the Audio Simulation module of a modular audio-visual software architecture. It generates realistic synthetic multi-channel microphone recordings for multiple moving speakers inside a virtual indoor environment using room acoustics simulation.

The module is designed to work together with the Unity-based visual simulation and downstream audio and video detection modules.

Audio-Visual Sensor Fusion Architecture

Features

Physically plausible room acoustics simulation using the Image Source Method (ISM).
GPU-accelerated room impulse response (RIR) generation via gpuRIR (CUDA-based).
Supports multiple moving sound sources and multiple microphones.
Generates time-synchronized, reproducible audio data aligned with visual simulation output.
Produces lossless multi-channel WAV recordings (32-bit, 44.1 kHz).
Fully driven by standardized JSON / JSONL interfaces shared with other modules.

Functionality Overview

The Audio Simulation module performs the following steps:

Room Acoustic Simulation Models sound propagation in a rectangular room with configurable dimensions and reverberation time.
Room Impulse Response (RIR) Generation Computes RIRs for all source–microphone pairs using gpuRIR based on room geometry and wall absorption.
Audio Rendering Convolves dry source signals with time-varying RIRs to generate realistic microphone signals for moving speakers.

Inputs

The module consumes the following inputs generated by the Unity simulation:

config.json Static scene description containing:
- Room dimensions
- Microphone positions
- Reverberation time (T60)
groundtruth_sources.jsonl Time-dependent 3D trajectories of moving speakers.
Dry (anechoic) source audio signals (e.g., speech WAV files).

Outputs

Multi-channel microphone recordings:
```
multichannel_audio_<timestamp>.wav
```
Each file contains synchronized signals from all simulated microphones.

Repository Structure

gpuRIR/
├── examples/
│   ├── config.py                       # Example simulation parameters
│   ├── simulate_static_sources.py      # main script for simulate static sound source
│   ├── simulate_moving_sources.py      # main script for simulate moving sound sources
│   ├── clap_mono_16_44100.wav          # dry signal: single hand clap
│   ├── speech_mono_16_44100.wav        # dry signal: male "Can you keep a secret?"
└── README.md

Generated Scenario Folder Structure

This module integrates into the shared experiment directory structure:

📁 experiment_001/
├── config.json
├── groundtruth_sources.jsonl
├── audio/
│   └── wav/
│       └── multichannel_audio_<timestamp>.wav
├── video/
│   ├── rgb/
│   │   └── RGB_frame_<timestamp>.png
│   └── depth/
│       └── Depth_frame_<timestamp>.png
├── localization/
│   ├── audio_localizations.jsonl
│   └── video_localizations.jsonl
└── tracking/
    ├── audio_tracking.jsonl
    ├── video_tracking.jsonl
    └── audio_video_tracking.jsonl

Installation

Install Python 3.11.3.
Install CUDA-compatible GPU drivers.

Install dependencies:

pip install gpuRIR numpy scipy soundfile

Clone the repository:

git clone https://code.fbi.h-da.de/est/est-workgroup/gpuRIR.git

Usage

Ensure config.json and groundtruth_sources.jsonl are available from the Unity simulation.
Prepare dry source audio files.

Run the audio simulation:

python examples/simulate_moving_sources.py

Generated multi-channel WAV files will be written to the experiment output directory.

Notes and Limitations

The simulation uses a geometrical acoustics model (ISM) and is most accurate above the Schroeder frequency.
Frequency-dependent wall absorption, microphone frequency response, and background noise are not modeled.
The human speaker is approximated as a point source.

These assumptions ensure computational efficiency and reproducibility while remaining suitable for audio localization and sensor fusion research.

Dependencies

Python 3.11.3
gpuRIR (CUDA-enabled GPU required)
numpy
scipy
soundfile

Contact

Created by: Laurens Sillekens

Official gpuRIR Readme

gpuRIR is a free and open-source Python library for Room Impulse Response (RIR) simulation using the Image Source Method (ISM) with GPU acceleration. It can compute the RIRs between several source and receivers positions in parallel using CUDA GPUs. It is approximately 100 times faster than CPU implementations [1].

Prerequisites
Installation
License
Documentation
References

Prerequisites

OS: It has been tested on GNU/Linux systems (Ubuntu and centOS) and Windows 10. Please, let me know if you successfully install it on Mac OSX systems.
Compilers: To install the package you will need the NVIDIA CUDA Toolkit (it has been tested with the release 8.0 and 10.0 but it should work fine with any version that includes cuRAND) and a C++11 compiler, such as GCC or MSVC++.
CMake: Finally, you will need, at least, the version 3.23 of CMake. You can easily get it by pip install cmake.
Python: It has been tested in Python 3, but should work fine with Python 2.

Note for PyTorch users: If you are going to use this module with PyTorch, the compiler you use to build gpuRIR must be ABI-compatible with the compiler PyTorch was built with, so you must use GCC version 4.9 and above.

Installation

You can use pip to install gpuRIR from our repository through pip install https://github.com/DavidDiazGuerra/gpuRIR/zipball/master. You can also clone or download our repository and run pip install gpuRIR/.

License

The library is subject to AGPL-3.0 license and comes with no warranty. If you find it useful for your research work, please, acknowledge it to [1].

Documentation

`simulateRIR`

Room Impulse Responses (RIRs) simulation using the Image Source Method (ISM). For further details see [1].

Parameters

room_sz : array_like with 3 elements. Size of the room (in meters).
beta : array_like with 6 elements. Reflection coefficients of the walls as $[\beta_{x0}, \beta_{x1}, \beta_{y0}, \beta_{y1}, \beta_{z0}, \beta_{z1}]$, where $\beta_{x0}$ and $\beta_{x1}$ are the reflection coefficents of the walls orthogonal to the x axis at x=0 and x=room_sz[0], respectively.
pos_src, pos_rcv : ndarray with 2 dimensions and 3 columns. Position of the sources and the receivers (in meters).
nb_img : array_like with 3 integer elements Number of images to simulate in each dimension.
Tmax : float RIRs length (in seconds).
fs : float RIRs sampling frequency (in Hertz).
Tdiff : float, optional Time (in seconds) when the ISM is replaced by a diffuse reverberation model. Default is Tmax (full ISM simulation).
spkr_pattern : {"omni", "homni", "card", "hypcard", "subcard", "bidir"}, optional. Polar pattern of the sources (the same for all of them).
mic_pattern : {"omni", "homni", "card", "hypcard", "subcard", "bidir"}, optional. Polar pattern of the receivers (the same for all of them).
- "omni" : Omnidireccional (default).
- "homni": Half omnidirectional, 1 in front of the microphone, 0 backwards.
- "card": Cardioid.
- "hypcard": Hypercardioid.
- "subcard": Subcardioid.
- "bidir": Bidirectional, a.k.a. figure 8.
orV_src : ndarray with 2 dimensions and 3 columns or None, optional. Orientation of the sources as vectors pointing in the same direction. Applies to each source. None (default) is only valid for omnidirectional patterns.
orV_rcv : ndarray with 2 dimensions and 3 columns or None, optional. Orientation of the receivers as vectors pointing in the same direction. Applies to each receiver. None (default) is only valid for omnidirectional patterns.
c : float, optional. Speed of sound (in m/s). The default is 343.0.

Returns

3D ndarray The first axis is the source, the second the receiver and the third the time.

Warnings

Asking for too much and too long RIRs (specially for full ISM simulations) may exceed the GPU memory and crash the kernel.

`simulateTrajectory`

Filter an audio signal by the RIRs of a motion trajectory recorded with a microphone array.

Parameters

source_signal : array_like. Signal of the moving source.
RIRs : 3D ndarray Room Impulse Responses generated with simulateRIR.
timestamps : array_like, optional Timestamp of each RIR [s]. By default, the RIRs are equispaced through the trajectory.
fs : float, optional Sampling frequency (in Hertz). It is only needed for custom timestamps.

Returns

2D ndarray Matrix with the signals captured by each microphone in each column.

`activateMixedPrecision`

Activate the mixed precision mode, only for Pascal GPU architecture or superior.

Parameters

activate : bool, optional. True for activate and Flase for deactivate. True by default.

`activateLUT`

Activate the lookup table for the sinc computations.

Parameters

activate : bool, optional. True for activate and Flase for deactivate. True by default.

`beta_SabineEstimation`

Estimation of the reflection coefficients needed to have the desired reverberation time.

Parameters

room_sz : 3 elements list or numpy array. Size of the room (in meters).
T60 : float. Reverberation time of the room (seconds to reach 60dB attenuation).
abs_weights : array_like with 6 elements, optional. Absorption coefficient ratios of the walls (the default is [1.0]*6).

Returns

ndarray with 6 elements. Reflection coefficients of the walls as $[\beta_{x0}, \beta_{x1}, \beta_{y0}, \beta_{y1}, \beta_{z0}, \beta_{z1}]$, where $\beta_{x0}$ and $\beta_{x1}$ are the reflection coefficients of the walls orthogonal to the x axis at x=0 and x=room_sz[0], respectively.

`att2t_SabineEstimator`

Estimation of the time for the RIR to reach a certain attenuation using the Sabine model.

Parameters

att_dB : float. Desired attenuation (in dB).
T60 : float. Reverberation time of the room (seconds to reach 60dB attenuation).

Returns

float. Time (in seconds) to reach the desired attenuation.

`t2n`

Estimation of the number of images needed for a correct RIR simulation.

Parameters

T : float. RIRs length (in seconds).
room_sz : 3 elements list or numpy array. Size of the room (in meters).
c : float, optional. Speed of sound (the default is 343.0).

Returns

3 elements list of integers. The number of images sources to compute in each dimension.

References

[1] Diaz-Guerra, D., Miguel, A. & Beltran, J.R. gpuRIR: A python library for room impulse response simulation with GPU acceleration. Multimed Tools Appl (2020). [DOI] [SharedIt] [arXiv preprint]

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
Images		Images
examples		examples
gpuRIR		gpuRIR
src		src
third_party/pybind11		third_party/pybind11
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

License

Laurens26/gpuRIR

Folders and files

Latest commit

History

Repository files navigation

Audio Simulation (gpuRIR)

Audio-Visual Sensor Fusion Architecture

Features

Functionality Overview

Inputs

Outputs

Repository Structure

Generated Scenario Folder Structure

Installation

Usage

Notes and Limitations

Dependencies

Contact

Official gpuRIR Readme

Prerequisites

Installation

License

Documentation

simulateRIR

Parameters

Returns

Warnings

simulateTrajectory

Parameters

Returns

activateMixedPrecision

Parameters

activateLUT

Parameters

beta_SabineEstimation

Parameters

Returns

att2t_SabineEstimator

Parameters

Returns

t2n

Parameters

Returns

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 10

Uh oh!

Languages

`simulateRIR`

`simulateTrajectory`

`activateMixedPrecision`

`activateLUT`

`beta_SabineEstimation`

`att2t_SabineEstimator`

`t2n`

Packages