Skip to content

Laurens26/gpuRIR

Repository files navigation

Audio Simulation (gpuRIR)

This repository provides the Audio Simulation module of a modular audio-visual software architecture. It generates realistic synthetic multi-channel microphone recordings for multiple moving speakers inside a virtual indoor environment using room acoustics simulation.

The module is designed to work together with the Unity-based visual simulation and downstream audio and video detection modules.

Audio-Visual Sensor Fusion Architecture

Audio-Visual Sensor Fusion Architecture

Features

  • Physically plausible room acoustics simulation using the Image Source Method (ISM).
  • GPU-accelerated room impulse response (RIR) generation via gpuRIR (CUDA-based).
  • Supports multiple moving sound sources and multiple microphones.
  • Generates time-synchronized, reproducible audio data aligned with visual simulation output.
  • Produces lossless multi-channel WAV recordings (32-bit, 44.1 kHz).
  • Fully driven by standardized JSON / JSONL interfaces shared with other modules.

Functionality Overview

The Audio Simulation module performs the following steps:

  1. Room Acoustic Simulation Models sound propagation in a rectangular room with configurable dimensions and reverberation time.

  2. Room Impulse Response (RIR) Generation Computes RIRs for all source–microphone pairs using gpuRIR based on room geometry and wall absorption.

  3. Audio Rendering Convolves dry source signals with time-varying RIRs to generate realistic microphone signals for moving speakers.

Inputs

The module consumes the following inputs generated by the Unity simulation:

  • config.json Static scene description containing:

    • Room dimensions
    • Microphone positions
    • Reverberation time (T60)
  • groundtruth_sources.jsonl Time-dependent 3D trajectories of moving speakers.

  • Dry (anechoic) source audio signals (e.g., speech WAV files).

Outputs

  • Multi-channel microphone recordings:

    multichannel_audio_<timestamp>.wav
    

    Each file contains synchronized signals from all simulated microphones.

Repository Structure

gpuRIR/
├── examples/
│   ├── config.py                       # Example simulation parameters
│   ├── simulate_static_sources.py      # main script for simulate static sound source
│   ├── simulate_moving_sources.py      # main script for simulate moving sound sources
│   ├── clap_mono_16_44100.wav          # dry signal: single hand clap
│   ├── speech_mono_16_44100.wav        # dry signal: male "Can you keep a secret?"
└── README.md

Generated Scenario Folder Structure

This module integrates into the shared experiment directory structure:

📁 experiment_001/
├── config.json
├── groundtruth_sources.jsonl
├── audio/
│   └── wav/
│       └── multichannel_audio_<timestamp>.wav
├── video/
│   ├── rgb/
│   │   └── RGB_frame_<timestamp>.png
│   └── depth/
│       └── Depth_frame_<timestamp>.png
├── localization/
│   ├── audio_localizations.jsonl
│   └── video_localizations.jsonl
└── tracking/
    ├── audio_tracking.jsonl
    ├── video_tracking.jsonl
    └── audio_video_tracking.jsonl

Installation

  1. Install Python 3.11.3.

  2. Install CUDA-compatible GPU drivers.

  3. Install dependencies:

    pip install gpuRIR numpy scipy soundfile
  4. Clone the repository:

    git clone https://code.fbi.h-da.de/est/est-workgroup/gpuRIR.git

Usage

  1. Ensure config.json and groundtruth_sources.jsonl are available from the Unity simulation.

  2. Prepare dry source audio files.

  3. Run the audio simulation:

    python examples/simulate_moving_sources.py
  4. Generated multi-channel WAV files will be written to the experiment output directory.

Notes and Limitations

  • The simulation uses a geometrical acoustics model (ISM) and is most accurate above the Schroeder frequency.
  • Frequency-dependent wall absorption, microphone frequency response, and background noise are not modeled.
  • The human speaker is approximated as a point source.

These assumptions ensure computational efficiency and reproducibility while remaining suitable for audio localization and sensor fusion research.

Dependencies

  • Python 3.11.3
  • gpuRIR (CUDA-enabled GPU required)
  • numpy
  • scipy
  • soundfile

Contact

Created by: Laurens Sillekens



Official gpuRIR Readme

gpuRIR is a free and open-source Python library for Room Impulse Response (RIR) simulation using the Image Source Method (ISM) with GPU acceleration. It can compute the RIRs between several source and receivers positions in parallel using CUDA GPUs. It is approximately 100 times faster than CPU implementations [1].

Prerequisites

  • OS: It has been tested on GNU/Linux systems (Ubuntu and centOS) and Windows 10. Please, let me know if you successfully install it on Mac OSX systems.

  • Compilers: To install the package you will need the NVIDIA CUDA Toolkit (it has been tested with the release 8.0 and 10.0 but it should work fine with any version that includes cuRAND) and a C++11 compiler, such as GCC or MSVC++.

  • CMake: Finally, you will need, at least, the version 3.23 of CMake. You can easily get it by pip install cmake.

  • Python: It has been tested in Python 3, but should work fine with Python 2.

Note for PyTorch users: If you are going to use this module with PyTorch, the compiler you use to build gpuRIR must be ABI-compatible with the compiler PyTorch was built with, so you must use GCC version 4.9 and above.

Installation

You can use pip to install gpuRIR from our repository through pip install https://github.com/DavidDiazGuerra/gpuRIR/zipball/master. You can also clone or download our repository and run pip install gpuRIR/.

License

The library is subject to AGPL-3.0 license and comes with no warranty. If you find it useful for your research work, please, acknowledge it to [1].

Documentation

simulateRIR

Room Impulse Responses (RIRs) simulation using the Image Source Method (ISM). For further details see [1].

Parameters

  • room_sz : array_like with 3 elements. Size of the room (in meters).
  • beta : array_like with 6 elements. Reflection coefficients of the walls as $[\beta_{x0}, \beta_{x1}, \beta_{y0}, \beta_{y1}, \beta_{z0}, \beta_{z1}]$, where $\beta_{x0}$ and $\beta_{x1}$ are the reflection coefficents of the walls orthogonal to the x axis at x=0 and x=room_sz[0], respectively.
  • pos_src, pos_rcv : ndarray with 2 dimensions and 3 columns. Position of the sources and the receivers (in meters).
  • nb_img : array_like with 3 integer elements Number of images to simulate in each dimension.
  • Tmax : float RIRs length (in seconds).
  • fs : float RIRs sampling frequency (in Hertz).
  • Tdiff : float, optional Time (in seconds) when the ISM is replaced by a diffuse reverberation model. Default is Tmax (full ISM simulation).
  • spkr_pattern : {"omni", "homni", "card", "hypcard", "subcard", "bidir"}, optional. Polar pattern of the sources (the same for all of them).
  • mic_pattern : {"omni", "homni", "card", "hypcard", "subcard", "bidir"}, optional. Polar pattern of the receivers (the same for all of them).
    • "omni" : Omnidireccional (default).
    • "homni": Half omnidirectional, 1 in front of the microphone, 0 backwards.
    • "card": Cardioid.
    • "hypcard": Hypercardioid.
    • "subcard": Subcardioid.
    • "bidir": Bidirectional, a.k.a. figure 8.
  • orV_src : ndarray with 2 dimensions and 3 columns or None, optional. Orientation of the sources as vectors pointing in the same direction. Applies to each source. None (default) is only valid for omnidirectional patterns.
  • orV_rcv : ndarray with 2 dimensions and 3 columns or None, optional. Orientation of the receivers as vectors pointing in the same direction. Applies to each receiver. None (default) is only valid for omnidirectional patterns.
  • c : float, optional. Speed of sound (in m/s). The default is 343.0.

Returns

3D ndarray The first axis is the source, the second the receiver and the third the time.

Warnings

Asking for too much and too long RIRs (specially for full ISM simulations) may exceed the GPU memory and crash the kernel.

simulateTrajectory

Filter an audio signal by the RIRs of a motion trajectory recorded with a microphone array.

Parameters

  • source_signal : array_like. Signal of the moving source.
  • RIRs : 3D ndarray Room Impulse Responses generated with simulateRIR.
  • timestamps : array_like, optional Timestamp of each RIR [s]. By default, the RIRs are equispaced through the trajectory.
  • fs : float, optional Sampling frequency (in Hertz). It is only needed for custom timestamps.

Returns

2D ndarray Matrix with the signals captured by each microphone in each column.

activateMixedPrecision

Activate the mixed precision mode, only for Pascal GPU architecture or superior.

Parameters

  • activate : bool, optional. True for activate and Flase for deactivate. True by default.

activateLUT

Activate the lookup table for the sinc computations.

Parameters

  • activate : bool, optional. True for activate and Flase for deactivate. True by default.

beta_SabineEstimation

Estimation of the reflection coefficients needed to have the desired reverberation time.

Parameters

  • room_sz : 3 elements list or numpy array. Size of the room (in meters).
  • T60 : float. Reverberation time of the room (seconds to reach 60dB attenuation).
  • abs_weights : array_like with 6 elements, optional. Absorption coefficient ratios of the walls (the default is [1.0]*6).

Returns

ndarray with 6 elements. Reflection coefficients of the walls as $[\beta_{x0}, \beta_{x1}, \beta_{y0}, \beta_{y1}, \beta_{z0}, \beta_{z1}]$, where $\beta_{x0}$ and $\beta_{x1}$ are the reflection coefficients of the walls orthogonal to the x axis at x=0 and x=room_sz[0], respectively.

att2t_SabineEstimator

Estimation of the time for the RIR to reach a certain attenuation using the Sabine model.

Parameters

  • att_dB : float. Desired attenuation (in dB).
  • T60 : float. Reverberation time of the room (seconds to reach 60dB attenuation).

Returns

float. Time (in seconds) to reach the desired attenuation.

t2n

Estimation of the number of images needed for a correct RIR simulation.

Parameters

  • T : float. RIRs length (in seconds).
  • room_sz : 3 elements list or numpy array. Size of the room (in meters).
  • c : float, optional. Speed of sound (the default is 343.0).

Returns

3 elements list of integers. The number of images sources to compute in each dimension.

References

[1] Diaz-Guerra, D., Miguel, A. & Beltran, J.R. gpuRIR: A python library for room impulse response simulation with GPU acceleration. Multimed Tools Appl (2020). [DOI] [SharedIt] [arXiv preprint]

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 10