- In the scripts for audio reconstruction, encoding, and embedding extraction, you need to provide a JSON file specifying the audio files to be processed.
- Each entry in the JSON file should contain an
indexfield (used as the basename for storing processed outputs) and awav_pathfield (indicating the path to the input audio). An example can be found inexample/batch_script_data.jsonl.
- Each entry in the JSON file should contain an
- To explore speech decoupling experiments, we introduce two parameters in the decoding script:
acoustic_maskedandsemantic_masked.- You can toggle these options to mask the corresponding encoding streams and observe the reconstruction results.
- Note that when setting
acoustic_masked=true, it is recommended to also setnormalize=trueto ensure the output volume remains audible.