Prd 1694 upgrade diarize model by AlexAtDavidhorn · Pull Request #8 · IndicoSystems/whisper-ctranslate2

AlexAtDavidhorn · 2025-11-26T11:32:39Z

Pull Request

This change implements PRD-1694 (New diarization model), and in addition, the following:

Separate dev containers for running on CPU and CUDA
Fixes e2e-tests
Implements e2e-tests for CUDA
Fixes Github workflows for automatic tests across python envs

All changes are tested with e2e-tests on CPU and CUDA.

All commits in this branch are patches.

Jira task: https://davidhorn.atlassian.net/browse/PRD-1694

Mandatory Checks

Implements exactly what the Jira task specifies
Conforms to the code standard and is readable
Has tests for happy paths and relevant failures

Versioning

Branch name starts with work-item ID: {{ PRD-1234 }}
Commits tagged [major], [minor], or [patch] (default is [patch])

Rebase Rules

Keep the branch rebased on main
Do not use GitHub’s rebase button

Rebase steps:

TomasEkeli

praise: Separate devcontainers solve the different-hosts problem, thanks for this!

TomasEkeli

issues: adds a lot of files that should not be added

egg-info is generated on build and should not be committed
e2e-tests-gpu is a copy of e2e-tests with one added file, just add that file to the original directory instead
build -folder should not be committed
docker.yml should not be in our repo, it tries to build and publish on the upstream repo's name

Also needs to be rebased on main

AlexAtDavidhorn · 2025-11-26T12:17:32Z

Issues:

My bad re egg-info, I'll remove
e2e folders: Need two separate e2e-tests folders, otherwise our GH runner tries to run the cuda tests. If you have a better solution, I'm all ears - I just picked the easiest one here ;)
Same with build folder
Will remove docker.yml
Will rebase to main

Plz see my comments above as well and provide feedback on those :)

TomasEkeli · 2025-11-26T15:15:45Z

Issue: this seems to have merged main from upstream into the PRD-1694 branch, not main from our fork?

TomasEkeli · 2025-12-04T10:52:16Z

        if model_handle is None:
            raise ValueError(
-                f"The token Hugging Face token '{self.use_auth_token}' for diarization is not valid or you did not accept the EULA"
+                f"The token Hugging Face token '{self.use_auth_token}' for diarization is not valid or you did not accept the EULAs for the necessary models. See https://github.com/Softcatala/whisper-ctranslate2#diarization-speaker-identification"


concern: the text at this link still refers to accepting the terms for pyannote/speaker-diarization-3.1 and this PR switches to pyannote/speaker-diarization-community-1

This is a dead PR - I've kept the branch only for cherry-picking commits, hehe. See pull 10 :)

TomasEkeli · 2025-12-04T13:24:29Z

@@ -184,6 +185,7 @@ def inference(
            vad_filter=vad,
            vad_parameters=vad_parameters,
            **batch_size,


style: shouldn't the ** parameter be placed last?

TomasEkeli · 2025-12-04T13:25:55Z

+    try:
+        transcribe = Transcribe(
+            model_dir,
+            device,
+            device_index,
+            compute_type,
+            threads,
+            cache_directory,
+            local_files_only,
+            batched,
+            batch_size,
+        )
+    except RuntimeError as e:
+        print(f"error: {e}")
+        exit(ExitCode.RUNTIME_ERROR)


praise: good to get some tries here :)