Prd 1694 upgrade diarize model#8
Conversation
TomasEkeli
left a comment
There was a problem hiding this comment.
praise: Separate devcontainers solve the different-hosts problem, thanks for this!
TomasEkeli
left a comment
There was a problem hiding this comment.
issues: adds a lot of files that should not be added
- egg-info is generated on build and should not be committed
- e2e-tests-gpu is a copy of e2e-tests with one added file, just add that file to the original directory instead
- build -folder should not be committed
- docker.yml should not be in our repo, it tries to build and publish on the upstream repo's name
Also needs to be rebased on main
|
Issues:
Plz see my comments above as well and provide feedback on those :) |
|
Issue: this seems to have merged main from upstream into the PRD-1694 branch, not main from our fork? |
59a96ea to
f8cc80c
Compare
| if model_handle is None: | ||
| raise ValueError( | ||
| f"The token Hugging Face token '{self.use_auth_token}' for diarization is not valid or you did not accept the EULA" | ||
| f"The token Hugging Face token '{self.use_auth_token}' for diarization is not valid or you did not accept the EULAs for the necessary models. See https://github.com/Softcatala/whisper-ctranslate2#diarization-speaker-identification" |
There was a problem hiding this comment.
concern: the text at this link still refers to accepting the terms for pyannote/speaker-diarization-3.1 and this PR switches to pyannote/speaker-diarization-community-1
There was a problem hiding this comment.
This is a dead PR - I've kept the branch only for cherry-picking commits, hehe. See pull 10 :)
| @@ -184,6 +185,7 @@ def inference( | |||
| vad_filter=vad, | |||
| vad_parameters=vad_parameters, | |||
| **batch_size, | |||
There was a problem hiding this comment.
style: shouldn't the ** parameter be placed last?
| try: | ||
| transcribe = Transcribe( | ||
| model_dir, | ||
| device, | ||
| device_index, | ||
| compute_type, | ||
| threads, | ||
| cache_directory, | ||
| local_files_only, | ||
| batched, | ||
| batch_size, | ||
| ) | ||
| except RuntimeError as e: | ||
| print(f"error: {e}") | ||
| exit(ExitCode.RUNTIME_ERROR) |
There was a problem hiding this comment.
praise: good to get some tries here :)
Pull Request
This change implements PRD-1694 (New diarization model), and in addition, the following:
All changes are tested with e2e-tests on CPU and CUDA.
All commits in this branch are patches.
Jira task: https://davidhorn.atlassian.net/browse/PRD-1694
Mandatory Checks
Versioning
[major],[minor], or[patch](default is[patch])Rebase Rules
mainRebase steps:
git checkout main && git pullgit checkout {{ feature-branch }}git rebase maingit push --force-with-lease