Skip to content

Fix ApplyLexiconToCorpusJob#646

Open
guest-tz wants to merge 3 commits intorwth-i6:mainfrom
guest-tz:fix_apply_lexicon_to_corpus_job
Open

Fix ApplyLexiconToCorpusJob#646
guest-tz wants to merge 3 commits intorwth-i6:mainfrom
guest-tz:fix_apply_lexicon_to_corpus_job

Conversation

@guest-tz
Copy link
Contributor

Fixing LookupError if orth is empty.
The fix ''.split() returns empty list while current version ''.split(" ") returns [''] and causes LookupError.

@guest-tz guest-tz marked this pull request as ready for review February 11, 2026 21:25
Copy link
Collaborator

@Icemole Icemole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find! With such a pattern you could technically have more edge cases, such as double space:

>>> "hello  world".split(" ")
['hello', '', 'world']

I assume nobody has run into this yet, so from my side we're good to merge.

Perhaps it would also be wise to filter your empty segments from the corpus by doing some fast postprocessing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants