⚡️ Speed up function iob_to_biluo by 58%
#3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 58% (0.58x) speedup for
iob_to_biluoinspacy/training/iob_utils.py⏱️ Runtime :
3.19 milliseconds→2.02 milliseconds(best of176runs)📝 Explanation and details
The optimized code achieves a 58% speedup by eliminating expensive list operations and reducing function call overhead.
Key Optimizations:
Eliminated costly
pop(0)operations: The original code repeatedly calledtags.pop(0), which is O(n) because it shifts all remaining elements. The optimization uses index-based iteration (i,j) with O(1) access instead.Replaced separate helper functions with inline logic: Removed
_consume_os()and_consume_ent()functions, eliminating generator overhead and function call costs. The profiler shows these functions consumed 43.9% and 54.5% of runtime respectively.Optimized O-tag handling: Instead of yielding individual O-tags, the code identifies consecutive O-tags and extends the output with a single slice operation (
out.extend(tags[start:i])).Reduced list operations: Uses
out.append()for single items and optimized list multiplication ([f"I-{label}"] * (length - 2)) instead of list comprehensions for repeated elements.Performance Impact by Test Case:
The optimization is particularly effective for NLP pipelines where IOB tag sequences can be lengthy, making the quadratic behavior of the original implementation a significant bottleneck.
✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
parser/test_ner.py::test_issue2385training/test_training.py::test_iob_to_biluo🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-iob_to_biluo-mhlix7crand push.