⚡️ Speed up function make_span_finder_scorer by 11%
#5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 11% (0.11x) speedup for
make_span_finder_scorerinspacy/pipeline/span_finder.py⏱️ Runtime :
11.1 microseconds→9.94 microseconds(best of250runs)📝 Explanation and details
The optimization achieves an 11% speedup by reducing overhead from Python's
setdefault()calls and optimizing key computations:Key optimizations:
Eliminated redundant
dict(kwargs)copy - The original code unnecessarily copied the kwargs dictionary, which creates overhead for every function call.Replaced
setdefault()with direct assignment for complex values - Forattr,getter, andhas_annotation, the code now useskwargs.get()followed by direct assignment. This avoids the overhead ofsetdefault()which must evaluate the default value (including lambda creation) even when the key already exists.Pre-computed string slicing - The suffix
key[len(attr_prefix):]is calculated once and captured in the closure, rather than being computed on every call to the getter lambda.Optimized dictionary removal - Changed from
scores.pop(f"{kwargs['attr']}_per_type", None)to a conditionaldeloperation, which is more direct when you know the key exists.Why this speeds up the code:
setdefault()has to evaluate its default argument even when the key exists, creating unnecessary lambda objectsdict(kwargs)copy creates unnecessary memory allocation and copying overheadPerformance impact by test case:
The optimization shows consistent 7-25% improvements across test cases, with particularly strong gains in:
The optimization maintains identical behavior while reducing Python interpreter overhead, making it especially beneficial for span evaluation pipelines that process many documents.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-make_span_finder_scorer-mhmi5ymgand push.