⚡️ Speed up function _display by 138%
#274
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 138% (1.38x) speedup for
_displayinunstructured/metrics/utils.py⏱️ Runtime :
63.7 milliseconds→26.7 milliseconds(best of71runs)📝 Explanation and details
The optimized code achieves a 138% speedup (from 63.7ms to 26.7ms) by eliminating the primary performance bottleneck: pandas'
df.iterrows()method, which creates expensive Series objects for each row.Key Optimizations
1. Eliminated
df.iterrows()overhead (53.3% → 0.6% of runtime)df.iterrows()consumed 136ms creating temporary Series objectscol_values[j][row_idx]) reduced this to 0.6ms2. Pre-computed string representations
col_values = [df[header].tolist() for header in headers]col_strs = [[str(item) for item in col] for col in col_values]str()repeatedlyf"{item:.3f}"to maintain precision3. Reduced column width calculation overhead
col_strsinstead of callingstr()for every item during width calculationPerformance Impact by Workload
Based on function references,
_display()is called fromcalculate()to show aggregated metrics after document processing. The optimization benefits are most significant when:The optimization preserves exact output formatting (3-decimal float precision, column alignment) while dramatically reducing runtime, particularly valuable when displaying evaluation results for batch document processing operations.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-_display-mks4gad7and push.