Skip to content

Fix large list bytecode estimation by removing sampling#113

Merged
fglock merged 2 commits intomasterfrom
fix/large-list-bytecode-estimation
Dec 31, 2025
Merged

Fix large list bytecode estimation by removing sampling#113
fglock merged 2 commits intomasterfrom
fix/large-list-bytecode-estimation

Conversation

@fglock
Copy link
Copy Markdown
Owner

@fglock fglock commented Dec 31, 2025

Problem

ExifTool's %Image::ExifTool::Extra hash (2,576 elements) was causing "Method too large" JVM errors.

Root cause: Bytecode size estimation was using sampling (only 10 elements) instead of estimating all elements, leading to severe underestimation: 31,684 bytes estimated vs actual >65KB.

Solution

1. Removed Sampling from LargeNodeRefactorer.shouldRefactor()

  • Changed from sampling 10 elements to estimating all elements
  • Provides accurate bytecode size calculation for mixed element types
  • Now correctly estimates 2,576-element list at 85,465 bytes → triggers refactoring

2. Fixed BytecodeSizeEstimator for Proper Visitor Pattern

  • ListNode: Only adds list overhead (DUP + add call = 5 bytes per element)
  • StringNode: Fixed to 6 bytes (LDC + INVOKESTATIC) based on actual disassembly
  • Elements estimate themselves via visit() - no special cases or multipliers
  • Added constant pool overhead for large lists (LDC_W costs 3 bytes vs 2)

3. Code Cleanup

  • Removed experimental codegen-time refactoring from EmitLiteral.java
  • Removed all debug trace statements
  • Refactoring happens only at parse time (as designed)

Result

  • 2,576-element list now correctly refactored into 72 chunks at parse time
  • Accurate bytecode estimation without sampling
  • Proper visitor pattern - elements estimate themselves
  • Prevents JVM "Method too large" errors for large Perl data structures

- Remove sampling from LargeNodeRefactorer.shouldRefactor()
  * Was estimating only 10 elements out of thousands
  * Now estimates all elements for accurate size calculation
  * 2,576-element list now correctly estimated at 85,465 bytes (vs 31,684)

- Fix BytecodeSizeEstimator to use proper visitor pattern
  * ListNode adds only list overhead (5 bytes per element)
  * StringNode fixed to 6 bytes (LDC + INVOKESTATIC) based on disassembly
  * Elements estimate themselves via visit() - no special cases
  * Added constant pool overhead for large lists

- Remove experimental codegen-time refactoring
  * Refactoring happens only at parse time as designed
  * Cleaned up all debug trace statements

Result: Large lists (2,576+ elements) now correctly refactored into
chunks at parse time, preventing JVM 'Method too large' errors.
@fglock fglock merged commit 15ca57e into master Dec 31, 2025
2 checks passed
@fglock fglock deleted the fix/large-list-bytecode-estimation branch December 31, 2025 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant