Skip to content

[WIP] Reduce compilation overhead with @nospecialize#83

Merged
bclyons12 merged 58 commits intomasterfrom
fix/nospecialize
Jan 14, 2026
Merged

[WIP] Reduce compilation overhead with @nospecialize#83
bclyons12 merged 58 commits intomasterfrom
fix/nospecialize

Conversation

@mgyoo86
Copy link
Copy Markdown
Member

@mgyoo86 mgyoo86 commented Oct 30, 2025

Status

🚧 Work in Progress - Testing and feedback welcome

Summary

This PR adds/fixes @nospecialize annotations to frequently-called utility functions to prevent excessive method specialization and reduce compilation overhead.

Major Changes

Core Functions Modified

  • ulocation(), f2u() - Added @nospecialize for IDS parameters
  • info(), coordinates() - Prevented specialization on IDS types
  • diff(), merge!(), freeze!() - Refactored type handling
  • dict2imas() - Fixed specialization in IO operations

Known Issues

⚠️ Specialization still occurs when calling high-level functions like hdf2imas() or through certain code paths. Root cause investigation ongoing.

Testing

Test File

See tmp_test/test_ulocation.jl for specialization behavior tests.

Required Temporary Modification

To properly test, you need to modify dd.jl to use getfield instead of getproperty or ids.field:

macOS:

sed -E -i '' 's/setfield!\(ids\.([^,]*)/setfield!(getfield(ids, :\1)/g' src/dd.jl

Linux (without macOS backup extension):

sed -E -i 's/setfield!\(ids\.([^,]*)/setfield!(getfield(ids, :\1)/g' src/dd.jl

This replaces ids.field property access with getfield(ids, :field) to bypass getproperty via ids.field implementations during testing.

Next Steps

  • Identify remaining specialization sources in high-level functions
  • Validate performance improvements with benchmarks
  • Determine if dd.jl modifications should be permanent or test-only
  • Add comprehensive test coverage

    Changed from @nospecialize(x::T) where {T<:Type} pattern to
    @nospecialize(x::{<:Type}) to properly prevent type specialization.

    This ensures Julia doesn't compile separate versions for each
concrete
    type, reducing compilation overhead.
Add @Assert type checks to prevent merging/freezing different IDS types.
This fixes type safety issues introduced by commit 74ded67 where 'where T'
constraints were removed.

Functions updated:
- merge!(::IDS, ::IDS) - assert same type
- merge!(::IDSvector, ::IDSvector) - assert same eltype
- freeze!(::IDS, ::IDS) - assert same type
- freeze!(::IDSvector, ::IDSvector) - assert same eltype

These assertions prevent runtime errors from field mismatches when
operating on incompatible types.
Replace error() with Dict return when comparing different types.
Now diff() returns a dict with 'type_mismatch' key instead of
throwing an error, making it more flexible and non-disruptive.

Example:
  diff(dd, dd.equilibrium)
  => Dict('type_mismatch' => 'dd{Float64} != equilibrium{Float64}')
Add @nospecialize annotations to info and coordinates to reduce
compilation overhead and binary size.
Added @nospecialize annotations to location and conversion functions
in f2.jl to prevent excessive method specialization:
- utlocation(ids, field) and variants
- f2u(ids) - converts IDS to universal location string
- fs2u(ids_type) - converts IDS type to universal location

Note: ulocation specialization still observed during constructor
execution despite @nospecialize - investigation ongoing into when
and why specialization occurs in the call chain.
Add dd_nospecialize() helper function that uses Base.invokelatest to
prevent compiler from analyzing dd() internals during type inference.
Replace all dd() default arguments in I/O functions with dd_nospecialize()
to significantly reduce allocation and compilation time.

The invokelatest barrier prevents the compiler from specializing on the
complex 180k-line dd struct generation, while ::dd{Float64} type assertion
ensures proper type propagation without additional inference overhead.

Affected functions:
- json2imas, jstr2imas
- hdf2imas (default arg and internal call)
- h5i2imas

Performance impact: ~20M fewer allocations in hdf2imas calls.
Apply @nospecialize to fieldtype, getproperty, parent, name, goto,
getindex, and time-related functions to reduce method specialization.
Temporary test file for investigating method specialization behavior.
@mgyoo86 mgyoo86 added the WIP Work in Progress label Oct 30, 2025
@mgyoo86 mgyoo86 changed the title # [WIP] Reduce compilation overhead with @nospecialize [WIP] Reduce compilation overhead with @nospecialize Oct 30, 2025
@codecov
Copy link
Copy Markdown

codecov bot commented Oct 30, 2025

Codecov Report

❌ Patch coverage is 63.42857% with 192 lines in your changes missing coverage. Please review.
✅ Project coverage is 43.96%. Comparing base (01f7aff) to head (cdc71d9).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
src/data.jl 62.50% 63 Missing ⚠️
src/identifiers.jl 0.00% 31 Missing ⚠️
src/time.jl 62.12% 25 Missing ⚠️
src/show.jl 22.72% 17 Missing ⚠️
src/diagnostics.jl 76.81% 16 Missing ⚠️
src/f2.jl 78.46% 14 Missing ⚠️
src/io.jl 68.88% 14 Missing ⚠️
src/expressions.jl 83.33% 6 Missing ⚠️
src/macros.jl 80.00% 3 Missing ⚠️
src/math.jl 0.00% 2 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #83      +/-   ##
==========================================
+ Coverage   43.76%   43.96%   +0.20%     
==========================================
  Files          13       15       +2     
  Lines       31243    31418     +175     
==========================================
+ Hits        13672    13813     +141     
- Misses      17571    17605      +34     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Remove type parameters from @nospecialize signatures and extract types inside function body using eltype() to properly prevent specialization.
Remove type parameters from @nospecialize signatures in ==, isequal, isapprox, and _extract_comparable_fields to properly prevent specialization on Union types.

Also update docstrings for resize!, diff, and time_groups functions.
…me bottleneck

Added @nospecializeinfer to 165+ functions across 9 core modules, drastically
reducing type inference overhead and specialization explosion.

Impact:
- Eliminates ~50% of top-level type inference allocations
- Prevents Union type specialization combinatorial explosion
- Dramatically reduces first-time function compilation overhead
- Measured: hdf2imas compilation time drops from 98.71% to minimal levels

Changes:
- Added `using Base: @nospecializeinfer` import
- Applied to functions with @nospecialize parameters in:
  cocos (6), data (25), expressions (18), f2 (24), identifiers (15),
  io (31), show (16), time (30)

Technical rationale:
@nospecializeinfer prevents both specialization AND type inference propagation.
Critical for Union types like Union{IDS,IDSvector,Vector{IDS}} which trigger
combinatorial method generation and expensive typeinf_ext_toplevel calls.
This partially reverts the previous commit which added @nospecializeinfer
to almost every function. While @nospecializeinfer improves compilation time,
it prevents type inference for certain key functions that are essential for
nested calls to return concrete types.

Functions affected:
- cocos_out, cocos_transform, transform_cocos_* (src/cocos.jl)
- concrete_fieldtype_typeof, eltype_concrete_fieldtype_typeof, Base.getproperty (src/data.jl)

These functions must infer concrete return types for nested IMASdd function
calls to work correctly, as tested in test/runtests_concrete.jl.

Result: All tests now pass again, including concrete type inference tests.
@orso82
Copy link
Copy Markdown
Contributor

orso82 commented Nov 1, 2025

great find with the '@nospecializeinfer' macro !!

Due to @nospecialize/@nospecializeinfer constraints, type parameters
cannot be used to guarantee matching types at dispatch time.

Solution:
- Add hot path methods for identical types (Int32/64, Float32/64, UInt64, Bool)
- These route to __convert_same_real_type! helper for fast path
- Generic Real->Real method now performs runtime type checking
- Convert to target type when needed (e.g., Float64 → Measurement{Float64})

Changes:
- Check COCOS conversion consistently to all cases (assumed always needed)
Separated DD from Union{DD, IDSraw, IDSvectorRawElement} into dedicated method.
Large Union (~130+ concrete subtypes) prevented compiler from inlining hot path.

DD now gets specialized method without @nospecialize for optimal performance.
Added guard condition (user_cocos != to_cocos) before calling cocos_out.
Since both default to 11, this avoids unnecessary function calls on hot path.
Added @inline and ::Bool annotations to hasdata for better type inference.
Helps compiler optimize getproperty hot path by eliminating runtime type checks.
…eld indices

- Replace fieldnames() iteration with fieldcount() + numeric indices
- Reduces from 14 allocations (4.125 KiB) to ~0 allocations
- Works correctly with @nospecialize by avoiding symbolic field names
- Added @inbounds for bounds-check elimination
- Replace enumerate() with eachindex() + @inbounds
- Replace fieldnames() with hasfield()
- Optimize hasdata() to use numeric field indices
- Use tuple literals instead of arrays for membership tests
Optimize setproperty! by reusing already-computed coords variable
instead of calling coordinates(ids, field) multiple times:
- Line 772: Use inline generator with coords reuse
- Line 774: Reuse coords instead of recalling coordinates()

Eliminates 2 redundant function calls per setproperty! invocation.
Uses idiomatic inline generator with any() for short-circuit benefit.

fix/nospecialize
Optimize name_2_index() by caching inverted Dict per IDS type:
- Add global cache _NAME_2_IDX_CACHE using IdDict
- Implement lazy initialization with get!() for thread-safety
- First call per type: inverts idx_2_name and caches result
- Subsequent calls: returns cached Dict (zero-allocation)

Performance improvement:
- Before: ~22μs, 5 allocations per call
- After: ~2ns, 0 allocations (after first call per type)

Related optimization in fix/nospecialize branch.
Simplify in_expression() by removing redundant key check:
- Remove manual `if t_id ∉ keys(_in_expression)` check
- Use get!() directly for atomic check-and-create operation
- get!() already handles check atomically, making manual check redundant

Performance improvement:
- Eliminates one dict lookup (haskey check)
- Cleaner code with same thread-safety guarantees

Related to fix/nospecialize optimization work.
…debug code

Optimize two functions with numeric field iteration pattern:
- Stack-based fill function: Replace fieldnames() with fieldcount/fieldname
- Base.empty!(): Use numeric indices for field iteration
- Add @inbounds for bounds check elimination

Remove debug statements:
- Clean up Main.@infiltrate calls from resize!() function

Performance improvement:
- Eliminates allocations from fieldnames() vector creation
- Enables bounds check elimination with @inbounds
- Consistent with other @nospecialize optimizations

Related to fix/nospecialize optimization work.
@mgyoo86
Copy link
Copy Markdown
Member Author

mgyoo86 commented Dec 4, 2025

@bclyons12 @fredrikekre
The following are additional micro-optimizations that can further improve performance, such as ActorFluxMatcher, which Brendan pointed out.

Additional Performance Optimizations (6 commits)

Summary

Zero-allocation improvements across hot paths in @nospecialize functions.

Key Changes

Loop Optimization (data.jl, expressions.jl, f2.jl, findall.jl, io.jl)

  • Replace for (k, v) in enumerate(arr)for k in eachindex(arr); v = @inbounds arr[k]
  • Eliminates tuple allocations in tight loops

Field Iteration (data.jl, expressions.jl)

  • Replace fieldnames(typeof(x)) iteration → numeric fieldcount/fieldname indices
  • hasdata(): Use early-return loop instead of generator with any()

Field Checks (identifiers.jl)

  • Replace :field in fieldnames(T)hasfield(T, :field)
  • Avoids tuple allocation on every check

Caching (identifiers.jl)

  • Add lazy auto-inversion cache for name_2_index()
  • One-time Dict creation per IDS type

Thread-safe Access (expressions.jl)

  • Optimize in_expression() with direct get!() usage
  • Remove redundant key existence check

Misc (math.jl, data.jl)

  • Use tuple literals (:a, :b) instead of vectors [:a, :b] in checks
  • Eliminate redundant coordinates() calls in setproperty!

Files Changed

data.jl, expressions.jl, identifiers.jl, io.jl, f2.jl, findall.jl, math.jl

Add diagnose_shared_objects() to detect unintended array sharing in IDS trees.
This helps identify cases where `a = b` was used instead of `a .= b`.

Features:
- Stack-based tree traversal following isequal pattern
- SharedObjectReport with indexed access (report[1].id, report[1].paths)
- Cross-IDS sharing detection (e.g., core_profiles ↔ core_sources)
- REPL display with chronological path ordering
…ility

Replace @maybe_nospecializeinfer with @nospecializeinfer since the macro
wrapper is not defined on master branch.
- Add runtests_f2.jl with 81 test cases covering:
  - f2p, f2i, f2u path conversion functions
  - i2p, p2i, i2u string parsing functions
  - location, ulocation path accessors
  - fs2u type-based lookup
  - f2p_name IDS naming
  - Round-trip consistency validation
  - Edge cases (standalone IDS, utime flag, deeply nested)

- Move f2-related tests from runtests_ids.jl to dedicated file
- Include runtests_f2.jl in main test runner
- Add _F2P_SKELETON_CACHE with concrete types for type-stable lookup
- Split _f2p_skeleton into fast path (cache hit) and slow path (@noinline)
- Pre-compute and cache result_size to avoid redundant count() calls
- Use Vector{String} in cache for concrete value type
- Remove String() conversion in loop (already cached as String)
- Add internal @_typed_cache macro with proper hygiene (gensym, esc)
- Use helper function pattern to solve return-bypass caching bug
- Apply macro to f2.jl: fs2u, _f2p_skeleton, f2p_name(Type)
- Rename cache constants with _TCACHE_ prefix for consistency
- Use Base.get single lookup instead of haskey+getindex pattern
- Remove type-based name computation (~15 lines)
- Reuse cached skeleton from _f2p_skeleton(T)
- Eliminates redundant replace/eachsplit/count calls per f2i invocation
- Add ::String return type to f2p_name(ids) for better type inference
- Refactor f2p_name(ids::IDS, ::IDS) to reuse cached f2p_name(Type)
- Remove redundant typename_str computation (now uses cache)
- Add @nospecialize to entry point f2p_name(ids) for compile time
- i2u fast path: avoid String(loc) allocation when loc is already String
- ulocation/location(IDSvector): use fs2u_base cache instead of SubString
- Add int_to_string() cache for small integers (0-10) used in f2p and f2p_name
- Add fs2u_base typed cache for IDSvector base paths (0 allocs)
- Expand benchmark_f2.jl with comprehensive allocation tests

Results:
- f2p: 10→8 allocs (simple), 14→10 allocs (nested)
- f2p_name(IDSvectorElement): 4→2 allocs
- ulocation/location(IDSvector): 0 allocs (cached)
- i2u(String, no brackets): 0 allocs
Changed eltype(ids) to typeof(ids) in ulocation/location(IDSvector) functions.
With @nospecialize, eltype(ids) returns Any causing 3 allocations and boxing.
Using typeof(ids) and extracting element type inside fs2u_base ensures type
stability and 0 allocations.

Result: ulocation/location(IDSvector) now ~27ns with 0 allocations
(previously ~600ns with 3 allocations/128 bytes)
Replace zeros(Int, N) with zeros!(pool, Int, N) using @with_pool macro.
This eliminates the small temporary array allocation (N typically 1-3)
that occurred on every f2p/f2i call by reusing pooled memory.
…e allocation

Under @nospecializeinfer, the closure created by `lock() do` can cause
boxing due to captured variables (ids, field, func, throw_on_missing, etc).
Using explicit try/finally eliminates the closure and reduces allocation.

Changes:
- exec_expression_with_ancestor_args (4-arg version): cache lock, use try/finally
- onetime expression path: same pattern for consistency
@mgyoo86
Copy link
Copy Markdown
Member Author

mgyoo86 commented Dec 18, 2025

@fredrikekre @bclyons12
I've updated f2.jl to reduce the allocation and improve the performance

Summary: Introduced type-based caching infrastructure to reduce allocations and improve performance in path/location functions. Also applied various micro-optimizations (array pooling, string caching, closure elimination).

Key Changes

1. Type-based caching (@_typed_cache macro)

  • Cached: _f2p_skeleton, fs2u, f2p_name, fs2u_base
  • Thread-safe via ThreadSafeDict

2. Temp array reuse (f2p, f2i)

  • AdaptiveArrayPools: zeros!(pool, Int, N)

3. + Other minor micro-optimizations

Add type normalization to prevent cache key conversion errors when
UnionAll types (e.g., SomeType instead of SomeType{Float64}) are
passed to cached functions.

Changes:
- Add _normalize_ids_type() to convert UnionAll → DataType{Float64}
- Separate fs2u/f2p_name into public wrapper + cached implementation
- Public API accepts Type, internal cache uses DataType only
- Use specific type constraints (Type{<:IDS}) to minimize method
  table complexity and preserve dispatch performance
@bclyons12 bclyons12 self-requested a review January 14, 2026 07:48
Copy link
Copy Markdown
Collaborator

@bclyons12 bclyons12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these changes, I see the TTFX for FUSE.warmup(dd) go from 8.5 to 3.5 minutes. Second execution is about the same, slightly faster now with these changes. Timings attached
master_timing.txt
nospecialize_timing.txt

@bclyons12 bclyons12 merged commit 06208c2 into master Jan 14, 2026
6 of 7 checks passed
@mgyoo86
Copy link
Copy Markdown
Member Author

mgyoo86 commented Jan 14, 2026

@bclyons12 Thanks for the detailed comparison!
While there are a few small parts that performance regressed, overall I'd like to say this is a win :)

fredrikekre added a commit that referenced this pull request Mar 17, 2026
I removed most code changes (since #83 stand "on its own" when it comes
to fixing latency issues. Whats here are mostly annotations for things
we discussed in the meetings. In particular, some of the `getproperty`
and `setproperty` code do more things that just getting and setting
properties so they might be better as separate functions. In particular
since some methods takes additional (keyword) arguments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

WIP Work in Progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants