[WIP] Reduce compilation overhead with @nospecialize by mgyoo86 · Pull Request #83 · ProjectTorreyPines/IMASdd.jl

mgyoo86 · 2025-10-30T20:11:29Z

Status

🚧 Work in Progress - Testing and feedback welcome

Summary

This PR adds/fixes @nospecialize annotations to frequently-called utility functions to prevent excessive method specialization and reduce compilation overhead.

Major Changes

Core Functions Modified

ulocation(), f2u() - Added @nospecialize for IDS parameters
info(), coordinates() - Prevented specialization on IDS types
diff(), merge!(), freeze!() - Refactored type handling
dict2imas() - Fixed specialization in IO operations

Known Issues

⚠️ Specialization still occurs when calling high-level functions like hdf2imas() or through certain code paths. Root cause investigation ongoing.

Testing

Test File

See tmp_test/test_ulocation.jl for specialization behavior tests.

Required Temporary Modification

To properly test, you need to modify dd.jl to use getfield instead of getproperty or ids.field:

macOS:

sed -E -i '' 's/setfield!\(ids\.([^,]*)/setfield!(getfield(ids, :\1)/g' src/dd.jl

Linux (without macOS backup extension):

sed -E -i 's/setfield!\(ids\.([^,]*)/setfield!(getfield(ids, :\1)/g' src/dd.jl

This replaces ids.field property access with getfield(ids, :field) to bypass getproperty via ids.field implementations during testing.

Next Steps

Identify remaining specialization sources in high-level functions
Validate performance improvements with benchmarks
Determine if dd.jl modifications should be permanent or test-only
Add comprehensive test coverage

Changed from @nospecialize(x::T) where {T<:Type} pattern to @nospecialize(x::{<:Type}) to properly prevent type specialization. This ensures Julia doesn't compile separate versions for each concrete type, reducing compilation overhead.

Add @Assert type checks to prevent merging/freezing different IDS types. This fixes type safety issues introduced by commit 74ded67 where 'where T' constraints were removed. Functions updated: - merge!(::IDS, ::IDS) - assert same type - merge!(::IDSvector, ::IDSvector) - assert same eltype - freeze!(::IDS, ::IDS) - assert same type - freeze!(::IDSvector, ::IDSvector) - assert same eltype These assertions prevent runtime errors from field mismatches when operating on incompatible types.

Replace error() with Dict return when comparing different types. Now diff() returns a dict with 'type_mismatch' key instead of throwing an error, making it more flexible and non-disruptive. Example: diff(dd, dd.equilibrium) => Dict('type_mismatch' => 'dd{Float64} != equilibrium{Float64}')

Add @nospecialize annotations to info and coordinates to reduce compilation overhead and binary size.

Added @nospecialize annotations to location and conversion functions in f2.jl to prevent excessive method specialization: - utlocation(ids, field) and variants - f2u(ids) - converts IDS to universal location string - fs2u(ids_type) - converts IDS type to universal location Note: ulocation specialization still observed during constructor execution despite @nospecialize - investigation ongoing into when and why specialization occurs in the call chain.

Add dd_nospecialize() helper function that uses Base.invokelatest to prevent compiler from analyzing dd() internals during type inference. Replace all dd() default arguments in I/O functions with dd_nospecialize() to significantly reduce allocation and compilation time. The invokelatest barrier prevents the compiler from specializing on the complex 180k-line dd struct generation, while ::dd{Float64} type assertion ensures proper type propagation without additional inference overhead. Affected functions: - json2imas, jstr2imas - hdf2imas (default arg and internal call) - h5i2imas Performance impact: ~20M fewer allocations in hdf2imas calls.

Apply @nospecialize to fieldtype, getproperty, parent, name, goto, getindex, and time-related functions to reduce method specialization.

Temporary test file for investigating method specialization behavior.

codecov · 2025-10-30T20:54:45Z

Codecov Report

❌ Patch coverage is 63.42857% with 192 lines in your changes missing coverage. Please review.
✅ Project coverage is 43.96%. Comparing base (01f7aff) to head (cdc71d9).
⚠️ Report is 3 commits behind head on master.

Files with missing lines	Patch %	Lines
src/data.jl	62.50%	63 Missing ⚠️
src/identifiers.jl	0.00%	31 Missing ⚠️
src/time.jl	62.12%	25 Missing ⚠️
src/show.jl	22.72%	17 Missing ⚠️
src/diagnostics.jl	76.81%	16 Missing ⚠️
src/f2.jl	78.46%	14 Missing ⚠️
src/io.jl	68.88%	14 Missing ⚠️
src/expressions.jl	83.33%	6 Missing ⚠️
src/macros.jl	80.00%	3 Missing ⚠️
src/math.jl	0.00%	2 Missing ⚠️
... and 1 more

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #83      +/-   ##
==========================================
+ Coverage   43.76%   43.96%   +0.20%     
==========================================
  Files          13       15       +2     
  Lines       31243    31418     +175     
==========================================
+ Hits        13672    13813     +141     
- Misses      17571    17605      +34

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Remove type parameters from @nospecialize signatures and extract types inside function body using eltype() to properly prevent specialization.

Remove type parameters from @nospecialize signatures in ==, isequal, isapprox, and _extract_comparable_fields to properly prevent specialization on Union types. Also update docstrings for resize!, diff, and time_groups functions.

… function

…me bottleneck Added @nospecializeinfer to 165+ functions across 9 core modules, drastically reducing type inference overhead and specialization explosion. Impact: - Eliminates ~50% of top-level type inference allocations - Prevents Union type specialization combinatorial explosion - Dramatically reduces first-time function compilation overhead - Measured: hdf2imas compilation time drops from 98.71% to minimal levels Changes: - Added `using Base: @nospecializeinfer` import - Applied to functions with @nospecialize parameters in: cocos (6), data (25), expressions (18), f2 (24), identifiers (15), io (31), show (16), time (30) Technical rationale: @nospecializeinfer prevents both specialization AND type inference propagation. Critical for Union types like Union{IDS,IDSvector,Vector{IDS}} which trigger combinatorial method generation and expensive typeinf_ext_toplevel calls.

This partially reverts the previous commit which added @nospecializeinfer to almost every function. While @nospecializeinfer improves compilation time, it prevents type inference for certain key functions that are essential for nested calls to return concrete types. Functions affected: - cocos_out, cocos_transform, transform_cocos_* (src/cocos.jl) - concrete_fieldtype_typeof, eltype_concrete_fieldtype_typeof, Base.getproperty (src/data.jl) These functions must infer concrete return types for nested IMASdd function calls to work correctly, as tested in test/runtests_concrete.jl. Result: All tests now pass again, including concrete type inference tests.

orso82 · 2025-11-01T07:25:46Z

great find with the '@nospecializeinfer' macro !!

Due to @nospecialize/@nospecializeinfer constraints, type parameters cannot be used to guarantee matching types at dispatch time. Solution: - Add hot path methods for identical types (Int32/64, Float32/64, UInt64, Bool) - These route to __convert_same_real_type! helper for fast path - Generic Real->Real method now performs runtime type checking - Convert to target type when needed (e.g., Float64 → Measurement{Float64}) Changes: - Check COCOS conversion consistently to all cases (assumed always needed)

Separated DD from Union{DD, IDSraw, IDSvectorRawElement} into dedicated method. Large Union (~130+ concrete subtypes) prevented compiler from inlining hot path. DD now gets specialized method without @nospecialize for optimal performance.

Added guard condition (user_cocos != to_cocos) before calling cocos_out. Since both default to 11, this avoids unnecessary function calls on hot path.

@inline

Added @inline and ::Bool annotations to hasdata for better type inference. Helps compiler optimize getproperty hot path by eliminating runtime type checks.

src/data.jl

…t` function

@inbounds

…eld indices - Replace fieldnames() iteration with fieldcount() + numeric indices - Reduces from 14 allocations (4.125 KiB) to ~0 allocations - Works correctly with @nospecialize by avoiding symbolic field names - Added @inbounds for bounds-check elimination

@inbounds

- Replace enumerate() with eachindex() + @inbounds - Replace fieldnames() with hasfield() - Optimize hasdata() to use numeric field indices - Use tuple literals instead of arrays for membership tests

Optimize setproperty! by reusing already-computed coords variable instead of calling coordinates(ids, field) multiple times: - Line 772: Use inline generator with coords reuse - Line 774: Reuse coords instead of recalling coordinates() Eliminates 2 redundant function calls per setproperty! invocation. Uses idiomatic inline generator with any() for short-circuit benefit. fix/nospecialize

Optimize name_2_index() by caching inverted Dict per IDS type: - Add global cache _NAME_2_IDX_CACHE using IdDict - Implement lazy initialization with get!() for thread-safety - First call per type: inverts idx_2_name and caches result - Subsequent calls: returns cached Dict (zero-allocation) Performance improvement: - Before: ~22μs, 5 allocations per call - After: ~2ns, 0 allocations (after first call per type) Related optimization in fix/nospecialize branch.

Simplify in_expression() by removing redundant key check: - Remove manual `if t_id ∉ keys(_in_expression)` check - Use get!() directly for atomic check-and-create operation - get!() already handles check atomically, making manual check redundant Performance improvement: - Eliminates one dict lookup (haskey check) - Cleaner code with same thread-safety guarantees Related to fix/nospecialize optimization work.

@inbounds

…debug code Optimize two functions with numeric field iteration pattern: - Stack-based fill function: Replace fieldnames() with fieldcount/fieldname - Base.empty!(): Use numeric indices for field iteration - Add @inbounds for bounds check elimination Remove debug statements: - Clean up Main.@infiltrate calls from resize!() function Performance improvement: - Eliminates allocations from fieldnames() vector creation - Enables bounds check elimination with @inbounds - Consistent with other @nospecialize optimizations Related to fix/nospecialize optimization work.

mgyoo86 · 2025-12-04T18:12:28Z

@bclyons12 @fredrikekre
The following are additional micro-optimizations that can further improve performance, such as ActorFluxMatcher, which Brendan pointed out.

Additional Performance Optimizations (6 commits)

Summary

Zero-allocation improvements across hot paths in @nospecialize functions.

Key Changes

Loop Optimization (data.jl, expressions.jl, f2.jl, findall.jl, io.jl)

Replace for (k, v) in enumerate(arr) → for k in eachindex(arr); v = @inbounds arr[k]
Eliminates tuple allocations in tight loops

Field Iteration (data.jl, expressions.jl)

Replace fieldnames(typeof(x)) iteration → numeric fieldcount/fieldname indices
hasdata(): Use early-return loop instead of generator with any()

Field Checks (identifiers.jl)

Replace :field in fieldnames(T) → hasfield(T, :field)
Avoids tuple allocation on every check

Caching (identifiers.jl)

Add lazy auto-inversion cache for name_2_index()
One-time Dict creation per IDS type

Thread-safe Access (expressions.jl)

Optimize in_expression() with direct get!() usage
Remove redundant key existence check

Misc (math.jl, data.jl)

Use tuple literals (:a, :b) instead of vectors [:a, :b] in ∈ checks
Eliminate redundant coordinates() calls in setproperty!

Files Changed

data.jl, expressions.jl, identifiers.jl, io.jl, f2.jl, findall.jl, math.jl

Add diagnose_shared_objects() to detect unintended array sharing in IDS trees. This helps identify cases where `a = b` was used instead of `a .= b`. Features: - Stack-based tree traversal following isequal pattern - SharedObjectReport with indexed access (report[1].id, report[1].paths) - Cross-IDS sharing detection (e.g., core_profiles ↔ core_sources) - REPL display with chronological path ordering

…ility Replace @maybe_nospecializeinfer with @nospecializeinfer since the macro wrapper is not defined on master branch.

- Add runtests_f2.jl with 81 test cases covering: - f2p, f2i, f2u path conversion functions - i2p, p2i, i2u string parsing functions - location, ulocation path accessors - fs2u type-based lookup - f2p_name IDS naming - Round-trip consistency validation - Edge cases (standalone IDS, utime flag, deeply nested) - Move f2-related tests from runtests_ids.jl to dedicated file - Include runtests_f2.jl in main test runner

@noinline

- Add _F2P_SKELETON_CACHE with concrete types for type-stable lookup - Split _f2p_skeleton into fast path (cache hit) and slow path (@noinline) - Pre-compute and cache result_size to avoid redundant count() calls - Use Vector{String} in cache for concrete value type - Remove String() conversion in loop (already cached as String)

- Add internal @_typed_cache macro with proper hygiene (gensym, esc) - Use helper function pattern to solve return-bypass caching bug - Apply macro to f2.jl: fs2u, _f2p_skeleton, f2p_name(Type) - Rename cache constants with _TCACHE_ prefix for consistency - Use Base.get single lookup instead of haskey+getindex pattern

- Remove type-based name computation (~15 lines) - Reuse cached skeleton from _f2p_skeleton(T) - Eliminates redundant replace/eachsplit/count calls per f2i invocation

- Add ::String return type to f2p_name(ids) for better type inference - Refactor f2p_name(ids::IDS, ::IDS) to reuse cached f2p_name(Type) - Remove redundant typename_str computation (now uses cache) - Add @nospecialize to entry point f2p_name(ids) for compile time

- i2u fast path: avoid String(loc) allocation when loc is already String - ulocation/location(IDSvector): use fs2u_base cache instead of SubString - Add int_to_string() cache for small integers (0-10) used in f2p and f2p_name - Add fs2u_base typed cache for IDSvector base paths (0 allocs) - Expand benchmark_f2.jl with comprehensive allocation tests Results: - f2p: 10→8 allocs (simple), 14→10 allocs (nested) - f2p_name(IDSvectorElement): 4→2 allocs - ulocation/location(IDSvector): 0 allocs (cached) - i2u(String, no brackets): 0 allocs

Changed eltype(ids) to typeof(ids) in ulocation/location(IDSvector) functions. With @nospecialize, eltype(ids) returns Any causing 3 allocations and boxing. Using typeof(ids) and extracting element type inside fs2u_base ensures type stability and 0 allocations. Result: ulocation/location(IDSvector) now ~27ns with 0 allocations (previously ~600ns with 3 allocations/128 bytes)

Replace zeros(Int, N) with zeros!(pool, Int, N) using @with_pool macro. This eliminates the small temporary array allocation (N typically 1-3) that occurred on every f2p/f2i call by reusing pooled memory.

…tency

…e allocation Under @nospecializeinfer, the closure created by `lock() do` can cause boxing due to captured variables (ids, field, func, throw_on_missing, etc). Using explicit try/finally eliminates the closure and reduces allocation. Changes: - exec_expression_with_ancestor_args (4-arg version): cache lock, use try/finally - onetime expression path: same pattern for consistency

mgyoo86 · 2025-12-18T21:13:16Z

@fredrikekre @bclyons12
I've updated f2.jl to reduce the allocation and improve the performance

Summary: Introduced type-based caching infrastructure to reduce allocations and improve performance in path/location functions. Also applied various micro-optimizations (array pooling, string caching, closure elimination).

Key Changes

1. Type-based caching (@_typed_cache macro)

Cached: _f2p_skeleton, fs2u, f2p_name, fs2u_base
Thread-safe via ThreadSafeDict

2. Temp array reuse (f2p, f2i)

AdaptiveArrayPools: zeros!(pool, Int, N)

3. + Other minor micro-optimizations

Add type normalization to prevent cache key conversion errors when UnionAll types (e.g., SomeType instead of SomeType{Float64}) are passed to cached functions. Changes: - Add _normalize_ids_type() to convert UnionAll → DataType{Float64} - Separate fs2u/f2p_name into public wrapper + cached implementation - Public API accepts Type, internal cache uses DataType only - Use specific type constraints (Type{<:IDS}) to minimize method table complexity and preserve dispatch performance

bclyons12

With these changes, I see the TTFX for FUSE.warmup(dd) go from 8.5 to 3.5 minutes. Second execution is about the same, slightly faster now with these changes. Timings attached
master_timing.txt
nospecialize_timing.txt

mgyoo86 · 2026-01-14T17:45:25Z

@bclyons12 Thanks for the detailed comparison!
While there are a few small parts that performance regressed, overall I'd like to say this is a win :)

I removed most code changes (since #83 stand "on its own" when it comes to fixing latency issues. Whats here are mostly annotations for things we discussed in the meetings. In particular, some of the `getproperty` and `setproperty` code do more things that just getting and setting properties so they might be better as separate functions. In particular since some methods takes additional (keyword) arguments.

mgyoo86 added 10 commits October 28, 2025 17:56

refactor: prevent specialization of info and coordinates functions

eccf6bf

Add @nospecialize annotations to info and coordinates to reduce compilation overhead and binary size.

refactor: add @nospecialize to utility functions

eb5444a

Apply @nospecialize to fieldtype, getproperty, parent, name, goto, getindex, and time-related functions to reduce method specialization.

fix @nospecialize for dict2imas

f362a18

test: add temporary ulocation specialization test

2830062

Temporary test file for investigating method specialization behavior.

refactor: reorganize test cases for ulocation specialization

f7f53ff

mgyoo86 requested review from bclyons12 and fredrikekre October 30, 2025 20:11

mgyoo86 added the WIP Work in Progress label Oct 30, 2025

mgyoo86 changed the title ~~# [WIP] Reduce compilation overhead with @nospecialize~~ [WIP] Reduce compilation overhead with @nospecialize Oct 30, 2025

fix bugs in time.jl

3e59edc

mgyoo86 mentioned this pull request Oct 30, 2025

Unexpected Behavior of @nospecialize #84

Closed

mgyoo86 added 7 commits October 30, 2025 15:32

fix: correct @nospecialize usage in copy_timeslice!

5574b60

Remove type parameters from @nospecialize signatures and extract types inside function body using eltype() to properly prevent specialization.

fix: apply @nospecialize to root_ids and target parameters in findall…

fbaeb18

… function

Merge branch 'master' into fix/nospecialize

0a145b1

refactor: remove test file for specialized instances of ulocation

63a0bb8

mgyoo86 added 5 commits November 1, 2025 18:08

perf: split DD getproperty to enable inlining

3698cc2

Separated DD from Union{DD, IDSraw, IDSvectorRawElement} into dedicated method. Large Union (~130+ concrete subtypes) prevented compiler from inlining hot path. DD now gets specialized method without @nospecialize for optimal performance.

perf: skip cocos_out when no conversion needed

2f9a2a1

Added guard condition (user_cocos != to_cocos) before calling cocos_out. Since both default to 11, this avoids unnecessary function calls on hot path.

perf: add Bool type assertions to hasdata

f0b7cc1

Added @inline and ::Bool annotations to hasdata for better type inference. Helps compiler optimize getproperty hot path by eliminating runtime type checks.

perf: add early termination and inline getfield calls in add_filled

f40b540

bclyons12 reviewed Nov 15, 2025

View reviewed changes

src/data.jl Outdated Show resolved Hide resolved

mgyoo86 added 7 commits November 16, 2025 11:21

fix: rollback error_parent_of_nothing keyword signatures for `paren…

ee1c3aa

…t` function

perf: eliminate allocations in @nospecialize functions

8de082b

- Replace enumerate() with eachindex() + @inbounds - Replace fieldnames() with hasfield() - Optimize hasdata() to use numeric field indices - Use tuple literals instead of arrays for membership tests

mgyoo86 added 12 commits December 9, 2025 21:24

fix(diagnostics): use @nospecializeinfer directly for master compatib…

e529c74

…ility Replace @maybe_nospecializeinfer with @nospecializeinfer since the macro wrapper is not defined on master branch.

perf(f2i): reuse _f2p_skeleton cache instead of duplicating computation

ac5c928

- Remove type-based name computation (~15 lines) - Reuse cached skeleton from _f2p_skeleton(T) - Eliminates redundant replace/eachsplit/count calls per f2i invocation

Use AdaptiveArrayPools to eliminate idx array allocation in f2p/f2i

ff3a206

Replace zeros(Int, N) with zeros!(pool, Int, N) using @with_pool macro. This eliminates the small temporary array allocation (N typically 1-3) that occurred on every f2p/f2i call by reusing pooled memory.

style: use string() instead of interpolation in utlocation for consis…

9df2387

…tency

mgyoo86 added 2 commits January 5, 2026 14:45

fix(deps): update AdaptiveArrayPools dependency to use new UUID

14a74ce

bclyons12 self-requested a review January 14, 2026 07:48

bclyons12 approved these changes Jan 14, 2026

View reviewed changes

bclyons12 merged commit 06208c2 into master Jan 14, 2026
6 of 7 checks passed

fredrikekre mentioned this pull request Mar 11, 2026

JuliaHub Project Summary 2025 ProjectTorreyPines/FUSE.jl#1065

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Reduce compilation overhead with @nospecialize#83

[WIP] Reduce compilation overhead with @nospecialize#83
bclyons12 merged 58 commits intomasterfrom
fix/nospecialize

mgyoo86 commented Oct 30, 2025

Uh oh!

codecov bot commented Oct 30, 2025 •

edited

Loading

Uh oh!

orso82 commented Nov 1, 2025

Uh oh!

Uh oh!

mgyoo86 commented Dec 4, 2025

Uh oh!

mgyoo86 commented Dec 18, 2025

Uh oh!

bclyons12 left a comment

Uh oh!

Uh oh!

mgyoo86 commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mgyoo86 commented Oct 30, 2025

Status

Summary

Major Changes

Core Functions Modified

Known Issues

Testing

Test File

Required Temporary Modification

Next Steps

Uh oh!

codecov bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

orso82 commented Nov 1, 2025

Uh oh!

Uh oh!

mgyoo86 commented Dec 4, 2025

Additional Performance Optimizations (6 commits)

Summary

Key Changes

Files Changed

Uh oh!

mgyoo86 commented Dec 18, 2025

Key Changes

Uh oh!

bclyons12 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mgyoo86 commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Oct 30, 2025 •

edited

Loading