Skip to content

Optimize garbage collection with a generational GC#2129

Open
pguyot wants to merge 1 commit intoatomvm:mainfrom
pguyot:w08/generational-gc
Open

Optimize garbage collection with a generational GC#2129
pguyot wants to merge 1 commit intoatomvm:mainfrom
pguyot:w08/generational-gc

Conversation

@pguyot
Copy link
Collaborator

@pguyot pguyot commented Feb 22, 2026

Running jit tests with AtomVM is now 20% faster.

Implement BEAM's fullsweep_after spawn_opt/1 option and process_flag/2 flag. Also fix process_flag/2 spec.

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

@petermm
Copy link
Contributor

petermm commented Feb 27, 2026

per Amp:

Code Review: Generational GC for AtomVM

Overview

This is a well-structured implementation of BEAM-style generational (minor/major) garbage collection. The architecture is sound — high water mark tracking, old heap promotion, dual scan loop, and proper MSO list separation are all correctly implemented. Documentation and tests are excellent. A few issues worth attention:


Issues

bug (medium) — nifs.c: fullsweep_after via process_flag/2 accepts negative integers, which silently wrap to UINT_MAX since ctx->fullsweep_after is unsigned int. Should validate term_to_int(value) >= 0 or raise badarg. Same issue at nifs.c#L1341 in do_spawn.

bug (medium) — memory.c: In memory_full_gc, if memory_init_heap fails (line 314), the function returns immediately but the old heap fragment has already been chained into old_root_fragment->next (line 311). The caller now has a corrupted fragment chain — the context's heap still points to the old root fragment with the old heap dangling off it, but no cleanup is performed. This leaves the process in an inconsistent state.

style (low) — memory.c: The MSO list target selection in memory_scan_and_copy_generational uses raw pointer comparisons (ptr >= heap->old_heap_start && ptr < heap->old_heap_end) instead of the existing memory_is_in_old_heap() helper defined at line 970. Should use the helper for consistency.

correctness (info) — memory.c: gen_heap is partially initialized — fields like root, heap_ptr, heap_end, old_mso_list are unset. This is safe since memory_shallow_copy_term_generational and memory_scan_and_copy_generational only access the initialized fields, but it's fragile if the struct ever gains new fields that are read during GC.

correctness (info) — memory.c: When MEMORY_SHRINK expands to memory_full_gc (non-realloc path), the shrink pass after a minor GC invokes a full GC, which resets old_heap_start. This is functionally correct but means a shrink after minor GC always does a full collection, losing generational benefits. Intentional given the platform constraints, but worth noting.

Positives

Dual scan loop (lines 1260–1296) is correctly implemented — both young and old regions are scanned to convergence, handling cross-generational references without a write barrier.
MSO list handling correctly splits refc binaries between young and old MSO lists, with proper refcount management (increment on copy, decrement on sweep).
Fallback to full GC when old heap space is insufficient is a clean safety net.
Tests cover basic generational, promotion, force-shrink, MSO, and fullsweep_after=0 scenarios — solid coverage.
Documentation in memory-management.md is clear and well-written.
JIT offset updates are validated by _Static_assert — good practice.

@pguyot pguyot force-pushed the w08/generational-gc branch 3 times, most recently from 2a1dab6 to 554c7f5 Compare March 8, 2026 09:08
@petermm
Copy link
Contributor

petermm commented Mar 8, 2026

I'm just the messenger.. per amp:

Previous issues: All 4 fixed ✅

Remaining blocker: Pre-existing old heap not scanned during minor GC 🔴

Line 1268 sets old_scan = saved_old_heap_ptr, meaning only the newly promoted old region is scanned. The pre-existing old heap [saved_old_heap_start, saved_old_heap_ptr) is skipped.

Why this is a bug: Minor GC itself creates old→young pointers. When a promoted term's field points to a young survivor (above HWM), that survivor is copied to the new young heap. The promoted old object now contains a pointer into the young heap. On the next minor GC, that old object is returned unchanged (lines 994–996), the pre-existing old heap is never scanned, and the young term it points to can be missed — leading to dangling pointers and premature refc binary frees.

The code itself confirms old→young pointers exist — memory_shrink (line 440) explicitly rewrites old heap pointers into young generation.

Fix: Change line 1268 from:

term *old_scan = saved_old_heap_ptr;

to:

term *old_scan = saved_old_heap_start ? saved_old_heap_start : old_heap_ptr;

Minor issue: fullsweep_after truncation on 64-bit

fullsweep_after is unsigned int but term_to_int() returns avm_int_t (64-bit). A value like 0x1_0000_0000 passes the >= 0 check but silently truncates to 0, disabling generational GC unexpectedly. Should validate <= UINT_MAX.

Minor hardening: memory_is_in_old_heap null-safety

When old_heap_start is NULL, comparing arbitrary pointers against NULL is technically undefined behavior in C. Add a null guard:

return heap->old_heap_start && ptr >= heap->old_heap_start && ptr < heap->old_heap_end;

Verdict

Not merge-ready. The old-heap scan gap is a correctness/memory-safety blocker. The fix is a one-line change + a regression test. Everything else in the commit (architecture, MSO handling, error paths, tests, docs, JIT offsets) looks solid.

@petermm
Copy link
Contributor

petermm commented Mar 8, 2026

Codex verified, but was NOT satisfied by one line fix:

is the proposed fix for #1 valid?

No, not as a standalone one-line fix.

Changing old_scan (line 1268) to start at saved_old_heap_start does fix the reachability gap in principle, but with the current code it would also rescan all pre-existing old objects through memory_scan_and_copy_generational (line 1065), and that routine re-adds old-heap MSOs and increments refcounts for them at src/libAtomVM/memory.c (line 1101) and src/libAtomVM/memory.c (line 1118). Since new_old_mso_list currently starts from the preserved old list at src/libAtomVM/memory.c (line 1271), rescanning the old heap would duplicate entries and over-increment refcounts on every minor GC.

So the reviewer’s diagnosis of the bug is valid, but the proposed one-line fix is incomplete. A correct fix needs one of these:

scan pre-existing old heap for old→young edges without rebuilding old-MSO bookkeeping for already-old objects, or
rebuild old-MSO bookkeeping from scratch with matching decrement/sweep logic.
So: valid direction for liveness, not valid as the final patch.

Running jit tests with AtomVM is now 20% faster.

Implement BEAM's `fullsweep_after` `spawn_opt/1` option and `process_flag/2`
flag. Also fix `process_flag/2` spec.

Also fix a bug where a finishing process in spawning state wasn't properly
removed from the processes list.

Signed-off-by: Paul Guyot <pguyot@kallisys.net>
@pguyot pguyot force-pushed the w08/generational-gc branch from 554c7f5 to 611ae42 Compare March 8, 2026 13:31
@pguyot
Copy link
Collaborator Author

pguyot commented Mar 8, 2026

I removed the defensive block that would process pointers from old -> young, which cannot exist as Erlang data structures are immutable. This confused the LLMs.

@petermm
Copy link
Contributor

petermm commented Mar 8, 2026

I removed the defensive block that would process pointers from old -> young, which cannot exist as Erlang data structures are immutable. This confused the LLMs.

You know best, this is way beyond my skill level - llm does suggest cleaning docs up if you back that out..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants