Skip to content

[mlir][bufferization] Add planning logic to static memory planner analysis#1

Draft
KrxGu wants to merge 10 commits into
mainfrom
krish/static-memory-planner-analysis-v1
Draft

[mlir][bufferization] Add planning logic to static memory planner analysis#1
KrxGu wants to merge 10 commits into
mainfrom
krish/static-memory-planner-analysis-v1

Conversation

@KrxGu

@KrxGu KrxGu commented Mar 28, 2026

Copy link
Copy Markdown
Owner

This PR extends the static memory planner analysis pass to compute actual memory layouts for eligible allocations.

What's implemented:

  • Pool-based grouping: allocations are grouped by memory space and element type
  • First-fit interval packing: assigns offsets while respecting lifetime overlaps and alignment constraints
  • Arena sizing: computes total arena size and alignment per pool
  • Enhanced remarks: now include pool ID, offset, size, alignment, and lifetime intervals

Test coverage:
Added test cases for Examples 10-14 from the project specification:

  • Sequential allocations with memory reuse
  • Overlapping lifetimes requiring different offsets
  • Alignment-sensitive placement
  • Separate pools for different memory spaces
  • Alias tracking preventing incorrect reuse

This is an analysis-only pass for same-block alloc/dealloc pairs. Future work will handle loops, conditionals, and the actual IR rewrite.

All tests pass with verify-diagnostics.

…ic buffer planning

Adds an analysis-only pass that finds same-block memref.alloc/dealloc pairs
eligible for static memory reuse. For each eligible alloc it computes a
conservative alias-aware lifetime interval using BufferViewFlowAnalysis,
collects size/alignment metadata, and emits op remarks. Ineligible allocs
get a skip reason (dynamic shape, nested in loop/conditional, no unique
same-block dealloc, escaping alias). Aggregate counts are tracked as pass
statistics.

No IR mutations. Intended to run after the deallocation pipeline
(ownership-based-buffer-deallocation + bufferization-lower-deallocations).

This is the first upstream step for a buffer reuse pass discussed in the
MLIR discourse thread (RFC: GSoC buffer reuse pass for non-overlapping
allocations after lower-deallocations).

Adds six lit/FileCheck tests covering the main eligibility paths.

Signed-off-by: KrxGu <krishom70@gmail.com>
@KrxGu KrxGu force-pushed the krish/static-memory-planner-analysis-v1 branch from feba137 to f5111c1 Compare March 28, 2026 05:05
Rewrote function-level and inline comments to be more concise.
Added test 7 covering the cross-block alias / escaping skip path,
which was previously exercised by the implementation but not tested.

Signed-off-by: KrxGu <krishom70@gmail.com>
KrxGu pushed a commit that referenced this pull request Apr 12, 2026
Running gcc test c-c++-common/tsan/tls_race.c on s390 we get:

ThreadSanitizer: CHECK failed: tsan_platform_linux.cpp:618 "((thr_beg))
>= ((tls_addr))" (0x3ffaa35e140, 0x3ffaa35e250) (tid=2419930)
#0 __tsan::CheckUnwind() /devel/src/libsanitizer/tsan/tsan_rtl.cpp:696
(libtsan.so.2+0x91b57)
#1 __sanitizer::CheckFailed(char const*, int, char const*, unsigned long
long, unsigned long long)
/devel/src/libsanitizer/sanitizer_common/sanitizer_termination.cpp:86
(libtsan.so.2+0xd211b)
llvm#2 __tsan::ImitateTlsWrite(__tsan::ThreadState*, unsigned long, unsigned
long) /devel/src/libsanitizer/tsan/tsan_platform_linux.cpp:618
(libtsan.so.2+0x8faa3)
llvm#3 __tsan::ThreadStart(__tsan::ThreadState*, unsigned int, unsigned long
long, __sanitizer::ThreadType)
/devel/src/libsanitizer/tsan/tsan_rtl_thread.cpp:225
(libtsan.so.2+0xaadb5)
llvm#4 __tsan_thread_start_func
/devel/src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1065
(libtsan.so.2+0x3d34d)
llvm#5 start_thread <null> (libc.so.6+0xae70d) (BuildId:
d3b08de1b543c2d15d419bf861b3c2e4c01ac75b)
llvm#6 thread_start <null> (libc.so.6+0x12d2ff) (BuildId:
d3b08de1b543c2d15d419bf861b3c2e4c01ac75b)

In order to determine the static TLS blocks in GetStaticTlsBoundary we
iterate over the modules and try to find the largest range without a
gap. Here we might have that modules are spaced exactly by the
alignment. For example, for the failing test we have:

(gdb) p/x ranges.data_[0]
$1 = {begin = 0x3fff7f9e6b8, end = 0x3fff7f9e740, align = 0x8, tls_modid
= 0x3} (gdb) p/x ranges.data_[1]
$2 = {begin = 0x3fff7f9e740, end = 0x3fff7f9eed0, align = 0x40,
tls_modid = 0x2} (gdb) p/x ranges.data_[2]
$3 = {begin = 0x3fff7f9eed8, end = 0x3fff7f9eef8, align = 0x8, tls_modid
= 0x4} (gdb) p/x ranges.data_[3]
$4 = {begin = 0x3fff7f9eefc, end = 0x3fff7f9ef00, align = 0x4, tls_modid
= 0x1}

where ranges[3].begin == ranges[2].end + ranges[3].align holds. Since in
the loop a strict inequality test is used we compute the wrong address

(gdb) p/x *addr
$5 = 0x3fff7f9eefc

whereas 0x3fff7f9e6b8 is expected which is why we bail out in the
subsequent.
KrxGu pushed a commit that referenced this pull request Apr 12, 2026
…8271)

Example:

    int foo(int a, int b) { return a - 1 + ~b; }

Before, on AArch64:

    mvn w8, w1
    add w8, w0, w8
    sub w0, w8, #1

After (matches gcc):

    sub w0, w0, w1
    sub w0, w0, llvm#2

Proof: https://alive2.llvm.org/ce/z/g_bV01
KrxGu pushed a commit that referenced this pull request May 18, 2026
llvm#183506 revealed a pre-existing
use-after-scope in createInstrInfo (MSan bot:
https://lab.llvm.org/buildbot/#/builders/164/builds/21562 [*]).

This patch fixes the issue by changing the stack-allocated
AArch64Subtarget (which goes out of scope once createInstrInfo()
returns) into heap-allocated, allowing it to be safely stored in the
returned AArch64InstrInfo.

-----

[*] WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x55555666fabd in
llvm::AArch64InstrInfo::getInstSizeInBytes(llvm::MachineInstr const&)
const
/home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:247:5
...

/home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/unittests/Target/AArch64/InstSizes.cpp:85:3
llvm#9 0x555556508559 in InstSizes_MOVaddrTagged_Test::TestBody()
/home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/unittests/Target/AArch64/InstSizes.cpp:301:3
...

  Member fields were destroyed
#0 0x555556498a1d in __sanitizer_dtor_callback_fields
/home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1074:5
#1 0x5555564fbda6 in ~Triple
/home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/include/llvm/TargetParser/Triple.h:348:12
llvm#2 0x5555564fbda6 in ~Triple
/home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/include/llvm/TargetParser/Triple.h:47:7
llvm#3 0x5555564fbda6 in llvm::AArch64Subtarget::~AArch64Subtarget()
/home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/lib/Target/AArch64/AArch64Subtarget.h:38:7
llvm#4 0x555556503396 in (anonymous
namespace)::createInstrInfo(llvm::TargetMachine*)
/home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/unittests/Target/AArch64/InstSizes.cpp:38:1
llvm#5 0x5555565084cb in InstSizes_MOVaddrTagged_Test::TestBody()
/home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/unittests/Target/AArch64/InstSizes.cpp:299:42
@KrxGu KrxGu force-pushed the krish/static-memory-planner-analysis-v1 branch from 6058453 to 9b33562 Compare May 18, 2026 07:37
@KrxGu KrxGu changed the title [mlir][bufferization] add same-block alloc lifetime analysis for stat… [mlir][bufferization] Add planning logic to static memory planner analysis May 18, 2026
…e analysis

Stripped down the initial implementation to focus on fundamentals.
The pass now just identifies eligible alloc/dealloc pairs in the same
block and checks which ones could potentially reuse memory based on
their order in the IR.

Removed pool grouping, offset assignment, and interval packing logic.
That complexity belongs in a follow-up once the basic approach is solid.

Signed-off-by: KrxGu <krishom70@gmail.com>
@KrxGu KrxGu force-pushed the krish/static-memory-planner-analysis-v1 branch from 9b33562 to 647b82d Compare May 22, 2026 16:19
Comment thread mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.td Outdated
…ny operation

Address review feedback: removed func::FuncOp restriction so the pass
can run on any operation (e.g., gpu.launch, async.execute). Updated
test to use pass-pipeline notation to explicitly schedule on func.func.

This makes the pass more flexible for users who want to apply static
memory planning to other operation types.

Signed-off-by: KrxGu <krishom70@gmail.com>
KrxGu pushed a commit that referenced this pull request May 25, 2026
…198548)

When an MCP client disconnects (EOF), `IOTransport::OnRead` called
`handler.OnClosed()` before resetting `m_read_handle`. The MCP server's
`OnClosed` handler erases the client from `m_instances`, destroying both
  the transport (`this`) and the binder (`handler`). The subsequent
`m_read_handle.reset()` then accessed the destroyed transport's member,
  causing a use-after-free (SIGSEGV).

* thread #1, stop reason = signal SIGSEGV: address not mapped to object
(fault address=0x28)
* frame #0: 0x00007ff5d4d5afda
liblldb.so.23.2`lldb_private::transport::IOTransport<lldb_protocol::mcp::ProtocolDescriptor>::OnRead(lldb_private::MainLoopBase&,
lldb_private::transport::JSONTransport<lldb_protocol::mcp::ProtocolDescriptor>::MessageHandler&)
+ 1274
frame #1: 0x00007ff5d1140ad8
liblldb.so.23.0`lldb_private::MainLoopPosix::Run() + 408
frame llvm#2: 0x00007ff5d1760c1c
liblldb.so.23.0`std::thread::_State_impl<std::thre

  Fix by resetting the read handle before calling `OnClosed()`, so no
  transport members are accessed after the handler potentially destroys
  the transport.

Then when the scope is left, the destructor is called for the new
read_handle local variable and it is cleaned up.

New unit tests added that fail without this change. With the change, the
custom 'ai' script (allows end user locally to communicate lldb context
to agent backend via a spun up MCP server: "protocol-server start MCP
listen://localhost:{port}") now successfully concludes without this
crash

Assisted with: claude
KrxGu added 6 commits June 1, 2026 16:56
…ry planner

This adds the complete transformation pass that converts multiple
memref.alloc/dealloc pairs into a single arena with subviews.

The offset assignment is intentionally simple (just sequential) - this
establishes the e2e pipeline so we can add smarter bin-packing later.

Tests verify arena sizing, sequential offsets, and that dynamic shapes
or missing deallocations are correctly skipped.
Track alignment requirements from memref.alloc operations and ensure
offsets are properly padded to meet alignment constraints. The arena
allocation receives the maximum alignment of all transformed allocations.

Changes:
- Add alignment field to AllocationCandidate structure
- Compute sizes in bytes to handle alignment padding correctly
- Implement alignOffset() helper to pad offsets to alignment boundaries
- Set arena alignment attribute to maximum required alignment
- Add test demonstrating alignment padding with 64 and 128-byte requirements

This ensures correctness for SIMD and other alignment-sensitive operations.
Separate memory planning logic from IR transformation by introducing
trivialMemoryPlanner() - a pure function that computes buffer offsets
without depending on MLIR operations.

Changes:
- Add Alloc structure for allocation-independent planning
- Implement trivialMemoryPlanner(arenaAlignment, allocs) -> offsets
- Refactor runOnOperation() to use the planning function
- Planning logic is now testable independently of MLIR

This architecture enables plugging in different allocation strategies
(firstFit, bestFit) without modifying IR transformation code.
Change the arena from typed (e.g., memref<Nxf32>) to a generic i8 byte buffer
(memref<Nxi8>). This allows a single arena to hold allocations of different
element types (f32, i64, i16, etc.).

Use memref.view to create typed views into the i8 arena at computed byte
offsets. This is the standard MLIR pattern for type-agnostic memory buffers.

Changes:
- Arena is now memref<totalSizexi8> instead of element-typed
- Use memref.view instead of memref.subview + reinterpret_cast
- Byte offsets passed directly to memref.view via arith.constant
- Update all tests to reflect i8 arena + view pattern

Example transformation:
  Before: memref.alloc() : memref<1024xf32>
  After:  %arena = memref.alloc() : memref<4096xi8>
          %c0 = arith.constant 0 : index
          %view = memref.view %arena[%c0][] : memref<4096xi8> to memref<1024xf32>
Add arena-mode pass option to control how the shared arena is obtained:
- 'allocate' (default): Creates arena via memref.alloc within the function
- 'arg': Uses function's first argument as the pre-allocated arena

The 'arg' mode is useful when the arena is pre-allocated externally and
passed to the function, enabling use cases like pre-allocated scratch
buffers or memory pools.

In 'arg' mode, the pass validates that:
1. The context is a function operation
2. The function has at least one argument
3. The first argument is memref<...xi8>

If validation fails, the pass emits an error and fails gracefully.

Changes:
- Add arena-mode option to Passes.td with default 'allocate'
- Update pass description to reflect transformation behavior
- Implement conditional arena acquisition based on mode
- Add tests for arg mode with error validation
Add validation to reject functions with memref return types, as static
memory planning is incompatible with returning memrefs. In allocate mode,
the arena is freed at function exit, making returned memrefs invalid. In
arg mode, returning a memref from the input arena violates typical memory
ownership patterns.

When a function has memref return types, the pass:
1. Emits a clear error message
2. Fails gracefully without transforming the function
3. Preserves the original IR

This prevents silent bugs where returned memrefs would point to freed or
external memory.

Changes:
- Add return type validation at start of runOnOperation()
- Check all result types for MemRefType
- Emit descriptive error and signal pass failure
- Add test case verifying error emission
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants