[mlir][bufferization] Add planning logic to static memory planner analysis#1
Draft
KrxGu wants to merge 10 commits into
Draft
[mlir][bufferization] Add planning logic to static memory planner analysis#1KrxGu wants to merge 10 commits into
KrxGu wants to merge 10 commits into
Conversation
…ic buffer planning Adds an analysis-only pass that finds same-block memref.alloc/dealloc pairs eligible for static memory reuse. For each eligible alloc it computes a conservative alias-aware lifetime interval using BufferViewFlowAnalysis, collects size/alignment metadata, and emits op remarks. Ineligible allocs get a skip reason (dynamic shape, nested in loop/conditional, no unique same-block dealloc, escaping alias). Aggregate counts are tracked as pass statistics. No IR mutations. Intended to run after the deallocation pipeline (ownership-based-buffer-deallocation + bufferization-lower-deallocations). This is the first upstream step for a buffer reuse pass discussed in the MLIR discourse thread (RFC: GSoC buffer reuse pass for non-overlapping allocations after lower-deallocations). Adds six lit/FileCheck tests covering the main eligibility paths. Signed-off-by: KrxGu <krishom70@gmail.com>
feba137 to
f5111c1
Compare
Rewrote function-level and inline comments to be more concise. Added test 7 covering the cross-block alias / escaping skip path, which was previously exercised by the implementation but not tested. Signed-off-by: KrxGu <krishom70@gmail.com>
KrxGu
pushed a commit
that referenced
this pull request
Apr 12, 2026
Running gcc test c-c++-common/tsan/tls_race.c on s390 we get: ThreadSanitizer: CHECK failed: tsan_platform_linux.cpp:618 "((thr_beg)) >= ((tls_addr))" (0x3ffaa35e140, 0x3ffaa35e250) (tid=2419930) #0 __tsan::CheckUnwind() /devel/src/libsanitizer/tsan/tsan_rtl.cpp:696 (libtsan.so.2+0x91b57) #1 __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /devel/src/libsanitizer/sanitizer_common/sanitizer_termination.cpp:86 (libtsan.so.2+0xd211b) llvm#2 __tsan::ImitateTlsWrite(__tsan::ThreadState*, unsigned long, unsigned long) /devel/src/libsanitizer/tsan/tsan_platform_linux.cpp:618 (libtsan.so.2+0x8faa3) llvm#3 __tsan::ThreadStart(__tsan::ThreadState*, unsigned int, unsigned long long, __sanitizer::ThreadType) /devel/src/libsanitizer/tsan/tsan_rtl_thread.cpp:225 (libtsan.so.2+0xaadb5) llvm#4 __tsan_thread_start_func /devel/src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1065 (libtsan.so.2+0x3d34d) llvm#5 start_thread <null> (libc.so.6+0xae70d) (BuildId: d3b08de1b543c2d15d419bf861b3c2e4c01ac75b) llvm#6 thread_start <null> (libc.so.6+0x12d2ff) (BuildId: d3b08de1b543c2d15d419bf861b3c2e4c01ac75b) In order to determine the static TLS blocks in GetStaticTlsBoundary we iterate over the modules and try to find the largest range without a gap. Here we might have that modules are spaced exactly by the alignment. For example, for the failing test we have: (gdb) p/x ranges.data_[0] $1 = {begin = 0x3fff7f9e6b8, end = 0x3fff7f9e740, align = 0x8, tls_modid = 0x3} (gdb) p/x ranges.data_[1] $2 = {begin = 0x3fff7f9e740, end = 0x3fff7f9eed0, align = 0x40, tls_modid = 0x2} (gdb) p/x ranges.data_[2] $3 = {begin = 0x3fff7f9eed8, end = 0x3fff7f9eef8, align = 0x8, tls_modid = 0x4} (gdb) p/x ranges.data_[3] $4 = {begin = 0x3fff7f9eefc, end = 0x3fff7f9ef00, align = 0x4, tls_modid = 0x1} where ranges[3].begin == ranges[2].end + ranges[3].align holds. Since in the loop a strict inequality test is used we compute the wrong address (gdb) p/x *addr $5 = 0x3fff7f9eefc whereas 0x3fff7f9e6b8 is expected which is why we bail out in the subsequent.
KrxGu
pushed a commit
that referenced
this pull request
Apr 12, 2026
…8271) Example: int foo(int a, int b) { return a - 1 + ~b; } Before, on AArch64: mvn w8, w1 add w8, w0, w8 sub w0, w8, #1 After (matches gcc): sub w0, w0, w1 sub w0, w0, llvm#2 Proof: https://alive2.llvm.org/ce/z/g_bV01
KrxGu
pushed a commit
that referenced
this pull request
May 18, 2026
llvm#183506 revealed a pre-existing use-after-scope in createInstrInfo (MSan bot: https://lab.llvm.org/buildbot/#/builders/164/builds/21562 [*]). This patch fixes the issue by changing the stack-allocated AArch64Subtarget (which goes out of scope once createInstrInfo() returns) into heap-allocated, allowing it to be safely stored in the returned AArch64InstrInfo. ----- [*] WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x55555666fabd in llvm::AArch64InstrInfo::getInstSizeInBytes(llvm::MachineInstr const&) const /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:247:5 ... /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/unittests/Target/AArch64/InstSizes.cpp:85:3 llvm#9 0x555556508559 in InstSizes_MOVaddrTagged_Test::TestBody() /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/unittests/Target/AArch64/InstSizes.cpp:301:3 ... Member fields were destroyed #0 0x555556498a1d in __sanitizer_dtor_callback_fields /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1074:5 #1 0x5555564fbda6 in ~Triple /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/include/llvm/TargetParser/Triple.h:348:12 llvm#2 0x5555564fbda6 in ~Triple /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/include/llvm/TargetParser/Triple.h:47:7 llvm#3 0x5555564fbda6 in llvm::AArch64Subtarget::~AArch64Subtarget() /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/lib/Target/AArch64/AArch64Subtarget.h:38:7 llvm#4 0x555556503396 in (anonymous namespace)::createInstrInfo(llvm::TargetMachine*) /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/unittests/Target/AArch64/InstSizes.cpp:38:1 llvm#5 0x5555565084cb in InstSizes_MOVaddrTagged_Test::TestBody() /home/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/unittests/Target/AArch64/InstSizes.cpp:299:42
6058453 to
9b33562
Compare
…e analysis Stripped down the initial implementation to focus on fundamentals. The pass now just identifies eligible alloc/dealloc pairs in the same block and checks which ones could potentially reuse memory based on their order in the IR. Removed pool grouping, offset assignment, and interval packing logic. That complexity belongs in a follow-up once the basic approach is solid. Signed-off-by: KrxGu <krishom70@gmail.com>
9b33562 to
647b82d
Compare
…ny operation Address review feedback: removed func::FuncOp restriction so the pass can run on any operation (e.g., gpu.launch, async.execute). Updated test to use pass-pipeline notation to explicitly schedule on func.func. This makes the pass more flexible for users who want to apply static memory planning to other operation types. Signed-off-by: KrxGu <krishom70@gmail.com>
KrxGu
pushed a commit
that referenced
this pull request
May 25, 2026
…198548) When an MCP client disconnects (EOF), `IOTransport::OnRead` called `handler.OnClosed()` before resetting `m_read_handle`. The MCP server's `OnClosed` handler erases the client from `m_instances`, destroying both the transport (`this`) and the binder (`handler`). The subsequent `m_read_handle.reset()` then accessed the destroyed transport's member, causing a use-after-free (SIGSEGV). * thread #1, stop reason = signal SIGSEGV: address not mapped to object (fault address=0x28) * frame #0: 0x00007ff5d4d5afda liblldb.so.23.2`lldb_private::transport::IOTransport<lldb_protocol::mcp::ProtocolDescriptor>::OnRead(lldb_private::MainLoopBase&, lldb_private::transport::JSONTransport<lldb_protocol::mcp::ProtocolDescriptor>::MessageHandler&) + 1274 frame #1: 0x00007ff5d1140ad8 liblldb.so.23.0`lldb_private::MainLoopPosix::Run() + 408 frame llvm#2: 0x00007ff5d1760c1c liblldb.so.23.0`std::thread::_State_impl<std::thre Fix by resetting the read handle before calling `OnClosed()`, so no transport members are accessed after the handler potentially destroys the transport. Then when the scope is left, the destructor is called for the new read_handle local variable and it is cleaned up. New unit tests added that fail without this change. With the change, the custom 'ai' script (allows end user locally to communicate lldb context to agent backend via a spun up MCP server: "protocol-server start MCP listen://localhost:{port}") now successfully concludes without this crash Assisted with: claude
…ry planner This adds the complete transformation pass that converts multiple memref.alloc/dealloc pairs into a single arena with subviews. The offset assignment is intentionally simple (just sequential) - this establishes the e2e pipeline so we can add smarter bin-packing later. Tests verify arena sizing, sequential offsets, and that dynamic shapes or missing deallocations are correctly skipped.
Track alignment requirements from memref.alloc operations and ensure offsets are properly padded to meet alignment constraints. The arena allocation receives the maximum alignment of all transformed allocations. Changes: - Add alignment field to AllocationCandidate structure - Compute sizes in bytes to handle alignment padding correctly - Implement alignOffset() helper to pad offsets to alignment boundaries - Set arena alignment attribute to maximum required alignment - Add test demonstrating alignment padding with 64 and 128-byte requirements This ensures correctness for SIMD and other alignment-sensitive operations.
Separate memory planning logic from IR transformation by introducing trivialMemoryPlanner() - a pure function that computes buffer offsets without depending on MLIR operations. Changes: - Add Alloc structure for allocation-independent planning - Implement trivialMemoryPlanner(arenaAlignment, allocs) -> offsets - Refactor runOnOperation() to use the planning function - Planning logic is now testable independently of MLIR This architecture enables plugging in different allocation strategies (firstFit, bestFit) without modifying IR transformation code.
Change the arena from typed (e.g., memref<Nxf32>) to a generic i8 byte buffer
(memref<Nxi8>). This allows a single arena to hold allocations of different
element types (f32, i64, i16, etc.).
Use memref.view to create typed views into the i8 arena at computed byte
offsets. This is the standard MLIR pattern for type-agnostic memory buffers.
Changes:
- Arena is now memref<totalSizexi8> instead of element-typed
- Use memref.view instead of memref.subview + reinterpret_cast
- Byte offsets passed directly to memref.view via arith.constant
- Update all tests to reflect i8 arena + view pattern
Example transformation:
Before: memref.alloc() : memref<1024xf32>
After: %arena = memref.alloc() : memref<4096xi8>
%c0 = arith.constant 0 : index
%view = memref.view %arena[%c0][] : memref<4096xi8> to memref<1024xf32>
Add arena-mode pass option to control how the shared arena is obtained: - 'allocate' (default): Creates arena via memref.alloc within the function - 'arg': Uses function's first argument as the pre-allocated arena The 'arg' mode is useful when the arena is pre-allocated externally and passed to the function, enabling use cases like pre-allocated scratch buffers or memory pools. In 'arg' mode, the pass validates that: 1. The context is a function operation 2. The function has at least one argument 3. The first argument is memref<...xi8> If validation fails, the pass emits an error and fails gracefully. Changes: - Add arena-mode option to Passes.td with default 'allocate' - Update pass description to reflect transformation behavior - Implement conditional arena acquisition based on mode - Add tests for arg mode with error validation
Add validation to reject functions with memref return types, as static memory planning is incompatible with returning memrefs. In allocate mode, the arena is freed at function exit, making returned memrefs invalid. In arg mode, returning a memref from the input arena violates typical memory ownership patterns. When a function has memref return types, the pass: 1. Emits a clear error message 2. Fails gracefully without transforming the function 3. Preserves the original IR This prevents silent bugs where returned memrefs would point to freed or external memory. Changes: - Add return type validation at start of runOnOperation() - Check all result types for MemRefType - Emit descriptive error and signal pass failure - Add test case verifying error emission
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR extends the static memory planner analysis pass to compute actual memory layouts for eligible allocations.
What's implemented:
Test coverage:
Added test cases for Examples 10-14 from the project specification:
This is an analysis-only pass for same-block alloc/dealloc pairs. Future work will handle loops, conditionals, and the actual IR rewrite.
All tests pass with verify-diagnostics.