LLVM and SPIRV-LLVM-Translator pulldown (WW20 2026)#22020
Draft
iclsrc wants to merge 2088 commits into
Draft
Conversation
…part 53) (#194772) Convert five tests to use new HLFIR lowering instead of legacy FIR lowering: Lower/allocatable-callee.f90, Lower/allocatable-caller.f90, Lower/assignment.f90, Lower/assumed-shape-caller.f90, Lower/Intrinsics/count.f90
…prevent unbounded path length growth. (#193691) Ref #147220. ### Problem Description Bazel's use of clang modules for its `layering_check` emits `extern module` declarations relative to some base path meaning those paths usually include long sequences of `../` followed by the path to the module itself. When parsing `extern module` in the module file, we (I believe intentionally) silently ignore missing module files. Currently in the problem case if the file existence check failed for any _other_ reason it also silently ignores it. This means that `-fmodules-strict-decluse` that bazel uses for the layering_check can throw a spurious `err_undeclared_use_of_module` error which is the problem reported in #147220. Clang's `extern module` parsing chooses to concatenate these relative paths recursively meaning the growth in those paths is unbounded. In this case the file existence check fails due to the path name being too long (ENAMETOOLONG in POSIX). In summary there are possibly 2 underlying problems that contribute to #147220 that we could try to fix: 1. Silently ignoring unexpected errors (ENAMETOOLONG) meaning the ultimately reported error (undeclared use of module) doesn't really help the user understand what was wrong. 2. Unbounded path growth when recursively declaring `extern module`s in a chain. I'm choosing to focus on (2) in this PR because both fixes seem useful, and (1) seems an intentional design choice. ### Implementation Collapse `../` in relative `extern module` paths before loading those modules for parsing.
This fixes 6617aac. Co-authored-by: Google Bazel Bot <google-bazel-bot@google.com>
This PR introduces an `erase` method to `ScopedHashTable`, designed to remove the most recent value associated with a given key within the scope stack. To support efficient deletion, the internal `ScopedHashTableVal` structure has been refactored into a doubly linked list, allowing the predecessor of a node to be identified in O(1) time during removal. Fix the MLIR CSE issue llvm/llvm-project#191135 (comment). Part of llvm/llvm-project#193778.
…n-reduced mask (#187076) Handles the case where the mask does not need to be trimmed, i.e. it's already equal to the reduced vector type, for `XferRead/WriteDropUnitDims` patterns. Signed-off-by: Ege Beysel <beysel@roofline.ai>
…asked stores. (#194689)
…94825) For the simplifyBinaryIntrinsic interface the `Call` argument passed in may be null, which differs from other interfaces such as simplifyIntrinsic and simplifyUnaryIntrinsic which require `Call` to be non-null. See FoldBinaryIntrinsic in InstSimplifyFolder.h where the `Call` argument has a default value of null. That means for all uses of `Call` in simplifyBinaryIntrinsic we must first check the pointer is not null to avoid an invalid dereference. This PR fixes the case for the get.active.lane.mask intrinsic. There isn't currently an easy way to test this fix because the only place I can see where FoldBinaryIntrinsic is called without a null `Call` is VPlanTransforms.cpp and we don't currently invoke the function for get.active.lane.mask intrinsics.
As mentioned at llvm/llvm-project#194239 (comment) : > Not related to your PR, but it looks like we're missing checks here for bool vectors and BitInt destination types. This patch adds the missing checks for bool vectors and BitInt types in the `ConstantEmitter::emitForMemory` function.
This change makes it possible to use YAML anchors [1], [2] with YAMLTraits. All of the necessary parser machinery already exists, so the only change that is necessary is to wire it up to YAMLTraits. This is done by keeping track of all `Anchor` -> `HNode *` mappings and reusing those when an `AliasNode` is encountered. In accordance with the spec [2], anchors do not have to be unique and refer to the last occurrence in the serialization. Example usage: ```yaml foo: &a 42 bar: *a ``` The above would be deserialized as: ```yaml foo: 42 bar: 42 ``` Note that aliases are a serialization detail and can be discarded during composition into a Representation Graph (`HNode` hierarchy). [1]: https://yaml.org/spec/1.2.2/#692-node-anchors [2]: https://yaml.org/spec/1.2.2/#3222-anchors-and-aliases
Summary: Right now it's a little difficult to use the multilibs support because the user must manually provide one. I believe that when the user configures multilibs with the LLVM CMake arguments at a minimum we should provide one that forward `-fmultilib-flag=<multilib>` to the created runtime. This RP makes CMake emit this by manually writing a flag. Because users could provide their own, this adds some extre complexity to prevent this from being overwritten. The desire for this change is to more easily ship this support in CMake configuration files without needing to write files manually (for the typical case).
### Summary part of : llvm/llvm-project#185382 This is a follow up : llvm/llvm-project#193658 Lower zip1 and zip2 intrinsics in https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#zip-elements All the intrinsics are handled inline in `llvm-project/build/lib/clang/23/include/arm_neon.h` like: ``` #ifdef __LITTLE_ENDIAN__ __ai __attribute__((target("neon"))) int8x8_t vzip1_s8(int8x8_t __p0, int8x8_t __p1) { int8x8_t __ret; __ret = __builtin_shufflevector(__p0, __p1, 0, 8, 1, 9, 2, 10, 3, 11); return __ret; } #else __ai __attribute__((target("neon"))) int8x8_t vzip1_s8(int8x8_t __p0, int8x8_t __p1) { int8x8_t __ret; int8x8_t __rev0; __rev0 = __builtin_shufflevector(__p0, __p0, __lane_reverse_64_8); int8x8_t __rev1; __rev1 = __builtin_shufflevector(__p1, __p1, __lane_reverse_64_8); __ret = __builtin_shufflevector(__rev0, __rev1, 0, 8, 1, 9, 2, 10, 3, 11); __ret = __builtin_shufflevector(__ret, __ret, __lane_reverse_64_8); return __ret; } #endif ``` So no additional special lowering logic is needed.
- Normalize the header syntax for ReleaseNotes (current `.md` file and `ReleaseNotesTemplate.txt`) to use `#`-based headings - Normalize indents to distinguish doc title from page headers Fixes navigation indents for Furo theme update (see llvm/llvm-project#184440).
The SetVector already ensures that there are no cycles in the collection.
…4790) - Remove UNSUPPORTED: intelgpu from 12 passing tests: * mapping/data_member_ref.cpp * offloading/bug50022.cpp, info.c * offloading/target_critical_region.cpp, target_depend_nowait.cpp, target_nowait_target.cpp * offloading/strided_update/* (6 tests) * unified_shared_memory/close_member.c - Change CUDA tests from XFAIL to UNSUPPORTED for Intel GPU: * offloading/CUDA/basic_launch.cu * offloading/CUDA/basic_launch_blocks_and_threads.cu * offloading/CUDA/basic_launch_multi_arg.cu * offloading/CUDA/launch_tu.cu - Add Intel GPU configuration section to lit.cfg to disable USM tests by default
…#194610) This PR implements the refactorings discussed with @localspook in #193838 --------- Co-authored-by: Victor Chernyakin <chernyakin.victor.j@outlook.com>
z/OS has a table of mapped names in the IR. Counting the hits for just the name leads to one more hit than expected. Search for the name with the @ char to make sure the right occurrences are being counted.
…#194648) Fixes llvm/llvm-project#194596. When the function result symbol is encountered while the compiler is already completing the function result type, flang could recursively re-enter _CompleteFunctionResultType()_ and crash on invalid code. Instead of crashing on conflicting declarations, flang now reports an “already declared” error and stops further recursion.
Handle AVX-512 VGF2P8AFFINEQB rmbi instructions in X86MCInstLower. Unlike the existing rmi forms, rmbi uses a 64-bit broadcast memory operand, so the constant pool entry may only contain the broadcast source instead of a full-width vector constant. Print that constant repeated across the destination vector width when forming the asm comment. Related: llvm/llvm-project#194572
…tributes (#194726) Replace `getAsInteger()` parsing of the `patchable-function-entry` and `patchable-function-prefix` function attributes with the existing `Function::getFnAttributeAsParsedInteger()` helper across AsmPrinter and all backend targets. The IR verifier already validates these attributes as unsigned base-10 integers via `checkUnsignedBaseTenFuncAttr`, so parse failure at point of use indicates a verifier bypass or IR corruption. `getFnAttributeAsParsedInteger()` returns a default of 0 on failure (matching the implicit behavior of the old code) and emits a diagnostic rather than silently continuing.
Add operations that follow `float op(float, int)` pattern, mirroring the existing `spirv.GL.Ldexp` op
The constexpr functions in question take a scoped enum as an argument and a switch statement returns a value for each value of the enum. These are all legal statements in a constexpr function in C++14. Under constexpr rules, the evaluation of a constexpr function cannot lead to an evaluation of any prohibited forms of expressions. An evaluation of the functions being discussed with a valid argument will terminate at the switch, and an code that follows will not be evaluated. Using "llvm_unreachable" after the switch should be ok as long as the expansion of the llvm_unreachable macro does not contain any statements not allowed to appear in a constexpr function. At the same time, GCC before v9 did not tolerate any unguarded calls to non-constexpr functions after the switch. To avoid using "llvm_unreachable", which can have multiple expansions, use an assert with an explicit condition that the underlying value of the argument lies between the minimum and maximum values of the enum.
Pulled out of #194473 - update combineMinMaxReduction to fold to a ISD::VECREDUCE_SMAX/SMIN/UMAX/UMIN node and then perform the lowering later on. combineMinMaxReduction will go away once we can use shouldExpandReduction, rely on the middle-end to recognise reductions and not have to recreate them from the expanded patterns. I've added pre-SSE41 handling using vector unrolling - hopefully this will go away once #194672 is in place.
PR#194368 changed how line breaks are handles on Windows and it broke several libcxx tests on Windows, including libcxx/test/std/localization/locale.categories/facet.numpunct/ locale.numpunct.byname/thousands_sep.pass.cpp This patch addresses this issue.
### Summary part of llvm/llvm-project#185382 lower part of intrinsics in : https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#zip-elements Lower NEON::BI__builtin_neon_vzip_v and NEON::BI__builtin_neon_vzipq_v in CIRGenBuiltinAArch64.cpp by porting the existing incubator logic (`clangir/clang/lib/CIR/CodeGen/CIRGenBuiltinAArch64.cpp`) onto ClangIR: two bitcasts on the input vectors, two rounds of cir.vec.shuffle generating the low/high interleave patterns, each stored through a ptr_stride of the sret base pointer. ### Test - test_vzip_mf8 - test_vzipq_mf8 I found that these two intrinsics are defined in `llvm-project/clang/test/CodeGen/AArch64/fp8-intrinsics/acle_neon_fp8_untyped.c`, but this file seems to be a test suite specifically for the `mfloat8` type, so I did not remove their original test cases. Some of the new CHECK lines additionally match a pair of bitcasts before the shuffle; this shape comes from arm_neon.h's inline wrappers, which re-cast typed vectors (e.g. <4 x i16>) through <8 x i8> before calling __builtin_neon_vzip_v. Variants whose element type is already i8 (s8/u8/p8/mf8) skip that round-trip and therefore have no bitcasts in the check lines.
Cache root entry and SLPCostThreshold queries once, group !ForReduction-only checks under two blocks, extract a shared benign-node predicate from the two duplicated lambdas, and skip HasSingleLoad and allConstant work when results are dead. Reviewers: Pull Request: llvm/llvm-project#194895
…els (#194754) As it turns out, even if a `ProcResGroup` consists of in-order pipes, as long as its (the group's) BufferSize is not zero, Machine Scheduler will not use in-order scheduling on instructions that consume it. Since BufferSize also defaults to -1 for `ProcResGroup`, we have been scheduling the resource consumption of SiFive7's `PipeAB` (scalar pipes) and `VA1OrVA2` (vector pipes) in an out-of-order fashion! Co-authored-by: Min Hsu <min.hsu@sifive.com>
… scalar remainder (#190258) Add two new loop metadata attributes — `llvm.loop.vectorize.body` and `llvm.loop.vectorize.epilogue` — that the loop vectorizer sets on the generated vector loop and epilogue loop respectively. The metadata is only emitted when optimization remarks are enabled (`ORE->enabled()`), so it has zero cost in normal compilation. These enable downstream passes (LoopUnroll, WarnMissedTransforms) to produce more precise optimization remarks. Instead of the generic "loop not unrolled" warning on a source line that was vectorized, the unroller can now report: - **"vectorized loop"** for the main vector body - **"epilogue loop"** for the scalar epilogue/remainder - **"epilogue vectorized loop"** for an epilogue that was itself vectorized during epilogue vectorization (carries both attributes) A shared `getLoopVectorizeKindPrefix()` helper in `LoopUtils.h`/`LoopUtils.cpp` reads the metadata and returns the appropriate prefix string, used by both `LoopUnroll.cpp` and `WarnMissedTransforms.cpp`. The metadata emission in `VPlan.cpp` uses `Loop::addIntLoopAttribute` from the NFC PR #194676. Two end-to-end tests exercise the full `loop-vectorize → loop-unroll` pipeline with forced epilogue vectorization (`-enable-epilogue-vectorization -epilogue-vectorization-force-VF=4`) to produce all four loop categories from a single vectorizable function. Each test also includes a plain (non-vectorized) function to cover the baseline "loop" case. Both tests verify stderr diagnostic output and YAML structured remarks. **`LoopUnroll/vectorizer-loop-kind-remarks.ll`** checks for successful-unroll remarks. **`LoopTransformWarning/vectorizer-loop-kind-unroll-warning.ll`** checks for failed-unroll warnings. AI Disclaimer: this patch was generated with assistance of GitHub Copilot/Claude Opus and reviewed by a human.
…overage (#194009) Fixes #193500
SIFixSGPRCopies was incorrectly handling inline assembly operands with
SGPR ("s") constraints when the value came from a memory load (which
produces a VGPR). The pass would fail to insert the necessary
v_readfirstlane instruction instead directly passes the vgpr value.
example:
asm sideeffect buffer_load_dwordx4 $0, $1, $2, 0 =v,v,s,n
previously it generated:
buffer_load_dwordx4 v[0:3], v0, v[8:11] (but sgpr is expected), 0 offen
The fix adds readfirstlanes during lowering when there is a copy from
divergent register to SGPR.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
…le (#194398) This PR factors out the Flang/Fortranly only options from Options.td into a separately file (FlangOptions.td). Assisted-by: codex
…part 54) (#194774) Convert five tests to use new HLFIR lowering instead of legacy FIR lowering: Lower/Intrinsics/c_f_pointer.f90, Lower/Intrinsics/c_loc.f90, Lower/default-initialization-globals.f90, Lower/cray-pointer.f90, Lower/loops.f90
The new AppleClang is only available on macOS 26, so we need to update both.
…it (#194893) This replaces some SFINAE and function overloading with `if _LIBCPP_CONSTEXPR` to simplify the code a bit.
`DW_OP_addr_sect_offset4` is not a real DWARF opcode; it was a proprietary LLDB proposal that was never adopted (and has no llvm::dwarf constant). The same shared-library sliding problem is handled today by evaluating DW_OP_addr as a FileAddress and converting via Value::ConvertToLoadAddress.
The parallel DWARF linker deduplicates types across compile units using a shared TypePool. When multiple CUs define the same type, allocateTypeDie uses compare_exchange_strong to race for setting the canonical DIE. The first thread to succeed stores the DIE and clones its attributes, while subsequent threads use it the canonical one. Which thread wins depends on OS thread scheduling, making the output non-deterministic. This PR fixes the non-determinism by assigning each CompileUnit a priority based on its position in the link order (object file index, CU index within the file). When a CU wants to mark DIE as canonical, it acquires the spinlock, and only stores its DIE if its priority is strictly lower than the current canonical DIE. This ensures that the canonical DIE is always the lowest-priority (i.e. first) CU that defines that type. The replaced DIE is leaked into the bump allocator and the existing DebugTypeDeclFilePatch and accelerator record filters skips the orphaned DIEs via getFinalDie() checks. This PR also removes the AllowNonDeterministicOutput option, which was never set in the first place, and is now obsolete.
CONFLICT (content): Merge conflict in llvm/lib/SYCLLowerIR/CMakeLists.txt
…95131) When debugging PExpect tests, the 60 second timeout can make that process rather tedious. For TestStatusline, I used a class variable to easily override it while iterating but the idea is applicable more generally.
…nts" (#195135) Reverts llvm/llvm-project#190607 Causes crashes, e.g. https://lab.llvm.org/buildbot/#/builders/10/builds/27641
CONFLICT (content): Merge conflict in llvm/lib/Passes/PassBuilderPipelines.cpp
CONFLICT (content): Merge conflict in clang/include/clang/Options/Options.td
CONFLICT (content): Merge conflict in clang/lib/CodeGen/TargetInfo.cpp CONFLICT (content): Merge conflict in clang/lib/Sema/SemaSYCL.cpp
XFAIL the upstream test which assumes upstream driver layout (libLLVMSYCL.so, per-target-runtime-dir, single-level include path, clang-linker-wrapper) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…36498) With de82b47, UseAllocaASForSrets is causing breakage in Clang::OpenMP/amdgcn_sret_ctor.cpp. CMPLRLLVM-75138 We should follow up to revist UseAllocaASForSrets in CMPLRLLVM-75358
Fix ast-attr-add-ir-attributes-misc.cpp test that broke after commit fb02433 which introduced ExplicitInstantiationDecl AST node. The commit added a new ExplicitInstantiationDecl node to preserve source information for explicit template instantiations. Updated the test CHECK lines to account for this new node appearing in the AST dump between the explicit instantiation statement and the partial specialization declaration. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Fixes: CMPLRLLVM-75129 Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
…types w/ de82b47 de82b47 ([Clang] Fix sret AS for non-trivial-copy returns) introduced calls to CXXRecordDecl::hasTrivialCopyConstructor() and similar methods without checking that the record has a definition. This mirrors a guard that QualType::isTriviallyCopyableType() already has internally (via isIncompleteType()), which the old code relied on. The crash manifests when compiling SYCL device code for nvptx64 where builtins like __builtin_intel_sycl_alloca return sycl::multi_ptr, a class template whose specialization is declared but never fully instantiated on unsupported device targets (no code path actually uses its members — the builtin generates raw IR directly). Guard the trivial-copy queries with RD->hasDefinition(). If the record has no definition, conservatively set CanAggregateCopy to false, which is safe — the old isTriviallyCopyableType path returned false for incomplete types too. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| import io | ||
| import os | ||
| import shutil | ||
| import subprocess |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
LLVM: llvm/llvm-project@bc325ec
SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@bd774ef4