Conversation
Signed-off-by: Karan Janthe <karanjanthe@gmail.com>
- Add F128 support to TypeTree Kind enum - Implement TypeTree FFI bindings and conversion functions - Add typetree.rs module for metadata attachment to LLVM functions - Integrate TypeTree generation with autodiff intrinsic pipeline - Support scalar types: f32, f64, integers, f16, f128 - Attach enzyme_type attributes as LLVM string metadata for Enzyme Signed-off-by: Karan Janthe <karanjanthe@gmail.com>
- Fix nott-flag test to emit LLVM IR and check enzyme_type attributes - Replace TODO comments with actual TypeTree metadata verification - Test that NoTT flag properly disables TypeTree generation - Test that TypeTree enabled generates proper enzyme_type attributes Signed-off-by: Karan Janthe <karanjanthe@gmail.com>
- Add specific tests for f32, f64, i32, f16, f128 TypeTree generation - Verify correct enzyme_type metadata for each scalar type - Ensure TypeTree metadata matches expected Enzyme format Signed-off-by: Karan Janthe <karanjanthe@gmail.com>
Signed-off-by: Karan Janthe <karanjanthe@gmail.com>
This patch introduces an LSX-optimized version of `analyze_source_file` for the `loongarch64` target. Similar to existing SSE2 implementation for x86, this version: - Processes 16-byte chunks at a time using LSX vector intrinsics. - Quickly identifies newlines in ASCII-only chunks. - Falls back to the generic implementation when multi-byte UTF-8 characters are detected or in the tail portion.
Current implementation uses bin_name to check if it exists, but it should use tool_root_dir/tool_bin_dir/bin_name instead. Otherwise the check fails every time, hence the function falls back to install the binary.
…n, r=joboet std::net: update tcp deferaccept delay type to Duration. See comment [here](rust-lang#119639 (comment)).
…iler-errors Allow `&raw [mut | const]` for union field in safe code fixes rust-lang#141264 r? ``@Veykril`` Unresolved questions: - [x] Any edge cases? - [x] How this works with rust-analyzer (because all I've did is prevent compiler from emitting error in `&raw` context) (rust-lang/rust-analyzer#19867) - [x] Should we allow `addr_of!` and `addr_of_mut!` as well? In current version they both (`&raw` and `addr_of!`) are allowed (They are the same) - [x] Is chain of union fields is a safe? (Yes)
TypeTree support in autodiff
# TypeTrees for Autodiff
## What are TypeTrees?
Memory layout descriptors for Enzyme. Tell Enzyme exactly how types are structured in memory so it can compute derivatives efficiently.
## Structure
```rust
TypeTree(Vec<Type>)
Type {
offset: isize, // byte offset (-1 = everywhere)
size: usize, // size in bytes
kind: Kind, // Float, Integer, Pointer, etc.
child: TypeTree // nested structure
}
```
## Example: `fn compute(x: &f32, data: &[f32]) -> f32`
**Input 0: `x: &f32`**
```rust
TypeTree(vec![Type {
offset: -1, size: 8, kind: Pointer,
child: TypeTree(vec![Type {
offset: -1, size: 4, kind: Float,
child: TypeTree::new()
}])
}])
```
**Input 1: `data: &[f32]`**
```rust
TypeTree(vec![Type {
offset: -1, size: 8, kind: Pointer,
child: TypeTree(vec![Type {
offset: -1, size: 4, kind: Float, // -1 = all elements
child: TypeTree::new()
}])
}])
```
**Output: `f32`**
```rust
TypeTree(vec![Type {
offset: -1, size: 4, kind: Float,
child: TypeTree::new()
}])
```
## Why Needed?
- Enzyme can't deduce complex type layouts from LLVM IR
- Prevents slow memory pattern analysis
- Enables correct derivative computation for nested structures
- Tells Enzyme which bytes are differentiable vs metadata
## What Enzyme Does With This Information:
Without TypeTrees (current state):
```llvm
; Enzyme sees generic LLVM IR:
define float ``@distance(ptr*`` %p1, ptr* %p2) {
; Has to guess what these pointers point to
; Slow analysis of all memory operations
; May miss optimization opportunities
}
```
With TypeTrees (our implementation):
```llvm
define "enzyme_type"="{[]:Float@float}" float ``@distance(``
ptr "enzyme_type"="{[]:Pointer}" %p1,
ptr "enzyme_type"="{[]:Pointer}" %p2
) {
; Enzyme knows exact type layout
; Can generate efficient derivative code directly
}
```
# TypeTrees - Offset and -1 Explained
## Type Structure
```rust
Type {
offset: isize, // WHERE this type starts
size: usize, // HOW BIG this type is
kind: Kind, // WHAT KIND of data (Float, Int, Pointer)
child: TypeTree // WHAT'S INSIDE (for pointers/containers)
}
```
## Offset Values
### Regular Offset (0, 4, 8, etc.)
**Specific byte position within a structure**
```rust
struct Point {
x: f32, // offset 0, size 4
y: f32, // offset 4, size 4
id: i32, // offset 8, size 4
}
```
TypeTree for `&Point` (internal representation):
```rust
TypeTree(vec![
Type { offset: 0, size: 4, kind: Float }, // x at byte 0
Type { offset: 4, size: 4, kind: Float }, // y at byte 4
Type { offset: 8, size: 4, kind: Integer } // id at byte 8
])
```
Generates LLVM:
```llvm
"enzyme_type"="{[]:Float@float}"
```
### Offset -1 (Special: "Everywhere")
**Means "this pattern repeats for ALL elements"**
#### Example 1: Array `[f32; 100]`
```rust
TypeTree(vec![Type {
offset: -1, // ALL positions
size: 4, // each f32 is 4 bytes
kind: Float, // every element is float
}])
```
Instead of listing 100 separate Types with offsets `0,4,8,12...396`
#### Example 2: Slice `&[i32]`
```rust
// Pointer to slice data
TypeTree(vec![Type {
offset: -1, size: 8, kind: Pointer,
child: TypeTree(vec![Type {
offset: -1, // ALL slice elements
size: 4, // each i32 is 4 bytes
kind: Integer
}])
}])
```
#### Example 3: Mixed Structure
```rust
struct Container {
header: i64, // offset 0
data: [f32; 1000], // offset 8, but elements use -1
}
```
```rust
TypeTree(vec![
Type { offset: 0, size: 8, kind: Integer }, // header
Type { offset: 8, size: 4000, kind: Pointer,
child: TypeTree(vec![Type {
offset: -1, size: 4, kind: Float // ALL array elements
}])
}
])
```
… r=Mark-Simulacrum Allow shared access to `Exclusive<T>` when `T: Sync` Addresses libs-api request in rust-lang#98407 (comment). Adds the following trait impls to `Exclusive<T>`, all bounded on `T: Sync`: - `AsRef<T>` - `Clone` - `Copy` - `PartialEq` - `StructuralPartialEq` - `Eq` - `Hash` - `PartialOrd` - `Ord` - `Fn` ``@rustbot`` label T-libs-api
Reland "Add LSX accelerated implementation for source file analysis" This patch introduces an LSX-optimized version of `analyze_source_file` for the `loongarch64` target. Similar to existing SSE2 implementation for x86, this version: - Processes 16-byte chunks at a time using LSX vector intrinsics. - Quickly identifies newlines in ASCII-only chunks. - Falls back to the generic implementation when multi-byte UTF-8 characters are detected or in the tail portion. Reland rust-lang#145963 r? ``@lqd``
Fix --extra-checks=spellcheck to prevent cargo install every time Fixes rust-lang#147105 ## Background Current implementation of `ensure_version_of_cargo_install` uses `bin_name` to check if it exists, but it should use `<tool_root_dir>/<tool_bin_dir>/<bin_name>` instead. Otherwise the check fails every time, hence the function falls back to install the binary. ## Change Move lines which define bin_path at the top of the function, and use bin_path for the check
|
@bors r+ rollup=never p=5 |
|
☀️ Test successful - checks-actions |
|
📌 Perf builds for each rolled up PR:
previous master: 8d72d3e1e9 In the case of a perf regression, run the following command for each PR you suspect might be the cause: |
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing 8d72d3e (parent) -> c8905ea (this PR) Test differencesShow 79 test diffsStage 1
Stage 2
Additionally, 36 doctest diffs were found. These are ignored, as they are noisy. Job group index
Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard c8905eaa66e0c35a33626e974b9ce6955c739b5b --output-dir test-dashboardAnd then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
|
Finished benchmarking commit (c8905ea): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 0.5%, secondary 1.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 2.6%, secondary -2.7%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 469.737s -> 470.874s (0.24%) |
Successful merges:
&raw [mut | const]for union field in safe code #141469 (Allow&raw [mut | const]for union field in safe code)Exclusive<T>whenT: Sync#146675 (Allow shared access toExclusive<T>whenT: Sync)r? @ghost
@rustbot modify labels: rollup
Create a similar rollup