Clamp Function for f32 and f64#100556
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
|
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @scottmcm (or someone else) soon. Please see the contribution instructions for more information. |
|
These are very susceptible to getting worse for different implementations, since they were defined in the RFC very intentionally to be able to be implemented as But I think we don't have a test for that right now. So could you please add a https://github.com/rust-lang/rust/tree/master/src/test/assembly test for it? I think this should work: // Floating-point clamp is designed to be implementable as max+min,
// so check to make sure that's what it's actually emitting.
// assembly-output: emit-asm
// compile-flags: -O -C "llvm-args=-x86-asm-syntax=intel"
// only-x86_64
// CHECK-LABEL: clamp_demo:
#[no_mangle]
pub fn clamp_demo(a: f32, x: f32, y: f32) -> f32 {
// CHECK: maxss
// CHECK: minss
a.clamp(x, y)
}
// CHECK-LABEL: clamp12_demo:
#[no_mangle]
pub fn clamp12_demo(a: f32) -> f32 {
// CHECK-NEXT: movss xmm1
// CHECK-NEXT: maxss xmm1, xmm0
// CHECK-NEXT: movss xmm0
// CHECK-NEXT: minss xmm0, xmm1
// CHECK-NEXT: ret
a.clamp(1.0, 2.0)
}Then if the new implementation of the method still passes (or produces similarly good ASM), then I'd be happy to use it! @rustbot author |
|
I checked the codegen in godbolt pub fn clamp_old(this: f32, min: f32, max: f32) -> f32 {
assert!(min <= max);
let mut x = this;
if x < min {
x = min;
}
if x > max {
x = max;
}
x
}
pub fn clamp_new(this: f32, min: f32, max: f32) -> f32 {
assert!(min <= max);
if this <= min {
return min;
}
if this > max {
return max;
}
this
}example::clamp_old:
ucomiss xmm2, xmm1
jb .LBB0_2
maxss xmm1, xmm0
minss xmm2, xmm1
movaps xmm0, xmm2
ret
.LBB0_2:
push rax
lea rdi, [rip + .L__unnamed_1]
lea rdx, [rip + .L__unnamed_2]
mov esi, 28
call qword ptr [rip + core::panicking::panic@GOTPCREL]
ud2
example::clamp_new:
ucomiss xmm2, xmm1
jb .LBB1_2
minss xmm2, xmm0
cmpnless xmm0, xmm1
andps xmm2, xmm0
andnps xmm0, xmm1
orps xmm0, xmm2
ret
.LBB1_2:
push rax
lea rdi, [rip + .L__unnamed_1]
lea rdx, [rip + .L__unnamed_3]
mov esi, 28
call qword ptr [rip + core::panicking::panic@GOTPCREL]
ud2 |
Thanks, this is what I was concerned about. For example, on Zen2 https://www.agner.org/optimize/instruction_tables.pdf lists Targeting that architecture specifically doesn't fix it either: https://godbolt.org/z/ET5sEff8q. That can (In general, down at the single instruction level, trying to branch to skip things is often slower than just running them.) So I think you could update this to use a |
|
(closing and immediately reopening to try to re-kick the PR build, since only 1 of the three jobs ran.) |
|
@scottmcm Thank you for your advice! I attempted to change the clamp function using example::clamp_old:
ucomiss xmm2, xmm1
jb .LBB0_2
maxss xmm1, xmm0
minss xmm2, xmm1
movaps xmm0, xmm2
ret
.LBB0_2:
push rax
lea rdi, [rip + .L__unnamed_1]
lea rdx, [rip + .L__unnamed_2]
mov esi, 28
call qword ptr [rip + core::panicking::panic@GOTPCREL]
ud2
example::clamp_new:
ucomiss xmm2, xmm1
jb .LBB1_5
ucomiss xmm1, xmm0
jae .LBB1_4
ucomiss xmm0, xmm2
movaps xmm1, xmm0
jbe .LBB1_4
movaps xmm1, xmm2
.LBB1_4:
movaps xmm0, xmm1
ret
.LBB1_5:
push rax
lea rdi, [rip + .L__unnamed_1]
lea rdx, [rip + .L__unnamed_3]
mov esi, 28
call qword ptr [rip + core::panicking::panic@GOTPCREL]
ud2This new clamp also has slightly different functionality than the original one, since I think it mutates the original value? and returns a float, as opposed to just returning a new value. fn clamp(mut self, min: f32, max: f32) -> f32 {
assert!(min <= max);
if self <= min {
self = min;
} else if self > max {
self = max;
}
self
} |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
scottmcm
left a comment
There was a problem hiding this comment.
Hopefully just the one more change and things'll be good 🤞
If you could also squash all the commits, I'd appreciate it.
Simple Clamp Function I thought this was more robust and easier to read. I also allowed this function to return early in order to skip the extra bound check (I'm sure the difference is negligible). I'm not sure if there was a reason for binding `self` to `x`; if so, please correct me. Simple Clamp Function for f64 I thought this was more robust and easier to read. I also allowed this function to return early in order to skip the extra bound check (I'm sure the difference is negligible). I'm not sure if there was a reason for binding `self` to `x`; if so, please correct me. Floating point clamp test f32 clamp using mut self f64 clamp using mut self Update library/core/src/num/f32.rs Update f64.rs Update x86_64-floating-point-clamp.rs Update src/test/assembly/x86_64-floating-point-clamp.rs Update x86_64-floating-point-clamp.rs Co-Authored-By: scottmcm <scottmcm@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
Co-authored-by: scottmcm <scottmcm@users.noreply.github.com>
|
Huzzah! The PR build finally stopped resisting. Thanks for your persistence here -- it's great to get an assembly test for this to record the work from the RFC in a way that's actually checked. @bors r+ |
|
@bors r- failed in a rollup #100664 (comment) |
|
@bors r+ |
Clamp Function for f32 and f64 I thought the clamp function could use a little improvement for readability purposes. The function now returns early in order to skip the extra bound checks. If there was a reason for binding `self` to `x` or if this code is incorrect, please correct me :)
Clamp Function for f32 and f64 I thought the clamp function could use a little improvement for readability purposes. The function now returns early in order to skip the extra bound checks. If there was a reason for binding `self` to `x` or if this code is incorrect, please correct me :)
Clamp Function for f32 and f64 I thought the clamp function could use a little improvement for readability purposes. The function now returns early in order to skip the extra bound checks. If there was a reason for binding `self` to `x` or if this code is incorrect, please correct me :)
Clamp Function for f32 and f64 I thought the clamp function could use a little improvement for readability purposes. The function now returns early in order to skip the extra bound checks. If there was a reason for binding `self` to `x` or if this code is incorrect, please correct me :)
Clamp Function for f32 and f64 I thought the clamp function could use a little improvement for readability purposes. The function now returns early in order to skip the extra bound checks. If there was a reason for binding `self` to `x` or if this code is incorrect, please correct me :)
…iaskrgr Rollup of 11 pull requests Successful merges: - rust-lang#100556 (Clamp Function for f32 and f64) - rust-lang#100663 (Make slice::reverse const) - rust-lang#100697 ( Minor syntax and formatting update to doc comment on `find_vtable_types_for_unsizing`) - rust-lang#100760 (update test for LLVM change) - rust-lang#100761 (some general mir typeck cleanup) - rust-lang#100775 (rustdoc: Merge source code pages HTML elements together v2) - rust-lang#100813 (Add `/build-rust-analyzer/` to .gitignore) - rust-lang#100821 (Make some docs nicer wrt pointer offsets) - rust-lang#100822 (Replace most uses of `pointer::offset` with `add` and `sub`) - rust-lang#100839 (Make doc for stdin field of process consistent) - rust-lang#100842 (Add diagnostics lints to `rustc_transmute` module (zero diags)) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
I thought the clamp function could use a little improvement for readability purposes. The function now returns early in order to skip the extra bound checks.
If there was a reason for binding
selftoxor if this code is incorrect, please correct me :)