-
-
Notifications
You must be signed in to change notification settings - Fork 14.4k
Description
Trying to get core::future::Ready<> handling to be more optimal (this is useful when supplying an async API that is smart enough to, at compile-time, not take the extra overhead if the API isn't actually async)
Originally posted by @gmarcosb in #62958
I tried to optimize with the following tricks in the hope that it would work (see full usage in commit); unfortunately, it seems the rust compiler (even in this straightforward case) is still generating all the "cruft" for the state machine:
#[inline(always)]
pub const fn extract_ready_check<const B: bool>(_: ReadyCheck<B>) -> bool {
B
}
#[macro_export]
macro_rules! process_maybe_async {
($source:expr) => {
match $source {
fut => {
#[allow(unused_imports)]
// use $crate::dm::{MaybeReady, IsReady, IsNotReady, NotReadyFallback};
use $crate::dm::{MaybeReady, IsReady, IsNotReady};
let is_ready : bool = $crate::dm::extract_ready_check((&&fut).get_check());
// even with check for true vs is_ready, there's still binary size bloat!
if true {
fut.get_ready()
} else {
fut.await
}
}
}
};
}
pub struct ReadyCheck<const B: bool>;
pub trait MaybeReady : core::future::Future {
fn get_ready(self) -> Self::Output;
}
// 1. The General Case: Any Future
impl<F: core::future::Future> MaybeReady for F {
default fn get_ready(self) -> Self::Output {
const {
panic!("This future is not a Ready<T> type!");
}
}
}
// 2. The Specialized Case: Specifically Ready<T>
impl<T> MaybeReady for core::future::Ready<T> {
fn get_ready(self) -> T {
self.into_inner()
}
}
pub trait IsReady<T> {
#[inline(always)]
fn get_check(&self) -> ReadyCheck<true> { ReadyCheck }
}
impl<T> IsReady<T> for &&core::future::Ready<T> {}
pub trait IsNotReady<T> {
fn get_check(&self) -> ReadyCheck<false> { ReadyCheck }
}
impl<T, F: core::future::Future<Output = T>> IsNotReady<T> for &F {}
I then have:
#[inline(always)]
fn fn_1(o : &SomeObject) -> core::future::Ready<String> {
core::future::ready(o.do_stuff())
}
#[inline(always)]
fn fn_2(o : &SomeObject, s: String) -> core::future::Ready<String> {
core::future::ready(o.do_other(s))
}
I would expect A:
async fn my_example_a(o : &SomeObject) -> String {
let s = process_maybe_async!(fn_1(o));
process_maybe_async!(fn_2(o, s))
}
To result in the exact same binary as B:
async fn my_example_b(o : &SomeObject) -> String {
let s = o.do_stuff();
o.do_other(s)
}
But it doesn't; instead, it results in ~the same binary bloat (more bloat, surprisingly) as C:
async fn my_example_c(o : &SomeObject) -> String {
let s = fn_1(o).await;
fn_2(o, s).await
}
See full compiling code in playground ASM, where the binary delta between 3 methods is clear
Of course it would be nice if A, B, and C all resulted in the exact same compilation; but at a minimum, with all the hints given, A and B should result in the same binary; preferably with if is_ready {as well vs justif true {
I've also brought this up in discourse: https://internals.rust-lang.org/t/async-await-optimizations-could-make-language-even-more-powerful/23973