Skip to content

.await "blowing up" state machine unnecessarily #152141

@gmarcosb

Description

@gmarcosb

Trying to get core::future::Ready<> handling to be more optimal (this is useful when supplying an async API that is smart enough to, at compile-time, not take the extra overhead if the API isn't actually async)

Originally posted by @gmarcosb in #62958

I tried to optimize with the following tricks in the hope that it would work (see full usage in commit); unfortunately, it seems the rust compiler (even in this straightforward case) is still generating all the "cruft" for the state machine:

#[inline(always)]
pub const fn extract_ready_check<const B: bool>(_: ReadyCheck<B>) -> bool {
    B
}

#[macro_export]
macro_rules! process_maybe_async {
    ($source:expr) => {
        match $source {
            fut => {
                #[allow(unused_imports)]
                // use $crate::dm::{MaybeReady, IsReady, IsNotReady, NotReadyFallback};
                use $crate::dm::{MaybeReady, IsReady, IsNotReady};
                let is_ready : bool = $crate::dm::extract_ready_check((&&fut).get_check());
                // even with check for true vs is_ready, there's still binary size bloat!
                if true {
                    fut.get_ready()
                } else {
                    fut.await
                }
           }
        }
    };
}
pub struct ReadyCheck<const B: bool>;

pub trait MaybeReady : core::future::Future {
    fn get_ready(self) -> Self::Output;
}

// 1. The General Case: Any Future
impl<F: core::future::Future> MaybeReady for F {
    default fn get_ready(self) -> Self::Output {
        const {
            panic!("This future is not a Ready<T> type!");
        }
    }
}

// 2. The Specialized Case: Specifically Ready<T>
impl<T> MaybeReady for core::future::Ready<T> {
    fn get_ready(self) -> T {
        self.into_inner()
    }
}

pub trait IsReady<T> {
    #[inline(always)]
    fn get_check(&self) -> ReadyCheck<true> { ReadyCheck }
}
impl<T> IsReady<T> for &&core::future::Ready<T> {}

pub trait IsNotReady<T> {
    fn get_check(&self) -> ReadyCheck<false> { ReadyCheck }
}
impl<T, F: core::future::Future<Output = T>> IsNotReady<T> for &F {}

I then have:

#[inline(always)]
fn fn_1(o : &SomeObject) -> core::future::Ready<String> {
    core::future::ready(o.do_stuff())
}

#[inline(always)]
fn fn_2(o : &SomeObject, s: String) -> core::future::Ready<String> {
    core::future::ready(o.do_other(s))
}

I would expect A:

async fn my_example_a(o : &SomeObject) -> String {
   let s = process_maybe_async!(fn_1(o));
   process_maybe_async!(fn_2(o, s))
}

To result in the exact same binary as B:

async fn my_example_b(o : &SomeObject) -> String {
   let s = o.do_stuff();
   o.do_other(s)
}

But it doesn't; instead, it results in ~the same binary bloat (more bloat, surprisingly) as C:

async fn my_example_c(o : &SomeObject) -> String {
   let s = fn_1(o).await;
   fn_2(o, s).await
}

See full compiling code in playground ASM, where the binary delta between 3 methods is clear

Of course it would be nice if A, B, and C all resulted in the exact same compilation; but at a minimum, with all the hints given, A and B should result in the same binary; preferably with if is_ready {as well vs justif true {

I've also brought this up in discourse: https://internals.rust-lang.org/t/async-await-optimizations-could-make-language-even-more-powerful/23973

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-async-awaitArea: Async & AwaitC-bugCategory: This is a bug.I-heavyIssue: Problems and improvements with respect to binary size of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions