Eric Holk

A Rose By Any Other Name

2024-04-19T00:00:00+00:00

There are currently two competing designs for async iteration traits for Rust. The first is poll_next. The second is async fn next. I see strengths to each design. The poll_next design seems stronger on technical concerns, such as performance and ease of implementation. The async fn next design seems better from an ergonomics and consistency perspective. Unfortunately, the process of resolve this debate has been slow going.

One thing I've realized in the debate is that at this point the two designs are more similar than they may seem at first glance. In this post I'd like to show how a handful of minor tweaks to each design results in something that is the same modulo names.

A Concrete Performance Difference🔗

But first, we're going to go on a bit of a diversion about the performance of the two designs.

One of the early arguments in favor of poll_next was that it was more efficient. In most cases, we expect the compiler to be able to optimize away any of the overhead that async fn next might introduce, but it seems better, all else being equal, to pick the design that doesn't require as much work from the compiler. That said, we discovered one case where the compiler cannot optimize away the overhead of async fn next, and that has to do with the using a dyn AsyncIterator object.

To see the difference, let's consider pseudo-code for a for await loop for the two designs. We'll also desugar the .await calls so we can see where the calls to poll happen.

Let's consider a simple function that receives a dyn AsyncIterator and iterates over it:

async fn sum(it: &Boxi32>>) -> i32 {
    let mut total = 0;
    for await i in it {
        total += i;
    }
    total
}

Using the poll_next design, this desugars to something like follows. I've left out the context parameter to poll_next for simplicity.

async fn sum(it: Boxi32>>) -> i32 {
    let mut total = 0;
    let it = Box::into_pin(it);
    loop {
        match it.poll_next() {  // poll_next call is indirect
            Poll::Ready(Some(i)) => total += i,
            Poll::Ready(None) => break,
            Poll::Pending => yield Poll::Pending,
        }
    }
    total
}

Using the async fn next design, this desugars to something like below. Note that the in this version AsyncIterator is not object safe, so we're assuming we have support using something like dyn*.

async fn sum(it: Boxi32>>) -> i32 {
    let mut total = 0;
    loop {
        let f: Pin* Futurei32>>> = pin!(it.next()); // next call is indirect
        let next = loop {
            match f.poll() { // poll call is indirect
                Poll::Ready(next) => break next,
                Poll::Pending => yield Poll::Pending,
            }
        };
        match next {
            Some(i) => total += i,
            None => break,
        }
    }
    total
}

The async fn next version has two indirect calls in its desugaring. First need to make an indirect call to next to get the future to poll to get the actual next item. The way async in trait objects works means that the resulting future will itself be a trait object, which means calling poll on that future will be an indirect call too.

It's possible these calls don't matter. Indirect calls are normally only relevant in CPU-bound workloads, while async is most often used for IO-bound workloads. In wg-async, we've talked a lot about the this extra indirections but I think it's mostly because we can objectively count how many indirect calls there are, while we can't objectively things like design elegance or ergonomics.

There will be cases where this overhead does matter though, so it is desirable to not bake these calls in at the language level. Can we tweak the design of async fn next to remove most of this overhead?

Reducing the `async fn next` overhead🔗

The key idea here is to change the semantics of the future returned by async fn next so that it's able to be polled again after completion. When polled after completion, it essentially calls next again but reuses the future. This is, in fact, how the Next future from the futures crate works. It wraps poll_next, so the future has the same semantics as poll_next.

If we make this semantic change, then we can desugar the async fn next version to something like below.

async fn sum(it: Boxi32>>) -> i32 {
    let mut total = 0;
    let mut f: Pin* Futurei32>>> = pin!(it.next()); // next call is indirect
    loop {
        let next = loop {
            match f.poll() { // poll call is indirect
                Poll::Ready(next) => break next,
                Poll::Pending => yield Poll::Pending,
            }
        };
        match next {
            Some(i) => total += i,
            None => break,
        }
        // the next time around the loop we'll use `f` again
    }
    total
}

We still have two distinct indirect calls, but the first one is only called once at the start of the loop rather than inside the loop for each iteration.

This code is now more convoluted than it needs to be. We can instead refactor it to only have one loop and one match expression:

async fn sum(it: Boxi32>>) -> i32 {
    let mut total = 0;
    let mut f: Pin* Futurei32>>> = pin!(it.next()); // next call is indirect
    loop {
        match f.poll() {  // poll call is indirect
            Poll::Ready(Some(i)) => total += i,
            Poll::Ready(None) => break,
            Poll::Pending => yield Poll::Pending,
        }
    }
    total
}

This version looks suspiciously like the poll_next desugaring.¹

Aside: This Doesn't Actually Work🔗

This optimization sounds good, but it doesn't actually work. I want to briefly touch on why but then let's just pretend we didn't notice this because I'd like to consider the rest of the argument.

The problem is that async fns only complete once. We might redefine the semantics of Future to allow multi-completion futures, but unless we also change async fn, the futures produced by async functions will still be single-completion. The key benefit of the async fn next is that users can hand-roll their own async iterators by writing async fn next and that this should be simpler than writing two state machines at once with a hand-written poll_next. We could add a modified version of async fn that lets you write a multi-completion future, but if we do this I'm pretty sure we'll end up with async gen fn with some different names. Or we could add a way to create a multi-completion future from an async closure, but if we make users go through all of this, we've given up the simplicity benefits of being able to just write async fn next.

Anyway, let's ignore this for now and continue on.

Iterator Setup🔗

So with the performance optimization, things are looking very similar. Both versions start with a little bit of setup code. The poll_next version needs to pin the iterator since poll_next takes a pinned argument. The async fn next version needs to call next once to get the future to poll, and then pins that as well. Then both versions go into a loop calling either poll_next, or a poll function that we've redefined to have the same semantics as poll_next.

The poll_next version does not have an analog to the next call in the async fn next version. I think this does exist, it's just not shown here. At some point, you need to set up your iterator state. You can't just call poll_next on a Vec, partly because the Vec would have to be pinned and that would be inconvenient, and also because it's not ideal to keep your iteration state attached to the Vec.

So let's imagine we have some operation which sets up the state needed to run an async iterator. In our poll_next examples, we've been assuming the iterator is set up before the call to sum. In other words, it's called as sum(my_vec.async_iter()) and not sum(my_vec).

What if, instead, we created some generic way to make an async iterator for some collection and moved that into the sum function? We could call this trait IntoAsyncIterator², and it would look something like:

trait IntoAsyncIterator {
    type Item;
    type AsyncIter: AsyncIteratorSelf::Item>;
    fn into_async_iter(self) -> Self::AsyncIter;
}

Then we could write the poll_next version of sum as:

async fn sum(it: Boxi32>>) -> i32 {
    let mut total = 0;
    let mut f: Pin* AsyncIteratori32>> = pin!(it.into_async_iter()); // into_async_iter call is indirect
    loop {
        match f.poll_next() {  // poll_next call is indirect
            Poll::Ready(Some(i)) => total += i,
            Poll::Ready(None) => break,
            Poll::Pending => yield Poll::Pending,
        }
    }
    total
}

Now we basically have the same code as the async fn next version. There's a subtle difference, since async fn next takes &mut self while into_async_iter takes self, but we could design the "initialize async iterator state" operation to take &mut self if we wanted. Also, I should point out that self methods are not allowed in object-safe traits, so as designed, we would not be able to have a dyn IntoAsyncIterator.

What's in a name?🔗

If we make the change to async fn next to allow the future to be polled again after completion and we add some kind of IntoAsyncIterator, then the two designs are basically equivalent. In both of them, the for await loop desugars into a setup phase (similar to have synchronous for loops call IntoIterator), and then there is a loop can calls a method to advance the state of the iterator.

If we make these two changes, then IntoAsyncIterator in the poll_next design is essentially what AsyncIterator is in the async fn next design. They are both the way to initialize iterator state. Then the AsyncIterator trait in the poll_next design is equivalent to the modified Future semantics in the async fn next design.³

So to me it seems that if we apply the multi-completion optimization to async fn next then the two designs have the same shape and most of the debate is around the boundaries and names of various subcomponents.

Conclusion🔗

I wrote the first draft of this post about three months ago and then hesitated to publish it because in writing it I discovered some fatal flaws in the argument. However, I've shared the draft privately with a few people and we've been referring to it often in discussions so I felt like it was worth putting it out in public. Others have come up with similar arguments to this post so I consider this post to be my particular take on the argument rather than the definitive statement.

At the beginning I had hoped to say that with the optimizations we've added to the async fn next design, the two designs are essentially the same. The implication then would be that we should stop arguing about names, pick something, and stabilize it.

Having written the post, I think we instead see an irreconcilable difference in the designs. While the multi-completion future optimization does lead to a design that's equivalent to poll_next, it gives up the key characteristic of the async fn next design, which is the ability to hand-write iterators with async fn next. It's not "here's a small tweak and now the designs are the same," it's "here's a small tweak and now async fn next is a completely different design."

So in conclusion, the two designs are not secretly the same and we are going to have to actually make a decision on which one we want to have.

In fact, I copied the poll_next desugaring and then made a few changes to get this version.↩

IntoAsyncIterator may not be the best name, because it implies it consumes its source. We might want a non-destructive way to iterate over something asynchronously.↩

If we go this route, I'd suggest having AsyncIterator::next return a subtrait of Future called something like IterableFuture or MultiCompletionFuture to highlight multi-completion semantics. This is similar to how FusedFuture adds additional semantics to the underlying Future trait.↩

Async Cancellation and Panic

2024-03-06T00:00:00+00:00

When I last wrote about async cancellation in Rust, I touched briefly on the question of how cancellation interacts with panic. Mostly I left it as an exercise for the reader and left a rough sketch for how I thought it would work. More recently, Boats touched on the issue in a little more detail, but I think there are still a lot of open questions. In this post, I'd like to experiment with unwinding using my cancellation prototype and build on some of the previous work in this area.

It's not as easy as I thought🔗

In the sketch I laid out before, I expected the core idea of supporting cancellation during unwinding would be to have the executor, and any mini-executors like race and join, would basically wrap calls to poll with catch_unwind, then in the Err case, call poll_cancel to completion and then call resume_unwind. In pseudo-code, that would look something like:

loop {
    match catch_unwind(|| task.poll(cx)) {
        Ok(Poll::Ready(x)) => return x,
        Ok(Poll::Pending) => continue,
        Err(panic) => {
            while Poll::Pending = task.poll_cancel(cx) {}
            resume_unwind(panic);
        }
    }
}

Unfortunately this doesn't work. It turns out I had some inkling this might be the case when I wrote:

There are other challenges though. One is that the poll_cancel functions will need to be written to be aware of the fact that they might be called during unwinding, which means the internal state for the future might be inconsistent.

To understand what's wrong, recall that I desugared cancellation-aware async blocks into coroutines. Rust coroutines only have one entry point, which is the resume method. I simulated two entry points (poll and poll_cancel) by passing another argument into resume. The thing is, once resume panics, coroutines cannot be resumed again and they will panic if you try. Since poll and poll_cancel are backed by the same resume method, this means we can't call poll_cancel after poll panics.

Some of this is an artifact of the way this experiment is structured. If we had proper compiler support for multiple entry points to a coroutine, we might be able to make this work. But I think it's more composable and more in line with existing precedent to follow a rule where all unwinding or cancellation work needs to finish before a panic leaves the poll call.

An approach that actually works🔗

This realization that we need to process and cancellations before unwinding out of poll felt constraining at first, but it actually simplifies a lot of the design. I thought we'd need to wrap basically every call to poll in catch_unwind, but in most cases this is unnecessary and we can instead let the usual unwinding machinery proceed as normal. The places where we do care are when we know of multiple futures and if one of them panics we need to cancel the rest.

Let's do on_cancel as an example. While I don't think on_cancel would be a great API to support in production, it is useful to focus on the specifics of cancellation behavior.

In the last post, I was thinking of on_cancel almost as an approximation of an exception handler. For our purposes today, I think it's more useful to think of it as a kind of future combinator. In this view, on_cancel produces a new future from two others, one that is the normal execution path, and another future that is run only when the future is cancelled.¹

Looking at it this way, we can see what we should do when the poll function on the main future panics. We aren't allowed to poll the future that's panicking anymore, because its internal state might be inconsistent. We have to trust that as poll was unwinding, the future ran any cancellation handlers that were on the stack. But, since we want cancel-on-unwind semantics, the on_cancel combinator needs to catch the panic, run the cancellation future to completion, and then resume unwinding.

Deriving the implementation🔗

Now let's see how to add cancellation on panic behavior to our existing on_cancel implementation. My last post didn't really go into the details on this, so let's start with a rough sketch of the previous on_cancel implementation. Throughout this section I'm going to ignore details like pinning and unsafe so we can focus on the main idea. I have a complete working implementation of the ideas in this section available at https://github.com/eholk/explicit-async-cancellation.

The on_cancel method returns a future that's carries a cancellation handler. While the details are hidden in the surface API, the struct and future implementation returned looks like this:

struct<F, H> OnCancel {
    future: F,
    on_cancel: Option,
}

impl Future for OnCancel
where
    F: Future,
    H: Future,
{
    type Output = F::Output;

    fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> {
        self.future.poll(cx)
    }

    fn poll_cancel(self: Pin<&mut Self>, cx: &mut Context) -> Poll<()> {
        // run the cancellation handler if it's still present
        if let Some(on_cancel) = self.on_cancel {
            match on_cancel.poll(cx) {
                // if cancellation is complete, clear the handler so we won't try to run it again
                Poll::Ready(()) => self.on_cancel = None,
                // cancellation is not finished, so yield to the caller.
                Poll::Pending => return Poll::Pending,
            }
        }

        // run any cancellation handlers on the inner future
        self.future.poll_cancel(cx)
    }
}

The poll function is pretty uninteresting. We just forward it to the inner future. The poll_cancel function is a little more subtle. The main thing we need to do is run the cancellation handler, which we do by calling poll on it. However, the inner future might also have nested cancellation handlers, so we need to call poll_cancel on it as well. This is also why I chose to wrap the cancellation hook in an Option, since I can use that as a flag to indicate whether the cancellation hook is finished.

As an aside, I chose to do outside-in cancellation semantics here since drop also runs outside-in. I'm not sure this was the right choice. For example, unwinding is inside-out instead. I think it's worth thinking harder about what the right ordering is, but for now it's easy to change and independent of our focus today.

Okay, so now that we have a basic on_cancel implementation, let's handle what happens if the call to the nested future's poll panics. In short, we need to wrap the call to poll in catch_unwind.

fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> {
    match catch_unwind(|| self.future.poll(cx)) {
        Ok(poll) => poll,
        Err(panic) => todo!("run the cancellation hook at then resume unwinding"),
    }
}

Now let's think about the Err case. Basically, we need to cancel ourselves, which we can do by calling poll_cancel. Then we need to resume unwinding. Because poll_cancel might take several tries to finish, we need to save the panic information so we can resume unwinding after it's done. So we'll add another field to OnCancel to optionally store the panic information.

struct<F, H> OnCancel {
    future: F,
    on_cancel: Option,
    panic: Option+ Send + 'static>>,
}

impl Future for OnCancel
where
    F: Future,
    H: Future,
{
    type Output = F::Output;

    fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> {
        match catch_unwind(|| self.future.poll(cx)) {
            Ok(poll) => poll,
            Err(panic) => {
                self.panic = Some(panic);
                match self.poll_cancel(cx) {
                    Poll::Ready(()) => resume_unwind(self.panic.take().unwrap()),
                    Poll::Pending => Poll::Pending,
                }
            },
        }
    }

    fn poll_cancel(self: Pin<&mut Self>, cx: &mut Context) -> Poll<()> {
        todo!("we'll come back to this in a minute")
    }
}

We're part of the way there, but we still have some problems. Assuming poll_cancel were correct (it's not, but we'll get there), we'd be okay if cancellation finished promptly. But if not, it will return Pending, which we'll bubble up to the caller. The caller doesn't know we're panicking, since we've hidden the panic information away in our panic field, so it will eventually call poll on us again. Unfortunately, this means we'll poll the inner future, which we've previously said is not allowed. So we need to make a small change to check if we're in the process of panicking when we're polled.

fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> {
    if self.panic.is_some() {
        match self.poll_cancel(cx) {
            Poll::Ready(()) => resume_unwind(self.panic.take().unwrap()),
            Poll::Pending => return Poll::Pending,
        }
    }

    match catch_unwind(|| self.future.poll(cx)) {
        Ok(poll) => poll,
        Err(panic) => {
            self.panic = Some(panic);
            match self.poll_cancel(cx) {
                Poll::Ready(()) => resume_unwind(self.panic.take().unwrap()),
                Poll::Pending => Poll::Pending,
            }
        },
    }
}

And now we're all set. If we're polled when there's panic information present then we never get to the call to self.future.poll(cx).

Now it's time to revisit poll_cancel. To share some logic, I had the panic path in poll call into poll_cancel, but this means we need to update poll_cancel to recognize that it can be called while panicking. Here's how:

fn poll_cancel(self: Pin<&mut Self>, cx: &mut Context) -> Poll<()> {
    // run the cancellation handler if it's still present (this part stays the same)
    if let Some(on_cancel) = self.on_cancel {
        match on_cancel.poll(cx) {
            // if cancellation is complete, clear the handler so we won't try to run it again
            Poll::Ready(()) => self.on_cancel = None,
            // cancellation is not finished, so yield to the caller.
            Poll::Pending => return Poll::Pending,
        }
    }

    // if we aren't panicking, run any cancellation handlers on the inner future
    // otherwise, resume unwinding
    match self.panic {
        None => self.future.poll_cancel(cx)
        Some(_) => resume_unwind(self.panic.take().unwrap()),
    }
}

The first part, where we run the cancellation hook, stays the same as before. In the second part, we would normally cancel the inner future, but remember that if we are panicking we aren't allowed to poll it again.

It's worth asking what we should do in the Some line though. At this point we know we are in the process of unwinding, and all cleanup code has finished. One option is to return Poll::Ready(()) here, and if we're called from poll then we could count on it calling resume_unwind. However, it could also be that while we were waiting on the cancellation to finish, the executor decided to cancel us. In this case, if we returned Poll::Ready(()) then we would swallow the exception. So instead, the right answer is to resume_unwind here as well.

So there we have it: how to cancel a future when polling it panics.

Should we do this?🔗

We've shown that it's at least somewhat possible to support async cleanup code while unwinding. I'll admit, beyond a basic smoke test, I haven't really probed the limits of this design. For example, what happens if we panic while running the cancellation handler as a result of another panic? Or what actually happens if the executor cancels us while we are cleaning up before resuming a panic? If we were to RFC something like this, these are all questions that we'd need to explore.

The reason I decided to go ahead and write this post without answering those questions is that in this post I think we've already learned enough that we can start evaluating this design and inform future options.

First of all, something about suspending while in the process of unwinding just feels fundamentally weird and uncomfortable. That said, I think we can develop a reasonable semantics for this behavior if we decide we want it.

But this also leads to a shortcoming that I'm not sure how to resolve. This prototype cannot work in #[no_std] environments, because catch_unwind and resume_unwind represent panic information as a Box, meaning we need an allocator. This is a non-starter for something that we'd want to consider building in as a core Rust language feature. The whole async/await system has been carefully designed not to need an allocator, and we need to preserve this property. After all, async/await has found a lot of success in microcontroller environments!

Is this necessary though? Or is it an artifact of trying to prototype a system purely in library code without compiler support? As an analogy, we could imagine prototyping destructors using catch_unwind, but rustc is able to generate code to run destructors during unwinding without needing to reify the exception.

Unfortunately I don't think we can avoid the issue in the same way. The problem is that normal unwinding doesn't suspend the execution at all, while we very much need to be able to do that to await in the unwinding path. This means the exception does need to be stored somewhere (presumably with the future), and we need to be able to resume unwinding later. If you're using a work-stealing executor, this means it's even possible that your task could start unwinding on one thread and finish on another. So we need somewhere to store the exception that's not ephemeral in the way that it is during the Rust-generated unwind code.

There might be other options that could work. For example, the executor could reserve some space for each task that's large enough to hold most panics. Most likely the way we'd accomplish this is by attaching something to the Context that gives access to it. Maybe it'd be specific to panics, or maybe it'd be a more general task-local bump allocator or something like that. At any rate, we could add API surface for a minimal allocator to support awaiting while unwinding without needing a full-blown global allocator. These could be made optional, which would give executors the option of aborting if they cannot or don't want to support async unwinding.

Another option would be to have the compiler not automatically generate calls to poll_cancel while unwinding, and instead provide something like an async version of catch_unwind. I think something like this is what boats was proposing. The nice thing about this option is that we can completely give up on supporting #[no_std]. Furthermore, we don't have to worry about being "zero cost," since the fact that the user called async_catch_unwind signals that they're willing to pay the cost that's needed.

That said, it's not clear how that should interact with do ... final blocks if we were to add them.² For example, the final block would presumably run during unwinding in sync code, so it seems like we'd also need to do it while unwinding in async code. Unfortunately, as far as I can tell that will run into the same allocation problems.

So to go back to the question of whether we should do this, I think we need more exploration. There are some options, but from my exploration here it seems like it's hard to satisfy all our requirements. But maybe one of these, or some other option, can strike a decent compromise.

With a small tweak, we could approximate a finally clause, by making it so we run the cancellation future even if the main future completes successfully.↩

I really like the idea of do ... final! I had hoped to explore that some in this post but I felt there was enough material here without it.↩

How to Shrink Rust

2024-01-08T00:00:00+00:00

While doing some housekeeping on my blog over the weekend, I can across an ancient post by Patrick Walton. While I didn't realize it at the time, this post embodies what has become one of my core principles in program language design.[^spiky-blob] In re-reading Patrick's post, this quote stood out in particular:

Language design tends to go in cycles: we grow the language to accommodate new functionality, then shrink the language as we discover ways in which the features can be orthogonally integrated into the rest of the system. Classes seem to me to be on the upward trajectory of complexity; now it’s time to shrink them down. At the same time, we shouldn’t sacrifice the functionality that they enable.

This cycle of growing and shrinking as a key part of the process in the early days of Rust. Upon reading this section, I found myself asking "how could we shrink Rust today?"

What happened to classes?🔗

To be honest, I had forgotten Rust had classes at one point. I remembered resources and objects, but forgot we had a brief window where there were classes. Patrick's post explains what happened to them. Essentially, once we added classes and a bunch of other features, we realized that classes combined five features that we could implement independently in a way that's more general. These, along with their modern replacements in Rust, are:

Nominal records, replaced by struct.
Constructors, replaced by struct literal syntax and plain functions that are conventionally called new.
Field-level privacy, replaced by module-level privacy.¹
Attached methods, replaced by inherent impls.
Destructors, replaced by Drop trait.

Some of these features weren't so much replaced as removed. For example, it's hard to claim Rust has constructors today, other than by convention. Similarly, if I remember right, at the time Rust also had the struct keyword, so you used struct if you just wanted a nominal record or class if you wanted the rest of these features. Or in the case of field-level privacy, we basically just decided this feature wasn't necessary.²

For the two features that had a clear replacement, by decoupling them from classes we gained a lot more power. You can attach methods to any type now, like enums and even primitive types, not just classes. Destructors are much simpler now too, since you implement Drop just like any other trait.³

The end result of this was we replaced a large feature, classes, with a handful of smaller, orthogonal features. The result was something that composed better⁴ and gave us more power and flexibility.

What does this have to do with Rust today?🔗

To me the key take away, at least looking back from over a decade later, is that a big part of why Rust is the way it is today is that we were able to add a bunch of features and then pare them down once we got some experience. In Rust's history, it's had three different ways to do destructors, and while I don't recall exactly, I suspect at least two of these coexisted at some point.

It's somewhat harder to follow this model now. In the early days, we made breaking syntax changes sometimes multiple times a week.⁵ At that time, the Rust team was a handful of people, about as many interns, and some people who hung out on IRC. Today the community is much larger and people are using Rust in mission-critical projects where they can't afford to make weekly syntax updates. And of course, Rust 1.0 came with a promise that there would be no more breaking changes. You can can rely on Rust to keep working tomorrow.

Rust is still able to grow, but shrinking is much harder, and as a result, we have to be much more conservative in how Rust grows. We have some ability to shrink through the editions system, but this is still not a great mechanism for rapidly iterating on designs.

Anyway, I don't really have a solution, or even necessarily a clearly defined problem. I mostly just wanted to observe that developing Rust is harder today because we mostly have to look at things incrementally. It's much harder to design a set of interrelated features that maybe by themselves wouldn't be particularly noteworthy but together are quite powerful.

Fortunately, Rust does have the nightly compiler, and a process for experiments. That seems like the right environment to do the kind of language experimentation today that was possible in the early days. This is the same codebase that becomes the stable compiler, so we still need to emphasize stability and maintainability, but liberal experimentation in the nightly compiler with many different Rust features at once seems like it has the possibility to do the same kind of broad scale language iteration that we did in the early days while staying true to Rust's stability promises.

I've since started calling this my Spiky Blog Theory of Programming Languages, but it deserves a post of its own.↩

One way of looking at this is that classes included their own module or namespace, and this was seen as unnecessary complexity.↩

It might seem nice to be able to make fields on a struct private today, but that requires us to pull in an number of other features. In particular, you need some methods that you can make public which do have access to the private fields. That's why there were attached methods before, and something like that could work with impls but it would be tricky since impls are a lot more flexible.↩

⁴

Early Rust had resource types which were basically a wrapper around a type that included a destructor. In some ways it was nice because most things didn't have destructors, but it also meant when you needed one you had to put your code through some contortions to make it work well with an attached destructor. Also, while it's tempting to say Drop is just like any other trait, it's not really because it has special meaning to the compiler.↩

⁵

I expect had we kept classes it'd be common to have classes that just wrap an enum, since otherwise we wouldn't have had a way to attach methods to enums. Eventually we probably would have invented some kind of enum class syntax.↩

⁶

This is a big part of why rustfmt is so good, because that was how we rewrote the whole compiler every time we had a major breaking syntax change, which was not uncommon.↩

Rethinking Rust's Function Declaration Syntax

2023-12-15T00:00:00+00:00

We had a fun discussion in #t-lang about possible new syntax for declaring functions in Rust. There were a lot of cool ideas put forward, and while mulling them over I realized a lot of them work nicely together and can be introduced in a backwards-compatible way to give us some cool new capabilities. While these were fresh in my mind and I'm feeling excited about them, I wanted to write them in one place.

For background, top level functions in Rust look sort of like this:

fn foo(x: i32) -> i32 {
    x + 1
}

In Rust 2018, we added async fn:

async fn foo(x: i32) -> i32 {
    x + 1
}

While that one doesn't do anything particularly interesting, an async function gives you the ability to use await inside it. It also secretly changes the return type from an i32 to an impl Future. This is regarded by many to have been a mistake, and it's starting to cause issues now that we have async functions in traits since there is no way to add additional bounds like Send to the return type. Anyway, async fn foo is mostly just syntactic sugar that desugars into:

fn foo(x: i32) -> impl Futurei32> {
    async { x + 1 }
}

It's likely that Rust will gain a whole bunch of new keywords we can stick in front of fn in the future.¹ For example, nightly Rust just got support for gen fn and async gen fn. Those desugar similar, by wrapping the return type in impl Iterator or impl AsyncIterator and wrapping the body in gen { } or async gen { }.

Another piece of sugar we could add is try fn, which is actually what started off the discussion thread today. Following the pattern we've had so far, we'd expect to be able to write something like:

try fn foo() -> i32 {
    let x = read_number()?;
    x
}

and have this desugar to:

fn foo() -> impl Tryi32, Residual = ???> {
    try {
        let x = read_number()?;
        x
    }
}

The problem is we need a hint for the Residual type. The obvious thing to do would be to add something to the function header, like try fn foo() -> i32 throws E. But if you've ever looked at the Residual types for the Try impls in the standard library, you know that these can look pretty hairy and not particularly intuitive. For example, to make a function that returns an Option, we'd need to write:

try fn foo() -> i32 throws Option {
    let x = read_number()?;
    x
}

This would give the compiler enough information to find the Try impl for Option. But notice that we also could have just written fn foo() -> Option, which is shorter and you don't have to figure out why my fallible function has an Infallible in it.

At this point, Lukas Wirth observed that they would rather see a shorthand for functions whose body is a single expression. If we did this, we could write try fn as:

fn foo() -> Option<i32> = try {
    let x = read_number()?;
    x
}

So that's pretty neat.

This also invites us to reconsider async fn. We could instead write:

fn foo() -> impl Futurei32> = async {
    let x = read_number().await;
    x
}

That's not too bad, but impl Future is a bit wordy. We could come up with some rules that would let you write impl Future instead, which honestly is how we usually read that out loud anyway. But then joboet and pitaj pointed out that we could treat Trait -> Type as shorthand for Trait. TC pointed out that we could probably generalize this to support yields T and Iterator.

So if we combined a few of these ideas, we'd be able to write:

fn foo() -> impl Future -> i32 = async {
    let x = read_number().await;
    x
}

I think this shows a lot of potential. I want to try to generalize this a bit more though. Instead of special-casing the Output associated type, we could create a set of attributes indicate an associated type can be used with trait keyword shorthands. For example, define the Future and Iterator traits like this:

trait Future {
    #[keyword(return)]
    type Output;

    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
}

trait Iterator {
    #[keyword(yields)]
    type Item;

    fn next(&mut self) -> Option<Self::Item>;
}

This would let us refer to Future as Future -> T and Iterator as Iterator yields T.

We could even combine them:

trait Coroutine {
    #[keyword(yields)]
    type Yield;

    #[keyword(return)]
    type Return;

    fn resume(self: Pin<&mut Self>, arg: R) -> CoroutineState<Self::Yield, Self::Return>;
}

fn coroutine() -> impl Coroutine<()> -> bool yields i32 = || {
    yield 42;
    true
}

This would also let remove some of the special handling around the Fn* traits and we could expose this functionality to users so libraries could use this sugar in their own traits.

At this point, I'd like to take a step back and think about plain fn functions. Notice that the following two would be equivalent:

fn foo() -> i32 {
    let number = read_number();
    number
}

fn foo() -> i32 = {
    let number = read_number();
    number
}

One way of think of this is that we've made the = optional. But I'd like to think of it a different way. Let's say instead we think of the = form as the standard function declaration syntax. Then, if the function body consists of a single block, we can use a compressed syntax. For a regular { } block, that just looks like the function declaration syntax we're used to. But for blocks with a keyword in front, like async { } or try { }, we say the keyword moves all the way to the front of the function header. In addition, each block as an characteristic trait associated with it, so when we used the block shorthand for function declarations, we also wrap an impl Trait around the return type.

Here are some examples:

// async ////////////////////////////////////////

async fn foo() -> i32 {
    let number = read_number().await;
    number
}

// desugars to:

fn foo() = impl Futurei32> = async {
    let number = read_number().await;
    number
}


// gen //////////////////////////////////////////

gen fn foo() -> i32 {
    yield 1;
    yield 2;
}

// desugars to:

fn foo() = impl Iteratori32> = gen {
    yield 1;
    yield 2;
}

// assuming `Iterator` is defined like:

trait Iterator {
    #[keyword(return)]
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
}


// async gen ////////////////////////////////////

async gen fn foo() -> i32 {
    yield 1;
    yield 2;
}

// desugars to:

fn foo() = impl AsyncIteratori32> = async gen {
    yield 1;
    yield 2;
}

I've left try out because it's complicated. You could technically do something like:

try fn foo() -> i32 throws Option {
    let number = read_number()?;
    number
}

but for try users usually want to know the concrete type. So instead I'd expect most people to prefer the desugared form:

fn foo() -> Option<i32> = try {
    let number = read_number()?;
    number
}

Note that the "pulling the keyword forward" transformation doesn't work because this function returns a concrete type and what I've proposed here is that pulling the keyword forward always adds an impl Trait rather than a concrete type.

Anyway, I'm pretty excited about this idea.² It feels like a consistent way to handle these connections between blocks, traits, and functions. It's backwards compatible with the syntax we have so far, but it gives us a lot more expressiveness in cases where we're currently missing it.

You can already do unsafe fn and const fn today, but these don't desugar in the same way as other proposed keywords here do.↩

Of course, I also just started thinking about this today and cranked out a blog post, so I may hate it by Monday.↩

A Mechanism for Async Cancellation

2023-11-14T00:00:00+00:00

One of the items on our Async 2027 Roadmap is to come up with some kind of asynchronous cleanup mechanism, like async Drop. There are some tricky design questions to making this work well, and we need to start thinking about these now if we want to have something ready by 2027.

In this post, I'd like to explore a low level mechanism for how we might implement async cancellation. The goal is to explore both how an async executor¹ would interact with cancellation, as well as to make sure that this mechanism would support reasonable surface-level semantics. You can think of this as a kind of compilation target for higher level features, similar to how the Rust compiler lowers async fn into coroutines.

If you haven't read my last post on Cancellation and Async State Machines, I'd encourage you to do so. That post provides a kind of theoretical background for what we'll implement in this post.

Introducing `poll_cancel`🔗

Lately I've been working on a prototype implementation of async/await, as well as changes to Future and related traits, that supports more flexible cancellation. I'd like to discuss this prototype, the tradeoffs made, and what I've learned about cancellation from the exercise. Note that what I'm presenting here is α-equivalent to several previous proposals, including Boats' poll_drop_ready RFC and a proposal by tvalloton on IRLO. My main contribution here is a prototype implementation that lets us write examples and explore their behavior.

A Cancellable `Future`🔗

The core of the idea is to extend the Future trait with a new poll_cancel that has a default implementation. The new trait would look like this:

pub trait Future {
    type Output;
    
    fn poll(self: Pin<&mut Self>, cx: Context) -> Poll<Self::Output>;
    
    fn poll_cancel(self: Pin<&mut Self>, cx: Context) -> Poll<()> {
        Poll::Ready(())
    }
}

In this new trait, poll has the same semantics as before. The new poll_cancel method performs two operations. First, it transitions the future's state machine from its normal execution path to the correct cancellation state. Second, poll_cancel continues to advance the state machine until the cancellation is complete.

The fact that poll and poll_cancel return different types highlights that fact that cancellation is a different exit from the future. A cancelled future returns no value, so poll_cancel returns Poll<()> instead of Poll This matches what we saw in my previous post where we had a different final state for a future that was cancelled versus one that completed normally.

There are some attractive properties about this approach. The default implementation of poll_cancel leads to the same behavior that we have for cancellation today, where cancelling a future just means synchronously dropping it. This suggests we can get a nice migration path, although adding a new default method to a trait is technically a breaking change.

There are significant shortcomings, which I'll discuss further down. But first, I'd like to look at how poll_cancel works with async and await.

Cancellation with `async` and `await`🔗

Most people writing async Rust should not have to deal with poll directly. Most of the time we use higher level constructs like async and await instead. The nice thing about async and await in Rust is that there's nothing particularly magical about them.² The can be thought of as desugaring into lower level constructs, and this desugaring happens in a way that you could mostly implement them both as macros.³ The primary benefit for building them into the language is that we can have nicer syntax and nicer diagnostics.

The fact that we can think of async and await as macros that desugars into lower level concepts means we can experiment with cancellation by writing a new set of macros that that call poll_cancel in the appropriate place. Most of the action will be in the changes we make to await.

The goal here is to come up with a desugaring that has predictable cancellation behavior that is also usually the desired behavior.

The somewhat surprising thing to me is that await mostly just forwards calls to poll, but doesn't have a lot of interesting future behavior. The interesting behavior (such as making sure a Waker gets called sometime in the future) all happens in hand-written Future impls. We can see this in the approximate desugaring of await from the Rust Language Docs:

match operand.into_future() {
    mut pinned => loop {
        let mut pin = unsafe { Pin::new_unchecked(&mut pinned) };
        match Pin::future::poll(Pin::borrow(&mut pin), &mut current_context) {
            Poll::Ready(r) => break r,
            Poll::Pending => yield Poll::Pending,
        }
    }
}

This block of code runs when some code higher up the call stack calls our poll method. What this block of code is doing is basically calling the awaited future's poll function in a loop. If that future returns Pending, we yield Pending. From this code, the compiler will generate a Future::poll function that returns Pending when the function would yield Pending.

This happens deeper than in the compiler than we can do with macros, but we can approximate something different. Originally, the compiler actually generated an object that implemented Generator (now Coroutine) and the standard library had a wrapper that adapted the Generator into a Future. We'll use this approach for our prototype.

We'll want to handle cancellation similarly to how polling is handled, where await also forwards calls to poll_cancel along the await chain until we arrive at a future that knows how to do something interesting with cancellation.

Looking at how we might extend the desugaring of await to support poll_cancel, we need to distinguish whether we're on the cancel path or the normal execution path so we can call either poll_cancel or poll depending on the context. We'll punt on this and assume we have a magic is_cancelled variable that can tell us this, which is similar to the current_context variable in the previous desugaring.

So let's see how this first step looks:

match operand.into_future() {
    mut pinned => loop {
        let mut pin = unsafe { Pin::new_unchecked(&mut pinned) };
        if !is_cancelled {
            match Pin::future::poll(Pin::borrow(&mut pin), &mut current_context) {
                Poll::Ready(r) => break r,
                Poll::Pending => yield Poll::Pending,
            }
        } else {
            match Pin::future::poll_cancel(Pin::borrow(&mut pin), &mut current_context) {
                Poll::Ready(()) => panic!("What do I do after cancelling?"),
                Poll::Pending => yield Poll::Pending,
            }
        }
    }
}

It's like before, but we check if we are cancelled first. If we are not, we continue with the previous behavior, calling poll and breaking out or the loop if the future is Ready or yielding Pending otherwise.

If we are cancelled we do almost the same thing, except we call poll_cancel instead. If the cancellation is Pending, we yield again. But if the cancellation is complete, we have to decide what to do next. In the normal case, we have break r, which passes r out to the surrounding context, which is expecting a value of whatever type r is. We can't do the same thing when the cancellation is complete because while r might be type (), we can't rely on that. For now we panicked, since that type checks, but this obviously doesn't work.

We can get some inspiration from our state machines we saw earlier. Cancellation effectively means we have two exit states for the function: normal return and cancelled. But functions in Rust only have one exit state⁴, so we need to reify this into some data type that shows which final state you'd be in if you could have multiple final states. It turns out the Rust standard library has one we can use for this purpose: Result.⁵ So to report that an async fn or async block was successfully cancelled, we can return something like Err(Cancelled) and Ok(T) in the success case. Factoring this into our approximate await desugaring gives us:

match operand.into_future() {
    mut pinned => loop {
        let mut pin = unsafe { Pin::new_unchecked(&mut pinned) };
        if !is_cancelled {
            match Pin::future::poll(Pin::borrow(&mut pin), &mut current_context) {
                Poll::Ready(Ok(r)) => break r,
                Poll::Pending => yield Poll::Pending,
            }
        } else {
            match Pin::future::poll_cancel(Pin::borrow(&mut pin), &mut current_context) {
                Poll::Ready(()) => return Err(Cancelled),
                Poll::Pending => yield Poll::Pending,
            }
        }
    }
}

In the desugaring of async {}, we'll also need to wrap all the normal exit paths with Ok().

The Generator Adapter🔗

In the previous section I gave a rough sketch of how to desugar async and await into generators in a way that supports cancellation. Now I want to fill in some of the details by looking at how this resulting generator becomes a future.

If we were implementing this for real in Rust, we'd probably just have the compiler implement Future directly, like it currently does for async blocks. But, using generators lets us implement and experiment with this in a crate without having to modify the compiler.⁶

So, if we did everything right in the previous section, we should end up with a compiler-generated generator that implements Generator<(Context, bool), Yield = (), Return = Result, where T is the output type of the Future and Cancelled is just a marker tag like struct Cancelled. The argument to the resume function, (Context, bool), is a tuple containing the Context as well as a bool indicating whether the future is cancelled. This bool would get bound to the is_cancelled variable in the await desugaring above.⁷

Now we can make these into futures as follows:⁸

impl Future for G
where
    G: core::ops::Generator>,
{
    type Output = O;

    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        match self.resume((cx, false)) {
            GeneratorState::Yielded(()) => Poll::Pending,
            GeneratorState::Complete(Ok(v)) => Poll::Ready(v),
            GeneratorState::Complete(Err(Cancelled)) => panic!("child future cancelled itself"),
        }
    }

    fn poll_cancel(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<()> {
        match self.resume((cx, true)) {
            GeneratorState::Yielded(()) => Poll::Pending,
            GeneratorState::Complete(Ok(_)) => {
                panic!("future completed after being cancelled")
            }
            GeneratorState::Complete(Err(Cancelled)) => Poll::Ready(()),
        }
    }
}

Our implementation needs to cover both poll and poll_cancel, but they are both pretty similar. Each one forwards the call to the generator's resume method and then adapts the result into something expected by the surrounding async code.

Generators only have a resume method, but in this post we've extended Future to have two methods. So when we go from a call to poll or poll_cancel to a call to resume, we need to tell resume which version it is. We do this by passing an extra boolean, which the generator uses to determine whether it should go along the normal execution path or the cancellation path.

Generators return either Yielded or Complete, which for futures correspond to Pending and Ready. Because we've made resume return a Result to indicate whether the future was cancelled, we have some more cases to check. We don't want to bubble the Result out to user code; we want to keep it hidden inside the monad. From the user's perspective, this is still just a future that evalutes to a T, not a fallible future.

So we have this invariant, that in poll, resume should never return an Err(Cancelled) and in poll_cancel, resume should never return Ok. The first case would mean that the future cancelled itself, which is not the way cancellation works in Rust. The second case would mean the cancellation failed, that after being cancelled the future completed normally. In this design we're also choosing not to model that case.⁹ In an ideal world, the compiler would be able to prove both of these cases are unreachable, or we'd design the API so that these cases aren't even possible to write. Honestly, this is one of the aspects of this design that I'm least satisfied with. I'd like to experiment with different factorings that would let us get rid of the panics.

Anyway, that's the rough idea of how this design works. I haven't written the complete implementation here because I find prose more informative than code, but I do have a prototype implementation at https://github.com/eholk/explicit-async-cancellation if you want to see the full details.

But for now, let's see what this lets us do.

Scenarios🔗

My prototype includes a macro called async_cancel!, which is similar to async {} blocks, except with support for cancellation handlers. This is meant to be paired with the awaitc! macro, which is analogous to .await, but with support for cancellation handlers.¹⁰ Because these are not built in syntax, they are ugly and hard to read in the examples I've prototyped so far. So in this section, I'll write out examples as if async and await supported cancellation handlers in the way described above.

First, I want to introduce a convenience called on_cancel. This gives us a way to run asynchronous code along the cancellation path. This is important to show that everything actually works how we want, but I'm not really a fan of the API and would prefer it not be the standard way to run code on cancellation. Think of this as a placeholder for something like defer {} blocks or async Drop.¹¹ I've implemented on_cancel as an extension method on futures that takes a future and runs that future on the parent future's cancellation path. That's a little confusing to read, but in code it looks like this:

async {
    do_something().await;
    println!("all done!");
}.on_cancel(async {
    println! ("cancelled!");
}).await;

In my examples, I'll also make liberal use of futures like pending() and ready(), which never complete and immediately complete respectively.

A Cancellation-aware Executor🔗

The first thing we need is an executor that is aware of cancellation. We'll make a simple one that runs a single task, similar to block_on. If for some reason the executor is dropped before the root task completes, then in the executor's drop function will call poll_cancel on the root task until it's complete. In pseudo-code, our executor looks something like this (actual code is here):

impl Executor {
    /// Run the root task to completion
    fn run(&mut self) -> T {
        loop {
            match self.poll_once() {
                Poll::Pending => continue,
                Poll::Ready(result) => return result,
            }
        }
    }
    
    /// Poll the root task once
    fn poll_once(&mut self) -> Poll::Pending {
        let context = self.context();
        self.root_task.poll(context)
    }
    
    // Definition of `context` is omitted
}

impl Drop for Executor {
    fn drop(&mut self) {
        let context = self.context();
        while let Poll::Pending = self.root_task.poll_cancel(context) {}
    }
}

This gives us just enough to experiment with cancellation behavior. We can run simple futures like this:

fn main() {
    let root_task = async {
        42
    };
    let mut exec = Executor::new(root_task);
    let result = exec.run();
    println!("the root task returned {result}");
}

This program would run and print out

the root task returned 42

We have some more power though. Rather than using run to poll to completion, we can use poll_once some number of times to leave the future in an incomplete state. If the executor is dropped before the future is complete, it will run the cancellation path in the executor's drop function.

Here's a basic example showing cancellation:

fn main() {
    let root_task = async {
        pending().await;
        println!("all done!");
    }.on_cancel(async {
        println!("the task did not finish")
    });
    
    let mut exec = Executor::new(root_task);
    exec.poll_once(); // pending
    exec.poll_once(); // still pending
    drop(exec); // just give up
}

In this example, the root task blocks on pending(), which will never finish. But we attached a cancellation handler that runs when the executor is dropped before finishing the future. Running this program produces:

the task did not finish

So we have the basics of cancellation support and cancellation handlers. Now lets see how this composes with more interesting futures.

Cancellation-aware Combinators🔗

I'm using "combinators" here to mean futures which combine or otherwise transform other futures in interesting ways.¹² By this definition, we've already seen the on_cancel combinator, which lets you override the cancellation behavior of a future.

Let's consider another one: race. We'll use a very simplified version of race, which looks like a.race(b). This takes a future a and a future b and runs them both concurrently. When one finishes, race will cancel the other and return the value from the one that finished first.

The code for this looks horrible, so I'll leave it out of the post and focus mainly on how it looks to use it.

Here's an example using race with a cancellation handler:

fn main() {
    let root_task = pending().on_cancel(async {
        println!("future `a` was cancelled");
    }).race(async {
        42
    });
    
    let mut exec = Executor::new(root_task);
    let result = exec.run();
    
    println!("result: {result}");
}

In this example, our root task consists of a race between pending() and async { 42 }. The pending() future never finishes. We've attached a cancellation handler to it so we can see some indication that it was cancelled. So the race combinator sees that the second future returns 42 while the first is still pending. Before returning, it runs the first future's cancellation handler, printing future `a` was cancelled. Then it returns 42 as the overall value of the race future. This program's output is:

future `a` was cancelled
result: 42

Cancel during Cancellation🔗

The poll_cancel mechanism we're discussing is able to support what I earlier called idempotent cancellation.¹³ This means that if you cancel a future whose cancellation process has already started then the cancellation process continues as before.

To get a feel for how this works, let's look at a rather contrived example:

fn main() {
    // we'll use `done` to create a future that blocks until some other code
    // sets the `done` to true.
    let done = &RefCell::new(false);
    let root_task = async { 
        // we're going to race `a` and `b`, so we'll create those two futures
        // separately.
        let a = async { 42 };
        // when b cancels, we want a cancellation handler that can print a
        // message for us the first time it's polled. We'll use
        // `cancel_started` to track that.
        let mut cancel_started = false;
        let b = pending().on_cancel(poll_fn(|_| {
            if !cancel_started {
                // print a message if it's our first time through.
                println!("begin cancelling `b`");
                cancel_started = true;
            }
            // Only complete if someone has set `done` to true.
            if *done.borrow() {
                println!("cancellation of `b` complete");
                Poll::Ready(())
            } else {
                Poll::Pending
            }
        }));
        
        a.race(b).on_cancel(async {
            println!("cancelling `race` future");
        }).await;
    }.on_cancel(async {
        println!("cancelling root future");
    });
    
    // Poll the futures a few time, then let the executor shut down
    let mut executor = Executor::new(root_task);
    let _ = executor.poll();
    let _ = executor.poll();
    let _ = executor.poll();
    *done.borrow_mut() = true;
}

The behavior here is pretty subtle, so let's see the output and break down why we get this behavior. The output from this program is:

begin cancelling `b`
cancelling root future
cancelling `race` future
cancellation of `b` complete

The core of this program is that we race two futures (line 28), one that returns immediately (line 8), and one that never completes (line 13). We've attached a bunch of cancellation handlers at various points so we can observe the behavior and the order that things happen in.

The cancellation handler on b is pretty complex, but the idea here is create a future that waits until some flag is set. We wanted to simulate something that takes a little bit of time to complete, but not an unbounded amount, so that we can interrupt the cancellation.

So, we start running, the async { 42 } completes immediately and then race has to start cancelling b. This shows up in the line begin cancelling `b` . This cancellation does not complete, even though we poll a few more times, because no one has set done to true.

The next step is to trigger the second cancellation of b. We do this by letting the executor go out of scope without completing, which means the destructor calls poll_cancel on the root task. This is when we see cancelling root future appear. This gets passed on to the race future because of the way we've desugared await, so we see the program print cancelling `race` future. In the implementation of race, its poll_cancel method cancels any futures that have not either completed or been cancelled. In our case, this means we call poll_cancel on b again, but this time the call chain originates in the executor's destructor rather than the normal execution of race.

Finally, since the done flag has been set, b's cancellation can complete and we see it print out cancellation of `b` complete.

If we had instead supported recursive cancellation, we would have had the option of having b's cancellation handler terminate early. There are likely cases where both options would make sense, but here we've chosen to use idempotent cancellation semantics across the board.

Cancel during Unwind🔗

This one is left as an exercise for the reader (or a future blog post here), but I don't see any fundamental reason why we can't do it.¹⁴ The gist of the idea is that anywhere we call poll, we'd want to wrap that in catch_unwind. If the poll function panics, we'd want to catch that, then call the future's poll_cancel method to completion, and then call resume_unwind to continue unwinding.

It will be annoying to have to do a poll, catch_unwind, poll_cancel, resume_unwind dance everywhere, but the basic idea should work.

There are other challenges though. One is that the poll_cancel functions will need to be written to be aware of the fact that they might be called during unwinding, which means the internal state for the future might be inconsistent.

Evaluation🔗

Writing this post gave me the chance to thoroughly explore this design. I would say overall I think this design has enough shortcomings that I don't want to advocate it as the solution for async cancellation handlers. I still think this is useful because the shortcomings can help us find a design with fewer, or at least more acceptable, compromises. The fact that I've been able to implement this as a prototype means we can easily pivot and explore variations.

That said, I wouldn't have written so much about this design if I didn't think it had some merit. So now I'd like to discuss what I see as some of the greatest strengths and shortcomings.

Strengths🔗

In my mind, the biggest strength is that it feels like a relatively small extension to async Rust, but it still gives a lot of benefits. It's basically one new method on the Future trait, as well as a minor change to the way async and await desugar. We can provide a default implementation of poll_cancel which preserves the status quo semantics for cancellation and therefore makes the migration path pretty easy in most cases. Of course, we're going to come back to this in the Weaknesses section because it's not all roses.

This design makes it clear what the responsibilities are for well-behaved executors (and executor-like things, like future combinators) to make sure cancellation behavior makes sense.

I think this design also works well with the requirement that futures are pinned. For example, and alternate approach could be adding a method like fn cancel(self) -> impl Future. The problem is that once a future has been pinned, you can't pass it as self. Instead, the signature would have to be something like fn cancel<'a>(self: Pin<&'a mut Self>) -> impl Future, which I think is going to be annoying for executors to work with in practice. Cancelling in place strikes me as significantly simpler.


All of the benefits I've talked about in this post are available without what strike me as significantly more extensive language changes.
For example, this gives us some way to run code on cancellation paths without needing complete support for async Drop.
Of course, this leads to significant shortcomings that we'll see in Weaknesses.
On the bright side, I think something like the poll_cancel API can serve as a compilation target for cancellation, the same way that poll is a compilation target for await.
Weaknesses🔗
The weaknesses in this design range from what to me seems rather tolerable to some that I find completely unacceptable.
On the more tolerable end of the spectrum, there's the fact that this API feels a little fragile.
We have a requirement that once you call poll_cancel on a future you can never call poll again, but the compiler can't do anything to prevent you from doing that.
This kind of requirement isn't unprecedented though.
For example, with futures you already aren't supposed to call poll again after the future has completed, but the compiler doesn't stop you from doing that.
In both cases, we can mitigate this by treating await as the normal interface to poll and poll_cancel and guaranteeing that those generate correct code.
Calling poll and poll_cancel directly would then be considered an advanced use case, so we can tolerate more complex requirements there.¹⁵
I'm slightly more concerned about the migration path.
As a strength, I mentioned that the default impl of poll_cancel means without any additional action, futures will retain their present-day behavior.
In many cases, this is perfectly fine, but it's probably the wrong default for future combinators.
For example, suppose you were using an async IO crate that supported asynchronously cancelling operations in flight, but you put one of those futures behind an older version of race that did not yet support poll_cancel.
In this case, when the race future is cancelled, it would fall back on the default implementation, which says "ok, all good, nothing left to do," without calling poll_cancel on the IO operation.
The result would be that the programmer has to be extremely careful to make sure that everything in their call chain handles cancellation correctly.
Cancellation would be best effort, at best.
You definitely could not rely on this for safety!
One possible way to avoid this might be to introduce poll_cancel through a CancellableFuture trait instead.
Doing this in a way that's backwards-compatible would be tricky though.
Related to this shortcoming, poll_cancel puts a heavy burden on executor and future combinator authors.
It's already tricky to write a state machine that calls poll. Having to add poll_cancel calls to that state machine as well is going to be a lot of error-prone work.
We might be able to factor some of this work into common libraries that make it easier though.
But to me the most critical shortcoming of this design is that it it's easy to forget to cancel a future.
Fortunately, as long as your future is always behind an await, you should be okay.
On the other hand, there are common patterns that would now be error-prone.
For example, consider the following example with FuturesUnordered:
let mut futures = FuturesUnordered::new();
futures.push(async { do_something().await; });
futures.push(async { do_something_else().await; });
futures.next().await;
drop(futures);

Here we've added two futures to a FuturesUnordered collection.
When we call next(), it will poll both futures until one of them completes, and then the next() future will complete.
This means that futures is still holding on to a partially completed future.
But, when we drop(futures), there's no way to run poll_cancel because drop must complete synchronously.
So, our only option right now is to just not cancel the future.
I suppose one way to work around this shortcoming is to try to argue that FuturesUnordered is a bad API.
Maybe I could redefine what we mean by structured concurrency to say that FuturesUnordered is unstructured and the cancellation mechanism we've described here only works for structured concurrency.
If I were to take this approach, our example would look more like this when using a redesigned FuturesUnordered collection:
FuturesUnordered::with(async |futures| {
    futures.push(async { do_something().await; });
    futures.push(async { do_something_else().await; });
    futures.next().await;    
}).await;

This solves the problem by making it so that FuturesUnordered::with does no work until its awaited, so there is never any partially completed future that's not under an await point.
It's less than ideal for a few reasons though.
Stylistically, it adds more rightward drift.
But more importantly, this API makes it hard to put a FuturesUnordered in another data structure, which can be quite useful in many situations.
Plus, in my subjective opinion, the original version feels more Rusty.
Without a solution, I think this issue will make cancellation handlers so unreliable as to not be useful.
In fact, they will likely do more harm than good.
This leaves me convinced that we need some more general solution, like async Drop.
The key thing is to have some mechanism for the compiler to make sure, in an async function, that any values that need cancelled are cancelled.
To be honest, I'm a bit disappointed by this realization.
I haven't personally seen a design for async Drop that I love¹⁶, so I was hoping that something like poll_cancel would give us most of the benefits of async Drop without having to wrestle with as many complex design issues.
That said, I think a design like poll_cancel complements a higher level feature like async Drop.
Even if we have a async Drop, we need to figure out how these get run and whether we can get the properties we want in order to build on them.
I think a variation on poll_cancel would give us a useful lower level target to build a more powerful feature like async Drop on top of.
Related Work🔗
If you've been following this space for a while, the ideas I've discussed here probably sound very familiar.
I wanted to take the time to both acknowledge the work that's come before, but also highlight the ways in which my proposal here differs from earlier work.
One of the earliest versions I'm aware of is the (now abandoned) poll_drop_ready RFC from Boats.
One of the biggest differences is that the RFC focuses a lot on compiler-generated async drop glue to call poll_drop_ready and make sure things are cleaned up well, while I've left that completely out of scope for this post.
I appreciated the RFC's careful consideration of issues around pinning and fusing poll_drop_ready.
I've not really thought about these issues in my post, but I think we will need to if we move forward with this or a similar design.
I also appreciated that the RFC called out that the synchronous drop would still be called after poll_drop_ready returns Ready(()).
That feature was implicit in my design as well, but I think it is better to call it out.
The most important distinction, however, is that I have focused mainly on cancellation semantics in this post (that is, what if a future is not polled to completion?), while it seems that poll_drop_ready is called as part of the parent future completing normally through poll.
In other words, it seems executors are not intended to call poll_drop_ready directly.
This has some implications on when the programmer can assume poll_cancel/poll_drop_ready will be called.
There was another proposal on IRLO to add poll_cancel to the Future trait that is syntactically exactly the same as I've described here.
The semantics look essentially the same as I've describe here as well, with perhaps some minor variations.
For example, in my design I've imagined you do not have to call poll_cancel on a future that's never been polled.¹⁷
I think the guarantees on the contract in the IRLO post are stronger than I was hoping we'd need here---I imagined we could get away with saying something like "a well-behaved executor should..." rather than "you must."
In particular, I didn't have the requirement that "A polled future may not be dropped without poll_cancel returning ready," and instead imagined such a thing would be impolite but not illegal.
I think the biggest contribution I've made in my post is showing how to adjust the desugaring of async and await to work with poll_cancel, giving us an answer to how "to generate a state machine that can keep track of a future in mid cancellation as a possible state."
Another excellent contribution in this area is A case for CancellationTokens.
One of the things I really like about the post is the review of the major options in this space, including request_cancellation, poll_cancel, async fn cancel and cancellation tokens.
If you haven't read it yet, that section alone is worth the read!
The main idea behind cancellation tokens is to have some bit of state that's carried along the await chain and futures can check whether they've been cancelled and activate the correct behavior in that case.
It has some nice benefits around composability, and seems to be better at traversing code that is not cancellation-aware, which is a major shortcoming of poll_cancel as I've describe it here.
One thing I find interesting is that although on the surface cancellation tokens and poll_cancel look like extremely different mechanisms, they have more in common than it appears.
For example, the extra is_cancelled flag we added in the async and await desugaring looks an awful lot like a cancellation token.
I think it'd be worth exploring this connection in more depth.
The last idea I want to explore is request_cancellation, which seems to have been first introduced in some early async vision notes by Niko Matsakis.
This is framed as a replacement Future trait called Async which includes a request_cancellation method.
The idea is that after calling request_cancellation on a future subsequent calls to poll would proceed along the cancellation path rather than the normal execution path.
This has a couple of strengths.
It avoids the possibility of calling poll after calling poll_cancel.
More importantly though, request_cancellation can be used to support recursive cancellation.
After writing this post, I'm actually pretty excited about request_cancellation because it seems strictly more powerful than poll_cancel.
Conclusion🔗
In this post we've made an in-depth exploration of how a poll_cancel API would support cancellation handlers in Rust.
The design includes a prototype implementation which allows us to write real programs to get a feel how cancellation behaves.
In the course of doing this, we realized that poll_cancel has some significant shortcomings and is probably not the best mechanism for cancellation handlers going forward.
But, we also see promise for related proposals to address the specific shortcomings we've identified.

¹
I'm using executor broadly here to basically mean "any code that calls poll on futures directly." This obviously includes async runtimes, but also includes many future combinators like race or join.↩

²
This is a little bit of a lie. They desugar into generators and yield expressions, which do involve a fair amount of compiler magic to implement. The key thing here is that we don't have to do much additional magic if we can rely on the compiler to give us support for generators.↩

³
Indeed, in the early days of async Rust, await! was in fact implemented as a macro.↩

⁴
Well, not quite. Anything can panic, which you can treat as another final state for a function.↩

⁵
Option would work just as well.↩

⁶
TC has also shown that we can emulate coroutines using async/await, so it's probably even possible to do all of this on stable Rust.↩

⁷
We could also add a Context::is_cancelled() method and just pass one parameter. There are a lot of ways to plumb this around.↩

⁸
This is pseudo code. I'm assuming the pinning stuff just works. Also, my actual implementation had some transmute crimes that I've left out here for clarity.↩

⁹
This "complete after cancel" case is one that could reasonably happen. For example, maybe you sent a request to a server, started to cancel it, but before you could the server sent back a response saying the request was completed. One possible behavior is to just drop the return value and say the cancellation was actually successful. In code this would mean replacing the panic!("future completed after being cancelled") line with Poll::Ready(()). The design in this post doesn't do this, but futures themselves are empowered to handle this case however they see fit.↩

¹⁰
If this were Scheme, I'd call this macro something like await/c or await/cancel, but Rust doesn't let us use / in identifiers.↩

¹¹
Incidentally, I'm also not entirely in love with defer {} and async Drop, but I think async Drop in particular solves a lot of problems I don't know how to solve otherwise.↩

¹²
Sometimes I also find it helpful to think of combinators as mini executors, since combinators and executors both call poll functions on other futures directly.↩

¹³
I don't think it would take too much to extend this to support recursive cancellation, but that's left for another post or an exercise for the reader. I think they key thing is you need some way to tell how many times you've been cancelled. One way is to add a depth or count parameter to poll_cancel. Another is to have cancelling a future destroy the old future and create a new future that represents the cancellation of the old one, which could itself be cancelled.↩

¹⁴
Whether we want to do it is a fair question though.↩

¹⁵
This is somewhat related to fusing futures and iterators. I haven't really touched on what happens if you call poll_cancel after the future is cancelled, but I think Boats' earlier proposed RFC on poll_drop_ready makes a pretty good case that poll_cancel should require fused semantics -- that is, that you can call poll_cancel again after it completes and nothing bad happens.↩

¹⁶
For example, I haven't seen a good way to run async destructors without introducing implicit await points. I like that right now we have the property that you can see anywhere an async fn might suspend by looking for await. Although, if I'm totally honest, this may not actually be that useful of a property.↩

¹⁷
The reason for this was to try to make it so we could get away with only having to deal with poll_cancel in the desugaring of await. Given the issue with FuturesUnordered, I don't think we can get away with only calling poll_cancel as part of await and will probably need some kind of compiler-generated drop glue cancellation path. Thus, it's probably simpler and better overall to have poll_cancel called even on futures that haven't been polled yet.↩



Cancellation and Async State Machines
2023-11-08T00:00:00+00:00
If you've been doing async for a while, you've probably heard someone say something like "the compiler takes an async function and converts it to a state machine."
I want to dive into this more, since we can think of cancellation as making the state machine more complex.
In this post, I'll show how to build a state machine for a simple async function.
Then we'll we'll see how the state machine changes if we want to be able to run async code during cancellation.
Finally, we'll explore some of the design space around cancellation, particularly what happens if a future that has been cancelled is cancelled again, and see how state machines can suggest several possibilities.

Let's use the program below as a running example.
In real life, this would probably return a Result, but I want to avoid the extra complexity around additional early exits.
async fn load_data(file: AsyncFile) -> DataTable {
    let mut data = Vec::new();
    let result = file.read_to_end(&mut data).await;
    result.unwrap(); // We're ignoring proper error handling
    parse_data(data)
}

For an async function's state machine, states are made up of the the code between await points.
Or alternatively, you can think of await points as edges between states.
For this program, the state machine would look like this:

You might notice I pulled a bit of a fast one on you.
I said await points turn into edges in the state transition diagram, so we'd expect to see just one edge labeled await.
Instead, we have two edges labelled await and one without a label.
What's going on?
First, some conventions.
I realized it's helpful to see some of the traditional control flow in addition to suspension or await points.
I've represented these edges as a solid, unlabeled line.
These mean that control transfers from the previous state immediately to the second state without any suspension.
Our example is a relatively simple strait-line program so the actual control flow graph isn't particularly interesting but this will change a little when we look at cancellation.
The other edge we have in this graph is the await edge.
These edges are labeled await and are dotted lines to indicate that execution is interrupted---the future will suspend and give the executor the chance to switch to another future for a time.
Finally, I've introduced a couple of special states that do not exactly correspond to any code the user wrote.
These states are shown in orange.
Now let's turn our attention to why the diagram shows two await edges but the await keyword only shows up once in the program.
Every async fn has an implicit suspend point that represents the time between when the function is called and when is first polled.
In this diagram, I've represented this as an await edge going from the start state to the first line of the function.
In general, you don't have to worry about this hidden initial suspend point too much because async function calls are almost always immediately awaited.
In other words, it's more common to see foo().await instead of let future = foo(); /* do some other stuff */; future.await.
Cancellation🔗
The state machine we've looked at so far does not do a good job of representing cancellation.
Let's try to extend it to do so.
Today in Rust cancellation simply means you stop polling the future, and instead it is dropped.
When dropping something like a closure or a future returned by an async fn, Rust needs to recursively drop the values store in (in other words, captured by) the closure or future.
Depending on what state the future is in when it is dropped, there are different values that need to be captured.
In our example, if we drop the future before we pull it, we only need to drop the AsyncFile that was passed in as a parameter.
On the other hand, if the future is dropped at the await point, we also need to drop the Vec that we read the file contents into.
We can add some extra states to our graph to illustrate this.

I've specifically called out drop along the cancellation path, but Rust also drops values during the normal exit path.
I've left the normal drops out for simplicity.
I like thinking of async functions this way because we can use it to make several observations about cancellation in Rust.
Many of these seem rather obvious, but they raise important requirements for designing a system that can handle cancellation well.
Observation 1: Cancellation is a state change. When we cancel a future, it transitions from its normal running states to a cancellation path.
Currently this happens implicitly when a future is dropped, but in the future we will probably want a way to explicitly transition a future to its cancellation path.
Observation 2: Async cancellation handlers¹ require adding await points on the cancellation path. At the moment, cancelling futures is synchronous.
This shows up in the async state graph in the fact that there are no await edges on the cancellation path.
If we want to allow for cancellation handlers, we will need to add await points in the cancellation path.
This may be obvious, but this also implies we need a way to make sure executors continue to poll futures that have been cancelled.
Observation 3: Cancellation is an alternate exit. An async function that has been cancelled does not exit through the normal return path.
From the perspective of an async function author, this shows up as the function not continuing to execute past an await point.
From a types standpoint, a function cannot exit normally in general because we may not yet have a value of the right type to return.
In our example we can see that the type of the function does not allow it to exit at the await point, because at that point we have not created a DataTable to return.
This observation has implications that will show up in the types of the API we eventually design for cancellation handlers.
Cancellation Cancellation🔗
Another thing we can explore with a state graph is what behaviors are possible if a cancelled future is cancelled again.
One common way this could happen is if you have something like a race combinator that returns the value of the first future to complete and cancels the other one.
If the race combinator is itself cancelled while it was cancelling the slower sub-future, the slower sub-future would be cancelled twice.
FIXME: write out and explain a code example of this case.s
Let's look at this in the abstract with state machines.
There are a couple of possibilities for how to handle cancellation of cancellation.
I'll consider three of them, inspired by the zero one infinity rule.
0. Cancelling during cancellation is not allowed🔗
Once we have support for cancellation handlers, it will definitely be possible to write code that leads to trying to cancel a cancellation.
The race example we mentioned earlier is one example.
So in this option, we would declare cancelling a cancellation to be an error.
We have some flexibility on what mechanism we'd use exactly, but I think the best option would be to panic.
I think in practice this option is not feasible.
Cancellation flows from top to bottom (e.g. an executor decides to terminate a task early and so runs the task's cancellation handler), but the higher levels do not know anything about the internal behavior of futures.
An executor that is cancelling a task does not know if one of the task's subfutures is trying to cancel a future already.
1. Cancelling a cancellation is idempotent🔗
In this version, cancelling an already-cancelled future is basically a no-op.
In state machines, it would look something like this:

The key point here is that any of the cancel states have a cancellation edge that comes back to the same state.
In other words, cancelling once your future has already been cancelled means you stay in the same state and continue executing the cancellation handler before.
What does this mean in practice?
It essentially means you can trust that your cleanup code in a cancellation handler will run to completion.
Admittedly, this might take additional rules, like we may want to declare it to be undefined behavior to not poll a cancelled future to completion².
Scoped tasks would likely need this guarantee, but we could consider weaker ones, like that a "well-behaved" executor will poll cancelled futures to completion.
The "well-behaved" guarantee is roughly what we have today for Drop, so it might be similarly useful.
The downside is that this also means we can add cancellation behavior that can take arbitrarily or even infinitely long.³
We might decide instead that cancellation means something like "request graceful shutdown" but then forcibly terminate a future if it takes too long.
For this we need recursive cancellation.
∞. Cancelling a cancellation is recursive🔗
In this version, canceling an already cancelled future would transfer us to a separate cancellation path.
That cancellation path could also be cancelled, and its cancellation could be cancelled, and so on.
In pictures, recursive cancellation looks like this:

While an infinite regress of cancellations might seem ridiculous, there are some cases where it might be useful.
There's also a nice regularity to it.⁴
One class of problems where this might be useful are cases where you have optional cleanup work to do but you can cancel it if needed for a more prompt shutdown.
Of course, I'm not sure this is really all that useful in practice, and if you need it there might be other ways to do it.
More importantly, there are many cases where you absolutely do not want to cancel the cancellation.
For example, maybe you have a transaction future whose cancellation path rolls back the transaction.
You do not want to stop the rollback before it's complete, or else you've completely defeated the purpose of transactions.
That said, recursive cancellation appears to be strictly more powerful than idempotent cancellation because if you have recursive cancellation you should be able to implement idempotent cancellation where needed (basically, you just ignore the subsequent cancellation signals and stay in the same state you were in).
Seen this way, recursive cancellation gives us a lot of flexibility.
It means individual futures can implement either behavior, according to what best fits their needs.
The main thing the Rust language would need to do is design reasonable defaults and set expectations so people authoring futures can encapsulate their specialized behavior.
Conclusion🔗
We've long talked about async functions as state machines, so in this post we looked at how you might draw a state transition diagram for async functions.
This gave us a way to play with cancellation and look at what various cancellation semantics might imply in terms of the shape of the state transition diagram.
I've found it really helpful to think about async cancellation this way, so I hope others find it useful as well!
This post was originally part of a larger post about implementing a prototype of async cancellation handlers.
The larger post was taking a long time and I felt like the content in this post was useful on its own so I wanted to go ahead and publish it.
While I no longer like to promise that a followup post is coming soon⁵, I do have most of the longer post drafted so chances are good I will get it out soon.
Plus, I did commit to discussing it at the WG Async Reading Club next week, so there is a little pressure on.
Anyway, please reach out if you have any thoughts or questions!

¹
I'm using "cancellation handlers" refer broadly to mechanisms to allow running async code on the cancellation path. This would likely be async Drop, I want to use a more general term to emphasize there are multiple possibilities here.↩

²
This will require us to mark something unsafe somewhere.↩

³
This is true of drop already today. I can write fn drop(&mut self) { loop {} } and my program will hang when the destructor tries to run.↩

⁴
One of the things that bugs me about the idempotent version of cancellation is that you can call any future from either a normal execution or cancellation path, but in the cancellation path they effectively become uncancellable. It's not actually a problem, since not cancelling a future is always a choice your allowed to make, but the asymmetry still bothers me.↩

⁵
I'm sure you'll find plenty of examples on my blog of posts I said were coming soon that did not, in fact, come soon, if they ever came at all.↩




Ideas on How to Elect Rust Project Directors
2023-07-11T00:00:00+00:00
One of the first tasks for the Rust Leadership Council is to elect new Project Directors.
But before we can do that, we need to create a process for doing so.
To do this, and in the spirit of delegation from the Leadership Council, we've formed a smaller group to focus on designing this process.
This group so far consists of myself, Jane Losare-Lusby, and Ryan Levick.
We have the beginnings of a proposal, but I wanted to write it up in my own blog to help make sure I understand it.
Note that this is a draft proposal at best at this point and nothing is set in stone.
I also want to recognize Jane's work in coming up with this process.
This is largely based on her initial suggestion and I want to make sure I'm not taking credit for something I didn't come up with.
But, any failings in this post should be viewed as my own and not hers.

What are Project Directors?🔗
Rust is split into two major organizations: the Rust Foundation and the Rust Project.
The Foundation does a few things.
It provides a legal structure to hold Rust's intellectual property.
It provides an entity for organizations to contribute financially to support Rust.
It supports the long term health of the Rust project.
One way it does this is through the Community Grants Program.¹
The Foundation is governed by a Board of Directors, and five of the seats on the Board of Directors are reserved for members of the Rust Project.
These seats are known as Rust Project Directors.
Project Directors are meant to serve for a term of two years with the hope that we can stagger terms and rotate out a subset of the directors this year.
Unfortunately, due to the lack of Rust Project governance over the better part of the last two years, we have not appointed new directors in place of those whose terms are completed.
Instead, the Foundation Board has voted several times to extend the terms.
Currently the terms are set to expire on September 21, 2023 so we'd like to be able to appoint new ones without having to ask for another extension.
How should we select Project Directors?🔗
There are a number of desired features and constraints on this process.
First of all, the Foundation bylaws state that Directors must be elected by those they represent.²
In the case of Project Directors, this means they must be elected by the Rust Project, and the Project governance is set up so that the electors will be the Rust Leadership Council.
There is some flexibility in what counts as an election though.
For example, we could follow some other selection process and the Council could then vote to ratify the results of that process.
So one possible process is to have the Council pick and vote on a set of directors without any input from rest of the project.
This would be a bad process.
We want something that gives the Rust Project a chance to provide input to the process.
And we want some transparency in how the Project Directors were elected.
Doing a more traditional election would also be at odds with Rust's culture, which tends to prefer consent based decision making rather than rule by majority or plurality.
A possible process🔗
With this background in mind, we can now discuss a possible process for selecting Project Directors.
I've based this off of the notes here, and the process described there is heavily influenced by Sociocracy for All's Selection Process.
I think of the process as a bottom-up process, so I'm going to describe it in those terms.
We start by soliciting nominations from each top level Rust team.
These nominations go to the Council, which will do the final selection.
Let's look at these in more detail.
Gathering Nominations🔗
When we kick off the process, we will start by telling all the Rust teams that they should begin nominating candidates for Rust Project Directors, with a deadline for when nominations will be closed.³
Teams should look at the role description and think of people they think would be a qualified candidate.
These candidates will likely come from the team itself, but there's no requirement.
Teams can nominate anyone who they believe meets the qualifications that will be set forth in the role description.
We aren't planning to impose a requirement to nominate a certain number of candidates.
It doesn't really make sense to nominate more than the number of vacancies, but there's no reason a team couldn't do that.
Similarly, a team may choose not to nominate anyone, or they may do this by default if the deadline expires.
We plan to leave the process for nominating candidates up to the teams, with a strong suggestion to follow a miniature version of the process the Council will follow.
For the purposes of accountability though, I think it makes sense to have the team's Council Representative drive the process, although "driving the process" might mean delegating to someone else who wants to run the process.
The main reason for this default is to make sure someone is responsible for making progress.
Once the team has selected a set of candidates, they should report these to the Council.
The team's council representative will be responsible for communicating these to the Council as a whole.
One of the goals of this project is to gather feedback to help members of the project grow.
Thus, I think it would make sense for the team to provide their nominees as a document (we might even provide a template) that lists the nominees and why they were chosen.
It might also make sense to include a list of people who were considered but not nominated, and why they weren't nominated.
I would hope this is a positive experience, so we don't say "we didn't nominate person X because they're terrible," but more as a way of highlighting rising stars.
For example, we could say "Person Y was considered and shows promise, but we would like to see more growth in these areas first. Please consider them in the future."
Selecting Candidates🔗
The next step is for the Council to select the Project directors from the pool of nominees.
There may not be much to decide here, since it's quite likely that we have exactly the number of nominees as there are openings.
But even in this case, we want to have a defined process that we follow.
The draft proposal says the Council should select a facilitator to lead the process.
The Council would then go through a round process, where each council member proposes a candidate from the nominees and explains why they think the candidate is a good choice.
After the first round, they Council goes around again in a change round, which gives everyone the chance to change their nomination based on the discussion so far.
Once this is done, the facilitator takes all of the suggestions and proposes one candidate.
The Council then consents to this choice, or if there are objections then the facilitator proposes a new candidate.
The process as I've described it so far is an iterative process, meaning we'd run the process to select one candidate, then do it again to select the second, and so on until filled all the open seats (two or three, in this case).
The subsequent rounds should be much faster than the first, because we can reuse most of the information gathered in the first phase.
An alternate way to do this would be as a batch process, where we pick the whole set of candidates at once.
I think this would be my preference for a couple of reasons.
First off, it's likely to be more time efficient, since we only have to do one process.⁴
Secondly, and more important to me, is that picking all the candidates at once allows us to more directly look at characteristics of the candidates as a group.
We'll already need to account for employment constraints, but choosing the set of directors as a group also lets us more directly make sure we have broad representation within the project.
In my mind, the thing we want to select for is a successful group more than any individual characteristics of the members.
Anyway, the batch process would be essentially the same, only instead of going through rounds proposing individuals, we'd propose a set of individuals to fill all the seats at once.
Once the Council has consented to a set of candidates, we'll have a vote to ratify the selection.
Since we've already heard all objections and consented to the selection, this would be expected to be a unanimous vote.
The main purpose here is to make sure we are meeting the requirement in the Foundation bylaws to elect the Project Directors.
After the vote passes, the process is complete.
The Council would then announce the results and the new Project Directors would take office once the outgoing Directors' terms end.
What's Next🔗
I've described my interpretation of the process we have in mind so far, along with some of my own opinions and additions to it.
My main goal here is to explain what I'm thinking so we can make sure the project director election group has a shared understanding of the proposal, and also to fold any relevant thoughts back into the proposal.
But I'd also like this to serve as a chance to raise awareness and solicit feedback.
If this is something you're interested in, please come see us at #council/project-director-election-proposal on Zulip.
To give us enough time to follow this process, we are going to try to reach consensus on the proposal within the not too distant future.
Look for official communication from the Rust Leadership Council once that happens.
Thanks to Jane Losare-Lusby for reviewing this post.

¹
This isn't an exhaustive list, and having a more complete understanding of everything the Foundation does is something I'm definitely working to build.↩

²
We've been interpreting this to mean we need to have a vote, but there's some ambiguity here. For example, maybe "selection" and "election" are just synonyms for each other. To be play it safe though, we expect to have a ratification vote following the selection process we're currently designing.↩

³
The deadline is important here because we need to choose Project Directors by the time the current term ends on September 21.↩

⁴
Of course, it may turn out that picking three people at once is actually much much harder than picking one person three times in a row.↩




An Exercise on Culture
2023-06-23T00:00:00+00:00
A few days ago at work we did an exercise on company culture that got me thinking so I thought I'd share some of those thoughts here.
I've been interested in organizational culture for a few years now.
Some organizations seem to have a great culture while other organizations have a not so great culture.
Sometimes culture is improving, and other times it is declining.
The declining case can be particularly frustrating because in my experience everyone can see it's getting worse, everyone wants to change it, but nobody knows how.
One of the reasons I chose to come work for Microsoft is that they seem to be one of the few examples of a large, established organization that intentionally and dramatically changed their culture.
I've been curious how they did that, and what other organizations can learn from that.
It seems like an important part of it is to have regular conversations about culture, such as by doing exercises like the one I'm going to discuss in this post.
So with this background in mind, let's talk about the exercise.

At the start of the exercise, we were reminded of the company mission statement.
Then we were asked to spend a couple of minutes coming up with a sentence explaining how we personally contribute to it.
According to Microsoft's About page, Microsoft's "mission is to empower every person and every organization on the planet to achieve more."
This mission actually speaks to me a lot.
I feel like computers should be tools that help people do the things they want to do.
Too often today, computers seem to actively work against their user instead.
I have a kind of lengthy rant on this subject that I should probably write down someday, but that will be for another time.
So how did I respond to the exercise?
I wrote:

In my work, I empower people to achieve more by creating powerful and accessible languages and APIs, and by helping to build a team that effectively does the same.

I felt rather proud of this answer, which is why I wanted to write about it in more depth.
I want to do this by going into more detail about the main phrases I used.
The first is powerful and accessible.
In my mind, Rust fits the bill really well here.
Rust is an incredibly powerful language.
Features like the borrow checker can ensure safe memory access for some complex patterns without any runtime overhead.
The trait system provides incredible opportunities for abstraction.
Rust provides good support for low level programming, while still including potent functional programming features like lambdas and algebraic data types.
But what really impresses me about Rust is that it manages to make all this power usable to many programmers.
Rust isn't the first language to do all these things, but some of the other languages essentially require a Ph.D. to use them effectively.¹
On the other hand, things like Rust's extreme attention to detail in its error messages greatly increases that chance the programmers will be successful with such a powerful language.
The second is languages and APIs.
I thought about writing "languages and libraries" because I like the alliteration but libraries seem too broad.
I don't often write an entire library, while I might add a single function to an existing library.
I felt like "API" better expressed that scope, and I've even heard a library called an API at times so this can generalize if needed.
The reason I mentioned these two together is that I believe it's best to consider a programming language and its standard library as a unit.
As a language nerd, I can easily get caught up in the excitement around designing new language syntax and semantics.
These language features need to be supported by a solid, well-rounded library.
To make an analogy to spoken languages, I think of the programming language as the grammar and the standard library as the vocabulary,
It's hard to say much of anything with only one.
Finally, I mentioned the importantance of helping to build a team.
This is the aspect I have the least experience in, but it's important and I'm trying to learn more about how to do it.
A well-functioning team can accomplish far more than an individual!
These teams don't form (or at least, aren't maintained) by accident.
So it's important to do things like mentor new team members and to create a welcoming place so new people can feel comfortable joining the team.
It's important to find ways to help grow members into new roles so that the team can outlive any one member.
It's important to foster a sense of shared purpose and values so we can effectively work together.
It's important to make space for disagreement, since often I find the best outcomes are a result of hashing out our differing perspectives.
Doing an exercise like this can feel pretty cheesy but this one resonated with me a lot.
It was a chance to put into words why I do the things I do, both in my day job and how I bring those things into the Rust community.
And now, having put this into words, I can use it as a guide when deciding how to approach my work in the future.
While this post was specific to my role within Microsoft, I think doing a similar exercise with Rust's mission would be illuminating.
Maybe I'll do that in another post.
What about you?
What values do you hold and how do you put those into action?

¹
I feel like Rust does not need a Ph.D. to use well, but given that I have one, I may not be the most qualified person to make this claim.↩




I'm Here to Serve
2023-06-20T00:00:00+00:00
Today we announced on the Rust Blog that the Leadership Council has been created and is now the top level governance body of the Rust Project.
I am the Compiler Team's representative to the Council, so I wanted to share a little more about how I hope to approach this role.
The Leadership Council is new, and in many ways its first tasks will be to define what it is.
We know it's sort of a replacement for the Core Team, but it's also supposed to be significantly different.
A lot of our first tasks are going to seem relatively mundane: figuring out when we regularly meet, how to propose items to the agenda, how we communicate what we're working on, etc.
After that, we can get on to the "more substantial" questions.
One colleague of mine told me once that Rust has at least two years of governance debt, and given that they said that two years ago, at this point we probably have at least four years of governance debt!
While we figure these things out, I know there are a few things I can say about myself and values, and how I hope I can bring these to the Leadership Council.
Keep in mind that these are my own opinions.
I'm not speaking for the Leadership Council or the Compiler Team, so the priorities I suggest here will evolve over time.

Desiring the Work🔗
When deciding whether to nominate myself for this role, I spent a lot of time thinking about why I wanted to do it.
To me, the question came down to how much I wanted to do the work.
The best leaders I've seen in my life are the ones who saw their job as serving those they lead.
I want to embody this mindset as I serve on the Leadership Council.
Most of my recent Rust contributions have been primarily within the Async Working Group.
Lately, I've found myself thinking more about the project and community as a whole.
For example, how do we make Rust more welcoming to those who want to contribute?
Or how can we make sure all the components of Rust work together as a whole?
How can we build excitement for contributing, including contributions we might think of as non-technical?
I realized that questions like these are the kinds of questions the Leadership Council should be thinking about.¹
Given that I've already been thinking about these questions, joining the Leadership Council became a clear opportunity to actually get to work on these things.
I'm Here to Listen🔗
One of the most important things I can do, especially at the start, is to listen.
I will be actively reaching out to the leaders in the Rust community to find out what they need and how I can best serve them in particular and the community in general.
I am also going to make myself available for office hours.
I have set up a Bookings page where you can schedule a 30 minute meeting with me.
Please feel free to use this if you'd like to have a synchronous chat about something related to the Rust Leadership Council or the Rust project in general.
To make this easier to fine, I've added a new top level page to this site where I'll keep up to date information about how to book an office hours appointment with me.
I'm Here to Share🔗
In the conversations I've already had with folks around Rust governance, one of the clear themes that has come up over and over is that we need more transparency in Rust leadership.
Fortunately, I believe all of us on the Council agree with this and are committed to improving transparency.
I believe most of this transparency should come through official channels, such as published minutes from Leadership Council meetings.
That said, I intend to supplement these official communications by sharing about my thinking as it relates to the Leadership Council.
This post is an example, and I will continue with more like this.
How Can I Help You?🔗
I wanted to share some of how I'm thinking about my role on the Leadership Council, and the things I plan to do.
I'll be honest, I'm a little scared even to post this, because if I fail at these goals it will be obvious.
I think this accountability is good.
If this is the last you hear from me about this, then I've failed as a leader, and people should know that.
But I also may not have the right priorities.
There's a lot we don't know, and almost everything I've written here may need to change.
When changes are needed and made, I promise to be transparent about them.
Please help me to be a good servant and leader for the Rust community.
And with that, I want to close with an explicit call for feedback.
What do you think of my priorities here?
If I do these well, will you be happy to have had me on the Leadership Council?
What are some things I've missed or should do instead?
Please send me your feedback, either by joining me in Office Hours, DMing me (eholk) on the rust-lang Zulip or Mastodon, or emailing me at eric@theincredibleholk.org.
Conclusion🔗
I'm thrilled to have met the other members of the Leadership Council.
I think we have a great group of people who all bring important background, perspectives, and skills to the team.
I'm excited to work with them to make Rust the best language and community it can be!

¹
This doesn't mean the Leadership Council is necessarily the right place to solve them. One of the main goals of the governance RFC was that the Council should primarily look to delegate to more suitable teams and to create those teams when they don't exist.↩




Lightweight, Predictable Async Send Bounds
2023-02-16T00:00:00+00:00
The last week or two has been exciting in Rust async land.
We're making great progress on one of the open questions around async functions in traits, and I think we're close to being ready to propose something officially.
In this post, I'd like to describe the proposal and discuss some of the tradeoffs and open questions with it.
We've had a couple of ideas going around so far.
One of the main ones is Return Type Notation (RTN), which Niko describes in his recent post.
In my last post, I suggested that we could infer the necessary bounds in many cases.
While I was excited about inferring bounds at first, one major shortcoming is that it creates new semantic versioning hazards.
The inference depends on the body of the the function you've annotated, which means when modifying the function you could easily add or remove bounds from the signature by accident.
In the discussions we've had since then, we have been converging on a solution that we expect will work in the common cases, but avoids both the verbosity inherent in RTN and the semver hazards with inferring bounds.
This is the solution I'll be describing in this post.

Recap: Two Versions of the Send Bound Problem🔗
One of the things I've realized is that there are two variants of the Send Bound Problem.
I'll call these the Promise and Require variants.
The Promise variant is "how can I promise that my async function will always return a Send future?"
There are several subvariants.
We may want to define an async trait so that all implementors must always have Send implementations.
This is what the #[async_trait] macro does by default.
Or, even if the trait does not require it, we may want to make this promise in out implementation.
And finally, for just a bare async fn, we may want to be able to make the same promise.
The Require variant is "how can I require that the implementation I'm given can be used in a Send context?"
This is looking at the use side rather than the definition side.
Let's recall the do_health_check example:
fn do_health_check(mut health_check: H, server: Server)
where
    H: HealthCheck + Send + 'static
{
    spawn(async move {
        health_check.check(server).await;
    });
}

We want to make sure that, although the HealthCheck trait does not require that implementations return Send futures, we only can call do_health_check with those that do so that we can spawn the background task.
My feeling so far has been that the Promise variant is the easier one to solve, so it is easy to accidentally start talking about that one, while the Require variant is the more important problem.
In this post, I will only be talking about the Require variant, although I suspect the proposal may generalize to the Promise variant and I may speculate on that.
The Proposal🔗
Without further ado, here is the proposal.
First, we require async traits to be declared as such.
This means the HealthCheck trait we've talking about gains an additional async keyword:
trait async HealthCheck {
    async fn check(&mut self, server: Server);
}

For the most part, we can think of this new async as becoming part of the name of the trait.
It's no longer just HealthCheck, but async HealthCheck.
Declaring a trait with async means the trait is allowed to have async methods.¹
Because we've changed the name of HealthCheck, we have to change where we use the trait as well:
fn do_health_check(mut health_check: H, server: Server)
where
    H: async HealthCheck + Send + 'static
{
    spawn(async move {
        health_check.check(server).await;
    });
}

This new async keyword in the where clause does a couple of things.
First, it's a hint that the trait has async methods.
More importantly, it gives us a place to hang additional bounds if needed.
Because we are spawning a future that awaits calls from this trait, we need a Send bound.²
So, to notate this, we'd use async(Send) in the bound:
fn do_health_check(mut health_check: H, server: Server)
where
    H: async(Send) HealthCheck + Send + 'static
{
    spawn(async move {
        health_check.check(server).await;
    });
}

The trait name async(Send) HealthCheck would mean the HealthCheck trait, with async methods, all of which have a Send bound on their returned futures.
So that's the proposal in a nutshell.
One thing I'd like to point out is that although so far we've only talked about Send bounds, I'm imagining that the grammar would allow any bound on the async keyword (although it might make sense to limit it to auto traits).
For example, one could imagine writing:
fn foo(x: T)
where
    T: async(Send + Clone + Debug) MyTrait
{
    ...
}

In practice, it's probably hard to implement Debug on the future returned by an async method...³
Discussion🔗
There's a lot I like about this proposal.
It's relatively lightweight syntactically, but we assume it's powerful enough to meet the common cases.
To be honest, we don't actually know how common it will be that users want to have some methods that are Send but some that are not.
The fact that #[async_trait] works well suggests that the all or nothing approach should be fine in many cases.
If there are cases that users need to be more precise, however, we can still provide return type notation for those advanced use cases.
The semantics of these new bounds seems easy to explain.
We don't have to talk about looking at function bodies and we definitely don't have to mention anything about flow-sensitivity, while we might if we did something that relied on more inference.
This helps keep Rust explicit and predictable as a language, without being burdensome.
This proposal also dovetails nicely with several others that are currently in progress, and is immediately open to more generalizations.
The syntax we've introduced here actually largely comes from the keyword generics initiative.
Although we have not talked about maybe-async bounds (written ?async), the syntax is completely consistent with what we've seen here.
I know I said I wasn't going to talk about the Promise variant of the Send bounds problem, but it's a relatively small change from what we have here to also allow trait modifiers anywhere else the async keyword is allowed.
For example, we could write the following to declare an async function whose returned future is guaranteed to be Send:
async(Send) fn foo() -> i32 { ... }

That said, I think there may be better ways to solve this problem,⁴ so I don't want to dwell to much on this just yet.
Open Questions🔗
We still have a few open questions though.
I'll briefly touch on these here.
Should methods should be Send by default?
Either way is feasible.
For example, methods could be Send by default and you could use async(?Send) to opt out, or they could be not assumed to be Send by default and you use async(Send) to opt in.
There are arguments for both, but fortunately it's a relatively minor detail and is easy to go either way.
How does this interact with supertraits and trait aliases?
We have some time to figure this out for aliases, since trait aliases aren't a thing yet.
For supertraits, we can probably start with a more conservative option and relax it later if needed.
Are non-async and async traits in the same namespace?
This question gets at whether you could define both trait Foo { ... } and trait async Foo { ... } in the same module.
While I won't do so in this post, this has enough implications that it's probably worth spending some time on.
For example, if we get this one wrong, we might end up in a situation where users have to write async AsyncFoo, which would just be sad.
Conclusion🔗
So anyway, that's the proposal.
I want to give a big shout out to everyone in the Async Working Group, and Yosh Wuyts in particular since he largely came up with the final syntax presented here in conjunction with his work on keyword generics.
Also, thanks to Nick Cameron for his early feedback on this post.
The proposal presented in this post incorporates a lot of ideas from many different people, and it's really great to see everyone's input coming together towards a solution we can be happy about.
We seem to be at a point where we've struck a nice balance for ergonomics, utility, and predictability.
Of course, the best way to know for sure is to prototype something and play around with it!
I'm excited to see progress in this area and am eager to see async functions in traits become fully supported in Rust!

¹
Technically, you wouldn't be required to have async methods, but we'd probably want to add a lint warning about unnecessary async keywords, just like we do for mut.↩

²
This is assuming we're using a multithreaded executor like Tokio. The spawn function from single threaded executors, like many embedded async runtimes provide, would likely not require a Send bound.↩

³
This potentially opens up some really powerful features though. For example, one could imagine futures that implement serde::Serialize and serde::Deserialize to make futures that can move not just between threads but between nodes in a cluster, or web frameworks where you can await input from the client.↩

⁴
One example suggested by Josh Triplett is if you could explicitly refer to the return type in where clauses. Then you could say async fn foo() -> i32 where return: Send { ... }. This makes the scoping a little clearer around parameters (for example, in async(Send + 'a) fn foo<'a>() -> i32 { ... }, it'd be weird to refer to 'a before it's declared), but it also is less clear as to whether we are saying i32 is Send or that the hidden future that async fn desugars to is Send.↩

Eric Holk

A Rose By Any Other Name

A Concrete Performance Difference🔗

Reducing the async fn next overhead🔗

Aside: This Doesn't Actually Work🔗

Iterator Setup🔗

What's in a name?🔗

Conclusion🔗

Async Cancellation and Panic

It's not as easy as I thought🔗

An approach that actually works🔗

Deriving the implementation🔗

Should we do this?🔗

How to Shrink Rust

What happened to classes?🔗

What does this have to do with Rust today?🔗

Rethinking Rust's Function Declaration Syntax

A Mechanism for Async Cancellation

Introducing poll_cancel🔗

A Cancellable Future🔗

Cancellation with async and await🔗

The Generator Adapter🔗

Scenarios🔗

A Cancellation-aware Executor🔗

Cancellation-aware Combinators🔗

Cancel during Cancellation🔗

Cancel during Unwind🔗

Evaluation🔗

Strengths🔗

Weaknesses🔗

Related Work🔗

Conclusion🔗

Cancellation and Async State Machines

Cancellation🔗

Cancellation Cancellation🔗

0. Cancelling during cancellation is not allowed🔗

1. Cancelling a cancellation is idempotent🔗

∞. Cancelling a cancellation is recursive🔗

Conclusion🔗

Ideas on How to Elect Rust Project Directors

What are Project Directors?🔗

How should we select Project Directors?🔗

A possible process🔗

Gathering Nominations🔗

Selecting Candidates🔗

What's Next🔗

An Exercise on Culture

I'm Here to Serve

Desiring the Work🔗

I'm Here to Listen🔗

I'm Here to Share🔗

How Can I Help You?🔗

Conclusion🔗

Lightweight, Predictable Async Send Bounds

Recap: Two Versions of the Send Bound Problem🔗

The Proposal🔗

Discussion🔗

Open Questions🔗

Conclusion🔗

Reducing the `async fn next` overhead🔗

Introducing `poll_cancel`🔗

A Cancellable `Future`🔗

Cancellation with `async` and `await`🔗