In the sketch I laid out before, I expected the core idea of supporting cancellation during unwinding would be to have the executor, and any mini-executors like race
and join
, would basically wrap calls to poll
with catch_unwind
, then in the Err
case, call poll_cancel
to completion and then call resume_unwind
.
In pseudo-code, that would look something like:
loop { match catch_unwind(|| task.poll(cx)) { Ok(Poll::Ready(x)) => return x, Ok(Poll::Pending) => continue, Err(panic) => { while Poll::Pending = task.poll_cancel(cx) {} resume_unwind(panic); } } }
Unfortunately this doesn't work. It turns out I had some inkling this might be the case when I wrote:
There are other challenges though. One is that the
poll_cancel
functions will need to be written to be aware of the fact that they might be called during unwinding, which means the internal state for the future might be inconsistent.
To understand what's wrong, recall that I desugared cancellation-aware async
blocks into coroutines.
Rust coroutines only have one entry point, which is the resume
method.
I simulated two entry points (poll
and poll_cancel
) by passing another argument into resume
.
The thing is, once resume
panics, coroutines cannot be resumed again and they will panic if you try.
Since poll
and poll_cancel
are backed by the same resume
method, this means we can't call poll_cancel
after poll
panics.
Some of this is an artifact of the way this experiment is structured.
If we had proper compiler support for multiple entry points to a coroutine, we might be able to make this work.
But I think it's more composable and more in line with existing precedent to follow a rule where all unwinding or cancellation work needs to finish before a panic leaves the poll
call.
This realization that we need to process and cancellations before unwinding out of poll
felt constraining at first, but it actually simplifies a lot of the design.
I thought we'd need to wrap basically every call to poll
in catch_unwind
, but in most cases this is unnecessary and we can instead let the usual unwinding machinery proceed as normal.
The places where we do care are when we know of multiple futures and if one of them panics we need to cancel the rest.
Let's do on_cancel
as an example.
While I don't think on_cancel
would be a great API to support in production, it is useful to focus on the specifics of cancellation behavior.
In the last post, I was thinking of on_cancel
almost as an approximation of an exception handler.
For our purposes today, I think it's more useful to think of it as a kind of future combinator.
In this view, on_cancel
produces a new future from two others, one that is the normal execution path, and another future that is run only when the future is cancelled.1
Looking at it this way, we can see what we should do when the poll
function on the main future panics.
We aren't allowed to poll the future that's panicking anymore, because its internal state might be inconsistent.
We have to trust that as poll
was unwinding, the future ran any cancellation handlers that were on the stack.
But, since we want cancel-on-unwind semantics, the on_cancel
combinator needs to catch the panic, run the cancellation future to completion, and then resume unwinding.
Now let's see how to add cancellation on panic behavior to our existing on_cancel
implementation.
My last post didn't really go into the details on this, so let's start with a rough sketch of the previous on_cancel
implementation.
Throughout this section I'm going to ignore details like pinning and unsafe
so we can focus on the main idea.
I have a complete working implementation of the ideas in this section available at https://github.com/eholk/explicit-async-cancellation.
The on_cancel
method returns a future that's carries a cancellation handler.
While the details are hidden in the surface API, the struct and future implementation returned looks like this:
struct<F, H> OnCancel { future: F, on_cancel: Option<H>, } impl<F, H> Future for OnCancel<F, H> where F: Future, H: Future, { type Output = F::Output; fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> { self.future.poll(cx) } fn poll_cancel(self: Pin<&mut Self>, cx: &mut Context) -> Poll<()> { // run the cancellation handler if it's still present if let Some(on_cancel) = self.on_cancel { match on_cancel.poll(cx) { // if cancellation is complete, clear the handler so we won't try to run it again Poll::Ready(()) => self.on_cancel = None, // cancellation is not finished, so yield to the caller. Poll::Pending => return Poll::Pending, } } // run any cancellation handlers on the inner future self.future.poll_cancel(cx) } }
The poll
function is pretty uninteresting.
We just forward it to the inner future.
The poll_cancel
function is a little more subtle.
The main thing we need to do is run the cancellation handler, which we do by calling poll
on it.
However, the inner future might also have nested cancellation handlers, so we need to call poll_cancel
on it as well.
This is also why I chose to wrap the cancellation hook in an Option
, since I can use that as a flag to indicate whether the cancellation hook is finished.
As an aside, I chose to do outside-in cancellation semantics here since drop also runs outside-in. I'm not sure this was the right choice. For example, unwinding is inside-out instead. I think it's worth thinking harder about what the right ordering is, but for now it's easy to change and independent of our focus today.
Okay, so now that we have a basic on_cancel
implementation, let's handle what happens if the call to the nested future's poll
panics.
In short, we need to wrap the call to poll
in catch_unwind
.
fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> { match catch_unwind(|| self.future.poll(cx)) { Ok(poll) => poll, Err(panic) => todo!("run the cancellation hook at then resume unwinding"), } }
Now let's think about the Err
case.
Basically, we need to cancel ourselves, which we can do by calling poll_cancel
.
Then we need to resume unwinding.
Because poll_cancel
might take several tries to finish, we need to save the panic information so we can resume unwinding after it's done.
So we'll add another field to OnCancel
to optionally store the panic information.
struct<F, H> OnCancel { future: F, on_cancel: Option<H>, panic: Option<Box<dyn Any + Send + 'static>>, } impl<F, H> Future for OnCancel<F, H> where F: Future, H: Future, { type Output = F::Output; fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> { match catch_unwind(|| self.future.poll(cx)) { Ok(poll) => poll, Err(panic) => { self.panic = Some(panic); match self.poll_cancel(cx) { Poll::Ready(()) => resume_unwind(self.panic.take().unwrap()), Poll::Pending => Poll::Pending, } }, } } fn poll_cancel(self: Pin<&mut Self>, cx: &mut Context) -> Poll<()> { todo!("we'll come back to this in a minute") } }
We're part of the way there, but we still have some problems.
Assuming poll_cancel
were correct (it's not, but we'll get there), we'd be okay if cancellation finished promptly.
But if not, it will return Pending
, which we'll bubble up to the caller.
The caller doesn't know we're panicking, since we've hidden the panic information away in our panic
field, so it will eventually call poll
on us again.
Unfortunately, this means we'll poll the inner future, which we've previously said is not allowed.
So we need to make a small change to check if we're in the process of panicking when we're polled.
fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> { if self.panic.is_some() { match self.poll_cancel(cx) { Poll::Ready(()) => resume_unwind(self.panic.take().unwrap()), Poll::Pending => return Poll::Pending, } } match catch_unwind(|| self.future.poll(cx)) { Ok(poll) => poll, Err(panic) => { self.panic = Some(panic); match self.poll_cancel(cx) { Poll::Ready(()) => resume_unwind(self.panic.take().unwrap()), Poll::Pending => Poll::Pending, } }, } }
And now we're all set.
If we're polled when there's panic information present then we never get to the call to self.future.poll(cx)
.
Now it's time to revisit poll_cancel
.
To share some logic, I had the panic path in poll
call into poll_cancel
, but this means we need to update poll_cancel
to recognize that it can be called while panicking.
Here's how:
fn poll_cancel(self: Pin<&mut Self>, cx: &mut Context) -> Poll<()> { // run the cancellation handler if it's still present (this part stays the same) if let Some(on_cancel) = self.on_cancel { match on_cancel.poll(cx) { // if cancellation is complete, clear the handler so we won't try to run it again Poll::Ready(()) => self.on_cancel = None, // cancellation is not finished, so yield to the caller. Poll::Pending => return Poll::Pending, } } // if we aren't panicking, run any cancellation handlers on the inner future // otherwise, resume unwinding match self.panic { None => self.future.poll_cancel(cx) Some(_) => resume_unwind(self.panic.take().unwrap()), } }
The first part, where we run the cancellation hook, stays the same as before. In the second part, we would normally cancel the inner future, but remember that if we are panicking we aren't allowed to poll it again.
It's worth asking what we should do in the Some
line though.
At this point we know we are in the process of unwinding, and all cleanup code has finished.
One option is to return Poll::Ready(())
here, and if we're called from poll
then we could count on it calling resume_unwind
.
However, it could also be that while we were waiting on the cancellation to finish, the executor decided to cancel us.
In this case, if we returned Poll::Ready(())
then we would swallow the exception.
So instead, the right answer is to resume_unwind
here as well.
So there we have it: how to cancel a future when polling it panics.
We've shown that it's at least somewhat possible to support async cleanup code while unwinding. I'll admit, beyond a basic smoke test, I haven't really probed the limits of this design. For example, what happens if we panic while running the cancellation handler as a result of another panic? Or what actually happens if the executor cancels us while we are cleaning up before resuming a panic? If we were to RFC something like this, these are all questions that we'd need to explore.
The reason I decided to go ahead and write this post without answering those questions is that in this post I think we've already learned enough that we can start evaluating this design and inform future options.
First of all, something about suspending while in the process of unwinding just feels fundamentally weird and uncomfortable. That said, I think we can develop a reasonable semantics for this behavior if we decide we want it.
But this also leads to a shortcoming that I'm not sure how to resolve.
This prototype cannot work in #[no_std]
environments, because catch_unwind
and resume_unwind
represent panic information as a Box<dyn Any + Send + 'static>
, meaning we need an allocator.
This is a non-starter for something that we'd want to consider building in as a core Rust language feature.
The whole async
/await
system has been carefully designed not to need an allocator, and we need to preserve this property.
After all, async
/await
has found a lot of success in microcontroller environments!
Is this necessary though?
Or is it an artifact of trying to prototype a system purely in library code without compiler support?
As an analogy, we could imagine prototyping destructors using catch_unwind
, but rustc is able to generate code to run destructors during unwinding without needing to reify the exception.
Unfortunately I don't think we can avoid the issue in the same way.
The problem is that normal unwinding doesn't suspend the execution at all, while we very much need to be able to do that to await
in the unwinding path.
This means the exception does need to be stored somewhere (presumably with the future), and we need to be able to resume unwinding later.
If you're using a work-stealing executor, this means it's even possible that your task could start unwinding on one thread and finish on another.
So we need somewhere to store the exception that's not ephemeral in the way that it is during the Rust-generated unwind code.
There might be other options that could work.
For example, the executor could reserve some space for each task that's large enough to hold most panics.
Most likely the way we'd accomplish this is by attaching something to the Context
that gives access to it.
Maybe it'd be specific to panics, or maybe it'd be a more general task-local bump allocator or something like that.
At any rate, we could add API surface for a minimal allocator to support awaiting while unwinding without needing a full-blown global allocator.
These could be made optional, which would give executors the option of aborting if they cannot or don't want to support async unwinding.
Another option would be to have the compiler not automatically generate calls to poll_cancel
while unwinding, and instead provide something like an async version of catch_unwind
.
I think something like this is what boats was proposing.
The nice thing about this option is that we can completely give up on supporting #[no_std]
.
Furthermore, we don't have to worry about being "zero cost," since the fact that the user called async_catch_unwind
signals that they're willing to pay the cost that's needed.
That said, it's not clear how that should interact with do ... final
blocks if we were to add them.2
For example, the final
block would presumably run during unwinding in sync code, so it seems like we'd also need to do it while unwinding in async code.
Unfortunately, as far as I can tell that will run into the same allocation problems.
So to go back to the question of whether we should do this, I think we need more exploration. There are some options, but from my exploration here it seems like it's hard to satisfy all our requirements. But maybe one of these, or some other option, can strike a decent compromise.
With a small tweak, we could approximate a finally
clause, by making it so we run the cancellation future even if the main future completes successfully.β©
I really like the idea of do ... final
! I had hoped to explore that some in this post but I felt there was enough material here without it.β©
Language design tends to go in cycles: we grow the language to accommodate new functionality, then shrink the language as we discover ways in which the features can be orthogonally integrated into the rest of the system. Classes seem to me to be on the upward trajectory of complexity; now itβs time to shrink them down. At the same time, we shouldnβt sacrifice the functionality that they enable.
This cycle of growing and shrinking as a key part of the process in the early days of Rust. Upon reading this section, I found myself asking "how could we shrink Rust today?"
To be honest, I had forgotten Rust had classes
at one point.
I remembered resources and objects, but forgot we had a brief window where there were classes.
Patrick's post explains what happened to them.
Essentially, once we added classes and a bunch of other features, we realized that classes combined five features that we could implement independently in a way that's more general.
These, along with their modern replacements in Rust, are:
struct
.struct
literal syntax and plain functions that are conventionally called new
.impl
s.Drop
trait.Some of these features weren't so much replaced as removed.
For example, it's hard to claim Rust has constructors today, other than by convention.
Similarly, if I remember right, at the time Rust also had the struct
keyword, so you used struct
if you just wanted a nominal record or class
if you wanted the rest of these features.
Or in the case of field-level privacy, we basically just decided this feature wasn't necessary.2
For the two features that had a clear replacement, by decoupling them from classes we gained a lot more power.
You can attach methods to any type now, like enum
s and even primitive types, not just classes.
Destructors are much simpler now too, since you implement Drop
just like any other trait.3
The end result of this was we replaced a large feature, classes, with a handful of smaller, orthogonal features. The result was something that composed better4 and gave us more power and flexibility.
To me the key take away, at least looking back from over a decade later, is that a big part of why Rust is the way it is today is that we were able to add a bunch of features and then pare them down once we got some experience. In Rust's history, it's had three different ways to do destructors, and while I don't recall exactly, I suspect at least two of these coexisted at some point.
It's somewhat harder to follow this model now. In the early days, we made breaking syntax changes sometimes multiple times a week.5 At that time, the Rust team was a handful of people, about as many interns, and some people who hung out on IRC. Today the community is much larger and people are using Rust in mission-critical projects where they can't afford to make weekly syntax updates. And of course, Rust 1.0 came with a promise that there would be no more breaking changes. You can can rely on Rust to keep working tomorrow.
Rust is still able to grow, but shrinking is much harder, and as a result, we have to be much more conservative in how Rust grows. We have some ability to shrink through the editions system, but this is still not a great mechanism for rapidly iterating on designs.
Anyway, I don't really have a solution, or even necessarily a clearly defined problem. I mostly just wanted to observe that developing Rust is harder today because we mostly have to look at things incrementally. It's much harder to design a set of interrelated features that maybe by themselves wouldn't be particularly noteworthy but together are quite powerful.
Fortunately, Rust does have the nightly compiler, and a process for experiments. That seems like the right environment to do the kind of language experimentation today that was possible in the early days. This is the same codebase that becomes the stable compiler, so we still need to emphasize stability and maintainability, but liberal experimentation in the nightly compiler with many different Rust features at once seems like it has the possibility to do the same kind of broad scale language iteration that we did in the early days while staying true to Rust's stability promises.
I've since started calling this my Spiky Blog Theory of Programming Languages, but it deserves a post of its own.β©
One way of looking at this is that classes included their own module or namespace, and this was seen as unnecessary complexity.β©
It might seem nice to be able to make fields on a struct
private today, but that requires us to pull in an number of other features. In particular, you need some methods that you can make public which do have access to the private fields. That's why there were attached methods before, and something like that could work with impl
s but it would be tricky since impl
s are a lot more flexible.β©
Early Rust had resource
types which were basically a wrapper around a type that included a destructor. In some ways it was nice because most things didn't have destructors, but it also meant when you needed one you had to put your code through some contortions to make it work well with an attached destructor. Also, while it's tempting to say Drop
is just like any other trait, it's not really because it has special meaning to the compiler.β©
I expect had we kept classes it'd be common to have classes that just wrap an enum, since otherwise we wouldn't have had a way to attach methods to enums. Eventually we probably would have invented some kind of enum class
syntax.β©
This is a big part of why rustfmt is so good, because that was how we rewrote the whole compiler every time we had a major breaking syntax change, which was not uncommon.β©
Language design tends to go in cycles: we grow the language to accommodate new functionality, then shrink the language as we discover ways in which the features can be orthogonally integrated into the rest of the system. Classes seem to me to be on the upward trajectory of complexity; now itβs time to shrink them down. At the same time, we shouldnβt sacrifice the functionality that they enable.
This cycle of growing and shrinking as a key part of the process in the early days of Rust. Upon reading this section, I found myself asking "how could we shrink Rust today?"
]]>For background, top level functions in Rust look sort of like this:
fn foo(x: i32) -> i32 { x + 1 }
In Rust 2018, we added async fn
:
async fn foo(x: i32) -> i32 { x + 1 }
While that one doesn't do anything particularly interesting, an async function gives you the ability to use await
inside it.
It also secretly changes the return type from an i32
to an impl Future<Output = i32>
.
This is regarded by many to have been a mistake, and it's starting to cause issues now that we have async functions in traits since there is no way to add additional bounds like Send
to the return type.
Anyway, async fn foo
is mostly just syntactic sugar that desugars into:
fn foo(x: i32) -> impl Future<Output = i32> { async { x + 1 } }
It's likely that Rust will gain a whole bunch of new keywords we can stick in front of fn
in the future.1
For example, nightly Rust just got support for gen fn
and async gen fn
.
Those desugar similar, by wrapping the return type in impl Iterator
or impl AsyncIterator
and wrapping the body in gen { }
or async gen { }
.
Another piece of sugar we could add is try fn
, which is actually what started off the discussion thread today.
Following the pattern we've had so far, we'd expect to be able to write something like:
try fn foo() -> i32 { let x = read_number()?; x }
and have this desugar to:
fn foo() -> impl Try<Output = i32, Residual = ???> { try { let x = read_number()?; x } }
The problem is we need a hint for the Residual
type.
The obvious thing to do would be to add something to the function header, like try fn foo() -> i32 throws E
.
But if you've ever looked at the Residual
types for the Try
impls in the standard library, you know that these can look pretty hairy and not particularly intuitive.
For example, to make a function that returns an Option
, we'd need to write:
try fn foo() -> i32 throws Option<Infallible> { let x = read_number()?; x }
This would give the compiler enough information to find the Try
impl for Option
.
But notice that we also could have just written fn foo() -> Option<i32>
, which is shorter and you don't have to figure out why my fallible function has an Infallible
in it.
At this point, Lukas Wirth observed that they would rather see a shorthand for functions whose body is a single expression.
If we did this, we could write try fn
as:
fn foo() -> Option<i32> = try { let x = read_number()?; x }
So that's pretty neat.
This also invites us to reconsider async fn
.
We could instead write:
fn foo() -> impl Future<Output = i32> = async { let x = read_number().await; x }
That's not too bad, but impl Future<Output = i32>
is a bit wordy.
We could come up with some rules that would let you write impl Future<i32>
instead, which honestly is how we usually read that out loud anyway.
But then joboet and pitaj pointed out that we could treat Trait -> Type
as shorthand for Trait<Output = Type>
.
TC pointed out that we could probably generalize this to support yields T
and Iterator<Item = T>
.
So if we combined a few of these ideas, we'd be able to write:
fn foo() -> impl Future -> i32 = async { let x = read_number().await; x }
I think this shows a lot of potential.
I want to try to generalize this a bit more though.
Instead of special-casing the Output
associated type, we could create a set of attributes indicate an associated type can be used with trait keyword shorthands.
For example, define the Future
and Iterator
traits like this:
trait Future { #[keyword(return)] type Output; fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>; } trait Iterator { #[keyword(yields)] type Item; fn next(&mut self) -> Option<Self::Item>; }
This would let us refer to Future<Output = T>
as Future -> T
and Iterator<Item = T>
as Iterator yields T
.
We could even combine them:
trait Coroutine<R> { #[keyword(yields)] type Yield; #[keyword(return)] type Return; fn resume(self: Pin<&mut Self>, arg: R) -> CoroutineState<Self::Yield, Self::Return>; } fn coroutine() -> impl Coroutine<()> -> bool yields i32 = || { yield 42; true }
This would also let remove some of the special handling around the Fn*
traits and we could expose this functionality to users so libraries could use this sugar in their own traits.
At this point, I'd like to take a step back and think about plain fn
functions.
Notice that the following two would be equivalent:
fn foo() -> i32 { let number = read_number(); number } fn foo() -> i32 = { let number = read_number(); number }
One way of think of this is that we've made the =
optional.
But I'd like to think of it a different way.
Let's say instead we think of the =
form as the standard function declaration syntax.
Then, if the function body consists of a single block, we can use a compressed syntax.
For a regular { }
block, that just looks like the function declaration syntax we're used to.
But for blocks with a keyword in front, like async { }
or try { }
, we say the keyword moves all the way to the front of the function header.
In addition, each block as an characteristic trait associated with it, so when we used the block shorthand for function declarations, we also wrap an impl Trait
around the return type.
Here are some examples:
// async //////////////////////////////////////// async fn foo() -> i32 { let number = read_number().await; number } // desugars to: fn foo() = impl Future<Output = i32> = async { let number = read_number().await; number } // gen ////////////////////////////////////////// gen fn foo() -> i32 { yield 1; yield 2; } // desugars to: fn foo() = impl Iterator<Item = i32> = gen { yield 1; yield 2; } // assuming `Iterator` is defined like: trait Iterator { #[keyword(return)] type Item; fn next(&mut self) -> Option<Self::Item>; } // async gen //////////////////////////////////// async gen fn foo() -> i32 { yield 1; yield 2; } // desugars to: fn foo() = impl AsyncIterator<Item = i32> = async gen { yield 1; yield 2; }
I've left try out because it's complicated. You could technically do something like:
try fn foo() -> i32 throws Option<Infallible> { let number = read_number()?; number }
but for try
users usually want to know the concrete type.
So instead I'd expect most people to prefer the desugared form:
fn foo() -> Option<i32> = try { let number = read_number()?; number }
Note that the "pulling the keyword forward" transformation doesn't work because this function returns a concrete type and what I've proposed here is that pulling the keyword forward always adds an impl Trait
rather than a concrete type.
Anyway, I'm pretty excited about this idea.2 It feels like a consistent way to handle these connections between blocks, traits, and functions. It's backwards compatible with the syntax we have so far, but it gives us a lot more expressiveness in cases where we're currently missing it.
You can already do unsafe fn
and const fn
today, but these don't desugar in the same way as other proposed keywords here do.β©
Of course, I also just started thinking about this today and cranked out a blog post, so I may hate it by Monday.β©
async Drop
.
There are some tricky design questions to making this work well, and we need to start thinking about these now if we want to have something ready by 2027.
In this post, I'd like to explore a low level mechanism for how we might implement async cancellation.
The goal is to explore both how an async executor1 would interact with cancellation, as well as to make sure that this mechanism would support reasonable surface-level semantics.
You can think of this as a kind of compilation target for higher level features, similar to how the Rust compiler lowers async fn
into coroutines.
If you haven't read my last post on Cancellation and Async State Machines, I'd encourage you to do so. That post provides a kind of theoretical background for what we'll implement in this post.
poll_cancel
πLately I've been working on a prototype implementation of async
/await
, as well as changes to Future
and related traits, that supports more flexible cancellation.
I'd like to discuss this prototype, the tradeoffs made, and what I've learned about cancellation from the exercise.
Note that what I'm presenting here is Ξ±-equivalent to several previous proposals, including Boats' poll_drop_ready
RFC and a proposal by tvalloton on IRLO.
My main contribution here is a prototype implementation that lets us write examples and explore their behavior.
Future
πThe core of the idea is to extend the Future
trait with a new poll_cancel
that has a default implementation.
The new trait would look like this:
pub trait Future { type Output; fn poll(self: Pin<&mut Self>, cx: Context) -> Poll<Self::Output>; fn poll_cancel(self: Pin<&mut Self>, cx: Context) -> Poll<()> { Poll::Ready(()) } }
In this new trait, poll
has the same semantics as before.
The new poll_cancel
method performs two operations.
First, it transitions the future's state machine from its normal execution path to the correct cancellation state.
Second, poll_cancel
continues to advance the state machine until the cancellation is complete.
The fact that poll
and poll_cancel
return different types highlights that fact that cancellation is a different exit from the future.
A cancelled future returns no value, so poll_cancel
returns Poll<()>
instead of Poll<Self::Output>
This matches what we saw in my previous post where we had a different final state for a future that was cancelled versus one that completed normally.
There are some attractive properties about this approach.
The default implementation of poll_cancel
leads to the same behavior that we have for cancellation today, where cancelling a future just means synchronously dropping it.
This suggests we can get a nice migration path, although adding a new default method to a trait is technically a breaking change.
There are significant shortcomings, which I'll discuss further down.
But first, I'd like to look at how poll_cancel
works with async
and await
.
async
and await
πMost people writing async Rust should not have to deal with poll
directly.
Most of the time we use higher level constructs like async
and await
instead.
The nice thing about async
and await
in Rust is that there's nothing particularly magical about them.2
The can be thought of as desugaring into lower level constructs, and this desugaring happens in a way that you could mostly implement them both as macros.3
The primary benefit for building them into the language is that we can have nicer syntax and nicer diagnostics.
The fact that we can think of async
and await
as macros that desugars into lower level concepts means we can experiment with cancellation by writing a new set of macros that that call poll_cancel
in the appropriate place.
Most of the action will be in the changes we make to await
.
The goal here is to come up with a desugaring that has predictable cancellation behavior that is also usually the desired behavior.
The somewhat surprising thing to me is that await
mostly just forwards calls to poll
, but doesn't have a lot of interesting future behavior.
The interesting behavior (such as making sure a Waker
gets called sometime in the future) all happens in hand-written Future
impls.
We can see this in the approximate desugaring of await
from the Rust Language Docs:
match operand.into_future() { mut pinned => loop { let mut pin = unsafe { Pin::new_unchecked(&mut pinned) }; match Pin::future::poll(Pin::borrow(&mut pin), &mut current_context) { Poll::Ready(r) => break r, Poll::Pending => yield Poll::Pending, } } }
This block of code runs when some code higher up the call stack calls our poll
method.
What this block of code is doing is basically calling the awaited future's poll
function in a loop.
If that future returns Pending
, we yield Pending
.
From this code, the compiler will generate a Future::poll
function that returns Pending
when the function would yield Pending
.
This happens deeper than in the compiler than we can do with macros, but we can approximate something different.
Originally, the compiler actually generated an object that implemented Generator
(now Coroutine
) and the standard library had a wrapper that adapted the Generator
into a Future
.
We'll use this approach for our prototype.
We'll want to handle cancellation similarly to how polling is handled, where await
also forwards calls to poll_cancel
along the await chain until we arrive at a future that knows how to do something interesting with cancellation.
Looking at how we might extend the desugaring of await
to support poll_cancel
, we need to distinguish whether we're on the cancel path or the normal execution path so we can call either poll_cancel
or poll
depending on the context.
We'll punt on this and assume we have a magic is_cancelled
variable that can tell us this, which is similar to the current_context
variable in the previous desugaring.
So let's see how this first step looks:
match operand.into_future() { mut pinned => loop { let mut pin = unsafe { Pin::new_unchecked(&mut pinned) }; if !is_cancelled { match Pin::future::poll(Pin::borrow(&mut pin), &mut current_context) { Poll::Ready(r) => break r, Poll::Pending => yield Poll::Pending, } } else { match Pin::future::poll_cancel(Pin::borrow(&mut pin), &mut current_context) { Poll::Ready(()) => panic!("What do I do after cancelling?"), Poll::Pending => yield Poll::Pending, } } } }
It's like before, but we check if we are cancelled first.
If we are not, we continue with the previous behavior, calling poll
and breaking out or the loop if the future is Ready
or yielding Pending
otherwise.
If we are cancelled we do almost the same thing, except we call poll_cancel
instead.
If the cancellation is Pending
, we yield again.
But if the cancellation is complete, we have to decide what to do next.
In the normal case, we have break r
, which passes r
out to the surrounding context, which is expecting a value of whatever type r
is.
We can't do the same thing when the cancellation is complete because while r
might be type ()
, we can't rely on that.
For now we panicked, since that type checks, but this obviously doesn't work.
We can get some inspiration from our state machines we saw earlier.
Cancellation effectively means we have two exit states for the function: normal return and cancelled.
But functions in Rust only have one exit state4, so we need to reify this into some data type that shows which final state you'd be in if you could have multiple final states.
It turns out the Rust standard library has one we can use for this purpose: Result
.5
So to report that an async fn
or async
block was successfully cancelled, we can return something like Err(Cancelled)
and Ok(T)
in the success case.
Factoring this into our approximate await
desugaring gives us:
match operand.into_future() { mut pinned => loop { let mut pin = unsafe { Pin::new_unchecked(&mut pinned) }; if !is_cancelled { match Pin::future::poll(Pin::borrow(&mut pin), &mut current_context) { Poll::Ready(Ok(r)) => break r, Poll::Pending => yield Poll::Pending, } } else { match Pin::future::poll_cancel(Pin::borrow(&mut pin), &mut current_context) { Poll::Ready(()) => return Err(Cancelled), Poll::Pending => yield Poll::Pending, } } } }
In the desugaring of async {}
, we'll also need to wrap all the normal exit paths with Ok()
.
In the previous section I gave a rough sketch of how to desugar async
and await
into generators in a way that supports cancellation.
Now I want to fill in some of the details by looking at how this resulting generator becomes a future.
If we were implementing this for real in Rust, we'd probably just have the compiler implement Future
directly, like it currently does for async
blocks.
But, using generators lets us implement and experiment with this in a crate without having to modify the compiler.6
So, if we did everything right in the previous section, we should end up with a compiler-generated generator that implements Generator<(Context, bool), Yield = (), Return = Result<T, Cancelled>
, where T
is the output type of the Future
and Cancelled
is just a marker tag like struct Cancelled
.
The argument to the resume
function, (Context, bool)
, is a tuple containing the Context
as well as a bool
indicating whether the future is cancelled.
This bool
would get bound to the is_cancelled
variable in the await
desugaring above.7
Now we can make these into futures as follows:8
impl<O, G> Future for G where G: core::ops::Generator<PollState, Yield = (), Return = Result<O, Cancelled>>, { type Output = O; fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> { match self.resume((cx, false)) { GeneratorState::Yielded(()) => Poll::Pending, GeneratorState::Complete(Ok(v)) => Poll::Ready(v), GeneratorState::Complete(Err(Cancelled)) => panic!("child future cancelled itself"), } } fn poll_cancel(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<()> { match self.resume((cx, true)) { GeneratorState::Yielded(()) => Poll::Pending, GeneratorState::Complete(Ok(_)) => { panic!("future completed after being cancelled") } GeneratorState::Complete(Err(Cancelled)) => Poll::Ready(()), } } }
Our implementation needs to cover both poll
and poll_cancel
, but they are both pretty similar.
Each one forwards the call to the generator's resume
method and then adapts the result into something expected by the surrounding async code.
Generators only have a resume
method, but in this post we've extended Future
to have two methods.
So when we go from a call to poll
or poll_cancel
to a call to resume
, we need to tell resume
which version it is.
We do this by passing an extra boolean, which the generator uses to determine whether it should go along the normal execution path or the cancellation path.
Generators return either Yielded
or Complete
, which for futures correspond to Pending
and Ready
.
Because we've made resume
return a Result
to indicate whether the future was cancelled, we have some more cases to check.
We don't want to bubble the Result
out to user code; we want to keep it hidden inside the monad.
From the user's perspective, this is still just a future that evalutes to a T
, not a fallible future.
So we have this invariant, that in poll
, resume
should never return an Err(Cancelled)
and in poll_cancel
, resume
should never return Ok
.
The first case would mean that the future cancelled itself, which is not the way cancellation works in Rust.
The second case would mean the cancellation failed, that after being cancelled the future completed normally.
In this design we're also choosing not to model that case.9
In an ideal world, the compiler would be able to prove both of these cases are unreachable, or we'd design the API so that these cases aren't even possible to write.
Honestly, this is one of the aspects of this design that I'm least satisfied with.
I'd like to experiment with different factorings that would let us get rid of the panics.
Anyway, that's the rough idea of how this design works. I haven't written the complete implementation here because I find prose more informative than code, but I do have a prototype implementation at https://github.com/eholk/explicit-async-cancellation if you want to see the full details.
But for now, let's see what this lets us do.
My prototype includes a macro called async_cancel!
, which is similar to async {}
blocks, except with support for cancellation handlers.
This is meant to be paired with the awaitc!
macro, which is analogous to .await
, but with support for cancellation handlers.10
Because these are not built in syntax, they are ugly and hard to read in the examples I've prototyped so far.
So in this section, I'll write out examples as if async
and await
supported cancellation handlers in the way described above.
First, I want to introduce a convenience called on_cancel
.
This gives us a way to run asynchronous code along the cancellation path.
This is important to show that everything actually works how we want, but I'm not really a fan of the API and would prefer it not be the standard way to run code on cancellation.
Think of this as a placeholder for something like defer {}
blocks or async Drop
.11
I've implemented on_cancel
as an extension method on futures that takes a future and runs that future on the parent future's cancellation path.
That's a little confusing to read, but in code it looks like this:
async { do_something().await; println!("all done!"); }.on_cancel(async { println! ("cancelled!"); }).await;
In my examples, I'll also make liberal use of futures like pending()
and ready()
, which never complete and immediately complete respectively.
The first thing we need is an executor that is aware of cancellation.
We'll make a simple one that runs a single task, similar to block_on
.
If for some reason the executor is dropped before the root task completes, then in the executor's drop function will call poll_cancel
on the root task until it's complete.
In pseudo-code, our executor looks something like this (actual code is here):
impl<T> Executor<T> { /// Run the root task to completion fn run(&mut self) -> T { loop { match self.poll_once() { Poll::Pending => continue, Poll::Ready(result) => return result, } } } /// Poll the root task once fn poll_once(&mut self) -> Poll::Pending { let context = self.context(); self.root_task.poll(context) } // Definition of `context` is omitted } impl<T> Drop for Executor<T> { fn drop(&mut self) { let context = self.context(); while let Poll::Pending = self.root_task.poll_cancel(context) {} } }
This gives us just enough to experiment with cancellation behavior. We can run simple futures like this:
fn main() { let root_task = async { 42 }; let mut exec = Executor::new(root_task); let result = exec.run(); println!("the root task returned {result}"); }
This program would run and print out
the root task returned 42
We have some more power though.
Rather than using run
to poll to completion, we can use poll_once
some number of times to leave the future in an incomplete state.
If the executor is dropped before the future is complete, it will run the cancellation path in the executor's drop
function.
Here's a basic example showing cancellation:
fn main() { let root_task = async { pending().await; println!("all done!"); }.on_cancel(async { println!("the task did not finish") }); let mut exec = Executor::new(root_task); exec.poll_once(); // pending exec.poll_once(); // still pending drop(exec); // just give up }
In this example, the root task blocks on pending()
, which will never finish.
But we attached a cancellation handler that runs when the executor is dropped before finishing the future.
Running this program produces:
the task did not finish
So we have the basics of cancellation support and cancellation handlers. Now lets see how this composes with more interesting futures.
I'm using "combinators" here to mean futures which combine or otherwise transform other futures in interesting ways.12
By this definition, we've already seen the on_cancel
combinator, which lets you override the cancellation behavior of a future.
Let's consider another one: race
.
We'll use a very simplified version of race
, which looks like a.race(b)
.
This takes a future a
and a future b
and runs them both concurrently.
When one finishes, race
will cancel the other and return the value from the one that finished first.
The code for this looks horrible, so I'll leave it out of the post and focus mainly on how it looks to use it.
Here's an example using race
with a cancellation handler:
fn main() { let root_task = pending().on_cancel(async { println!("future `a` was cancelled"); }).race(async { 42 }); let mut exec = Executor::new(root_task); let result = exec.run(); println!("result: {result}"); }
In this example, our root task consists of a race between pending()
and async { 42 }
.
The pending()
future never finishes.
We've attached a cancellation handler to it so we can see some indication that it was cancelled.
So the race combinator sees that the second future returns 42
while the first is still pending.
Before returning, it runs the first future's cancellation handler, printing future `a` was cancelled
.
Then it returns 42 as the overall value of the race future.
This program's output is:
future `a` was cancelled
result: 42
The poll_cancel
mechanism we're discussing is able to support what I earlier called idempotent cancellation.13
This means that if you cancel a future whose cancellation process has already started then the cancellation process continues as before.
To get a feel for how this works, let's look at a rather contrived example:
| fn main() { // we'll use `done` to create a future that blocks until some other code // sets the `done` to true. let done = &RefCell::new(false); let root_task = async { // we're going to race `a` and `b`, so we'll create those two futures // separately. let a = async { 42 }; // when b cancels, we want a cancellation handler that can print a // message for us the first time it's polled. We'll use // `cancel_started` to track that. let mut cancel_started = false; let b = pending().on_cancel(poll_fn(|_| { if !cancel_started { // print a message if it's our first time through. println!("begin cancelling `b`"); cancel_started = true; } // Only complete if someone has set `done` to true. if *done.borrow() { println!("cancellation of `b` complete"); Poll::Ready(()) } else { Poll::Pending } })); a.race(b).on_cancel(async { println!("cancelling `race` future"); }).await; }.on_cancel(async { println!("cancelling root future"); }); // Poll the futures a few time, then let the executor shut down let mut executor = Executor::new(root_task); let _ = executor.poll(); let _ = executor.poll(); let _ = executor.poll(); *done.borrow_mut() = true; } |
The behavior here is pretty subtle, so let's see the output and break down why we get this behavior. The output from this program is:
begin cancelling `b`
cancelling root future
cancelling `race` future
cancellation of `b` complete
The core of this program is that we race two futures (line 28), one that returns immediately (line 8), and one that never completes (line 13). We've attached a bunch of cancellation handlers at various points so we can observe the behavior and the order that things happen in.
The cancellation handler on b
is pretty complex, but the idea here is create a future that waits until some flag is set.
We wanted to simulate something that takes a little bit of time to complete, but not an unbounded amount, so that we can interrupt the cancellation.
So, we start running, the async { 42 }
completes immediately and then race
has to start cancelling b
.
This shows up in the line begin cancelling `b`
.
This cancellation does not complete, even though we poll a few more times, because no one has set done
to true.
The next step is to trigger the second cancellation of b
.
We do this by letting the executor go out of scope without completing, which means the destructor calls poll_cancel
on the root task.
This is when we see cancelling root future
appear.
This gets passed on to the race
future because of the way we've desugared await
, so we see the program print cancelling `race` future
.
In the implementation of race
, its poll_cancel
method cancels any futures that have not either completed or been cancelled.
In our case, this means we call poll_cancel
on b
again, but this time the call chain originates in the executor's destructor rather than the normal execution of race
.
Finally, since the done
flag has been set, b
's cancellation can complete and we see it print out cancellation of `b` complete
.
If we had instead supported recursive cancellation, we would have had the option of having b
's cancellation handler terminate early.
There are likely cases where both options would make sense, but here we've chosen to use idempotent cancellation semantics across the board.
This one is left as an exercise for the reader (or a future blog post here), but I don't see any fundamental reason why we can't do it.14
The gist of the idea is that anywhere we call poll
, we'd want to wrap that in catch_unwind
.
If the poll
function panics, we'd want to catch that, then call the future's poll_cancel
method to completion, and then call resume_unwind
to continue unwinding.
It will be annoying to have to do a poll
, catch_unwind
, poll_cancel
, resume_unwind
dance everywhere, but the basic idea should work.
There are other challenges though.
One is that the poll_cancel
functions will need to be written to be aware of the fact that they might be called during unwinding, which means the internal state for the future might be inconsistent.
Writing this post gave me the chance to thoroughly explore this design. I would say overall I think this design has enough shortcomings that I don't want to advocate it as the solution for async cancellation handlers. I still think this is useful because the shortcomings can help us find a design with fewer, or at least more acceptable, compromises. The fact that I've been able to implement this as a prototype means we can easily pivot and explore variations.
That said, I wouldn't have written so much about this design if I didn't think it had some merit. So now I'd like to discuss what I see as some of the greatest strengths and shortcomings.
In my mind, the biggest strength is that it feels like a relatively small extension to async Rust, but it still gives a lot of benefits.
It's basically one new method on the Future
trait, as well as a minor change to the way async
and await
desugar.
We can provide a default implementation of poll_cancel
which preserves the status quo semantics for cancellation and therefore makes the migration path pretty easy in most cases.
Of course, we're going to come back to this in the Weaknesses section because it's not all roses.
This design makes it clear what the responsibilities are for well-behaved executors (and executor-like things, like future combinators) to make sure cancellation behavior makes sense.
I think this design also works well with the requirement that futures are pinned.
For example, and alternate approach could be adding a method like fn cancel(self) -> impl Future<Output = ()>
.
The problem is that once a future has been pinned, you can't pass it as self
.
Instead, the signature would have to be something like fn cancel<'a>(self: Pin<&'a mut Self>) -> impl Future<Output = () + 'a
, which I think is going to be annoying for executors to work with in practice.
Cancelling in place strikes me as significantly simpler.
All of the benefits I've talked about in this post are available without what strike me as significantly more extensive language changes.
For example, this gives us some way to run code on cancellation paths without needing complete support for async Drop
.
Of course, this leads to significant shortcomings that we'll see in Weaknesses.
On the bright side, I think something like the poll_cancel
API can serve as a compilation target for cancellation, the same way that poll is a compilation target for await
.
The weaknesses in this design range from what to me seems rather tolerable to some that I find completely unacceptable.
On the more tolerable end of the spectrum, there's the fact that this API feels a little fragile.
We have a requirement that once you call poll_cancel
on a future you can never call poll
again, but the compiler can't do anything to prevent you from doing that.
This kind of requirement isn't unprecedented though.
For example, with futures you already aren't supposed to call poll
again after the future has completed, but the compiler doesn't stop you from doing that.
In both cases, we can mitigate this by treating await
as the normal interface to poll
and poll_cancel
and guaranteeing that those generate correct code.
Calling poll
and poll_cancel
directly would then be considered an advanced use case, so we can tolerate more complex requirements there.15
I'm slightly more concerned about the migration path.
As a strength, I mentioned that the default impl of poll_cancel
means without any additional action, futures will retain their present-day behavior.
In many cases, this is perfectly fine, but it's probably the wrong default for future combinators.
For example, suppose you were using an async IO crate that supported asynchronously cancelling operations in flight, but you put one of those futures behind an older version of race
that did not yet support poll_cancel
.
In this case, when the race future is cancelled, it would fall back on the default implementation, which says "ok, all good, nothing left to do," without calling poll_cancel
on the IO operation.
The result would be that the programmer has to be extremely careful to make sure that everything in their call chain handles cancellation correctly.
Cancellation would be best effort, at best.
You definitely could not rely on this for safety!
One possible way to avoid this might be to introduce poll_cancel
through a CancellableFuture
trait instead.
Doing this in a way that's backwards-compatible would be tricky though.
Related to this shortcoming, poll_cancel
puts a heavy burden on executor and future combinator authors.
It's already tricky to write a state machine that calls poll
. Having to add poll_cancel
calls to that state machine as well is going to be a lot of error-prone work.
We might be able to factor some of this work into common libraries that make it easier though.
But to me the most critical shortcoming of this design is that it it's easy to forget to cancel a future.
Fortunately, as long as your future is always behind an await
, you should be okay.
On the other hand, there are common patterns that would now be error-prone.
For example, consider the following example with FuturesUnordered
:
let mut futures = FuturesUnordered::new(); futures.push(async { do_something().await; }); futures.push(async { do_something_else().await; }); futures.next().await; drop(futures);
Here we've added two futures to a FuturesUnordered
collection.
When we call next()
, it will poll both futures until one of them completes, and then the next()
future will complete.
This means that futures
is still holding on to a partially completed future.
But, when we drop(futures)
, there's no way to run poll_cancel
because drop
must complete synchronously.
So, our only option right now is to just not cancel the future.
I suppose one way to work around this shortcoming is to try to argue that FuturesUnordered
is a bad API.
Maybe I could redefine what we mean by structured concurrency to say that FuturesUnordered
is unstructured and the cancellation mechanism we've described here only works for structured concurrency.
If I were to take this approach, our example would look more like this when using a redesigned FuturesUnordered
collection:
FuturesUnordered::with(async |futures| { futures.push(async { do_something().await; }); futures.push(async { do_something_else().await; }); futures.next().await; }).await;
This solves the problem by making it so that FuturesUnordered::with
does no work until its awaited, so there is never any partially completed future that's not under an await
point.
It's less than ideal for a few reasons though.
Stylistically, it adds more rightward drift.
But more importantly, this API makes it hard to put a FuturesUnordered
in another data structure, which can be quite useful in many situations.
Plus, in my subjective opinion, the original version feels more Rusty.
Without a solution, I think this issue will make cancellation handlers so unreliable as to not be useful.
In fact, they will likely do more harm than good.
This leaves me convinced that we need some more general solution, like async Drop
.
The key thing is to have some mechanism for the compiler to make sure, in an async function, that any values that need cancelled are cancelled.
To be honest, I'm a bit disappointed by this realization.
I haven't personally seen a design for async Drop
that I love16, so I was hoping that something like poll_cancel
would give us most of the benefits of async Drop
without having to wrestle with as many complex design issues.
That said, I think a design like poll_cancel
complements a higher level feature like async Drop
.
Even if we have a async Drop
, we need to figure out how these get run and whether we can get the properties we want in order to build on them.
I think a variation on poll_cancel
would give us a useful lower level target to build a more powerful feature like async Drop
on top of.
If you've been following this space for a while, the ideas I've discussed here probably sound very familiar. I wanted to take the time to both acknowledge the work that's come before, but also highlight the ways in which my proposal here differs from earlier work.
One of the earliest versions I'm aware of is the (now abandoned) poll_drop_ready
RFC from Boats.
One of the biggest differences is that the RFC focuses a lot on compiler-generated async drop glue to call poll_drop_ready
and make sure things are cleaned up well, while I've left that completely out of scope for this post.
I appreciated the RFC's careful consideration of issues around pinning and fusing poll_drop_ready
.
I've not really thought about these issues in my post, but I think we will need to if we move forward with this or a similar design.
I also appreciated that the RFC called out that the synchronous drop
would still be called after poll_drop_ready
returns Ready(())
.
That feature was implicit in my design as well, but I think it is better to call it out.
The most important distinction, however, is that I have focused mainly on cancellation semantics in this post (that is, what if a future is not polled to completion?), while it seems that poll_drop_ready
is called as part of the parent future completing normally through poll
.
In other words, it seems executors are not intended to call poll_drop_ready
directly.
This has some implications on when the programmer can assume poll_cancel
/poll_drop_ready
will be called.
There was another proposal on IRLO to add poll_cancel
to the Future
trait that is syntactically exactly the same as I've described here.
The semantics look essentially the same as I've describe here as well, with perhaps some minor variations.
For example, in my design I've imagined you do not have to call poll_cancel
on a future that's never been polled.17
I think the guarantees on the contract in the IRLO post are stronger than I was hoping we'd need here---I imagined we could get away with saying something like "a well-behaved executor should..." rather than "you must."
In particular, I didn't have the requirement that "A polled future may not be dropped without poll_cancel
returning ready," and instead imagined such a thing would be impolite but not illegal.
I think the biggest contribution I've made in my post is showing how to adjust the desugaring of async
and await
to work with poll_cancel
, giving us an answer to how "to generate a state machine that can keep track of a future in mid cancellation as a possible state."
Another excellent contribution in this area is A case for CancellationTokens.
One of the things I really like about the post is the review of the major options in this space, including request_cancellation
, poll_cancel
, async fn cancel
and cancellation tokens.
If you haven't read it yet, that section alone is worth the read!
The main idea behind cancellation tokens is to have some bit of state that's carried along the await chain and futures can check whether they've been cancelled and activate the correct behavior in that case.
It has some nice benefits around composability, and seems to be better at traversing code that is not cancellation-aware, which is a major shortcoming of poll_cancel
as I've describe it here.
One thing I find interesting is that although on the surface cancellation tokens and poll_cancel
look like extremely different mechanisms, they have more in common than it appears.
For example, the extra is_cancelled
flag we added in the async
and await
desugaring looks an awful lot like a cancellation token.
I think it'd be worth exploring this connection in more depth.
The last idea I want to explore is request_cancellation
, which seems to have been first introduced in some early async vision notes by Niko Matsakis.
This is framed as a replacement Future
trait called Async
which includes a request_cancellation
method.
The idea is that after calling request_cancellation
on a future subsequent calls to poll
would proceed along the cancellation path rather than the normal execution path.
This has a couple of strengths.
It avoids the possibility of calling poll
after calling poll_cancel
.
More importantly though, request_cancellation
can be used to support recursive cancellation.
After writing this post, I'm actually pretty excited about request_cancellation
because it seems strictly more powerful than poll_cancel
.
In this post we've made an in-depth exploration of how a poll_cancel
API would support cancellation handlers in Rust.
The design includes a prototype implementation which allows us to write real programs to get a feel how cancellation behaves.
In the course of doing this, we realized that poll_cancel
has some significant shortcomings and is probably not the best mechanism for cancellation handlers going forward.
But, we also see promise for related proposals to address the specific shortcomings we've identified.
I'm using executor broadly here to basically mean "any code that calls poll
on futures directly." This obviously includes async runtimes, but also includes many future combinators like race
or join
.β©
This is a little bit of a lie. They desugar into generators and yield
expressions, which do involve a fair amount of compiler magic to implement. The key thing here is that we don't have to do much additional magic if we can rely on the compiler to give us support for generators.β©
Indeed, in the early days of async Rust, await!
was in fact implemented as a macro.β©
Well, not quite. Anything can panic, which you can treat as another final state for a function.β©
Option
would work just as well.β©
TC has also shown that we can emulate coroutines using async
/await
, so it's probably even possible to do all of this on stable Rust.β©
We could also add a Context::is_cancelled()
method and just pass one parameter. There are a lot of ways to plumb this around.β©
This is pseudo code. I'm assuming the pinning stuff just works. Also, my actual implementation had some transmute
crimes that I've left out here for clarity.β©
This "complete after cancel" case is one that could reasonably happen. For example, maybe you sent a request to a server, started to cancel it, but before you could the server sent back a response saying the request was completed. One possible behavior is to just drop the return value and say the cancellation was actually successful. In code this would mean replacing the panic!("future completed after being cancelled")
line with Poll::Ready(())
. The design in this post doesn't do this, but futures themselves are empowered to handle this case however they see fit.β©
If this were Scheme, I'd call this macro something like await/c
or await/cancel
, but Rust doesn't let us use /
in identifiers.β©
Incidentally, I'm also not entirely in love with defer {}
and async Drop
, but I think async Drop
in particular solves a lot of problems I don't know how to solve otherwise.β©
Sometimes I also find it helpful to think of combinators as mini executors, since combinators and executors both call poll functions on other futures directly.β©
I don't think it would take too much to extend this to support recursive cancellation, but that's left for another post or an exercise for the reader. I think they key thing is you need some way to tell how many times you've been cancelled. One way is to add a depth
or count
parameter to poll_cancel
. Another is to have cancelling a future destroy the old future and create a new future that represents the cancellation of the old one, which could itself be cancelled.β©
Whether we want to do it is a fair question though.β©
This is somewhat related to fusing futures and iterators. I haven't really touched on what happens if you call poll_cancel
after the future is cancelled, but I think Boats' earlier proposed RFC on poll_drop_ready
makes a pretty good case that poll_cancel
should require fused semantics -- that is, that you can call poll_cancel
again after it completes and nothing bad happens.β©
For example, I haven't seen a good way to run async destructors without introducing implicit await points. I like that right now we have the property that you can see anywhere an async fn
might suspend by looking for await
. Although, if I'm totally honest, this may not actually be that useful of a property.β©
The reason for this was to try to make it so we could get away with only having to deal with poll_cancel
in the desugaring of await
. Given the issue with FuturesUnordered
, I don't think we can get away with only calling poll_cancel
as part of await
and will probably need some kind of compiler-generated drop glue cancellation path. Thus, it's probably simpler and better overall to have poll_cancel
called even on futures that haven't been polled yet.β©
async Drop
.
There are some tricky design questions to making this work well, and we need to start thinking about these now if we want to have something ready by 2027.
In this post, I'd like to explore a low level mechanism for how we might implement async cancellation.
The goal is to explore both how an async executor1 would interact with cancellation, as well as to make sure that this mechanism would support reasonable surface-level semantics.
You can think of this as a kind of compilation target for higher level features, similar to how the Rust compiler lowers async fn
into coroutines.
Let's use the program below as a running example.
In real life, this would probably return a Result<DataTable>
, but I want to avoid the extra complexity around additional early exits.
async fn load_data(file: AsyncFile) -> DataTable { let mut data = Vec::new(); let result = file.read_to_end(&mut data).await; result.unwrap(); // We're ignoring proper error handling parse_data(data) }
For an async function's state machine, states are made up of the the code between await points. Or alternatively, you can think of await points as edges between states. For this program, the state machine would look like this:
You might notice I pulled a bit of a fast one on you.
I said await points turn into edges in the state transition diagram, so we'd expect to see just one edge labeled await
.
Instead, we have two edges labelled await
and one without a label.
What's going on?
First, some conventions.
I realized it's helpful to see some of the traditional control flow in addition to suspension or await points.
I've represented these edges as a solid, unlabeled line.
These mean that control transfers from the previous state immediately to the second state without any suspension.
Our example is a relatively simple strait-line program so the actual control flow graph isn't particularly interesting but this will change a little when we look at cancellation.
The other edge we have in this graph is the await
edge.
These edges are labeled await
and are dotted lines to indicate that execution is interrupted---the future will suspend and give the executor the chance to switch to another future for a time.
Finally, I've introduced a couple of special states that do not exactly correspond to any code the user wrote.
These states are shown in orange.
Now let's turn our attention to why the diagram shows two await edges but the await
keyword only shows up once in the program.
Every async fn
has an implicit suspend point that represents the time between when the function is called and when is first polled.
In this diagram, I've represented this as an await edge going from the start state to the first line of the function.
In general, you don't have to worry about this hidden initial suspend point too much because async function calls are almost always immediately awaited.
In other words, it's more common to see foo().await
instead of let future = foo(); /* do some other stuff */; future.await
.
The state machine we've looked at so far does not do a good job of representing cancellation. Let's try to extend it to do so.
Today in Rust cancellation simply means you stop polling the future, and instead it is dropped.
When dropping something like a closure or a future returned by an async fn
, Rust needs to recursively drop the values store in (in other words, captured by) the closure or future.
Depending on what state the future is in when it is dropped, there are different values that need to be captured.
In our example, if we drop the future before we pull it, we only need to drop the AsyncFile
that was passed in as a parameter.
On the other hand, if the future is dropped at the await point, we also need to drop the Vec
that we read the file contents into.
We can add some extra states to our graph to illustrate this.
I've specifically called out drop
along the cancellation path, but Rust also drops values during the normal exit path.
I've left the normal drops out for simplicity.
I like thinking of async functions this way because we can use it to make several observations about cancellation in Rust. Many of these seem rather obvious, but they raise important requirements for designing a system that can handle cancellation well.
Observation 1: Cancellation is a state change. When we cancel a future, it transitions from its normal running states to a cancellation path. Currently this happens implicitly when a future is dropped, but in the future we will probably want a way to explicitly transition a future to its cancellation path.
Observation 2: Async cancellation handlers1 require adding await points on the cancellation path. At the moment, cancelling futures is synchronous. This shows up in the async state graph in the fact that there are no await edges on the cancellation path. If we want to allow for cancellation handlers, we will need to add await points in the cancellation path. This may be obvious, but this also implies we need a way to make sure executors continue to poll futures that have been cancelled.
Observation 3: Cancellation is an alternate exit. An async function that has been cancelled does not exit through the normal return path.
From the perspective of an async function author, this shows up as the function not continuing to execute past an await point.
From a types standpoint, a function cannot exit normally in general because we may not yet have a value of the right type to return.
In our example we can see that the type of the function does not allow it to exit at the await point, because at that point we have not created a DataTable
to return.
This observation has implications that will show up in the types of the API we eventually design for cancellation handlers.
Another thing we can explore with a state graph is what behaviors are possible if a cancelled future is cancelled again.
One common way this could happen is if you have something like a race
combinator that returns the value of the first future to complete and cancels the other one.
If the race
combinator is itself cancelled while it was cancelling the slower sub-future, the slower sub-future would be cancelled twice.
FIXME: write out and explain a code example of this case.s
Let's look at this in the abstract with state machines.
There are a couple of possibilities for how to handle cancellation of cancellation. I'll consider three of them, inspired by the zero one infinity rule.
Once we have support for cancellation handlers, it will definitely be possible to write code that leads to trying to cancel a cancellation.
The race
example we mentioned earlier is one example.
So in this option, we would declare cancelling a cancellation to be an error.
We have some flexibility on what mechanism we'd use exactly, but I think the best option would be to panic.
I think in practice this option is not feasible. Cancellation flows from top to bottom (e.g. an executor decides to terminate a task early and so runs the task's cancellation handler), but the higher levels do not know anything about the internal behavior of futures. An executor that is cancelling a task does not know if one of the task's subfutures is trying to cancel a future already.
In this version, cancelling an already-cancelled future is basically a no-op. In state machines, it would look something like this:
The key point here is that any of the cancel states have a cancellation edge that comes back to the same state. In other words, cancelling once your future has already been cancelled means you stay in the same state and continue executing the cancellation handler before.
What does this mean in practice?
It essentially means you can trust that your cleanup code in a cancellation handler will run to completion.
Admittedly, this might take additional rules, like we may want to declare it to be undefined behavior to not poll a cancelled future to completion2.
Scoped tasks would likely need this guarantee, but we could consider weaker ones, like that a "well-behaved" executor will poll cancelled futures to completion.
The "well-behaved" guarantee is roughly what we have today for Drop
, so it might be similarly useful.
The downside is that this also means we can add cancellation behavior that can take arbitrarily or even infinitely long.3 We might decide instead that cancellation means something like "request graceful shutdown" but then forcibly terminate a future if it takes too long. For this we need recursive cancellation.
In this version, canceling an already cancelled future would transfer us to a separate cancellation path. That cancellation path could also be cancelled, and its cancellation could be cancelled, and so on. In pictures, recursive cancellation looks like this:
While an infinite regress of cancellations might seem ridiculous, there are some cases where it might be useful. There's also a nice regularity to it.4 One class of problems where this might be useful are cases where you have optional cleanup work to do but you can cancel it if needed for a more prompt shutdown. Of course, I'm not sure this is really all that useful in practice, and if you need it there might be other ways to do it.
More importantly, there are many cases where you absolutely do not want to cancel the cancellation. For example, maybe you have a transaction future whose cancellation path rolls back the transaction. You do not want to stop the rollback before it's complete, or else you've completely defeated the purpose of transactions.
That said, recursive cancellation appears to be strictly more powerful than idempotent cancellation because if you have recursive cancellation you should be able to implement idempotent cancellation where needed (basically, you just ignore the subsequent cancellation signals and stay in the same state you were in).
Seen this way, recursive cancellation gives us a lot of flexibility. It means individual futures can implement either behavior, according to what best fits their needs. The main thing the Rust language would need to do is design reasonable defaults and set expectations so people authoring futures can encapsulate their specialized behavior.
We've long talked about async functions as state machines, so in this post we looked at how you might draw a state transition diagram for async functions. This gave us a way to play with cancellation and look at what various cancellation semantics might imply in terms of the shape of the state transition diagram. I've found it really helpful to think about async cancellation this way, so I hope others find it useful as well!
This post was originally part of a larger post about implementing a prototype of async cancellation handlers. The larger post was taking a long time and I felt like the content in this post was useful on its own so I wanted to go ahead and publish it. While I no longer like to promise that a followup post is coming soon5, I do have most of the longer post drafted so chances are good I will get it out soon. Plus, I did commit to discussing it at the WG Async Reading Club next week, so there is a little pressure on.
Anyway, please reach out if you have any thoughts or questions!
I'm using "cancellation handlers" refer broadly to mechanisms to allow running async code on the cancellation path. This would likely be async Drop
, I want to use a more general term to emphasize there are multiple possibilities here.β©
This will require us to mark something unsafe
somewhere.β©
This is true of drop
already today. I can write fn drop(&mut self) { loop {} }
and my program will hang when the destructor tries to run.β©
One of the things that bugs me about the idempotent version of cancellation is that you can call any future from either a normal execution or cancellation path, but in the cancellation path they effectively become uncancellable. It's not actually a problem, since not cancelling a future is always a choice your allowed to make, but the asymmetry still bothers me.β©
I'm sure you'll find plenty of examples on my blog of posts I said were coming soon that did not, in fact, come soon, if they ever came at all.β©
We have the beginnings of a proposal, but I wanted to write it up in my own blog to help make sure I understand it. Note that this is a draft proposal at best at this point and nothing is set in stone. I also want to recognize Jane's work in coming up with this process. This is largely based on her initial suggestion and I want to make sure I'm not taking credit for something I didn't come up with. But, any failings in this post should be viewed as my own and not hers.
Rust is split into two major organizations: the Rust Foundation and the Rust Project. The Foundation does a few things. It provides a legal structure to hold Rust's intellectual property. It provides an entity for organizations to contribute financially to support Rust. It supports the long term health of the Rust project. One way it does this is through the Community Grants Program.1
The Foundation is governed by a Board of Directors, and five of the seats on the Board of Directors are reserved for members of the Rust Project. These seats are known as Rust Project Directors. Project Directors are meant to serve for a term of two years with the hope that we can stagger terms and rotate out a subset of the directors this year.
Unfortunately, due to the lack of Rust Project governance over the better part of the last two years, we have not appointed new directors in place of those whose terms are completed. Instead, the Foundation Board has voted several times to extend the terms. Currently the terms are set to expire on September 21, 2023 so we'd like to be able to appoint new ones without having to ask for another extension.
There are a number of desired features and constraints on this process.
First of all, the Foundation bylaws state that Directors must be elected by those they represent.2 In the case of Project Directors, this means they must be elected by the Rust Project, and the Project governance is set up so that the electors will be the Rust Leadership Council. There is some flexibility in what counts as an election though. For example, we could follow some other selection process and the Council could then vote to ratify the results of that process.
So one possible process is to have the Council pick and vote on a set of directors without any input from rest of the project. This would be a bad process. We want something that gives the Rust Project a chance to provide input to the process. And we want some transparency in how the Project Directors were elected.
Doing a more traditional election would also be at odds with Rust's culture, which tends to prefer consent based decision making rather than rule by majority or plurality.
With this background in mind, we can now discuss a possible process for selecting Project Directors. I've based this off of the notes here, and the process described there is heavily influenced by Sociocracy for All's Selection Process.
I think of the process as a bottom-up process, so I'm going to describe it in those terms. We start by soliciting nominations from each top level Rust team. These nominations go to the Council, which will do the final selection. Let's look at these in more detail.
When we kick off the process, we will start by telling all the Rust teams that they should begin nominating candidates for Rust Project Directors, with a deadline for when nominations will be closed.3 Teams should look at the role description and think of people they think would be a qualified candidate. These candidates will likely come from the team itself, but there's no requirement. Teams can nominate anyone who they believe meets the qualifications that will be set forth in the role description. We aren't planning to impose a requirement to nominate a certain number of candidates. It doesn't really make sense to nominate more than the number of vacancies, but there's no reason a team couldn't do that. Similarly, a team may choose not to nominate anyone, or they may do this by default if the deadline expires.
We plan to leave the process for nominating candidates up to the teams, with a strong suggestion to follow a miniature version of the process the Council will follow. For the purposes of accountability though, I think it makes sense to have the team's Council Representative drive the process, although "driving the process" might mean delegating to someone else who wants to run the process. The main reason for this default is to make sure someone is responsible for making progress.
Once the team has selected a set of candidates, they should report these to the Council. The team's council representative will be responsible for communicating these to the Council as a whole. One of the goals of this project is to gather feedback to help members of the project grow. Thus, I think it would make sense for the team to provide their nominees as a document (we might even provide a template) that lists the nominees and why they were chosen. It might also make sense to include a list of people who were considered but not nominated, and why they weren't nominated. I would hope this is a positive experience, so we don't say "we didn't nominate person X because they're terrible," but more as a way of highlighting rising stars. For example, we could say "Person Y was considered and shows promise, but we would like to see more growth in these areas first. Please consider them in the future."
The next step is for the Council to select the Project directors from the pool of nominees. There may not be much to decide here, since it's quite likely that we have exactly the number of nominees as there are openings. But even in this case, we want to have a defined process that we follow.
The draft proposal says the Council should select a facilitator to lead the process. The Council would then go through a round process, where each council member proposes a candidate from the nominees and explains why they think the candidate is a good choice. After the first round, they Council goes around again in a change round, which gives everyone the chance to change their nomination based on the discussion so far. Once this is done, the facilitator takes all of the suggestions and proposes one candidate. The Council then consents to this choice, or if there are objections then the facilitator proposes a new candidate.
The process as I've described it so far is an iterative process, meaning we'd run the process to select one candidate, then do it again to select the second, and so on until filled all the open seats (two or three, in this case). The subsequent rounds should be much faster than the first, because we can reuse most of the information gathered in the first phase.
An alternate way to do this would be as a batch process, where we pick the whole set of candidates at once. I think this would be my preference for a couple of reasons. First off, it's likely to be more time efficient, since we only have to do one process.4 Secondly, and more important to me, is that picking all the candidates at once allows us to more directly look at characteristics of the candidates as a group. We'll already need to account for employment constraints, but choosing the set of directors as a group also lets us more directly make sure we have broad representation within the project. In my mind, the thing we want to select for is a successful group more than any individual characteristics of the members.
Anyway, the batch process would be essentially the same, only instead of going through rounds proposing individuals, we'd propose a set of individuals to fill all the seats at once.
Once the Council has consented to a set of candidates, we'll have a vote to ratify the selection. Since we've already heard all objections and consented to the selection, this would be expected to be a unanimous vote. The main purpose here is to make sure we are meeting the requirement in the Foundation bylaws to elect the Project Directors.
After the vote passes, the process is complete. The Council would then announce the results and the new Project Directors would take office once the outgoing Directors' terms end.
I've described my interpretation of the process we have in mind so far, along with some of my own opinions and additions to it. My main goal here is to explain what I'm thinking so we can make sure the project director election group has a shared understanding of the proposal, and also to fold any relevant thoughts back into the proposal. But I'd also like this to serve as a chance to raise awareness and solicit feedback. If this is something you're interested in, please come see us at #council/project-director-election-proposal on Zulip.
To give us enough time to follow this process, we are going to try to reach consensus on the proposal within the not too distant future. Look for official communication from the Rust Leadership Council once that happens.
Thanks to Jane Losare-Lusby for reviewing this post.
This isn't an exhaustive list, and having a more complete understanding of everything the Foundation does is something I'm definitely working to build.β©
We've been interpreting this to mean we need to have a vote, but there's some ambiguity here. For example, maybe "selection" and "election" are just synonyms for each other. To be play it safe though, we expect to have a ratification vote following the selection process we're currently designing.β©
The deadline is important here because we need to choose Project Directors by the time the current term ends on September 21.β©
Of course, it may turn out that picking three people at once is actually much much harder than picking one person three times in a row.β©
We have the beginnings of a proposal, but I wanted to write it up in my own blog to help make sure I understand it. Note that this is a draft proposal at best at this point and nothing is set in stone. I also want to recognize Jane's work in coming up with this process. This is largely based on her initial suggestion and I want to make sure I'm not taking credit for something I didn't come up with. But, any failings in this post should be viewed as my own and not hers.
]]>One of the reasons I chose to come work for Microsoft is that they seem to be one of the few examples of a large, established organization that intentionally and dramatically changed their culture. I've been curious how they did that, and what other organizations can learn from that.
It seems like an important part of it is to have regular conversations about culture, such as by doing exercises like the one I'm going to discuss in this post. So with this background in mind, let's talk about the exercise.
At the start of the exercise, we were reminded of the company mission statement. Then we were asked to spend a couple of minutes coming up with a sentence explaining how we personally contribute to it.
According to Microsoft's About page, Microsoft's "mission is to empower every person and every organization on the planet to achieve more."
This mission actually speaks to me a lot. I feel like computers should be tools that help people do the things they want to do. Too often today, computers seem to actively work against their user instead. I have a kind of lengthy rant on this subject that I should probably write down someday, but that will be for another time.
So how did I respond to the exercise? I wrote:
In my work, I empower people to achieve more by creating powerful and accessible languages and APIs, and by helping to build a team that effectively does the same.
I felt rather proud of this answer, which is why I wanted to write about it in more depth. I want to do this by going into more detail about the main phrases I used.
The first is powerful and accessible. In my mind, Rust fits the bill really well here. Rust is an incredibly powerful language. Features like the borrow checker can ensure safe memory access for some complex patterns without any runtime overhead. The trait system provides incredible opportunities for abstraction. Rust provides good support for low level programming, while still including potent functional programming features like lambdas and algebraic data types. But what really impresses me about Rust is that it manages to make all this power usable to many programmers. Rust isn't the first language to do all these things, but some of the other languages essentially require a Ph.D. to use them effectively.1 On the other hand, things like Rust's extreme attention to detail in its error messages greatly increases that chance the programmers will be successful with such a powerful language.
The second is languages and APIs. I thought about writing "languages and libraries" because I like the alliteration but libraries seem too broad. I don't often write an entire library, while I might add a single function to an existing library. I felt like "API" better expressed that scope, and I've even heard a library called an API at times so this can generalize if needed. The reason I mentioned these two together is that I believe it's best to consider a programming language and its standard library as a unit. As a language nerd, I can easily get caught up in the excitement around designing new language syntax and semantics. These language features need to be supported by a solid, well-rounded library. To make an analogy to spoken languages, I think of the programming language as the grammar and the standard library as the vocabulary, It's hard to say much of anything with only one.
Finally, I mentioned the importantance of helping to build a team. This is the aspect I have the least experience in, but it's important and I'm trying to learn more about how to do it. A well-functioning team can accomplish far more than an individual! These teams don't form (or at least, aren't maintained) by accident. So it's important to do things like mentor new team members and to create a welcoming place so new people can feel comfortable joining the team. It's important to find ways to help grow members into new roles so that the team can outlive any one member. It's important to foster a sense of shared purpose and values so we can effectively work together. It's important to make space for disagreement, since often I find the best outcomes are a result of hashing out our differing perspectives.
Doing an exercise like this can feel pretty cheesy but this one resonated with me a lot. It was a chance to put into words why I do the things I do, both in my day job and how I bring those things into the Rust community. And now, having put this into words, I can use it as a guide when deciding how to approach my work in the future. While this post was specific to my role within Microsoft, I think doing a similar exercise with Rust's mission would be illuminating. Maybe I'll do that in another post. What about you? What values do you hold and how do you put those into action?
I feel like Rust does not need a Ph.D. to use well, but given that I have one, I may not be the most qualified person to make this claim.β©
One of the reasons I chose to come work for Microsoft is that they seem to be one of the few examples of a large, established organization that intentionally and dramatically changed their culture. I've been curious how they did that, and what other organizations can learn from that.
It seems like an important part of it is to have regular conversations about culture, such as by doing exercises like the one I'm going to discuss in this post. So with this background in mind, let's talk about the exercise.
]]>The Leadership Council is new, and in many ways its first tasks will be to define what it is. We know it's sort of a replacement for the Core Team, but it's also supposed to be significantly different. A lot of our first tasks are going to seem relatively mundane: figuring out when we regularly meet, how to propose items to the agenda, how we communicate what we're working on, etc. After that, we can get on to the "more substantial" questions. One colleague of mine told me once that Rust has at least two years of governance debt, and given that they said that two years ago, at this point we probably have at least four years of governance debt!
While we figure these things out, I know there are a few things I can say about myself and values, and how I hope I can bring these to the Leadership Council. Keep in mind that these are my own opinions. I'm not speaking for the Leadership Council or the Compiler Team, so the priorities I suggest here will evolve over time.
When deciding whether to nominate myself for this role, I spent a lot of time thinking about why I wanted to do it. To me, the question came down to how much I wanted to do the work.
The best leaders I've seen in my life are the ones who saw their job as serving those they lead. I want to embody this mindset as I serve on the Leadership Council.
Most of my recent Rust contributions have been primarily within the Async Working Group. Lately, I've found myself thinking more about the project and community as a whole. For example, how do we make Rust more welcoming to those who want to contribute? Or how can we make sure all the components of Rust work together as a whole? How can we build excitement for contributing, including contributions we might think of as non-technical?
I realized that questions like these are the kinds of questions the Leadership Council should be thinking about.1 Given that I've already been thinking about these questions, joining the Leadership Council became a clear opportunity to actually get to work on these things.
One of the most important things I can do, especially at the start, is to listen. I will be actively reaching out to the leaders in the Rust community to find out what they need and how I can best serve them in particular and the community in general.
I am also going to make myself available for office hours. I have set up a Bookings page where you can schedule a 30 minute meeting with me. Please feel free to use this if you'd like to have a synchronous chat about something related to the Rust Leadership Council or the Rust project in general. To make this easier to fine, I've added a new top level page to this site where I'll keep up to date information about how to book an office hours appointment with me.
In the conversations I've already had with folks around Rust governance, one of the clear themes that has come up over and over is that we need more transparency in Rust leadership. Fortunately, I believe all of us on the Council agree with this and are committed to improving transparency. I believe most of this transparency should come through official channels, such as published minutes from Leadership Council meetings. That said, I intend to supplement these official communications by sharing about my thinking as it relates to the Leadership Council. This post is an example, and I will continue with more like this.
I wanted to share some of how I'm thinking about my role on the Leadership Council, and the things I plan to do. I'll be honest, I'm a little scared even to post this, because if I fail at these goals it will be obvious. I think this accountability is good. If this is the last you hear from me about this, then I've failed as a leader, and people should know that.
But I also may not have the right priorities. There's a lot we don't know, and almost everything I've written here may need to change. When changes are needed and made, I promise to be transparent about them. Please help me to be a good servant and leader for the Rust community.
And with that, I want to close with an explicit call for feedback. What do you think of my priorities here? If I do these well, will you be happy to have had me on the Leadership Council? What are some things I've missed or should do instead?
Please send me your feedback, either by joining me in Office Hours, DMing me (eholk) on the rust-lang Zulip or Mastodon, or emailing me at eric@theincredibleholk.org.
I'm thrilled to have met the other members of the Leadership Council. I think we have a great group of people who all bring important background, perspectives, and skills to the team. I'm excited to work with them to make Rust the best language and community it can be!
This doesn't mean the Leadership Council is necessarily the right place to solve them. One of the main goals of the governance RFC was that the Council should primarily look to delegate to more suitable teams and to create those teams when they don't exist.β©
The Leadership Council is new, and in many ways its first tasks will be to define what it is. We know it's sort of a replacement for the Core Team, but it's also supposed to be significantly different. A lot of our first tasks are going to seem relatively mundane: figuring out when we regularly meet, how to propose items to the agenda, how we communicate what we're working on, etc. After that, we can get on to the "more substantial" questions. One colleague of mine told me once that Rust has at least two years of governance debt, and given that they said that two years ago, at this point we probably have at least four years of governance debt!
While we figure these things out, I know there are a few things I can say about myself and values, and how I hope I can bring these to the Leadership Council. Keep in mind that these are my own opinions. I'm not speaking for the Leadership Council or the Compiler Team, so the priorities I suggest here will evolve over time.
]]>We've had a couple of ideas going around so far. One of the main ones is Return Type Notation (RTN), which Niko describes in his recent post. In my last post, I suggested that we could infer the necessary bounds in many cases.
While I was excited about inferring bounds at first, one major shortcoming is that it creates new semantic versioning hazards. The inference depends on the body of the the function you've annotated, which means when modifying the function you could easily add or remove bounds from the signature by accident.
In the discussions we've had since then, we have been converging on a solution that we expect will work in the common cases, but avoids both the verbosity inherent in RTN and the semver hazards with inferring bounds. This is the solution I'll be describing in this post.
One of the things I've realized is that there are two variants of the Send Bound Problem. I'll call these the Promise and Require variants.
The Promise variant is "how can I promise that my async function will always return a Send
future?"
There are several subvariants.
We may want to define an async trait so that all implementors must always have Send
implementations.
This is what the #[async_trait]
macro does by default.
Or, even if the trait does not require it, we may want to make this promise in out implementation.
And finally, for just a bare async fn
, we may want to be able to make the same promise.
The Require variant is "how can I require that the implementation I'm given can be used in a Send
context?"
This is looking at the use side rather than the definition side.
Let's recall the do_health_check
example:
fn do_health_check<H>(mut health_check: H, server: Server) where H: HealthCheck + Send + 'static { spawn(async move { health_check.check(server).await; }); }
We want to make sure that, although the HealthCheck
trait does not require that implementations return Send
futures, we only can call do_health_check
with those that do so that we can spawn the background task.
My feeling so far has been that the Promise variant is the easier one to solve, so it is easy to accidentally start talking about that one, while the Require variant is the more important problem. In this post, I will only be talking about the Require variant, although I suspect the proposal may generalize to the Promise variant and I may speculate on that.
Without further ado, here is the proposal.
First, we require async traits to be declared as such.
This means the HealthCheck
trait we've talking about gains an additional async
keyword:
trait async HealthCheck { async fn check(&mut self, server: Server); }
For the most part, we can think of this new async
as becoming part of the name of the trait.
It's no longer just HealthCheck
, but async HealthCheck
.
Declaring a trait with async
means the trait is allowed to have async methods.1
Because we've changed the name of HealthCheck
, we have to change where we use the trait as well:
fn do_health_check<H>(mut health_check: H, server: Server) where H: async HealthCheck + Send + 'static { spawn(async move { health_check.check(server).await; }); }
This new async
keyword in the where clause does a couple of things.
First, it's a hint that the trait has async
methods.
More importantly, it gives us a place to hang additional bounds if needed.
Because we are spawning a future that awaits calls from this trait, we need a Send
bound.2
So, to notate this, we'd use async(Send)
in the bound:
fn do_health_check<H>(mut health_check: H, server: Server) where H: async(Send) HealthCheck + Send + 'static { spawn(async move { health_check.check(server).await; }); }
The trait name async(Send) HealthCheck
would mean the HealthCheck
trait, with async methods, all of which have a Send
bound on their returned futures.
So that's the proposal in a nutshell.
One thing I'd like to point out is that although so far we've only talked about Send
bounds, I'm imagining that the grammar would allow any bound on the async
keyword (although it might make sense to limit it to auto traits).
For example, one could imagine writing:
fn foo<T>(x: T) where T: async(Send + Clone + Debug) MyTrait { ... }
In practice, it's probably hard to implement Debug
on the future returned by an async method...3
There's a lot I like about this proposal.
It's relatively lightweight syntactically, but we assume it's powerful enough to meet the common cases.
To be honest, we don't actually know how common it will be that users want to have some methods that are Send
but some that are not.
The fact that #[async_trait]
works well suggests that the all or nothing approach should be fine in many cases.
If there are cases that users need to be more precise, however, we can still provide return type notation for those advanced use cases.
The semantics of these new bounds seems easy to explain. We don't have to talk about looking at function bodies and we definitely don't have to mention anything about flow-sensitivity, while we might if we did something that relied on more inference. This helps keep Rust explicit and predictable as a language, without being burdensome.
This proposal also dovetails nicely with several others that are currently in progress, and is immediately open to more generalizations.
The syntax we've introduced here actually largely comes from the keyword generics initiative.
Although we have not talked about maybe-async bounds (written ?async
), the syntax is completely consistent with what we've seen here.
I know I said I wasn't going to talk about the Promise variant of the Send
bounds problem, but it's a relatively small change from what we have here to also allow trait modifiers anywhere else the async
keyword is allowed.
For example, we could write the following to declare an async
function whose returned future is guaranteed to be Send
:
async(Send) fn foo() -> i32 { ... }
That said, I think there may be better ways to solve this problem,4 so I don't want to dwell to much on this just yet.
We still have a few open questions though. I'll briefly touch on these here.
Should methods should be Send
by default?
Either way is feasible.
For example, methods could be Send
by default and you could use async(?Send)
to opt out, or they could be not assumed to be Send
by default and you use async(Send)
to opt in.
There are arguments for both, but fortunately it's a relatively minor detail and is easy to go either way.
How does this interact with supertraits and trait aliases? We have some time to figure this out for aliases, since trait aliases aren't a thing yet. For supertraits, we can probably start with a more conservative option and relax it later if needed.
Are non-async and async traits in the same namespace?
This question gets at whether you could define both trait Foo { ... }
and trait async Foo { ... }
in the same module.
While I won't do so in this post, this has enough implications that it's probably worth spending some time on.
For example, if we get this one wrong, we might end up in a situation where users have to write async AsyncFoo
, which would just be sad.
So anyway, that's the proposal. I want to give a big shout out to everyone in the Async Working Group, and Yosh Wuyts in particular since he largely came up with the final syntax presented here in conjunction with his work on keyword generics. Also, thanks to Nick Cameron for his early feedback on this post. The proposal presented in this post incorporates a lot of ideas from many different people, and it's really great to see everyone's input coming together towards a solution we can be happy about. We seem to be at a point where we've struck a nice balance for ergonomics, utility, and predictability. Of course, the best way to know for sure is to prototype something and play around with it!
I'm excited to see progress in this area and am eager to see async functions in traits become fully supported in Rust!
Technically, you wouldn't be required to have async methods, but we'd probably want to add a lint warning about unnecessary async
keywords, just like we do for mut
.β©
This is assuming we're using a multithreaded executor like Tokio. The spawn
function from single threaded executors, like many embedded async runtimes provide, would likely not require a Send
bound.β©
This potentially opens up some really powerful features though. For example, one could imagine futures that implement serde::Serialize
and serde::Deserialize
to make futures that can move not just between threads but between nodes in a cluster, or web frameworks where you can await input from the client.β©
One example suggested by Josh Triplett is if you could explicitly refer to the return type in where clauses. Then you could say async fn foo() -> i32 where return: Send { ... }
. This makes the scoping a little clearer around parameters (for example, in async(Send + 'a) fn foo<'a>() -> i32 { ... }
, it'd be weird to refer to 'a
before it's declared), but it also is less clear as to whether we are saying i32
is Send
or that the hidden future that async fn
desugars to is Send
.β©
We've had a couple of ideas going around so far. One of the main ones is Return Type Notation (RTN), which Niko describes in his recent post. In my last post, I suggested that we could infer the necessary bounds in many cases.
While I was excited about inferring bounds at first, one major shortcoming is that it creates new semantic versioning hazards. The inference depends on the body of the the function you've annotated, which means when modifying the function you could easily add or remove bounds from the signature by accident.
In the discussions we've had since then, we have been converging on a solution that we expect will work in the common cases, but avoids both the verbosity inherent in RTN and the semver hazards with inferring bounds. This is the solution I'll be describing in this post.
]]>Send
bounds to futures returned by async methods.
Niko Matsakis has been writing on this subject recently, so if you haven't seen his posts, definitely check them out!
His first post outlines the problem, while the second post introduces a possible solution: Return Type Notation (RTN).
I'm going to write this post assuming you're familiar with those.
I'm mostly a fan of the proposed return type notation.
It's a very powerful feature that gives a solution to the Send
bound problem but is also generally useful in other cases.
There are some significant shortcomings though, so I don't think it should be the only or even primary solution people reach for to solve the Send
bound question.
In this post I'd like to explore some more implicit approaches that address the issue while using much lighter syntax.
As Niko points out in his conclusion, this notation is not without its drawbacks.
In particular, it can easily become quite verbose.
Consider a common trait like Read
.
It has twelve methods!
Presumably once we have AsyncRead
or similar, that trait will also be widely used and have a similar number of methods.
As a user this isn't a problem, and as a trait implementer it's not bad because only one of the methods is required---the rest have default implementations.
But consider if we wanted to write something like:
fn read_file_on_other_thread<R>(reader: R) where R: Read, { ... }
To add all the Send
bounds on an async version using RTN, we'd end up with something like this:
fn read_file_on_other_thread<R>(reader: R) where R: AsyncRead, R::read(..): Send, R::read_vector(..): Send, R::read_to_end(..): Send, R::read_to_string(..): Send, R::read_exact(..): Send, R::read_buf(..): Send, R::read_buf_exact(..): Send, R::by_ref(..): Send, R::bytes(..): Send, R::chain(..): Send, R::take(..): Send, { ... }
Now, I'm perhaps being a little unfair.
You don't have to list all the methods, only the ones you use.
I don't think I've ever used all twelve methods.
Usually I just use read_to_string
or maybe read_to_end
.
But these methods exist, and someone uses them---perhaps code in some high performance library that I've pulled in.
These bounds also end up being viral. I have to add the bounds needed by any function I call, or any function the callee might call, and so on. Trait aliases can help, but those aren't currently implemented and it'd be nice if we could solve the issue without relying on them.
While that verbosity can be useful, I suspect those will be somewhat niche and advanced use cases. If would be nice to be able to do something lighter weight that works in the common cases, while still keeping the more precise options when needed.
Perhaps one approach that will be fruitful is to take inspiration from the way auto traits are inferred.
Auto traits are traits where Rust automatically generates an implementation for you in most cases.
The Send
trait is perhaps the most common example.
While you can implement Send
yourself, in most cases the compiler implements it for you if it can.
For structs, enums, and similar types the rules are pretty simple.
They implement an auto trait if all their constituent fields also implement the trait.
For closures, things are a little more subtle.
Closures implement an auto trait if all of the things the closure captures also implement the auto trait.1
Async blocks are trickier still.
They work like closures in that you have to consider what they capture.
However, because async blocks can suspend at an await
point, we must also consider all of the things that are live across an await
point.2
We find these values in the generator interior analysis step in the compiler (because async functions and blocks desugar into generators).
Anyway, the point of this little detail is that we already have an analysis pass in the compiler that could be helpful for inferring Send
bounds.
async fn
πI was talking with Nick Cameron about this last week and we came up with an idea that I'd like to describe and flesh out here.
The idea is that you would somehow annotate an async function to indicate you want to guarantee it returns a Send
future and then the compiler infers whatever bounds are necessary to make this happen.
Let's look at an example inspired by Niko's post.
async fn do_health_check<H>(health_check: H, server: Server) where H: HealthCheck + Send + 'static, { info!("doing health check"); health_check.check(server).await }
This example is admittedly kind of pointless because you could just call .check
directly rather than calling do_health_check
.
I added the info!
line to make this seem a little more plausible, because maybe you want to make sure you have uniform logging.
Still, I admit it's contrived.
Anyway, suppose we wanted to ensure no matter what type you use for H
, the future returned by do_health_check
was Send
?
With this proposal, we'd write something like this:3
async<Send> fn do_health_check<H>(health_check: H, server: Server) where H: HealthCheck + Send + 'static, { info!("doing health check"); health_check.check(server).await }
The compiler already has to infer whether do_health_check
is Send
.
Since Send
is an auto trait, the compiler looks at what's live across an await point (in this case the future returned by check(..)
since that's the thing we're awaiting) and decides do_health_check
returns a Send
future depending on whether check(..)
does or not.
For this proposal, instead of just checking the whether the result future is Send
, the compiler would add any bounds necessary to make it so that do_health_check
is always Send
.
In our example, it would be as if you had written:
async fn do_health_check<H>(health_check: H, server: Server) where H: HealthCheck + Send + 'static, <H as HealthCheck>::check(..): Send, { info!("doing health check"); health_check.check(server).await }
This is exactly what we wanted!
Partially. I was really excited by this idea at first but in thinking about it to write this post I've realized at best it solves a very tiny piece of the issue. This is kind of apparent in how contrived the example was that I introduced the feature with. But in looking at its shortcomings, maybe we can come up with something that would work better.
The biggest issue is it only works for cases where you await
something, but as we saw in Niko's post, often times we care that a future is Send
even if we don't await it.
Let's try and fill in a possible body for the example in Niko's post:
fn start_health_check<H>(health_check: H, server: Server) where H: HealthCheck + Send + 'static, { let task = async move { health_check.check(server).await; }; spawn(task); }
Here spawn
would have a signature like fn spawn(task: impl Future<Output = ()> + Send + 'static)
.
The problem is, start_health_check
is where we need to add the bounds, but it is not async
, so we can't just change it to async<Send>
.
We could try using out do_health_check
function with the async<Send>
annotation, and we'd end up with something like:
fn start_health_check<H>(health_check: H, server: Server) where H: HealthCheck + Send + 'static, { let task = do_health_check(health_check, server); spawn(task); }
We'd probably get a different error, but it's essentially the same.
We still have no way to guarantee that any H
will meet the requirements of this function.
But maybe this can still help.
A lot of futures that need to be Send
will be composed of lots of calls to other futures.
These all will need to be Send
as well.
Adding async<Send>
to those inner futures can still save a lot of boilerplate.
We might still have to be more explicit at the boundary where a task is spawned, but we can avoid restating the bounds explicitly throughout the whole call tree.
In the Read
example, this can be significant.
Furthermore, looking at some internal async code we have at work, it looks like we already have code that would benefit from exactly this async<Send>
inference proposal.
Another thing I like about it is that it allows us to partially avoid a semver hazard.
Auto traits leak into public types, and because of the way inference works for async functions, we have a case where we could in one release have a function that always returns Send
futures and in some later release due to a subtle change in the body of an async function it is no longer Send
.
What the async<Send>
syntax proposed here does is allows to you explicitly promise that a function will always return Send
futures.
Of course, that Send
ness may still depend on what methods on a trait the function uses.
For example, maybe I switch from calling Read::read_to_string
and instead call Read::read_buf
.
Still, this will catch some accidental semver breakages and the ones that remain are more likely to be within the caller's power to fix.
So let's go back to the issue of start_health_check
.
What if we could carry this idea of lifting bounds out of the body even further?
It might look something like this:
#[infer_bounds(Send)] fn start_health_check<H>(health_check: H, server: Server) where H: HealthCheck + Send + 'static, { let task = do_health_check(health_check, server); spawn(task); }
This is just strawman syntax.
We may want to stick the annotation on H
instead, or not use attributes and use real syntax instead.
There's room to experiment with syntax, but the important thing is that somehow we opt in to a new inference behavior that I'll describe here.
For the sake of argument I'm going to assume do_health_check
is still written as async<Send>
as described above, but I think these two proposals can stand alone.
Without the #[infer_bounds(Send)]
attribute (i.e. Rust's current behavior), we can't compile start_health_check
because we get a trait error.
The type checker / trait system treat type parameters parametrically, meaning we can only use facts about H
that are stated in the where clauses.
There are currently no facts that say the future returned by H::check
is Send
, which the compiler will inform us of in the error report.
Using RTN, we can add any extra where clauses needed to make this compile.
With #[infer_bounds]
, I'm proposing to let the compiler add these clauses for us.
We'll probably need to define a clear set of rules, but the gist of it is that if any missing bound would give us a type error, we instead add an implicit bound to the function.
In this case, because the do_health_check
requires H::check(..): Send
, we'd add this bound to start_health_check
as well.
I'm not sure. One issue is that I think this could become a global inference problem, which for the most part Rust avoids doing. For example, consider the following:
#[infer_bounds(Send)] fn foo(x: impl MyAsyncFuture) { spawn(x.some_async_fn()); } #[infer_bounds(Send)] fn bar(x: impl MyAsyncFuture) { foo(x); }
In this short program, we need to figure out foo
's inferred where clauses before we can figure out bar
's.
This means we can't check each function independently.
It's even tougher if we end up with a recursive cycle.
This might not be a problem though. Auto traits are coinductive, which seems like is the case to solve exactly these kinds of problems. I'll need to learn more about the trait solver to know if this is a problem and if it's solvable.
Of course, another downside is that this introduces additional semver hazards. On the bright side, they are opt in, and arguably not much worse than the existing auto trait leakage hazards. Still, rather than creating new semver hazards using existing ones as precedent, it'd be better to go the other direction and remove old hazards across an edition because we made being explicit about the requirements ergonomic enough that we don't need auto traits on async functions. I don't think this proposal does this, but I'm hopeful that maybe this will spark some ideas that will lead to something much better.
This post has been a bit of a roller coaster to write.
I started out thinking we had come up with a really tidy solution to most of our problems.
Then there was a moment of despair where it seemed like maybe the idea wasn't workable or didn't solve a useful part of the problem.
Now that I'm at the conclusion, I'm cautiously optimistic.
I think between async<Send>
and #[infer_bounds(Send)]
we might have something that's ergonomic that may cover the most common cases.
Let's sum up by looking at some pros and cons of the two proposals together.
Pros:
Sync
Send
ness and other auto traits explicit in the type signatureCons:
My main conclusion is that we should be thinking more about approaches based on implicit or inferred bounds though.
I between RTN and implicit bounds there's a nice symmetry with functions that return -> impl Future
and async functions.
In the cases where one wants more control, we provide a more verbose, explicit way of doing it.
On the other hand, if that's not needed, there's a simpler, more concise notation that lets the compiler handle a lot of details for you.
I think this is a promising direction, and I look forward to hearing how other's can improve on the idea!
You can think of a closure as a struct that holds all of the captured fields along with an impl of one of the function traits like FnOnce
and FnMut
, so in that sense the rules for structs and closures are the same.β©
While not completely accurate, I think of async blocks and generators as being like an enum where there are is a variant for the beginning of the function, each await point, and the end. The fields of each variant then store the captures and anything live across the corresponding await point. So in this sense, the rules for auto traits on async blocks are similar to those for enums.β©
There are lots of other syntaxes we could use, like async(Send)
or an attribute like #[require(Send)]
. The point is to illustrate the idea and we can bikeshed on syntax if we decide this is worth pursuing.β©
Send
bounds to futures returned by async methods.
Niko Matsakis has been writing on this subject recently, so if you haven't seen his posts, definitely check them out!
His first post outlines the problem, while the second post introduces a possible solution: Return Type Notation (RTN).
I'm going to write this post assuming you're familiar with those.
I'm mostly a fan of the proposed return type notation.
It's a very powerful feature that gives a solution to the Send
bound problem but is also generally useful in other cases.
There are some significant shortcomings though, so I don't think it should be the only or even primary solution people reach for to solve the Send
bound question.
In this post I'd like to explore some more implicit approaches that address the issue while using much lighter syntax.
]]>