Lately there has been a lot of discussion about async cancellation in Rust. One thing I've noticed in these discussions is that cancellation often means different things to people. As a result, what is a concern for one person may not be a concern for another and it may not even be possible for one person to articulate the other's concern. I think perhaps the reason for this is that async cancellation has to be discussed from a certain perspective, and this perspective is often implied but not explicitly stated or shared. When looking from different perspectives, async cancellation can have different meanings, capabilities, and implications.

In this post, I'd like to make an attempt at cataloging and categorizing the various perspectives we might use when discussing async cancellation. I don't expect this to be the last word, but I hope it will be useful in clarifying the issues surrounding cancellation and serve as a starting point for more precise classifications.

What is Cancellation?πŸ”—

I don't know if this was intentional, but cancellation in Rust largely exists as a consequence of the fact that futures must be externally driven by calling the poll method. Whereas in most languages with async and cancellation, cancellation requires doing something such as calling .cancel() on the task, in Rust you cancel a future by simply not calling poll again.

Once it becomes clear that a future will never be called again, such as by passing the future to drop or simply letting it go out of scope, Rust ensures things get cleaned up by running the future's destructor, as long as the program does not call mem::forget or otherwise leak the value.

Cancellation PerspectivesπŸ”—

So now we can start to categorize different perspectives on cancellation. I propose four perspectives for looking at async cancellation:

  1. From the canceled future
  2. From the parent future
  3. From the runtime
  4. From and between tasks

Let's dive into each of these in more detail.

From the Canceled FutureπŸ”—

Looking at cancellation from this perspective means "what happens if you are the future being canceled?" Let's consider the following code as an example:

async fn cancel_me() {
    println!("1");
    some_other_async_fn().await;
    println!("2");
}

In a lot of ways, this is the most obvious example of cancellation. Cancellation can happen at .await points, since these are places where the the async function's future's poll method will return, or before execution even starts since the future may be dropped before it is polled. If this future were canceled, we'd observe that "1" was printed, but "2" was not. The cancel_me function would not continue executing past the call to some_other_async_fn().

There are a few other things we could substitute for some_other_async_fn().await; that would have similar behavior. Maybe some_other_async_fn panics. Or maybe instead of or in addition to async/await, we are using Result and ?. What these examples all have in common is that execution may not continue after the line of code that calls some_other_async_fn.

One thing we need to be careful about when writing cancel_me is cleaning up and keeping your program in a sane state in the face of cancellation. Rust APIs are generally written in a way that helps here. For example, mutexes don't have to be manually unlocked because MutexGuard does this in its Drop implementation. Still, not all programs use this, and not all invariants can be conveniently encoded into data structures and types.

Manually Written FuturesπŸ”—

Manually written futures are those that do not come from an async fn or async block but instead come from providing an impl Future for T. Essentially all async behavior is rooted in a manually implemented future, since the compiler will not create a return Pending on its own. It's worth considering what cancellation looks like in this case too. Let's start with an example of a very simple manually implemented future:

enum State {
    A, B, C
}

impl Future for State {
    type Output = ();

    fn poll(mut self: Pin<&mut Self>, _cx: &mut Context) -> Poll<Self::Output> {
        match *self {
            State::A => {
                *self = State::B;
                Poll::Pending
            },
            State::B => {
                *self = State::C;
                Poll::Pending
            },
            State::C => {
                Poll::Ready(())
            },
        }
    }
}

This shows a state machine that starts in state A, then moves to B, then finally ends in C. It has a few problems, most notably that it doesn't store a waker anywhere before returning Poll::Pending, meaning there's no way to tell the runtime to run the future again after it returns.

Leaving that aside, the key point from a cancellation perspective is that there are no .awaits in the code. In fact, poll is a synchronous function. Thus, cancellation in a manually written future only happens when poll is not called again, or at all.

From a Parent FutureπŸ”—

This perspective looks at a future that may choose to drop a child future, thereby canceling it. A somewhat contrived case is:

async fn parent() {
    let cancel_future = cancel_me();
    let do_not_cancel_future = do_not_cancel_me();
    drop(cancel_future);
    do_not_cancel_future.await;
}

This example is arguably not cancellation, since cancel_future never gets polled at all, but you did call cancel_me and the function never actually got the chance to execute, which does look a little like cancellation.

A less contrived example is:

async fn parent2() {
    cancel_me()
        .race(or_maybe_cancel_me())
        .await;
}

Here the .race method composes two futures, runs them concurrently (by interleaving calls to their poll methods), and returns the value of the future that completed first, dropping (or cancelling) the other.

One thing to keep in mind in both of these examples is that since we are in the context of an async function, parent and parent2 may themselves both be canceled. This means this perspective also inherits all the issues of the previous section.

From the RuntimeπŸ”—

Another place we can look at cancellation from is from the perspective of the runtime. In some ways this is similar to our previous section, in that runtimes keep track of and run many different futures at the same time. But there are some differences. The runtime serves as the boundary between sync and async code---the mayonnaise of the async sandwich, if you will. Runtimes present synchronous interfaces to create the runtime, spawn some number of tasks, and begin execution. On the other side, the runtime host execution of async functions, provides for making progress by polling futures, and normally provides synchronization primitives to futures.

Because the runtime is responsible for starting off the chain of calls to poll, it seems to be the most obvious place to think about initiating cancellation. At a high level, runtimes do something like this:

1. Get next ready task, or wait if none is ready
2. Poll the ready task
3. Repeat

Cancellation shows up as a small change to step 2:

1. Get next ready task, or wait if none is ready
2. If should_cancel
   Then drop the ready task
   Else poll the ready task
3. Repeat

One key point about this perspective of cancellation is that the runtime is all synchronous code1. A lot of the complications that come up with async cancellation are not something that affect the runtime, even though the runtime ends up initiating at least some of the cancellations. The runtime also has a lot of visibility into the surrounding metadata for a future, but not a lot of visibility into the future itself.

Another difference between the runtime's view and the view from an async function is that in an async function, you don't directly poll a future. Instead, you await the future, which under the covers tries to poll the future to completion. Because the runtime directly interacts with poll, there is no requirement to poll the future to completion. This applies to manually implemented futures as well.2

Cancellation and TasksπŸ”—

We sneakily started talking about tasks in the previous section, but tasks raise some issues that are worth calling out explicitly. One thing worth calling out is that tasks and futures in Rust are related but different. Rust has no first class concept of tasks, only futures. Tasks instead are concepts that are provided by the runtime to manage the state of various futures in progress.

Tasks are where things start to truly feel asynchronous. Tasks are often run on multiple threads, but even if they don't, the runtime interleaves between them to create cooperative multitasking. This means other aspects of cancellation become meaningful. For example, we might have one task request the cancellation of another task. Or it may look to a task like cancellation has come from an outside source. This outside source may feel surprising, since unlike functions returning Result, it's not as clear from the types that failure could occur here.3 In all our cases thus far, you are either canceling another future directly by dropping it, or you are the future being canceled, in which case you don't really get the chance to observe the cancellation.

Whether tasks are cancellable is up to the runtime, but runtimes that support cancellable tasks might provide an API sort of like this:

async fn spawn_and_cancel() {
    let task = spawn(async { ... });
    task.cancel();
    match task.join().await {
        Completed(return_value) => ...,
        Error(err) => ...,
        Canceled(reason) => ...,
    }
}

The idea here is that spawning a task gives you a handle that can be used to interact with the task, such as by canceling it or joining it. In our example, we request cancellation immediately after spawning the task by calling task.cancel().4 We then call task.join().await, which blocks the current future until task has completed. We then inspect the return value with gives us more information about how and why the task completed. In this hypothetical task API, there are several reasons the task might complete, such as running to completion (the Completed case above), panicking (the Error case above), or being canceled (the Cancelled case above). Of course, being able to distinguish each of these cases may not actually be that useful in practice.

ConclusionπŸ”—

We've just finished looking at several different perspectives on async cancellation in Rust. We've looked at cancellation from the perspective of the future being canceled, from a future managing other futures, from the async runtime, and within the context of multiple tasks. In a future post, I'd like to use these perspectives to explore what async drop means and what it might mean to catch a cancellation.

I and (I assume) the rest of the async working group definitely don't have all the answers here or have it all figured out yet. Chances are your perspective on async cancellation is different than mine. If so, let me know! Hearing from others about how they use async and how they understand these things can help us build something that will be sound and a joy to use.

Thanks to Yosh Wuyts and Nick Cameron for their feedback on an earlier version of this post.


1

I suppose you could write an async async runtime, but I'm not really sure why you'd do that...↩

2

Manually written futures, especially those like race and join that let you compose futures, are sort of like mini runtimes. You might even call them an async async runtime...↩

3

The same surprise exists with panic!, except even more because at least with cancellation you can see the .await in the code.↩

4

This cancellation API is kind of weird in that it is a synchronous function that simply asks the runtime to cancel the task but does not wait for the cancellation to complete. For runtimes such as async-std, cancel has the signature async fn cancel(self), which consumes the task and blocks until the task is complete. If we wanted to have the request-but-do-not-wait-for-completion semantics in this post's example, more likely what we'd want to do is have something like task.get_cancellation_handle() which returns a Send value that can be used to request cancellation from another task, and even then we'd want cancel to be async since actually requesting the cancellation will probably require synchronizing on some shared resources.↩