Eric Holk

Two Ways Not to Move

2024-07-15T00:00:00+00:00

Lately I've been talking to a few people about whether it might be possible to replace the Pin wrapper in Rust with a new Move trait. Pin is one of those things that, while amazing that Rust can express the concept and semantics using just a library type, is not a lot of fun to use. This creates friction in building out Rust's async features because currently doing so increases the opportunities for users to be exposed to Pin. Maybe we can just get rid of Pin entirely?

One possible way to do this is with a trait called Move. This would be a trait that most types implement and indicates that you have the ability to perform a move operation on a value of that type. It's sort of like the Send trait, where when a type implements Send it means values of those types can be sent to another thread.

This trait was considered somewhere in the Rust 2015 and 2018 timeframe but was rejected in favor of Pin instead. The main reason is that Pin could be added in a way that meets Rust's stability guarantees. Indeed, how to add Move now is a huge problem that we're going to entirely ignore in this post.

I recently discovered that the version of Move that was originally considered worked quite different from how I would have expected. As a result, we accidentally have two Move designs. In this post, I want to briefly describe each one and then discuss some of the tradeoffs they make.

The Original `Move`🔗

Originally Move was a marker trait that meant values of a type that implements Move can be moved even after taking that value's address. That definition applies to basically all types in Rust now, so the more interesting thing to think about is what can you do with a value of a type that does not implement Move?

You end up with two phases to a value's lifetime. There's some period in which you can move a value around, but then when you take its address the value can no longer be moved.

Futures in Rust work this way now. When you create a future, you can move it around, send it to another thread (as long as it implements Send), put it in a data structure, etc. But in order to call poll, you first have to pin the future, and once you do that no longer move the future.

If we had the Move trait instead, the Future trait would look like this:

trait Future {
    type Output;
    fn poll(&mut self, cx: &mut Context) -> Poll<Self::Output>;
}

Notice how Pin has just gone away?

When you call the poll method, doing something like f.poll(cx), Rust automatically borrows (i.e. takes the address of) f for you and passes that as the self argument to poll. If the future does not implement Move, the compiler would notice that you've taken the address of f and prevent you from moving f after that point.

When I first heard this I thought the analysis to make this work would surely be infeasible. It turns out it's actually not too hard and can be done entirely with local reasoning. For example, if you've received a reference to some !Move type, you know it's pinned because someone had to take the address to give you a reference. So you basically just need to track where the first place you borrow some owned type happens. This lines up neatly with the kind of analysis the compiler is already doing in borrow checking.

So why don't we have this version of `Move`?🔗

The main reason is that it's not backwards compatible. Let's look at one of many examples of the way it's not backwards compatible. Functions in Rust implement one or more of the various Fn* traits. Each of these have an Output parameter that represents the return value. For !Move types, we want to be able to return them from a function, such as to make a constructor function called new. So this means we need to add ?Move as a bound to the Output parameter of the Fn* traits. But now, any type you want to have the existing behavior, where return values are movable, you'd have to add a Output: Move bound to your type signature.

This version of Move also struggles with projections.¹ Consider the following code:

struct Slot?Move> {
    slot: Option
}

impl?Move> Slot {
    fn fill_slot(&mut self, value: T) {
        self.slot = Some(value);
    }

    fn take(&mut self) -> T {
        self.slot.take().unwrap()
    }
}

Is take legal? Strictly no, because to call take you've exposed the address of self, which technically means you've exposed the address of all the fields of self, meaning slot would no longer be moveable. This particular case happens to be safe since we can see that there's no way to actually observe the address of slot. But we've now made this an interprocedural analysis instead of a local analysis. Consider if we add the following method:

fn poll_inner(&mut self, cx: &mut Context) -> Option>
where
    T: Future
{
    self.map(|f| f.poll(cx))
}

Now knowing whether take is safe depends on whether poll_inner has been called.

This particular case is probably not insurmountable. We'd probably just go with our initial conservative observation that take exposes the address of slot and therefore slot is no longer moveable. But this also greatly limits the flexibility of the Move trait.

The New `Move`🔗

When I first started thinking about a Move trait, I assumed its definition was that a value of a certain type could be moved if and only if the type implements Move.² In particular, you'd need a Move implementation to do any of the following things:

Pass a value to a function
Return a value from a function³
Use mem::swap, mem::replace, mem::take, or similar functions

Many things about this version of the trait just work in surprisingly pleasing ways. For example, pin projection just turns into normal projection. You don't need any annotations to say which fields can be pinned or not because this information is carried by whether the type of the field implements Move. Similarly, there's no ambiguity in the Slot example before; in order to write take you must have a Move bound on T.

The downside is that this doesn't naturally support the patterns where a type may move around for a while until at some point it's pinned and then remains pinned.

Instead we have to rely on API design and use one type to encapsulate the unpinned phase of a value's lifetime, and then use another type to encapsulate the pinned phase. For Future, we already have this, with the IntoFuture trait being able to represent the unpinned phase and the Future trait representing the pinned phase.

How you convert from one to the other is hard though, because you can't return a !Move type from a function. In stable Rust today, you could work around this using MaybeUninit but long term we'd need to design some kind of placement new or emplace feature.

What about backwards compatibility? This formulation doesn't have the issue of needing to add ?Move to the Fn*::Output types. In fact, there aren't many places where you'd need to add the bound. The reason is that if you aren't moving something, you are already taking a reference to it. Reference types like &T and &mut T implement Move even if T does not. So the only place where it really makes sense to add a ?Move bound is if you are already taking a generic parameter by reference and don't need to move out of it.

The bigger backwards compatibility question is how you'd make Move interoperate well with the existing Pin wrapper and the functions that use it. I haven't thought much about this problem, so that will have to wait for another post.

Thanks to Boats for pointing this out to me.↩

This is also the definition that Yosh Wuyts was working from in his post Ergonomic Self-Referential Types for Rust. Also, thanks for Yosh for helping me work through the ideas in this section.↩

It's interesting that in this formulation we also cannot return a !Move type from a function, but in this case it's a feature rather than an insurmountable obstacle.↩

Some Rough Thoughts on Rust Project Organization

2024-05-06T00:00:00+00:00

One of the big priorities for the Rust Leadership Council has been to determine the "shape of Rust." For a long time I've wanted to write a comprehensive blog post about what I think the shape of Rust should look like. Unfortunately, I've had too many competing priorities to write something fully comprehensive. This post is not that post, but I did want to try to put down a couple ideas I've been thinking about recently.

I want to touch on three things. These aren't necessarily in priority order or anything, but they are all things that I think would be useful for various reasons. The three things in this post are:

Create structures for support roles
Create a Design Team
Separate Project Membership from Team Membership

Create structures for support roles🔗

While I've listed this as one major idea, I think there are at least two ideas that could fit under this heading:

Create a Council Staff Team
Provide Project-wide support

But first, let's talk about support. This area has been a bit of a revelation for me. When talking about how to spend Rust's budget, people regularly mention hiring support staff. I haven't been sure what we'd do with this staff, but I was curious because a lot of people I trust seemed to think it was a good idea. Throughout my career, I often haven't had an amazing experience with roles like project managers, program managers, technical program managers, etc. I imagine a lot of this was me not understanding the value they brought so it felt like they were just asking me to fill out TPS reports and show up at meetings to talk about all the things we haven't done since the last meeting because we were in meetings. To the excellent people in this roles I've worked with before, I apologize for undervaluing you!

When people in the Rust project talk about support, there seem to be envisioning a few different things, such as:

Project/program management: Keep track of project status, open issues, make sure people are working on them, etc.
Engineering support: This role would work on things like triagebot and other software tools to help the Rust project work more efficiently. One thing I'd personally love here is a way to set personal limits on my review queue, like "I'm willing to review two PRs per week."
Secretary or Executive Assistant: Keep minutes for meetings, coordinate scheduling as needed, keep track of upcoming or recurring deadlines, etc.

While my intention here is to express that these things are valuable, I'm sure I'm selling each of these roles short. Unfortunately, this is often the kind of work that is more visible when it goes poorly than when it goes well. I'm sure I'm missing a lot of things that support staff provide because they do it so well no one notices.

Anyway, let's dive into these two approaches to support roles.

Create a Council Staff Team🔗

RFC 3392, which created the Rust Leadership Council, says repeatedly that the Council should mostly delegate rather than do the work themselves. In my opinion, this is something that we on the Council have not done an amazing job of. I also commonly hear people not on the Council say "I wish you would delegate more."

A big part of the reason why this is the case is that each time we delegate we have to decide who to delegate to. Many times this means creating a team, or subcommittee as we've tended to call small project groups doing work on behalf of the Council. It turns out, creating a subcommittee is a lot of work. So we're often faced with a choice between doing a lot of work to find volunteers for a new subcommittee and then waiting for them to do the work, or just doing the work ourselves. Because so far our subcommittees have been focused on a single project, we don't really get to reuse the work of building the subcommittee. I could imagine if the same subcommittee did several projects, then the tradeoff looks a lot more attractive because we can amortize that effort over future projects.

The idea of the Council Staff Team is basically to amortize the effort of creating subcommittees. We would find a set of people to initially fill the team and part of their charter would be to continue to find people to maintain and grow the team as needed. The Council Staff then becomes the default place for the Council to delegate to when there is no obviously better place already existing.

The analogy I've had in mind most while thinking about this is a congressperson's staff. At least in the United States (and I assume in many other countries), although the legislature officially has the power of writing and voting on the laws, the actual elected members do not write the thousands of pages of legislation they vote on (as far as I can tell they don't even read them). Instead, at least in theory, they tell their staff what priorities and goals are important to their constituents (or maybe their donors instead) and ask their staff to do research and design policies that they believe will achieve these goals.

I imagine the Council Staff Team could work similarly. As the Council, we would identify priorities and things that need to be solved for the project. These things are likely to be hard and things that we as representatives of the various Rust times likely lack the necessary background to solve well. On the other hand, the Council Staff team would ideally be filled with people who love researching and designing policies! So they would work with the Council to draft policies to meet the goals decided by the Council. These would then be presented back to the Council, and perhaps made into project-wide RFCs if needed, and we could decide to adopt the policy.

Provide Project-wide support🔗

I'm a bit less clear on this section, but several people have raised ideas in this space and I think it warrants some exploration.

Several people have cropped up across the Project who have taken on either support or project management roles. These people do things like curate the agenda for team meetings, facilitate meetings, keep track of what issues need triaged, what RFCs and decisions are in progress, and often do a lot of work unblocking progress on these things. Everyone who works on teams with one of these volunteers has given absolutely glowing feedback on how having someone to do this work has kept things in the project moving smoothly and have accelerate the progress. Seeing these success stories in various points across the Project is a major part of why I'm really excited about the value that support work provides!

I think every team should have someone who's keeping things moving smoothly like this.

The question is how.

It's tempting to make a new team, like T-operations or T-support. The people who have been doing this work already would become founding members of this team and hopefully the team would grow over time. We could even hire people for this role. Then team members would sort of roam around to other teams and provide their services.

I think there are some ways this approach may not work as well. It reminds me somewhat of corporate structures, where you typically have a dev organization and a PM organization. The PMs are assigned to a certain dev team, but because their have their own separate reporting chain, this can lead to the PMs' goals and the devs' not being aligned.

It's easy to focus on the org chart and say things like "Oh, we want more people doing project support work, so let's make a team." This may be the right answer, but I think there are some more fundamental goals that should drive this decision.

First, these support people need to be deeply embedded in the team they support. Some of the work looks somewhat standalone at first. For example, building a triage agenda can be done by pulling a couple GitHub labels and putting them in a HackMD. Indeed, we have several bots that do this. However, this work often requires much more context to be maintained, and this context is not always visible. This is especially true when it comes to decisions where there are competing proposals. One of the things the people doing support work that I've worked with in the Rust project do is compile lists of various positions and the major concerns and values driving those positions. They actively seek ways to find common ground and unblock things. In my experience, they know the nuances of the issues they are bringing for discussion better than anyone else. Doing this work requires close ties to the team.

Second, we need to make sure we have ways to sustain and grow this work. This is hard, unrelenting work, and I suspect prone to burnout. We want to make sure people doing support work are themselves supported and appreciated. Where it makes sense, this should include financial support. There are needs for more of this kind of work across the project, so we need to inspire more people to contribute in this way. This is one area where a centralized operations team could help, since the team would be able to mentor contributors doing this work. I think as a matter of project culture, we should make an effort to highlight this work that's being done (because it's easy to overlook), to celebrate it, and to appreciate those who do it.

Create a Design Team🔗

Another thing I'd like to see is the creation of T-design. The goal here is to encourage a unified design of Rust as a product.

I think the development process of Rust has really shone when it has been able to do codesign between the language and the standard library.

One early instance of this is when the Vec type became just a regular type in the standard library. Earlier, Vec was a primitive type built into the compiler. When you did an push operation, for example, the compiler would plop down some LLVM code right there to do the operation. As the language got more powerful, we realized we could move Vec to the standard library. This made Vec easier to implement and maintain, since it was just regular Rust code; safer, because it was written in Rust and automatically got all the safety guarantees Rust gives you; and faster, because... well, I'm not sure why we couldn't have done the same optimizations in LLVM but I recall the library version was faster.¹

Another instance is the addition of the ? operator. The Result type had been well-built in the standard library, but using it was still a little clunky. Adding the ? operator to the language worked hand-in-hand with Result that was already in the library to make for a much nicer experience.

I want to make sure Rust is able to always do this kind of codesign, and pulling the teams that own the design of Rust under one umbrella seems like an obvious way to do this. To start with, I'd bring T-lang and T-libs-api under the T-design umbrella. These are the design-oriented teams I happen to be most familiar with, but others have suggested to me that tools like Cargo have design needs that could also fit here.

Of course, this raises the question of how to operationalize this change.

One option is to merge T-lang and T-libs-api into one team, so we no longer talk about either team separately but only talk about T-design. I don't think this would be ideal. The teams exist now because they have different goals, needs, work, etc. They also have well-established ways of working that I wouldn't want to disrupt too much. The goal is to improve what's already good, not start over.

Instead, I imagine T-design being a very small container team that continues to delegate almost all of its purview to T-lang and T-libs-api.

But then, who is would be a member of T-design, and what powers would they retain? I would start with a small number of people drawn from T-lang and T-libs-api who have deep experience with the design of Rust and a long tenure in the project. They wouldn't be involved with the day-to-day design work; if they want to do that they would also be members of the subteams. But they would be responsible for making sure the Rust, both the language and standard library, feel like a unified whole and that all the parts support each other. In concrete terms, I imagine this happens through being able to raise blocking objections for either team if something violates the overall unity of Rust's design, and also through owning decisions in cases where the language and library elements of the design are particularly inseparable.

What about T-impl?🔗

When I first started floating this idea, I phrased it as building two top level teams: T-design and T-impl. T-design would be, as described in the previous section, made up of T-lang and T-libs-api, and would own the overall design of the language and libraries. T-lang would consist primarily of T-compiler and T-libs, and they would implement the design that is created by T-design.

There's a nice symmetry to this, but at this point, I think the motivation for T-impl is not as strong.

There's a lot of similarity in that, oversimplifying a lot, they both write lots of code. But from what I can see, this code does not really need to be developed together. Once the compiler implements new language features, the standard library can start to use them. If those features enable new library features, then the compiler can use those.

I guess one case where a T-impl team would make sense is if there were cross cutting concerns that needed to be managed in a centralized way. In corporate environments, for example, security concerns often have this shape.

Still, at this point in time I see less benefit to reorganizing the implementation-focused teams, while I do see benefits to bringing the design-focused teams closer together.

Separate Project Membership from Team Membership🔗

This last section is the one I'm by far the least confident in, but I want to suggest the idea anyway, discuss the goals underlying it, and other options that might work instead. I think I first heard this idea floated at the recent Rust Leads Summit.

Right now, Project membership is essentially the union of all the various Rust teams. I would suggest refactoring these a bit. Under this proposal, we'd have a set of people called the Rust Project Members. Team members would then be pulled from that set.

The first order goal here is to make it easy to recognize people who contribute heavily to the Rust project in various ways, often across teams, but may not have yet been invited to a maintainership role on a Team. I'm unclear on what would be the exact requirements or procedures for becoming a member, but it's roughly someone who is recognized by other members (perhaps Team members) as someone who contributes recognizably and significantly to the project.

Underlying this goal are a couple others.

One, the Foundation often needs to know who the Project Members are. Part of their mission is to support the set of maintainers governing and developing the project. In order to do this, they need to know who those people are. So far we've been using anyone on the all@rust-lang.org mailing list as our working definition, but this is not an ideal set. In particular, this mailing list does not include everyone who is developing the project and should be eligible for Foundation support. If we created a clear group of Rust Project Members, which includes all the maintainers (roughly Team Members now) and also others making significant contributions, then we have a much better set to give the Foundation when needed.

Part of why this matters is that the Foundation is looking at more ways to support Project Members. While nothing is set in stone yet, this could include things like helping to buy a headset for someone who has a less than ideal audio situation but wants to participate in team calls, or helping to cover travel costs for Project Members to attend conferences.

And this leads into my second motivation, which is to remove friction from supporting the Project. The Foundation has a lot of resources that it can use to support the project. One of my biggest concerns is that we'll spend too much time talking about how to best support the Project and then not spend any of the money or use any of the other resources we have. I'd rather make it easy to use the resources. We may not always use them in the very best way possible, but we can learn from these mistakes and be more effective going forward.

To this end, if we have some kind of support available to the Project, like hardware or travel support, I'd like to make the decision as much of a rubber stamp as possible. If a request comes in, I want to say "this support is available to Project Members? Are you a Project Member? Yes? Okay, here you go." The alternative is that we need to have some more in depth application and review process. This discourages people from applying in the first place, and adds some ambiguity to the application review process.

Of course, there are challenges. Having low friction like this means it's imperative that the Project is a high trust organization.

If we had a general set of Project Members, we need to make sure the process for including people in it are fair. We also need a regular review period to make sure the roster stays up to date. It can be awkward if someone needs to be removed. In effect, all the challenges I mentioned with reviewing applications have to instead be resolved at the point where someone wants to become a member of the Project.

And there are other ways to accomplish these goals as well. For example, more teams are adding T-*-contributors teams. In effect, what I'm proposing as the Project Members group would be the union of all the T-*-contributors teams.

And this proposal may not address the other real problems. For example, contributors do not always want to ask for membership, and teams can sometimes overlook adding someone who has been significantly contributing for some time. It might be better to directly address the questions and processes for team membership rather than creating a new organizational structure and doing it as part of that process.

Conclusion🔗

One of the tasks the Leadership Council was encouraged to do, and has decided to do, is to consider the boundaries and organization of the Rust Project. This is obviously a huge project.

In this post, I've shared some ideas in this space that I think are worthy of consideration. I'm sharing them here in part so that others can consider them, but also to spark discussion, brainstorming, and alternate proposals.

These have some pros that I hope I've laid out, but there are also open questions that I've also touched on. I'd love to hear other folks' ideas on how to effectively organize the Rust Project!

I didn't realize this at the time, but Graydon points out a downside of this decision is that it made it harder to give Rust a stable ABI.↩

A Rose By Any Other Name

2024-04-19T00:00:00+00:00

There are currently two competing designs for async iteration traits for Rust. The first is poll_next. The second is async fn next. I see strengths to each design. The poll_next design seems stronger on technical concerns, such as performance and ease of implementation. The async fn next design seems better from an ergonomics and consistency perspective. Unfortunately, the process of resolve this debate has been slow going.

One thing I've realized in the debate is that at this point the two designs are more similar than they may seem at first glance. In this post I'd like to show how a handful of minor tweaks to each design results in something that is the same modulo names.

A Concrete Performance Difference🔗

But first, we're going to go on a bit of a diversion about the performance of the two designs.

One of the early arguments in favor of poll_next was that it was more efficient. In most cases, we expect the compiler to be able to optimize away any of the overhead that async fn next might introduce, but it seems better, all else being equal, to pick the design that doesn't require as much work from the compiler. That said, we discovered one case where the compiler cannot optimize away the overhead of async fn next, and that has to do with the using a dyn AsyncIterator object.

To see the difference, let's consider pseudo-code for a for await loop for the two designs. We'll also desugar the .await calls so we can see where the calls to poll happen.

Let's consider a simple function that receives a dyn AsyncIterator and iterates over it:

async fn sum(it: &Boxi32>>) -> i32 {
    let mut total = 0;
    for await i in it {
        total += i;
    }
    total
}

Using the poll_next design, this desugars to something like follows. I've left out the context parameter to poll_next for simplicity.

async fn sum(it: Boxi32>>) -> i32 {
    let mut total = 0;
    let it = Box::into_pin(it);
    loop {
        match it.poll_next() {  // poll_next call is indirect
            Poll::Ready(Some(i)) => total += i,
            Poll::Ready(None) => break,
            Poll::Pending => yield Poll::Pending,
        }
    }
    total
}

Using the async fn next design, this desugars to something like below. Note that the in this version AsyncIterator is not object safe, so we're assuming we have support using something like dyn*.

async fn sum(it: Boxi32>>) -> i32 {
    let mut total = 0;
    loop {
        let f: Pin* Futurei32>>> = pin!(it.next()); // next call is indirect
        let next = loop {
            match f.poll() { // poll call is indirect
                Poll::Ready(next) => break next,
                Poll::Pending => yield Poll::Pending,
            }
        };
        match next {
            Some(i) => total += i,
            None => break,
        }
    }
    total
}

The async fn next version has two indirect calls in its desugaring. First need to make an indirect call to next to get the future to poll to get the actual next item. The way async in trait objects works means that the resulting future will itself be a trait object, which means calling poll on that future will be an indirect call too.

It's possible these calls don't matter. Indirect calls are normally only relevant in CPU-bound workloads, while async is most often used for IO-bound workloads. In wg-async, we've talked a lot about the this extra indirections but I think it's mostly because we can objectively count how many indirect calls there are, while we can't objectively things like design elegance or ergonomics.

There will be cases where this overhead does matter though, so it is desirable to not bake these calls in at the language level. Can we tweak the design of async fn next to remove most of this overhead?

Reducing the `async fn next` overhead🔗

The key idea here is to change the semantics of the future returned by async fn next so that it's able to be polled again after completion. When polled after completion, it essentially calls next again but reuses the future. This is, in fact, how the Next future from the futures crate works. It wraps poll_next, so the future has the same semantics as poll_next.

If we make this semantic change, then we can desugar the async fn next version to something like below.

async fn sum(it: Boxi32>>) -> i32 {
    let mut total = 0;
    let mut f: Pin* Futurei32>>> = pin!(it.next()); // next call is indirect
    loop {
        let next = loop {
            match f.poll() { // poll call is indirect
                Poll::Ready(next) => break next,
                Poll::Pending => yield Poll::Pending,
            }
        };
        match next {
            Some(i) => total += i,
            None => break,
        }
        // the next time around the loop we'll use `f` again
    }
    total
}

We still have two distinct indirect calls, but the first one is only called once at the start of the loop rather than inside the loop for each iteration.

This code is now more convoluted than it needs to be. We can instead refactor it to only have one loop and one match expression:

async fn sum(it: Boxi32>>) -> i32 {
    let mut total = 0;
    let mut f: Pin* Futurei32>>> = pin!(it.next()); // next call is indirect
    loop {
        match f.poll() {  // poll call is indirect
            Poll::Ready(Some(i)) => total += i,
            Poll::Ready(None) => break,
            Poll::Pending => yield Poll::Pending,
        }
    }
    total
}

This version looks suspiciously like the poll_next desugaring.¹

Aside: This Doesn't Actually Work🔗

This optimization sounds good, but it doesn't actually work. I want to briefly touch on why but then let's just pretend we didn't notice this because I'd like to consider the rest of the argument.

The problem is that async fns only complete once. We might redefine the semantics of Future to allow multi-completion futures, but unless we also change async fn, the futures produced by async functions will still be single-completion. The key benefit of the async fn next is that users can hand-roll their own async iterators by writing async fn next and that this should be simpler than writing two state machines at once with a hand-written poll_next. We could add a modified version of async fn that lets you write a multi-completion future, but if we do this I'm pretty sure we'll end up with async gen fn with some different names. Or we could add a way to create a multi-completion future from an async closure, but if we make users go through all of this, we've given up the simplicity benefits of being able to just write async fn next.

Anyway, let's ignore this for now and continue on.

Iterator Setup🔗

So with the performance optimization, things are looking very similar. Both versions start with a little bit of setup code. The poll_next version needs to pin the iterator since poll_next takes a pinned argument. The async fn next version needs to call next once to get the future to poll, and then pins that as well. Then both versions go into a loop calling either poll_next, or a poll function that we've redefined to have the same semantics as poll_next.

The poll_next version does not have an analog to the next call in the async fn next version. I think this does exist, it's just not shown here. At some point, you need to set up your iterator state. You can't just call poll_next on a Vec, partly because the Vec would have to be pinned and that would be inconvenient, and also because it's not ideal to keep your iteration state attached to the Vec.

So let's imagine we have some operation which sets up the state needed to run an async iterator. In our poll_next examples, we've been assuming the iterator is set up before the call to sum. In other words, it's called as sum(my_vec.async_iter()) and not sum(my_vec).

What if, instead, we created some generic way to make an async iterator for some collection and moved that into the sum function? We could call this trait IntoAsyncIterator², and it would look something like:

trait IntoAsyncIterator {
    type Item;
    type AsyncIter: AsyncIteratorSelf::Item>;
    fn into_async_iter(self) -> Self::AsyncIter;
}

Then we could write the poll_next version of sum as:

async fn sum(it: Boxi32>>) -> i32 {
    let mut total = 0;
    let mut f: Pin* AsyncIteratori32>> = pin!(it.into_async_iter()); // into_async_iter call is indirect
    loop {
        match f.poll_next() {  // poll_next call is indirect
            Poll::Ready(Some(i)) => total += i,
            Poll::Ready(None) => break,
            Poll::Pending => yield Poll::Pending,
        }
    }
    total
}

Now we basically have the same code as the async fn next version. There's a subtle difference, since async fn next takes &mut self while into_async_iter takes self, but we could design the "initialize async iterator state" operation to take &mut self if we wanted. Also, I should point out that self methods are not allowed in object-safe traits, so as designed, we would not be able to have a dyn IntoAsyncIterator.

What's in a name?🔗

If we make the change to async fn next to allow the future to be polled again after completion and we add some kind of IntoAsyncIterator, then the two designs are basically equivalent. In both of them, the for await loop desugars into a setup phase (similar to have synchronous for loops call IntoIterator), and then there is a loop can calls a method to advance the state of the iterator.

If we make these two changes, then IntoAsyncIterator in the poll_next design is essentially what AsyncIterator is in the async fn next design. They are both the way to initialize iterator state. Then the AsyncIterator trait in the poll_next design is equivalent to the modified Future semantics in the async fn next design.³

So to me it seems that if we apply the multi-completion optimization to async fn next then the two designs have the same shape and most of the debate is around the boundaries and names of various subcomponents.

Conclusion🔗

I wrote the first draft of this post about three months ago and then hesitated to publish it because in writing it I discovered some fatal flaws in the argument. However, I've shared the draft privately with a few people and we've been referring to it often in discussions so I felt like it was worth putting it out in public. Others have come up with similar arguments to this post so I consider this post to be my particular take on the argument rather than the definitive statement.

At the beginning I had hoped to say that with the optimizations we've added to the async fn next design, the two designs are essentially the same. The implication then would be that we should stop arguing about names, pick something, and stabilize it.

Having written the post, I think we instead see an irreconcilable difference in the designs. While the multi-completion future optimization does lead to a design that's equivalent to poll_next, it gives up the key characteristic of the async fn next design, which is the ability to hand-write iterators with async fn next. It's not "here's a small tweak and now the designs are the same," it's "here's a small tweak and now async fn next is a completely different design."

So in conclusion, the two designs are not secretly the same and we are going to have to actually make a decision on which one we want to have.

In fact, I copied the poll_next desugaring and then made a few changes to get this version.↩

IntoAsyncIterator may not be the best name, because it implies it consumes its source. We might want a non-destructive way to iterate over something asynchronously.↩

If we go this route, I'd suggest having AsyncIterator::next return a subtrait of Future called something like IterableFuture or MultiCompletionFuture to highlight multi-completion semantics. This is similar to how FusedFuture adds additional semantics to the underlying Future trait.↩

Async Cancellation and Panic

2024-03-06T00:00:00+00:00

When I last wrote about async cancellation in Rust, I touched briefly on the question of how cancellation interacts with panic. Mostly I left it as an exercise for the reader and left a rough sketch for how I thought it would work. More recently, Boats touched on the issue in a little more detail, but I think there are still a lot of open questions. In this post, I'd like to experiment with unwinding using my cancellation prototype and build on some of the previous work in this area.

It's not as easy as I thought🔗

In the sketch I laid out before, I expected the core idea of supporting cancellation during unwinding would be to have the executor, and any mini-executors like race and join, would basically wrap calls to poll with catch_unwind, then in the Err case, call poll_cancel to completion and then call resume_unwind. In pseudo-code, that would look something like:

loop {
    match catch_unwind(|| task.poll(cx)) {
        Ok(Poll::Ready(x)) => return x,
        Ok(Poll::Pending) => continue,
        Err(panic) => {
            while Poll::Pending = task.poll_cancel(cx) {}
            resume_unwind(panic);
        }
    }
}

Unfortunately this doesn't work. It turns out I had some inkling this might be the case when I wrote:

There are other challenges though. One is that the poll_cancel functions will need to be written to be aware of the fact that they might be called during unwinding, which means the internal state for the future might be inconsistent.

To understand what's wrong, recall that I desugared cancellation-aware async blocks into coroutines. Rust coroutines only have one entry point, which is the resume method. I simulated two entry points (poll and poll_cancel) by passing another argument into resume. The thing is, once resume panics, coroutines cannot be resumed again and they will panic if you try. Since poll and poll_cancel are backed by the same resume method, this means we can't call poll_cancel after poll panics.

Some of this is an artifact of the way this experiment is structured. If we had proper compiler support for multiple entry points to a coroutine, we might be able to make this work. But I think it's more composable and more in line with existing precedent to follow a rule where all unwinding or cancellation work needs to finish before a panic leaves the poll call.

An approach that actually works🔗

This realization that we need to process and cancellations before unwinding out of poll felt constraining at first, but it actually simplifies a lot of the design. I thought we'd need to wrap basically every call to poll in catch_unwind, but in most cases this is unnecessary and we can instead let the usual unwinding machinery proceed as normal. The places where we do care are when we know of multiple futures and if one of them panics we need to cancel the rest.

Let's do on_cancel as an example. While I don't think on_cancel would be a great API to support in production, it is useful to focus on the specifics of cancellation behavior.

In the last post, I was thinking of on_cancel almost as an approximation of an exception handler. For our purposes today, I think it's more useful to think of it as a kind of future combinator. In this view, on_cancel produces a new future from two others, one that is the normal execution path, and another future that is run only when the future is cancelled.¹

Looking at it this way, we can see what we should do when the poll function on the main future panics. We aren't allowed to poll the future that's panicking anymore, because its internal state might be inconsistent. We have to trust that as poll was unwinding, the future ran any cancellation handlers that were on the stack. But, since we want cancel-on-unwind semantics, the on_cancel combinator needs to catch the panic, run the cancellation future to completion, and then resume unwinding.

Deriving the implementation🔗

Now let's see how to add cancellation on panic behavior to our existing on_cancel implementation. My last post didn't really go into the details on this, so let's start with a rough sketch of the previous on_cancel implementation. Throughout this section I'm going to ignore details like pinning and unsafe so we can focus on the main idea. I have a complete working implementation of the ideas in this section available at https://github.com/eholk/explicit-async-cancellation.

The on_cancel method returns a future that's carries a cancellation handler. While the details are hidden in the surface API, the struct and future implementation returned looks like this:

struct<F, H> OnCancel {
    future: F,
    on_cancel: Option,
}

impl Future for OnCancel
where
    F: Future,
    H: Future,
{
    type Output = F::Output;

    fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> {
        self.future.poll(cx)
    }

    fn poll_cancel(self: Pin<&mut Self>, cx: &mut Context) -> Poll<()> {
        // run the cancellation handler if it's still present
        if let Some(on_cancel) = self.on_cancel {
            match on_cancel.poll(cx) {
                // if cancellation is complete, clear the handler so we won't try to run it again
                Poll::Ready(()) => self.on_cancel = None,
                // cancellation is not finished, so yield to the caller.
                Poll::Pending => return Poll::Pending,
            }
        }

        // run any cancellation handlers on the inner future
        self.future.poll_cancel(cx)
    }
}

The poll function is pretty uninteresting. We just forward it to the inner future. The poll_cancel function is a little more subtle. The main thing we need to do is run the cancellation handler, which we do by calling poll on it. However, the inner future might also have nested cancellation handlers, so we need to call poll_cancel on it as well. This is also why I chose to wrap the cancellation hook in an Option, since I can use that as a flag to indicate whether the cancellation hook is finished.

As an aside, I chose to do outside-in cancellation semantics here since drop also runs outside-in. I'm not sure this was the right choice. For example, unwinding is inside-out instead. I think it's worth thinking harder about what the right ordering is, but for now it's easy to change and independent of our focus today.

Okay, so now that we have a basic on_cancel implementation, let's handle what happens if the call to the nested future's poll panics. In short, we need to wrap the call to poll in catch_unwind.

fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> {
    match catch_unwind(|| self.future.poll(cx)) {
        Ok(poll) => poll,
        Err(panic) => todo!("run the cancellation hook at then resume unwinding"),
    }
}

Now let's think about the Err case. Basically, we need to cancel ourselves, which we can do by calling poll_cancel. Then we need to resume unwinding. Because poll_cancel might take several tries to finish, we need to save the panic information so we can resume unwinding after it's done. So we'll add another field to OnCancel to optionally store the panic information.

struct<F, H> OnCancel {
    future: F,
    on_cancel: Option,
    panic: Option+ Send + 'static>>,
}

impl Future for OnCancel
where
    F: Future,
    H: Future,
{
    type Output = F::Output;

    fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> {
        match catch_unwind(|| self.future.poll(cx)) {
            Ok(poll) => poll,
            Err(panic) => {
                self.panic = Some(panic);
                match self.poll_cancel(cx) {
                    Poll::Ready(()) => resume_unwind(self.panic.take().unwrap()),
                    Poll::Pending => Poll::Pending,
                }
            },
        }
    }

    fn poll_cancel(self: Pin<&mut Self>, cx: &mut Context) -> Poll<()> {
        todo!("we'll come back to this in a minute")
    }
}

We're part of the way there, but we still have some problems. Assuming poll_cancel were correct (it's not, but we'll get there), we'd be okay if cancellation finished promptly. But if not, it will return Pending, which we'll bubble up to the caller. The caller doesn't know we're panicking, since we've hidden the panic information away in our panic field, so it will eventually call poll on us again. Unfortunately, this means we'll poll the inner future, which we've previously said is not allowed. So we need to make a small change to check if we're in the process of panicking when we're polled.

fn poll(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Self::Output> {
    if self.panic.is_some() {
        match self.poll_cancel(cx) {
            Poll::Ready(()) => resume_unwind(self.panic.take().unwrap()),
            Poll::Pending => return Poll::Pending,
        }
    }

    match catch_unwind(|| self.future.poll(cx)) {
        Ok(poll) => poll,
        Err(panic) => {
            self.panic = Some(panic);
            match self.poll_cancel(cx) {
                Poll::Ready(()) => resume_unwind(self.panic.take().unwrap()),
                Poll::Pending => Poll::Pending,
            }
        },
    }
}

And now we're all set. If we're polled when there's panic information present then we never get to the call to self.future.poll(cx).

Now it's time to revisit poll_cancel. To share some logic, I had the panic path in poll call into poll_cancel, but this means we need to update poll_cancel to recognize that it can be called while panicking. Here's how:

fn poll_cancel(self: Pin<&mut Self>, cx: &mut Context) -> Poll<()> {
    // run the cancellation handler if it's still present (this part stays the same)
    if let Some(on_cancel) = self.on_cancel {
        match on_cancel.poll(cx) {
            // if cancellation is complete, clear the handler so we won't try to run it again
            Poll::Ready(()) => self.on_cancel = None,
            // cancellation is not finished, so yield to the caller.
            Poll::Pending => return Poll::Pending,
        }
    }

    // if we aren't panicking, run any cancellation handlers on the inner future
    // otherwise, resume unwinding
    match self.panic {
        None => self.future.poll_cancel(cx)
        Some(_) => resume_unwind(self.panic.take().unwrap()),
    }
}

The first part, where we run the cancellation hook, stays the same as before. In the second part, we would normally cancel the inner future, but remember that if we are panicking we aren't allowed to poll it again.

It's worth asking what we should do in the Some line though. At this point we know we are in the process of unwinding, and all cleanup code has finished. One option is to return Poll::Ready(()) here, and if we're called from poll then we could count on it calling resume_unwind. However, it could also be that while we were waiting on the cancellation to finish, the executor decided to cancel us. In this case, if we returned Poll::Ready(()) then we would swallow the exception. So instead, the right answer is to resume_unwind here as well.

So there we have it: how to cancel a future when polling it panics.

Should we do this?🔗

We've shown that it's at least somewhat possible to support async cleanup code while unwinding. I'll admit, beyond a basic smoke test, I haven't really probed the limits of this design. For example, what happens if we panic while running the cancellation handler as a result of another panic? Or what actually happens if the executor cancels us while we are cleaning up before resuming a panic? If we were to RFC something like this, these are all questions that we'd need to explore.

The reason I decided to go ahead and write this post without answering those questions is that in this post I think we've already learned enough that we can start evaluating this design and inform future options.

First of all, something about suspending while in the process of unwinding just feels fundamentally weird and uncomfortable. That said, I think we can develop a reasonable semantics for this behavior if we decide we want it.

But this also leads to a shortcoming that I'm not sure how to resolve. This prototype cannot work in #[no_std] environments, because catch_unwind and resume_unwind represent panic information as a Box, meaning we need an allocator. This is a non-starter for something that we'd want to consider building in as a core Rust language feature. The whole async/await system has been carefully designed not to need an allocator, and we need to preserve this property. After all, async/await has found a lot of success in microcontroller environments!

Is this necessary though? Or is it an artifact of trying to prototype a system purely in library code without compiler support? As an analogy, we could imagine prototyping destructors using catch_unwind, but rustc is able to generate code to run destructors during unwinding without needing to reify the exception.

Unfortunately I don't think we can avoid the issue in the same way. The problem is that normal unwinding doesn't suspend the execution at all, while we very much need to be able to do that to await in the unwinding path. This means the exception does need to be stored somewhere (presumably with the future), and we need to be able to resume unwinding later. If you're using a work-stealing executor, this means it's even possible that your task could start unwinding on one thread and finish on another. So we need somewhere to store the exception that's not ephemeral in the way that it is during the Rust-generated unwind code.

There might be other options that could work. For example, the executor could reserve some space for each task that's large enough to hold most panics. Most likely the way we'd accomplish this is by attaching something to the Context that gives access to it. Maybe it'd be specific to panics, or maybe it'd be a more general task-local bump allocator or something like that. At any rate, we could add API surface for a minimal allocator to support awaiting while unwinding without needing a full-blown global allocator. These could be made optional, which would give executors the option of aborting if they cannot or don't want to support async unwinding.

Another option would be to have the compiler not automatically generate calls to poll_cancel while unwinding, and instead provide something like an async version of catch_unwind. I think something like this is what boats was proposing. The nice thing about this option is that we can completely give up on supporting #[no_std]. Furthermore, we don't have to worry about being "zero cost," since the fact that the user called async_catch_unwind signals that they're willing to pay the cost that's needed.

That said, it's not clear how that should interact with do ... final blocks if we were to add them.² For example, the final block would presumably run during unwinding in sync code, so it seems like we'd also need to do it while unwinding in async code. Unfortunately, as far as I can tell that will run into the same allocation problems.

So to go back to the question of whether we should do this, I think we need more exploration. There are some options, but from my exploration here it seems like it's hard to satisfy all our requirements. But maybe one of these, or some other option, can strike a decent compromise.

With a small tweak, we could approximate a finally clause, by making it so we run the cancellation future even if the main future completes successfully.↩

I really like the idea of do ... final! I had hoped to explore that some in this post but I felt there was enough material here without it.↩

How to Shrink Rust

2024-01-08T00:00:00+00:00

While doing some housekeeping on my blog over the weekend, I can across an ancient post by Patrick Walton. While I didn't realize it at the time, this post embodies what has become one of my core principles in program language design.[^spiky-blob] In re-reading Patrick's post, this quote stood out in particular:

Language design tends to go in cycles: we grow the language to accommodate new functionality, then shrink the language as we discover ways in which the features can be orthogonally integrated into the rest of the system. Classes seem to me to be on the upward trajectory of complexity; now it’s time to shrink them down. At the same time, we shouldn’t sacrifice the functionality that they enable.

This cycle of growing and shrinking as a key part of the process in the early days of Rust. Upon reading this section, I found myself asking "how could we shrink Rust today?"

What happened to classes?🔗

To be honest, I had forgotten Rust had classes at one point. I remembered resources and objects, but forgot we had a brief window where there were classes. Patrick's post explains what happened to them. Essentially, once we added classes and a bunch of other features, we realized that classes combined five features that we could implement independently in a way that's more general. These, along with their modern replacements in Rust, are:

Nominal records, replaced by struct.
Constructors, replaced by struct literal syntax and plain functions that are conventionally called new.
Field-level privacy, replaced by module-level privacy.¹
Attached methods, replaced by inherent impls.
Destructors, replaced by Drop trait.

Some of these features weren't so much replaced as removed. For example, it's hard to claim Rust has constructors today, other than by convention. Similarly, if I remember right, at the time Rust also had the struct keyword, so you used struct if you just wanted a nominal record or class if you wanted the rest of these features. Or in the case of field-level privacy, we basically just decided this feature wasn't necessary.²

For the two features that had a clear replacement, by decoupling them from classes we gained a lot more power. You can attach methods to any type now, like enums and even primitive types, not just classes. Destructors are much simpler now too, since you implement Drop just like any other trait.³

The end result of this was we replaced a large feature, classes, with a handful of smaller, orthogonal features. The result was something that composed better⁴ and gave us more power and flexibility.

What does this have to do with Rust today?🔗

To me the key take away, at least looking back from over a decade later, is that a big part of why Rust is the way it is today is that we were able to add a bunch of features and then pare them down once we got some experience. In Rust's history, it's had three different ways to do destructors, and while I don't recall exactly, I suspect at least two of these coexisted at some point.

It's somewhat harder to follow this model now. In the early days, we made breaking syntax changes sometimes multiple times a week.⁵ At that time, the Rust team was a handful of people, about as many interns, and some people who hung out on IRC. Today the community is much larger and people are using Rust in mission-critical projects where they can't afford to make weekly syntax updates. And of course, Rust 1.0 came with a promise that there would be no more breaking changes. You can can rely on Rust to keep working tomorrow.

Rust is still able to grow, but shrinking is much harder, and as a result, we have to be much more conservative in how Rust grows. We have some ability to shrink through the editions system, but this is still not a great mechanism for rapidly iterating on designs.

Anyway, I don't really have a solution, or even necessarily a clearly defined problem. I mostly just wanted to observe that developing Rust is harder today because we mostly have to look at things incrementally. It's much harder to design a set of interrelated features that maybe by themselves wouldn't be particularly noteworthy but together are quite powerful.

Fortunately, Rust does have the nightly compiler, and a process for experiments. That seems like the right environment to do the kind of language experimentation today that was possible in the early days. This is the same codebase that becomes the stable compiler, so we still need to emphasize stability and maintainability, but liberal experimentation in the nightly compiler with many different Rust features at once seems like it has the possibility to do the same kind of broad scale language iteration that we did in the early days while staying true to Rust's stability promises.

I've since started calling this my Spiky Blog Theory of Programming Languages, but it deserves a post of its own.↩

One way of looking at this is that classes included their own module or namespace, and this was seen as unnecessary complexity.↩

It might seem nice to be able to make fields on a struct private today, but that requires us to pull in an number of other features. In particular, you need some methods that you can make public which do have access to the private fields. That's why there were attached methods before, and something like that could work with impls but it would be tricky since impls are a lot more flexible.↩

⁴

Early Rust had resource types which were basically a wrapper around a type that included a destructor. In some ways it was nice because most things didn't have destructors, but it also meant when you needed one you had to put your code through some contortions to make it work well with an attached destructor. Also, while it's tempting to say Drop is just like any other trait, it's not really because it has special meaning to the compiler.↩

⁵

I expect had we kept classes it'd be common to have classes that just wrap an enum, since otherwise we wouldn't have had a way to attach methods to enums. Eventually we probably would have invented some kind of enum class syntax.↩

⁶

This is a big part of why rustfmt is so good, because that was how we rewrote the whole compiler every time we had a major breaking syntax change, which was not uncommon.↩

Rethinking Rust's Function Declaration Syntax

2023-12-15T00:00:00+00:00

We had a fun discussion in #t-lang about possible new syntax for declaring functions in Rust. There were a lot of cool ideas put forward, and while mulling them over I realized a lot of them work nicely together and can be introduced in a backwards-compatible way to give us some cool new capabilities. While these were fresh in my mind and I'm feeling excited about them, I wanted to write them in one place.

For background, top level functions in Rust look sort of like this:

fn foo(x: i32) -> i32 {
    x + 1
}

In Rust 2018, we added async fn:

async fn foo(x: i32) -> i32 {
    x + 1
}

While that one doesn't do anything particularly interesting, an async function gives you the ability to use await inside it. It also secretly changes the return type from an i32 to an impl Future. This is regarded by many to have been a mistake, and it's starting to cause issues now that we have async functions in traits since there is no way to add additional bounds like Send to the return type. Anyway, async fn foo is mostly just syntactic sugar that desugars into:

fn foo(x: i32) -> impl Futurei32> {
    async { x + 1 }
}

It's likely that Rust will gain a whole bunch of new keywords we can stick in front of fn in the future.¹ For example, nightly Rust just got support for gen fn and async gen fn. Those desugar similar, by wrapping the return type in impl Iterator or impl AsyncIterator and wrapping the body in gen { } or async gen { }.

Another piece of sugar we could add is try fn, which is actually what started off the discussion thread today. Following the pattern we've had so far, we'd expect to be able to write something like:

try fn foo() -> i32 {
    let x = read_number()?;
    x
}

and have this desugar to:

fn foo() -> impl Tryi32, Residual = ???> {
    try {
        let x = read_number()?;
        x
    }
}

The problem is we need a hint for the Residual type. The obvious thing to do would be to add something to the function header, like try fn foo() -> i32 throws E. But if you've ever looked at the Residual types for the Try impls in the standard library, you know that these can look pretty hairy and not particularly intuitive. For example, to make a function that returns an Option, we'd need to write:

try fn foo() -> i32 throws Option {
    let x = read_number()?;
    x
}

This would give the compiler enough information to find the Try impl for Option. But notice that we also could have just written fn foo() -> Option, which is shorter and you don't have to figure out why my fallible function has an Infallible in it.

At this point, Lukas Wirth observed that they would rather see a shorthand for functions whose body is a single expression. If we did this, we could write try fn as:

fn foo() -> Option<i32> = try {
    let x = read_number()?;
    x
}

So that's pretty neat.

This also invites us to reconsider async fn. We could instead write:

fn foo() -> impl Futurei32> = async {
    let x = read_number().await;
    x
}

That's not too bad, but impl Future is a bit wordy. We could come up with some rules that would let you write impl Future instead, which honestly is how we usually read that out loud anyway. But then joboet and pitaj pointed out that we could treat Trait -> Type as shorthand for Trait. TC pointed out that we could probably generalize this to support yields T and Iterator.

So if we combined a few of these ideas, we'd be able to write:

fn foo() -> impl Future -> i32 = async {
    let x = read_number().await;
    x
}

I think this shows a lot of potential. I want to try to generalize this a bit more though. Instead of special-casing the Output associated type, we could create a set of attributes indicate an associated type can be used with trait keyword shorthands. For example, define the Future and Iterator traits like this:

trait Future {
    #[keyword(return)]
    type Output;

    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output>;
}

trait Iterator {
    #[keyword(yields)]
    type Item;

    fn next(&mut self) -> Option<Self::Item>;
}

This would let us refer to Future as Future -> T and Iterator as Iterator yields T.

We could even combine them:

trait Coroutine {
    #[keyword(yields)]
    type Yield;

    #[keyword(return)]
    type Return;

    fn resume(self: Pin<&mut Self>, arg: R) -> CoroutineState<Self::Yield, Self::Return>;
}

fn coroutine() -> impl Coroutine<()> -> bool yields i32 = || {
    yield 42;
    true
}

This would also let remove some of the special handling around the Fn* traits and we could expose this functionality to users so libraries could use this sugar in their own traits.

At this point, I'd like to take a step back and think about plain fn functions. Notice that the following two would be equivalent:

fn foo() -> i32 {
    let number = read_number();
    number
}

fn foo() -> i32 = {
    let number = read_number();
    number
}

One way of think of this is that we've made the = optional. But I'd like to think of it a different way. Let's say instead we think of the = form as the standard function declaration syntax. Then, if the function body consists of a single block, we can use a compressed syntax. For a regular { } block, that just looks like the function declaration syntax we're used to. But for blocks with a keyword in front, like async { } or try { }, we say the keyword moves all the way to the front of the function header. In addition, each block as an characteristic trait associated with it, so when we used the block shorthand for function declarations, we also wrap an impl Trait around the return type.

Here are some examples:

// async ////////////////////////////////////////

async fn foo() -> i32 {
    let number = read_number().await;
    number
}

// desugars to:

fn foo() = impl Futurei32> = async {
    let number = read_number().await;
    number
}


// gen //////////////////////////////////////////

gen fn foo() -> i32 {
    yield 1;
    yield 2;
}

// desugars to:

fn foo() = impl Iteratori32> = gen {
    yield 1;
    yield 2;
}

// assuming `Iterator` is defined like:

trait Iterator {
    #[keyword(return)]
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
}


// async gen ////////////////////////////////////

async gen fn foo() -> i32 {
    yield 1;
    yield 2;
}

// desugars to:

fn foo() = impl AsyncIteratori32> = async gen {
    yield 1;
    yield 2;
}

I've left try out because it's complicated. You could technically do something like:

try fn foo() -> i32 throws Option {
    let number = read_number()?;
    number
}

but for try users usually want to know the concrete type. So instead I'd expect most people to prefer the desugared form:

fn foo() -> Option<i32> = try {
    let number = read_number()?;
    number
}

Note that the "pulling the keyword forward" transformation doesn't work because this function returns a concrete type and what I've proposed here is that pulling the keyword forward always adds an impl Trait rather than a concrete type.

Anyway, I'm pretty excited about this idea.² It feels like a consistent way to handle these connections between blocks, traits, and functions. It's backwards compatible with the syntax we have so far, but it gives us a lot more expressiveness in cases where we're currently missing it.

You can already do unsafe fn and const fn today, but these don't desugar in the same way as other proposed keywords here do.↩

Of course, I also just started thinking about this today and cranked out a blog post, so I may hate it by Monday.↩

A Mechanism for Async Cancellation

2023-11-14T00:00:00+00:00

One of the items on our Async 2027 Roadmap is to come up with some kind of asynchronous cleanup mechanism, like async Drop. There are some tricky design questions to making this work well, and we need to start thinking about these now if we want to have something ready by 2027.

In this post, I'd like to explore a low level mechanism for how we might implement async cancellation. The goal is to explore both how an async executor¹ would interact with cancellation, as well as to make sure that this mechanism would support reasonable surface-level semantics. You can think of this as a kind of compilation target for higher level features, similar to how the Rust compiler lowers async fn into coroutines.

If you haven't read my last post on Cancellation and Async State Machines, I'd encourage you to do so. That post provides a kind of theoretical background for what we'll implement in this post.

Introducing `poll_cancel`🔗

Lately I've been working on a prototype implementation of async/await, as well as changes to Future and related traits, that supports more flexible cancellation. I'd like to discuss this prototype, the tradeoffs made, and what I've learned about cancellation from the exercise. Note that what I'm presenting here is α-equivalent to several previous proposals, including Boats' poll_drop_ready RFC and a proposal by tvalloton on IRLO. My main contribution here is a prototype implementation that lets us write examples and explore their behavior.

A Cancellable `Future`🔗

The core of the idea is to extend the Future trait with a new poll_cancel that has a default implementation. The new trait would look like this:

pub trait Future {
    type Output;
    
    fn poll(self: Pin<&mut Self>, cx: Context) -> Poll<Self::Output>;
    
    fn poll_cancel(self: Pin<&mut Self>, cx: Context) -> Poll<()> {
        Poll::Ready(())
    }
}

In this new trait, poll has the same semantics as before. The new poll_cancel method performs two operations. First, it transitions the future's state machine from its normal execution path to the correct cancellation state. Second, poll_cancel continues to advance the state machine until the cancellation is complete.

The fact that poll and poll_cancel return different types highlights that fact that cancellation is a different exit from the future. A cancelled future returns no value, so poll_cancel returns Poll<()> instead of Poll This matches what we saw in my previous post where we had a different final state for a future that was cancelled versus one that completed normally.

There are some attractive properties about this approach. The default implementation of poll_cancel leads to the same behavior that we have for cancellation today, where cancelling a future just means synchronously dropping it. This suggests we can get a nice migration path, although adding a new default method to a trait is technically a breaking change.

There are significant shortcomings, which I'll discuss further down. But first, I'd like to look at how poll_cancel works with async and await.

Cancellation with `async` and `await`🔗

Most people writing async Rust should not have to deal with poll directly. Most of the time we use higher level constructs like async and await instead. The nice thing about async and await in Rust is that there's nothing particularly magical about them.² The can be thought of as desugaring into lower level constructs, and this desugaring happens in a way that you could mostly implement them both as macros.³ The primary benefit for building them into the language is that we can have nicer syntax and nicer diagnostics.

The fact that we can think of async and await as macros that desugars into lower level concepts means we can experiment with cancellation by writing a new set of macros that that call poll_cancel in the appropriate place. Most of the action will be in the changes we make to await.

The goal here is to come up with a desugaring that has predictable cancellation behavior that is also usually the desired behavior.

The somewhat surprising thing to me is that await mostly just forwards calls to poll, but doesn't have a lot of interesting future behavior. The interesting behavior (such as making sure a Waker gets called sometime in the future) all happens in hand-written Future impls. We can see this in the approximate desugaring of await from the Rust Language Docs:

match operand.into_future() {
    mut pinned => loop {
        let mut pin = unsafe { Pin::new_unchecked(&mut pinned) };
        match Pin::future::poll(Pin::borrow(&mut pin), &mut current_context) {
            Poll::Ready(r) => break r,
            Poll::Pending => yield Poll::Pending,
        }
    }
}

This block of code runs when some code higher up the call stack calls our poll method. What this block of code is doing is basically calling the awaited future's poll function in a loop. If that future returns Pending, we yield Pending. From this code, the compiler will generate a Future::poll function that returns Pending when the function would yield Pending.

This happens deeper than in the compiler than we can do with macros, but we can approximate something different. Originally, the compiler actually generated an object that implemented Generator (now Coroutine) and the standard library had a wrapper that adapted the Generator into a Future. We'll use this approach for our prototype.

We'll want to handle cancellation similarly to how polling is handled, where await also forwards calls to poll_cancel along the await chain until we arrive at a future that knows how to do something interesting with cancellation.

Looking at how we might extend the desugaring of await to support poll_cancel, we need to distinguish whether we're on the cancel path or the normal execution path so we can call either poll_cancel or poll depending on the context. We'll punt on this and assume we have a magic is_cancelled variable that can tell us this, which is similar to the current_context variable in the previous desugaring.

So let's see how this first step looks:

match operand.into_future() {
    mut pinned => loop {
        let mut pin = unsafe { Pin::new_unchecked(&mut pinned) };
        if !is_cancelled {
            match Pin::future::poll(Pin::borrow(&mut pin), &mut current_context) {
                Poll::Ready(r) => break r,
                Poll::Pending => yield Poll::Pending,
            }
        } else {
            match Pin::future::poll_cancel(Pin::borrow(&mut pin), &mut current_context) {
                Poll::Ready(()) => panic!("What do I do after cancelling?"),
                Poll::Pending => yield Poll::Pending,
            }
        }
    }
}

It's like before, but we check if we are cancelled first. If we are not, we continue with the previous behavior, calling poll and breaking out or the loop if the future is Ready or yielding Pending otherwise.

If we are cancelled we do almost the same thing, except we call poll_cancel instead. If the cancellation is Pending, we yield again. But if the cancellation is complete, we have to decide what to do next. In the normal case, we have break r, which passes r out to the surrounding context, which is expecting a value of whatever type r is. We can't do the same thing when the cancellation is complete because while r might be type (), we can't rely on that. For now we panicked, since that type checks, but this obviously doesn't work.

We can get some inspiration from our state machines we saw earlier. Cancellation effectively means we have two exit states for the function: normal return and cancelled. But functions in Rust only have one exit state⁴, so we need to reify this into some data type that shows which final state you'd be in if you could have multiple final states. It turns out the Rust standard library has one we can use for this purpose: Result.⁵ So to report that an async fn or async block was successfully cancelled, we can return something like Err(Cancelled) and Ok(T) in the success case. Factoring this into our approximate await desugaring gives us:

match operand.into_future() {
    mut pinned => loop {
        let mut pin = unsafe { Pin::new_unchecked(&mut pinned) };
        if !is_cancelled {
            match Pin::future::poll(Pin::borrow(&mut pin), &mut current_context) {
                Poll::Ready(Ok(r)) => break r,
                Poll::Pending => yield Poll::Pending,
            }
        } else {
            match Pin::future::poll_cancel(Pin::borrow(&mut pin), &mut current_context) {
                Poll::Ready(()) => return Err(Cancelled),
                Poll::Pending => yield Poll::Pending,
            }
        }
    }
}

In the desugaring of async {}, we'll also need to wrap all the normal exit paths with Ok().

The Generator Adapter🔗

In the previous section I gave a rough sketch of how to desugar async and await into generators in a way that supports cancellation. Now I want to fill in some of the details by looking at how this resulting generator becomes a future.

If we were implementing this for real in Rust, we'd probably just have the compiler implement Future directly, like it currently does for async blocks. But, using generators lets us implement and experiment with this in a crate without having to modify the compiler.⁶

So, if we did everything right in the previous section, we should end up with a compiler-generated generator that implements Generator<(Context, bool), Yield = (), Return = Result, where T is the output type of the Future and Cancelled is just a marker tag like struct Cancelled. The argument to the resume function, (Context, bool), is a tuple containing the Context as well as a bool indicating whether the future is cancelled. This bool would get bound to the is_cancelled variable in the await desugaring above.⁷

Now we can make these into futures as follows:⁸

impl Future for G
where
    G: core::ops::Generator>,
{
    type Output = O;

    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        match self.resume((cx, false)) {
            GeneratorState::Yielded(()) => Poll::Pending,
            GeneratorState::Complete(Ok(v)) => Poll::Ready(v),
            GeneratorState::Complete(Err(Cancelled)) => panic!("child future cancelled itself"),
        }
    }

    fn poll_cancel(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<()> {
        match self.resume((cx, true)) {
            GeneratorState::Yielded(()) => Poll::Pending,
            GeneratorState::Complete(Ok(_)) => {
                panic!("future completed after being cancelled")
            }
            GeneratorState::Complete(Err(Cancelled)) => Poll::Ready(()),
        }
    }
}

Our implementation needs to cover both poll and poll_cancel, but they are both pretty similar. Each one forwards the call to the generator's resume method and then adapts the result into something expected by the surrounding async code.

Generators only have a resume method, but in this post we've extended Future to have two methods. So when we go from a call to poll or poll_cancel to a call to resume, we need to tell resume which version it is. We do this by passing an extra boolean, which the generator uses to determine whether it should go along the normal execution path or the cancellation path.

Generators return either Yielded or Complete, which for futures correspond to Pending and Ready. Because we've made resume return a Result to indicate whether the future was cancelled, we have some more cases to check. We don't want to bubble the Result out to user code; we want to keep it hidden inside the monad. From the user's perspective, this is still just a future that evalutes to a T, not a fallible future.

So we have this invariant, that in poll, resume should never return an Err(Cancelled) and in poll_cancel, resume should never return Ok. The first case would mean that the future cancelled itself, which is not the way cancellation works in Rust. The second case would mean the cancellation failed, that after being cancelled the future completed normally. In this design we're also choosing not to model that case.⁹ In an ideal world, the compiler would be able to prove both of these cases are unreachable, or we'd design the API so that these cases aren't even possible to write. Honestly, this is one of the aspects of this design that I'm least satisfied with. I'd like to experiment with different factorings that would let us get rid of the panics.

Anyway, that's the rough idea of how this design works. I haven't written the complete implementation here because I find prose more informative than code, but I do have a prototype implementation at https://github.com/eholk/explicit-async-cancellation if you want to see the full details.

But for now, let's see what this lets us do.

Scenarios🔗

My prototype includes a macro called async_cancel!, which is similar to async {} blocks, except with support for cancellation handlers. This is meant to be paired with the awaitc! macro, which is analogous to .await, but with support for cancellation handlers.¹⁰ Because these are not built in syntax, they are ugly and hard to read in the examples I've prototyped so far. So in this section, I'll write out examples as if async and await supported cancellation handlers in the way described above.

First, I want to introduce a convenience called on_cancel. This gives us a way to run asynchronous code along the cancellation path. This is important to show that everything actually works how we want, but I'm not really a fan of the API and would prefer it not be the standard way to run code on cancellation. Think of this as a placeholder for something like defer {} blocks or async Drop.¹¹ I've implemented on_cancel as an extension method on futures that takes a future and runs that future on the parent future's cancellation path. That's a little confusing to read, but in code it looks like this:

async {
    do_something().await;
    println!("all done!");
}.on_cancel(async {
    println! ("cancelled!");
}).await;

In my examples, I'll also make liberal use of futures like pending() and ready(), which never complete and immediately complete respectively.

A Cancellation-aware Executor🔗

The first thing we need is an executor that is aware of cancellation. We'll make a simple one that runs a single task, similar to block_on. If for some reason the executor is dropped before the root task completes, then in the executor's drop function will call poll_cancel on the root task until it's complete. In pseudo-code, our executor looks something like this (actual code is here):

impl Executor {
    /// Run the root task to completion
    fn run(&mut self) -> T {
        loop {
            match self.poll_once() {
                Poll::Pending => continue,
                Poll::Ready(result) => return result,
            }
        }
    }
    
    /// Poll the root task once
    fn poll_once(&mut self) -> Poll::Pending {
        let context = self.context();
        self.root_task.poll(context)
    }
    
    // Definition of `context` is omitted
}

impl Drop for Executor {
    fn drop(&mut self) {
        let context = self.context();
        while let Poll::Pending = self.root_task.poll_cancel(context) {}
    }
}

This gives us just enough to experiment with cancellation behavior. We can run simple futures like this:

fn main() {
    let root_task = async {
        42
    };
    let mut exec = Executor::new(root_task);
    let result = exec.run();
    println!("the root task returned {result}");
}

This program would run and print out

the root task returned 42

We have some more power though. Rather than using run to poll to completion, we can use poll_once some number of times to leave the future in an incomplete state. If the executor is dropped before the future is complete, it will run the cancellation path in the executor's drop function.

Here's a basic example showing cancellation:

fn main() {
    let root_task = async {
        pending().await;
        println!("all done!");
    }.on_cancel(async {
        println!("the task did not finish")
    });
    
    let mut exec = Executor::new(root_task);
    exec.poll_once(); // pending
    exec.poll_once(); // still pending
    drop(exec); // just give up
}

In this example, the root task blocks on pending(), which will never finish. But we attached a cancellation handler that runs when the executor is dropped before finishing the future. Running this program produces:

the task did not finish

So we have the basics of cancellation support and cancellation handlers. Now lets see how this composes with more interesting futures.

Cancellation-aware Combinators🔗

I'm using "combinators" here to mean futures which combine or otherwise transform other futures in interesting ways.¹² By this definition, we've already seen the on_cancel combinator, which lets you override the cancellation behavior of a future.

Let's consider another one: race. We'll use a very simplified version of race, which looks like a.race(b). This takes a future a and a future b and runs them both concurrently. When one finishes, race will cancel the other and return the value from the one that finished first.

The code for this looks horrible, so I'll leave it out of the post and focus mainly on how it looks to use it.

Here's an example using race with a cancellation handler:

fn main() {
    let root_task = pending().on_cancel(async {
        println!("future `a` was cancelled");
    }).race(async {
        42
    });
    
    let mut exec = Executor::new(root_task);
    let result = exec.run();
    
    println!("result: {result}");
}

In this example, our root task consists of a race between pending() and async { 42 }. The pending() future never finishes. We've attached a cancellation handler to it so we can see some indication that it was cancelled. So the race combinator sees that the second future returns 42 while the first is still pending. Before returning, it runs the first future's cancellation handler, printing future `a` was cancelled. Then it returns 42 as the overall value of the race future. This program's output is:

future `a` was cancelled
result: 42

Cancel during Cancellation🔗

The poll_cancel mechanism we're discussing is able to support what I earlier called idempotent cancellation.¹³ This means that if you cancel a future whose cancellation process has already started then the cancellation process continues as before.

To get a feel for how this works, let's look at a rather contrived example:

fn main() {
    // we'll use `done` to create a future that blocks until some other code
    // sets the `done` to true.
    let done = &RefCell::new(false);
    let root_task = async { 
        // we're going to race `a` and `b`, so we'll create those two futures
        // separately.
        let a = async { 42 };
        // when b cancels, we want a cancellation handler that can print a
        // message for us the first time it's polled. We'll use
        // `cancel_started` to track that.
        let mut cancel_started = false;
        let b = pending().on_cancel(poll_fn(|_| {
            if !cancel_started {
                // print a message if it's our first time through.
                println!("begin cancelling `b`");
                cancel_started = true;
            }
            // Only complete if someone has set `done` to true.
            if *done.borrow() {
                println!("cancellation of `b` complete");
                Poll::Ready(())
            } else {
                Poll::Pending
            }
        }));
        
        a.race(b).on_cancel(async {
            println!("cancelling `race` future");
        }).await;
    }.on_cancel(async {
        println!("cancelling root future");
    });
    
    // Poll the futures a few time, then let the executor shut down
    let mut executor = Executor::new(root_task);
    let _ = executor.poll();
    let _ = executor.poll();
    let _ = executor.poll();
    *done.borrow_mut() = true;
}

The behavior here is pretty subtle, so let's see the output and break down why we get this behavior. The output from this program is:

begin cancelling `b`
cancelling root future
cancelling `race` future
cancellation of `b` complete

The core of this program is that we race two futures (line 28), one that returns immediately (line 8), and one that never completes (line 13). We've attached a bunch of cancellation handlers at various points so we can observe the behavior and the order that things happen in.

The cancellation handler on b is pretty complex, but the idea here is create a future that waits until some flag is set. We wanted to simulate something that takes a little bit of time to complete, but not an unbounded amount, so that we can interrupt the cancellation.

So, we start running, the async { 42 } completes immediately and then race has to start cancelling b. This shows up in the line begin cancelling `b` . This cancellation does not complete, even though we poll a few more times, because no one has set done to true.

The next step is to trigger the second cancellation of b. We do this by letting the executor go out of scope without completing, which means the destructor calls poll_cancel on the root task. This is when we see cancelling root future appear. This gets passed on to the race future because of the way we've desugared await, so we see the program print cancelling `race` future. In the implementation of race, its poll_cancel method cancels any futures that have not either completed or been cancelled. In our case, this means we call poll_cancel on b again, but this time the call chain originates in the executor's destructor rather than the normal execution of race.

Finally, since the done flag has been set, b's cancellation can complete and we see it print out cancellation of `b` complete.

If we had instead supported recursive cancellation, we would have had the option of having b's cancellation handler terminate early. There are likely cases where both options would make sense, but here we've chosen to use idempotent cancellation semantics across the board.

Cancel during Unwind🔗

This one is left as an exercise for the reader (or a future blog post here), but I don't see any fundamental reason why we can't do it.¹⁴ The gist of the idea is that anywhere we call poll, we'd want to wrap that in catch_unwind. If the poll function panics, we'd want to catch that, then call the future's poll_cancel method to completion, and then call resume_unwind to continue unwinding.

It will be annoying to have to do a poll, catch_unwind, poll_cancel, resume_unwind dance everywhere, but the basic idea should work.

There are other challenges though. One is that the poll_cancel functions will need to be written to be aware of the fact that they might be called during unwinding, which means the internal state for the future might be inconsistent.

Evaluation🔗

Writing this post gave me the chance to thoroughly explore this design. I would say overall I think this design has enough shortcomings that I don't want to advocate it as the solution for async cancellation handlers. I still think this is useful because the shortcomings can help us find a design with fewer, or at least more acceptable, compromises. The fact that I've been able to implement this as a prototype means we can easily pivot and explore variations.

That said, I wouldn't have written so much about this design if I didn't think it had some merit. So now I'd like to discuss what I see as some of the greatest strengths and shortcomings.

Strengths🔗

In my mind, the biggest strength is that it feels like a relatively small extension to async Rust, but it still gives a lot of benefits. It's basically one new method on the Future trait, as well as a minor change to the way async and await desugar. We can provide a default implementation of poll_cancel which preserves the status quo semantics for cancellation and therefore makes the migration path pretty easy in most cases. Of course, we're going to come back to this in the Weaknesses section because it's not all roses.

This design makes it clear what the responsibilities are for well-behaved executors (and executor-like things, like future combinators) to make sure cancellation behavior makes sense.

I think this design also works well with the requirement that futures are pinned. For example, and alternate approach could be adding a method like fn cancel(self) -> impl Future. The problem is that once a future has been pinned, you can't pass it as self. Instead, the signature would have to be something like fn cancel<'a>(self: Pin<&'a mut Self>) -> impl Future, which I think is going to be annoying for executors to work with in practice. Cancelling in place strikes me as significantly simpler.


All of the benefits I've talked about in this post are available without what strike me as significantly more extensive language changes.
For example, this gives us some way to run code on cancellation paths without needing complete support for async Drop.
Of course, this leads to significant shortcomings that we'll see in Weaknesses.
On the bright side, I think something like the poll_cancel API can serve as a compilation target for cancellation, the same way that poll is a compilation target for await.
Weaknesses🔗
The weaknesses in this design range from what to me seems rather tolerable to some that I find completely unacceptable.
On the more tolerable end of the spectrum, there's the fact that this API feels a little fragile.
We have a requirement that once you call poll_cancel on a future you can never call poll again, but the compiler can't do anything to prevent you from doing that.
This kind of requirement isn't unprecedented though.
For example, with futures you already aren't supposed to call poll again after the future has completed, but the compiler doesn't stop you from doing that.
In both cases, we can mitigate this by treating await as the normal interface to poll and poll_cancel and guaranteeing that those generate correct code.
Calling poll and poll_cancel directly would then be considered an advanced use case, so we can tolerate more complex requirements there.¹⁵
I'm slightly more concerned about the migration path.
As a strength, I mentioned that the default impl of poll_cancel means without any additional action, futures will retain their present-day behavior.
In many cases, this is perfectly fine, but it's probably the wrong default for future combinators.
For example, suppose you were using an async IO crate that supported asynchronously cancelling operations in flight, but you put one of those futures behind an older version of race that did not yet support poll_cancel.
In this case, when the race future is cancelled, it would fall back on the default implementation, which says "ok, all good, nothing left to do," without calling poll_cancel on the IO operation.
The result would be that the programmer has to be extremely careful to make sure that everything in their call chain handles cancellation correctly.
Cancellation would be best effort, at best.
You definitely could not rely on this for safety!
One possible way to avoid this might be to introduce poll_cancel through a CancellableFuture trait instead.
Doing this in a way that's backwards-compatible would be tricky though.
Related to this shortcoming, poll_cancel puts a heavy burden on executor and future combinator authors.
It's already tricky to write a state machine that calls poll. Having to add poll_cancel calls to that state machine as well is going to be a lot of error-prone work.
We might be able to factor some of this work into common libraries that make it easier though.
But to me the most critical shortcoming of this design is that it it's easy to forget to cancel a future.
Fortunately, as long as your future is always behind an await, you should be okay.
On the other hand, there are common patterns that would now be error-prone.
For example, consider the following example with FuturesUnordered:
let mut futures = FuturesUnordered::new();
futures.push(async { do_something().await; });
futures.push(async { do_something_else().await; });
futures.next().await;
drop(futures);

Here we've added two futures to a FuturesUnordered collection.
When we call next(), it will poll both futures until one of them completes, and then the next() future will complete.
This means that futures is still holding on to a partially completed future.
But, when we drop(futures), there's no way to run poll_cancel because drop must complete synchronously.
So, our only option right now is to just not cancel the future.
I suppose one way to work around this shortcoming is to try to argue that FuturesUnordered is a bad API.
Maybe I could redefine what we mean by structured concurrency to say that FuturesUnordered is unstructured and the cancellation mechanism we've described here only works for structured concurrency.
If I were to take this approach, our example would look more like this when using a redesigned FuturesUnordered collection:
FuturesUnordered::with(async |futures| {
    futures.push(async { do_something().await; });
    futures.push(async { do_something_else().await; });
    futures.next().await;    
}).await;

This solves the problem by making it so that FuturesUnordered::with does no work until its awaited, so there is never any partially completed future that's not under an await point.
It's less than ideal for a few reasons though.
Stylistically, it adds more rightward drift.
But more importantly, this API makes it hard to put a FuturesUnordered in another data structure, which can be quite useful in many situations.
Plus, in my subjective opinion, the original version feels more Rusty.
Without a solution, I think this issue will make cancellation handlers so unreliable as to not be useful.
In fact, they will likely do more harm than good.
This leaves me convinced that we need some more general solution, like async Drop.
The key thing is to have some mechanism for the compiler to make sure, in an async function, that any values that need cancelled are cancelled.
To be honest, I'm a bit disappointed by this realization.
I haven't personally seen a design for async Drop that I love¹⁶, so I was hoping that something like poll_cancel would give us most of the benefits of async Drop without having to wrestle with as many complex design issues.
That said, I think a design like poll_cancel complements a higher level feature like async Drop.
Even if we have a async Drop, we need to figure out how these get run and whether we can get the properties we want in order to build on them.
I think a variation on poll_cancel would give us a useful lower level target to build a more powerful feature like async Drop on top of.
Related Work🔗
If you've been following this space for a while, the ideas I've discussed here probably sound very familiar.
I wanted to take the time to both acknowledge the work that's come before, but also highlight the ways in which my proposal here differs from earlier work.
One of the earliest versions I'm aware of is the (now abandoned) poll_drop_ready RFC from Boats.
One of the biggest differences is that the RFC focuses a lot on compiler-generated async drop glue to call poll_drop_ready and make sure things are cleaned up well, while I've left that completely out of scope for this post.
I appreciated the RFC's careful consideration of issues around pinning and fusing poll_drop_ready.
I've not really thought about these issues in my post, but I think we will need to if we move forward with this or a similar design.
I also appreciated that the RFC called out that the synchronous drop would still be called after poll_drop_ready returns Ready(()).
That feature was implicit in my design as well, but I think it is better to call it out.
The most important distinction, however, is that I have focused mainly on cancellation semantics in this post (that is, what if a future is not polled to completion?), while it seems that poll_drop_ready is called as part of the parent future completing normally through poll.
In other words, it seems executors are not intended to call poll_drop_ready directly.
This has some implications on when the programmer can assume poll_cancel/poll_drop_ready will be called.
There was another proposal on IRLO to add poll_cancel to the Future trait that is syntactically exactly the same as I've described here.
The semantics look essentially the same as I've describe here as well, with perhaps some minor variations.
For example, in my design I've imagined you do not have to call poll_cancel on a future that's never been polled.¹⁷
I think the guarantees on the contract in the IRLO post are stronger than I was hoping we'd need here---I imagined we could get away with saying something like "a well-behaved executor should..." rather than "you must."
In particular, I didn't have the requirement that "A polled future may not be dropped without poll_cancel returning ready," and instead imagined such a thing would be impolite but not illegal.
I think the biggest contribution I've made in my post is showing how to adjust the desugaring of async and await to work with poll_cancel, giving us an answer to how "to generate a state machine that can keep track of a future in mid cancellation as a possible state."
Another excellent contribution in this area is A case for CancellationTokens.
One of the things I really like about the post is the review of the major options in this space, including request_cancellation, poll_cancel, async fn cancel and cancellation tokens.
If you haven't read it yet, that section alone is worth the read!
The main idea behind cancellation tokens is to have some bit of state that's carried along the await chain and futures can check whether they've been cancelled and activate the correct behavior in that case.
It has some nice benefits around composability, and seems to be better at traversing code that is not cancellation-aware, which is a major shortcoming of poll_cancel as I've describe it here.
One thing I find interesting is that although on the surface cancellation tokens and poll_cancel look like extremely different mechanisms, they have more in common than it appears.
For example, the extra is_cancelled flag we added in the async and await desugaring looks an awful lot like a cancellation token.
I think it'd be worth exploring this connection in more depth.
The last idea I want to explore is request_cancellation, which seems to have been first introduced in some early async vision notes by Niko Matsakis.
This is framed as a replacement Future trait called Async which includes a request_cancellation method.
The idea is that after calling request_cancellation on a future subsequent calls to poll would proceed along the cancellation path rather than the normal execution path.
This has a couple of strengths.
It avoids the possibility of calling poll after calling poll_cancel.
More importantly though, request_cancellation can be used to support recursive cancellation.
After writing this post, I'm actually pretty excited about request_cancellation because it seems strictly more powerful than poll_cancel.
Conclusion🔗
In this post we've made an in-depth exploration of how a poll_cancel API would support cancellation handlers in Rust.
The design includes a prototype implementation which allows us to write real programs to get a feel how cancellation behaves.
In the course of doing this, we realized that poll_cancel has some significant shortcomings and is probably not the best mechanism for cancellation handlers going forward.
But, we also see promise for related proposals to address the specific shortcomings we've identified.

¹
I'm using executor broadly here to basically mean "any code that calls poll on futures directly." This obviously includes async runtimes, but also includes many future combinators like race or join.↩

²
This is a little bit of a lie. They desugar into generators and yield expressions, which do involve a fair amount of compiler magic to implement. The key thing here is that we don't have to do much additional magic if we can rely on the compiler to give us support for generators.↩

³
Indeed, in the early days of async Rust, await! was in fact implemented as a macro.↩

⁴
Well, not quite. Anything can panic, which you can treat as another final state for a function.↩

⁵
Option would work just as well.↩

⁶
TC has also shown that we can emulate coroutines using async/await, so it's probably even possible to do all of this on stable Rust.↩

⁷
We could also add a Context::is_cancelled() method and just pass one parameter. There are a lot of ways to plumb this around.↩

⁸
This is pseudo code. I'm assuming the pinning stuff just works. Also, my actual implementation had some transmute crimes that I've left out here for clarity.↩

⁹
This "complete after cancel" case is one that could reasonably happen. For example, maybe you sent a request to a server, started to cancel it, but before you could the server sent back a response saying the request was completed. One possible behavior is to just drop the return value and say the cancellation was actually successful. In code this would mean replacing the panic!("future completed after being cancelled") line with Poll::Ready(()). The design in this post doesn't do this, but futures themselves are empowered to handle this case however they see fit.↩

¹⁰
If this were Scheme, I'd call this macro something like await/c or await/cancel, but Rust doesn't let us use / in identifiers.↩

¹¹
Incidentally, I'm also not entirely in love with defer {} and async Drop, but I think async Drop in particular solves a lot of problems I don't know how to solve otherwise.↩

¹²
Sometimes I also find it helpful to think of combinators as mini executors, since combinators and executors both call poll functions on other futures directly.↩

¹³
I don't think it would take too much to extend this to support recursive cancellation, but that's left for another post or an exercise for the reader. I think they key thing is you need some way to tell how many times you've been cancelled. One way is to add a depth or count parameter to poll_cancel. Another is to have cancelling a future destroy the old future and create a new future that represents the cancellation of the old one, which could itself be cancelled.↩

¹⁴
Whether we want to do it is a fair question though.↩

¹⁵
This is somewhat related to fusing futures and iterators. I haven't really touched on what happens if you call poll_cancel after the future is cancelled, but I think Boats' earlier proposed RFC on poll_drop_ready makes a pretty good case that poll_cancel should require fused semantics -- that is, that you can call poll_cancel again after it completes and nothing bad happens.↩

¹⁶
For example, I haven't seen a good way to run async destructors without introducing implicit await points. I like that right now we have the property that you can see anywhere an async fn might suspend by looking for await. Although, if I'm totally honest, this may not actually be that useful of a property.↩

¹⁷
The reason for this was to try to make it so we could get away with only having to deal with poll_cancel in the desugaring of await. Given the issue with FuturesUnordered, I don't think we can get away with only calling poll_cancel as part of await and will probably need some kind of compiler-generated drop glue cancellation path. Thus, it's probably simpler and better overall to have poll_cancel called even on futures that haven't been polled yet.↩



Cancellation and Async State Machines
2023-11-08T00:00:00+00:00
If you've been doing async for a while, you've probably heard someone say something like "the compiler takes an async function and converts it to a state machine."
I want to dive into this more, since we can think of cancellation as making the state machine more complex.
In this post, I'll show how to build a state machine for a simple async function.
Then we'll we'll see how the state machine changes if we want to be able to run async code during cancellation.
Finally, we'll explore some of the design space around cancellation, particularly what happens if a future that has been cancelled is cancelled again, and see how state machines can suggest several possibilities.

Let's use the program below as a running example.
In real life, this would probably return a Result, but I want to avoid the extra complexity around additional early exits.
async fn load_data(file: AsyncFile) -> DataTable {
    let mut data = Vec::new();
    let result = file.read_to_end(&mut data).await;
    result.unwrap(); // We're ignoring proper error handling
    parse_data(data)
}

For an async function's state machine, states are made up of the the code between await points.
Or alternatively, you can think of await points as edges between states.
For this program, the state machine would look like this:

You might notice I pulled a bit of a fast one on you.
I said await points turn into edges in the state transition diagram, so we'd expect to see just one edge labeled await.
Instead, we have two edges labelled await and one without a label.
What's going on?
First, some conventions.
I realized it's helpful to see some of the traditional control flow in addition to suspension or await points.
I've represented these edges as a solid, unlabeled line.
These mean that control transfers from the previous state immediately to the second state without any suspension.
Our example is a relatively simple strait-line program so the actual control flow graph isn't particularly interesting but this will change a little when we look at cancellation.
The other edge we have in this graph is the await edge.
These edges are labeled await and are dotted lines to indicate that execution is interrupted---the future will suspend and give the executor the chance to switch to another future for a time.
Finally, I've introduced a couple of special states that do not exactly correspond to any code the user wrote.
These states are shown in orange.
Now let's turn our attention to why the diagram shows two await edges but the await keyword only shows up once in the program.
Every async fn has an implicit suspend point that represents the time between when the function is called and when is first polled.
In this diagram, I've represented this as an await edge going from the start state to the first line of the function.
In general, you don't have to worry about this hidden initial suspend point too much because async function calls are almost always immediately awaited.
In other words, it's more common to see foo().await instead of let future = foo(); /* do some other stuff */; future.await.
Cancellation🔗
The state machine we've looked at so far does not do a good job of representing cancellation.
Let's try to extend it to do so.
Today in Rust cancellation simply means you stop polling the future, and instead it is dropped.
When dropping something like a closure or a future returned by an async fn, Rust needs to recursively drop the values store in (in other words, captured by) the closure or future.
Depending on what state the future is in when it is dropped, there are different values that need to be captured.
In our example, if we drop the future before we pull it, we only need to drop the AsyncFile that was passed in as a parameter.
On the other hand, if the future is dropped at the await point, we also need to drop the Vec that we read the file contents into.
We can add some extra states to our graph to illustrate this.

I've specifically called out drop along the cancellation path, but Rust also drops values during the normal exit path.
I've left the normal drops out for simplicity.
I like thinking of async functions this way because we can use it to make several observations about cancellation in Rust.
Many of these seem rather obvious, but they raise important requirements for designing a system that can handle cancellation well.
Observation 1: Cancellation is a state change. When we cancel a future, it transitions from its normal running states to a cancellation path.
Currently this happens implicitly when a future is dropped, but in the future we will probably want a way to explicitly transition a future to its cancellation path.
Observation 2: Async cancellation handlers¹ require adding await points on the cancellation path. At the moment, cancelling futures is synchronous.
This shows up in the async state graph in the fact that there are no await edges on the cancellation path.
If we want to allow for cancellation handlers, we will need to add await points in the cancellation path.
This may be obvious, but this also implies we need a way to make sure executors continue to poll futures that have been cancelled.
Observation 3: Cancellation is an alternate exit. An async function that has been cancelled does not exit through the normal return path.
From the perspective of an async function author, this shows up as the function not continuing to execute past an await point.
From a types standpoint, a function cannot exit normally in general because we may not yet have a value of the right type to return.
In our example we can see that the type of the function does not allow it to exit at the await point, because at that point we have not created a DataTable to return.
This observation has implications that will show up in the types of the API we eventually design for cancellation handlers.
Cancellation Cancellation🔗
Another thing we can explore with a state graph is what behaviors are possible if a cancelled future is cancelled again.
One common way this could happen is if you have something like a race combinator that returns the value of the first future to complete and cancels the other one.
If the race combinator is itself cancelled while it was cancelling the slower sub-future, the slower sub-future would be cancelled twice.
FIXME: write out and explain a code example of this case.s
Let's look at this in the abstract with state machines.
There are a couple of possibilities for how to handle cancellation of cancellation.
I'll consider three of them, inspired by the zero one infinity rule.
0. Cancelling during cancellation is not allowed🔗
Once we have support for cancellation handlers, it will definitely be possible to write code that leads to trying to cancel a cancellation.
The race example we mentioned earlier is one example.
So in this option, we would declare cancelling a cancellation to be an error.
We have some flexibility on what mechanism we'd use exactly, but I think the best option would be to panic.
I think in practice this option is not feasible.
Cancellation flows from top to bottom (e.g. an executor decides to terminate a task early and so runs the task's cancellation handler), but the higher levels do not know anything about the internal behavior of futures.
An executor that is cancelling a task does not know if one of the task's subfutures is trying to cancel a future already.
1. Cancelling a cancellation is idempotent🔗
In this version, cancelling an already-cancelled future is basically a no-op.
In state machines, it would look something like this:

The key point here is that any of the cancel states have a cancellation edge that comes back to the same state.
In other words, cancelling once your future has already been cancelled means you stay in the same state and continue executing the cancellation handler before.
What does this mean in practice?
It essentially means you can trust that your cleanup code in a cancellation handler will run to completion.
Admittedly, this might take additional rules, like we may want to declare it to be undefined behavior to not poll a cancelled future to completion².
Scoped tasks would likely need this guarantee, but we could consider weaker ones, like that a "well-behaved" executor will poll cancelled futures to completion.
The "well-behaved" guarantee is roughly what we have today for Drop, so it might be similarly useful.
The downside is that this also means we can add cancellation behavior that can take arbitrarily or even infinitely long.³
We might decide instead that cancellation means something like "request graceful shutdown" but then forcibly terminate a future if it takes too long.
For this we need recursive cancellation.
∞. Cancelling a cancellation is recursive🔗
In this version, canceling an already cancelled future would transfer us to a separate cancellation path.
That cancellation path could also be cancelled, and its cancellation could be cancelled, and so on.
In pictures, recursive cancellation looks like this:

While an infinite regress of cancellations might seem ridiculous, there are some cases where it might be useful.
There's also a nice regularity to it.⁴
One class of problems where this might be useful are cases where you have optional cleanup work to do but you can cancel it if needed for a more prompt shutdown.
Of course, I'm not sure this is really all that useful in practice, and if you need it there might be other ways to do it.
More importantly, there are many cases where you absolutely do not want to cancel the cancellation.
For example, maybe you have a transaction future whose cancellation path rolls back the transaction.
You do not want to stop the rollback before it's complete, or else you've completely defeated the purpose of transactions.
That said, recursive cancellation appears to be strictly more powerful than idempotent cancellation because if you have recursive cancellation you should be able to implement idempotent cancellation where needed (basically, you just ignore the subsequent cancellation signals and stay in the same state you were in).
Seen this way, recursive cancellation gives us a lot of flexibility.
It means individual futures can implement either behavior, according to what best fits their needs.
The main thing the Rust language would need to do is design reasonable defaults and set expectations so people authoring futures can encapsulate their specialized behavior.
Conclusion🔗
We've long talked about async functions as state machines, so in this post we looked at how you might draw a state transition diagram for async functions.
This gave us a way to play with cancellation and look at what various cancellation semantics might imply in terms of the shape of the state transition diagram.
I've found it really helpful to think about async cancellation this way, so I hope others find it useful as well!
This post was originally part of a larger post about implementing a prototype of async cancellation handlers.
The larger post was taking a long time and I felt like the content in this post was useful on its own so I wanted to go ahead and publish it.
While I no longer like to promise that a followup post is coming soon⁵, I do have most of the longer post drafted so chances are good I will get it out soon.
Plus, I did commit to discussing it at the WG Async Reading Club next week, so there is a little pressure on.
Anyway, please reach out if you have any thoughts or questions!

¹
I'm using "cancellation handlers" refer broadly to mechanisms to allow running async code on the cancellation path. This would likely be async Drop, I want to use a more general term to emphasize there are multiple possibilities here.↩

²
This will require us to mark something unsafe somewhere.↩

³
This is true of drop already today. I can write fn drop(&mut self) { loop {} } and my program will hang when the destructor tries to run.↩

⁴
One of the things that bugs me about the idempotent version of cancellation is that you can call any future from either a normal execution or cancellation path, but in the cancellation path they effectively become uncancellable. It's not actually a problem, since not cancelling a future is always a choice your allowed to make, but the asymmetry still bothers me.↩

⁵
I'm sure you'll find plenty of examples on my blog of posts I said were coming soon that did not, in fact, come soon, if they ever came at all.↩




Ideas on How to Elect Rust Project Directors
2023-07-11T00:00:00+00:00
One of the first tasks for the Rust Leadership Council is to elect new Project Directors.
But before we can do that, we need to create a process for doing so.
To do this, and in the spirit of delegation from the Leadership Council, we've formed a smaller group to focus on designing this process.
This group so far consists of myself, Jane Losare-Lusby, and Ryan Levick.
We have the beginnings of a proposal, but I wanted to write it up in my own blog to help make sure I understand it.
Note that this is a draft proposal at best at this point and nothing is set in stone.
I also want to recognize Jane's work in coming up with this process.
This is largely based on her initial suggestion and I want to make sure I'm not taking credit for something I didn't come up with.
But, any failings in this post should be viewed as my own and not hers.

What are Project Directors?🔗
Rust is split into two major organizations: the Rust Foundation and the Rust Project.
The Foundation does a few things.
It provides a legal structure to hold Rust's intellectual property.
It provides an entity for organizations to contribute financially to support Rust.
It supports the long term health of the Rust project.
One way it does this is through the Community Grants Program.¹
The Foundation is governed by a Board of Directors, and five of the seats on the Board of Directors are reserved for members of the Rust Project.
These seats are known as Rust Project Directors.
Project Directors are meant to serve for a term of two years with the hope that we can stagger terms and rotate out a subset of the directors this year.
Unfortunately, due to the lack of Rust Project governance over the better part of the last two years, we have not appointed new directors in place of those whose terms are completed.
Instead, the Foundation Board has voted several times to extend the terms.
Currently the terms are set to expire on September 21, 2023 so we'd like to be able to appoint new ones without having to ask for another extension.
How should we select Project Directors?🔗
There are a number of desired features and constraints on this process.
First of all, the Foundation bylaws state that Directors must be elected by those they represent.²
In the case of Project Directors, this means they must be elected by the Rust Project, and the Project governance is set up so that the electors will be the Rust Leadership Council.
There is some flexibility in what counts as an election though.
For example, we could follow some other selection process and the Council could then vote to ratify the results of that process.
So one possible process is to have the Council pick and vote on a set of directors without any input from rest of the project.
This would be a bad process.
We want something that gives the Rust Project a chance to provide input to the process.
And we want some transparency in how the Project Directors were elected.
Doing a more traditional election would also be at odds with Rust's culture, which tends to prefer consent based decision making rather than rule by majority or plurality.
A possible process🔗
With this background in mind, we can now discuss a possible process for selecting Project Directors.
I've based this off of the notes here, and the process described there is heavily influenced by Sociocracy for All's Selection Process.
I think of the process as a bottom-up process, so I'm going to describe it in those terms.
We start by soliciting nominations from each top level Rust team.
These nominations go to the Council, which will do the final selection.
Let's look at these in more detail.
Gathering Nominations🔗
When we kick off the process, we will start by telling all the Rust teams that they should begin nominating candidates for Rust Project Directors, with a deadline for when nominations will be closed.³
Teams should look at the role description and think of people they think would be a qualified candidate.
These candidates will likely come from the team itself, but there's no requirement.
Teams can nominate anyone who they believe meets the qualifications that will be set forth in the role description.
We aren't planning to impose a requirement to nominate a certain number of candidates.
It doesn't really make sense to nominate more than the number of vacancies, but there's no reason a team couldn't do that.
Similarly, a team may choose not to nominate anyone, or they may do this by default if the deadline expires.
We plan to leave the process for nominating candidates up to the teams, with a strong suggestion to follow a miniature version of the process the Council will follow.
For the purposes of accountability though, I think it makes sense to have the team's Council Representative drive the process, although "driving the process" might mean delegating to someone else who wants to run the process.
The main reason for this default is to make sure someone is responsible for making progress.
Once the team has selected a set of candidates, they should report these to the Council.
The team's council representative will be responsible for communicating these to the Council as a whole.
One of the goals of this project is to gather feedback to help members of the project grow.
Thus, I think it would make sense for the team to provide their nominees as a document (we might even provide a template) that lists the nominees and why they were chosen.
It might also make sense to include a list of people who were considered but not nominated, and why they weren't nominated.
I would hope this is a positive experience, so we don't say "we didn't nominate person X because they're terrible," but more as a way of highlighting rising stars.
For example, we could say "Person Y was considered and shows promise, but we would like to see more growth in these areas first. Please consider them in the future."
Selecting Candidates🔗
The next step is for the Council to select the Project directors from the pool of nominees.
There may not be much to decide here, since it's quite likely that we have exactly the number of nominees as there are openings.
But even in this case, we want to have a defined process that we follow.
The draft proposal says the Council should select a facilitator to lead the process.
The Council would then go through a round process, where each council member proposes a candidate from the nominees and explains why they think the candidate is a good choice.
After the first round, they Council goes around again in a change round, which gives everyone the chance to change their nomination based on the discussion so far.
Once this is done, the facilitator takes all of the suggestions and proposes one candidate.
The Council then consents to this choice, or if there are objections then the facilitator proposes a new candidate.
The process as I've described it so far is an iterative process, meaning we'd run the process to select one candidate, then do it again to select the second, and so on until filled all the open seats (two or three, in this case).
The subsequent rounds should be much faster than the first, because we can reuse most of the information gathered in the first phase.
An alternate way to do this would be as a batch process, where we pick the whole set of candidates at once.
I think this would be my preference for a couple of reasons.
First off, it's likely to be more time efficient, since we only have to do one process.⁴
Secondly, and more important to me, is that picking all the candidates at once allows us to more directly look at characteristics of the candidates as a group.
We'll already need to account for employment constraints, but choosing the set of directors as a group also lets us more directly make sure we have broad representation within the project.
In my mind, the thing we want to select for is a successful group more than any individual characteristics of the members.
Anyway, the batch process would be essentially the same, only instead of going through rounds proposing individuals, we'd propose a set of individuals to fill all the seats at once.
Once the Council has consented to a set of candidates, we'll have a vote to ratify the selection.
Since we've already heard all objections and consented to the selection, this would be expected to be a unanimous vote.
The main purpose here is to make sure we are meeting the requirement in the Foundation bylaws to elect the Project Directors.
After the vote passes, the process is complete.
The Council would then announce the results and the new Project Directors would take office once the outgoing Directors' terms end.
What's Next🔗
I've described my interpretation of the process we have in mind so far, along with some of my own opinions and additions to it.
My main goal here is to explain what I'm thinking so we can make sure the project director election group has a shared understanding of the proposal, and also to fold any relevant thoughts back into the proposal.
But I'd also like this to serve as a chance to raise awareness and solicit feedback.
If this is something you're interested in, please come see us at #council/project-director-election-proposal on Zulip.
To give us enough time to follow this process, we are going to try to reach consensus on the proposal within the not too distant future.
Look for official communication from the Rust Leadership Council once that happens.
Thanks to Jane Losare-Lusby for reviewing this post.

¹
This isn't an exhaustive list, and having a more complete understanding of everything the Foundation does is something I'm definitely working to build.↩

²
We've been interpreting this to mean we need to have a vote, but there's some ambiguity here. For example, maybe "selection" and "election" are just synonyms for each other. To be play it safe though, we expect to have a ratification vote following the selection process we're currently designing.↩

³
The deadline is important here because we need to choose Project Directors by the time the current term ends on September 21.↩

⁴
Of course, it may turn out that picking three people at once is actually much much harder than picking one person three times in a row.↩




An Exercise on Culture
2023-06-23T00:00:00+00:00
A few days ago at work we did an exercise on company culture that got me thinking so I thought I'd share some of those thoughts here.
I've been interested in organizational culture for a few years now.
Some organizations seem to have a great culture while other organizations have a not so great culture.
Sometimes culture is improving, and other times it is declining.
The declining case can be particularly frustrating because in my experience everyone can see it's getting worse, everyone wants to change it, but nobody knows how.
One of the reasons I chose to come work for Microsoft is that they seem to be one of the few examples of a large, established organization that intentionally and dramatically changed their culture.
I've been curious how they did that, and what other organizations can learn from that.
It seems like an important part of it is to have regular conversations about culture, such as by doing exercises like the one I'm going to discuss in this post.
So with this background in mind, let's talk about the exercise.

At the start of the exercise, we were reminded of the company mission statement.
Then we were asked to spend a couple of minutes coming up with a sentence explaining how we personally contribute to it.
According to Microsoft's About page, Microsoft's "mission is to empower every person and every organization on the planet to achieve more."
This mission actually speaks to me a lot.
I feel like computers should be tools that help people do the things they want to do.
Too often today, computers seem to actively work against their user instead.
I have a kind of lengthy rant on this subject that I should probably write down someday, but that will be for another time.
So how did I respond to the exercise?
I wrote:

In my work, I empower people to achieve more by creating powerful and accessible languages and APIs, and by helping to build a team that effectively does the same.

I felt rather proud of this answer, which is why I wanted to write about it in more depth.
I want to do this by going into more detail about the main phrases I used.
The first is powerful and accessible.
In my mind, Rust fits the bill really well here.
Rust is an incredibly powerful language.
Features like the borrow checker can ensure safe memory access for some complex patterns without any runtime overhead.
The trait system provides incredible opportunities for abstraction.
Rust provides good support for low level programming, while still including potent functional programming features like lambdas and algebraic data types.
But what really impresses me about Rust is that it manages to make all this power usable to many programmers.
Rust isn't the first language to do all these things, but some of the other languages essentially require a Ph.D. to use them effectively.¹
On the other hand, things like Rust's extreme attention to detail in its error messages greatly increases that chance the programmers will be successful with such a powerful language.
The second is languages and APIs.
I thought about writing "languages and libraries" because I like the alliteration but libraries seem too broad.
I don't often write an entire library, while I might add a single function to an existing library.
I felt like "API" better expressed that scope, and I've even heard a library called an API at times so this can generalize if needed.
The reason I mentioned these two together is that I believe it's best to consider a programming language and its standard library as a unit.
As a language nerd, I can easily get caught up in the excitement around designing new language syntax and semantics.
These language features need to be supported by a solid, well-rounded library.
To make an analogy to spoken languages, I think of the programming language as the grammar and the standard library as the vocabulary,
It's hard to say much of anything with only one.
Finally, I mentioned the importantance of helping to build a team.
This is the aspect I have the least experience in, but it's important and I'm trying to learn more about how to do it.
A well-functioning team can accomplish far more than an individual!
These teams don't form (or at least, aren't maintained) by accident.
So it's important to do things like mentor new team members and to create a welcoming place so new people can feel comfortable joining the team.
It's important to find ways to help grow members into new roles so that the team can outlive any one member.
It's important to foster a sense of shared purpose and values so we can effectively work together.
It's important to make space for disagreement, since often I find the best outcomes are a result of hashing out our differing perspectives.
Doing an exercise like this can feel pretty cheesy but this one resonated with me a lot.
It was a chance to put into words why I do the things I do, both in my day job and how I bring those things into the Rust community.
And now, having put this into words, I can use it as a guide when deciding how to approach my work in the future.
While this post was specific to my role within Microsoft, I think doing a similar exercise with Rust's mission would be illuminating.
Maybe I'll do that in another post.
What about you?
What values do you hold and how do you put those into action?

¹
I feel like Rust does not need a Ph.D. to use well, but given that I have one, I may not be the most qualified person to make this claim.↩

Eric Holk

Two Ways Not to Move

The Original Move🔗

So why don't we have this version of Move?🔗

The New Move🔗

Some Rough Thoughts on Rust Project Organization

Create structures for support roles🔗

Create a Council Staff Team🔗

Provide Project-wide support🔗

Create a Design Team🔗

What about T-impl?🔗

Separate Project Membership from Team Membership🔗

Conclusion🔗

A Rose By Any Other Name

A Concrete Performance Difference🔗

Reducing the async fn next overhead🔗

Aside: This Doesn't Actually Work🔗

Iterator Setup🔗

What's in a name?🔗

Conclusion🔗

Async Cancellation and Panic

It's not as easy as I thought🔗

An approach that actually works🔗

Deriving the implementation🔗

Should we do this?🔗

How to Shrink Rust

What happened to classes?🔗

What does this have to do with Rust today?🔗

Rethinking Rust's Function Declaration Syntax

A Mechanism for Async Cancellation

Introducing poll_cancel🔗

A Cancellable Future🔗

Cancellation with async and await🔗

The Generator Adapter🔗

Scenarios🔗

A Cancellation-aware Executor🔗

Cancellation-aware Combinators🔗

Cancel during Cancellation🔗

Cancel during Unwind🔗

Evaluation🔗

Strengths🔗

Weaknesses🔗

Related Work🔗

Conclusion🔗

Cancellation and Async State Machines

Cancellation🔗

Cancellation Cancellation🔗

0. Cancelling during cancellation is not allowed🔗

1. Cancelling a cancellation is idempotent🔗

∞. Cancelling a cancellation is recursive🔗

Conclusion🔗

Ideas on How to Elect Rust Project Directors

What are Project Directors?🔗

How should we select Project Directors?🔗

A possible process🔗

Gathering Nominations🔗

Selecting Candidates🔗

What's Next🔗

An Exercise on Culture

The Original `Move`🔗

So why don't we have this version of `Move`?🔗

The New `Move`🔗

Reducing the `async fn next` overhead🔗

Introducing `poll_cancel`🔗

A Cancellable `Future`🔗

Cancellation with `async` and `await`🔗