One page of async Rust
2026-02-17 19:42https://dotat.at/@/2026-02-16-async.html
I'm writing a simulation, or rather, I'm procrastinating, and this blog post is the result of me going off on a side-track from the main quest.
The simulation involves a bunch of tasks that go through a series of
steps with delays in between, and each step can affect some shared
state. I want it to run in fake virtual time so that the delays are
just administrative updates to variables without any real
sleep()ing, and I want to ensure that the mutations happen in the
right order.
I thought about doing this by representing each task as an enum
State with a big match state to handle each step. But then I
thought, isn't async supposed to be able to write the enum State and
match state for me? And then I wondered how much the simulation
would be overwhelmed by boilerplate if I wrote it using async.
Rather than digging around for a crate that solves my problem, I thought I would use this as an opportunity to learn a little about lower-level async Rust.
Turns out, if I strip away as much as possible, the boilerplate can fit on one side of a sheet of paper if it is printed at a normal font size. Not too bad!
But I have questions...
no subject
Date: 2026-02-17 22:38 (UTC)I will hopefully come back and read a little bit more about the specifics.
I *think* you know all this better than me, FWIW and/or to check my understanding, my impression of async rust has been:
* The thing that most people want async for is to write code with lots of little tasks that all work through some code interspersed with system calls. Rust does some super clever but obscure and a bit cheaty magic in how it defines Futures and poll() to make this viable.
* In practice, people can define async functions and don't need to think at all about the ways that the compiler uses futures in order to implement them.
* The thing that guides to rust async unforgiveably refuse to tell you is that although the rust language provides async functions, implemented in terms of futures, to do anything with them, you need an async executor library, which is not built into the language, and almost everyone uses tokio.
* The thing *I* (and Simon) most commonly want to do is to write functions as coroutines for clarity, but with no need for anything but running them in whatever order they want to run in on a one thread. As a spin-off of the async code, rust eventually (mostly) implemented a "yield" statement that turns a function into an iterator. Futures returned by async functions are implemented using the same mechanism, but with all the poll machinery that my code doesn't need.
Similarly to my experience, does your code actually benefit from the async framework? It feels like it should. But the effort with poll is to make it possible for the executor to listen to an OS callback to know when a task (might) be waking up from a syscall. But your code probably knows exactly which task to run, the task with the least "virtual time", something like that. And mediating your virtual time through the OS might add extra unnecessary weight. So you might be able to follow a "how to write an executor" guide and leave most of the poll functions pretty trivial. But maybe just a set of coroutine iterators sorted by time is all you need would be simpler?
A final rant, not specifically related to rust. I feel like the ".await" syntax is unnecessarily clunky -- could we not have ".await" as the default and a special syntax for "Return an unexecuted future, which I can put in an executor in a minute"...
no subject
Date: 2026-02-18 11:42 (UTC)I think you overlooked the link to the unabbreviated blog post which addresses many of your observations :-)
In particular "one page" is the size of the super-simple async executor that I wrote, to explore what is the least machinery necessary to get it working. I guess generators will eventually be a nicer solution to this problem, but async is all I have until generators are stabilized.
no subject
Date: 2026-02-19 11:02 (UTC)My expectation was that if you don't need to interact with the operating system, using gen+yield even though unstable would be a better fit for the problem than working with the async syntax, but now you do have a grasp on the under-the-hood of async, I don't know how you feel about it.
ETA: Or in other words, I think you were more leaning to "see what I can do with async" and not necessarily "what's the best way of implementing my virtual-time code" and if so I didn't realise that, sorry :) I am reading through your expt more slowly and it is v v informative, although I don't know if I will finish
no subject
Date: 2026-02-20 12:31 (UTC)I basically treat unstable Rust as nonexistent and unlikely to be usable any time soon. Generators have been unstable for years so they are likely to remain unstable for years. So if I want coroutines I have to hack them up using async/await, and the point of my post was to explore how simple an async executor can be. Turns out it can be simple, at the cost of some unsafe code that subverts its assumptions about what an executor looks like.
no subject
Date: 2026-02-22 19:04 (UTC)Reading through *has* been really helpful for grokking more about what rust futures end up doing.
no subject
Date: 2026-02-18 14:45 (UTC)no subject
Date: 2026-02-18 15:09 (UTC)PuTTY is full of C preprocessor coroutines, but they're used for a purpose that looks much more like modern languages' async than pure computation. If I were to rewrite PuTTY from the ground up in Rust – which I wouldn't, because it would be way too much work and the general wisdom seems to be that it's not even a good idea, but I keep coming back to it as an interesting thought experiment – then surely the sensible thing would be to build it on something like tokio, and the SSH protocol layers and proxy-server interaction coroutines would all become standard async Rust functions.
Whereas spigot, also full of C preprocessor coroutines, is exactly what you say: the coroutines are purely computational, and just a way of expressing a very complicated calculation in the way that's most natural for that calculation to be written down. If I rewrote that in Rust, it would probably look async-ish, but with some kind of radically simplified executor. Perhaps even something very like
(And, unlike PuTTY, I am seriously considering rewriting spigot in Rust. It wants a major restructuring already to remove infinite loops, and if you're rewriting a ton of the code anyway then changing language isn't such a huge cost.)
no subject
Date: 2026-02-19 10:54 (UTC)I had been thinking of a conversation where you talked about spigot-like things :)
At some point, I tried to write up a venn diagram of the key concepts and what different names people call them (including the same name for different concepts or with different emphases, often...). Something like:
* "A function which can be passed around like an object" (Big partial overlap with "function defined within another function, automatically capturing any outer-scope variables used as inputs")
* "A function which can be passed around like an object, but return half way through and maintain its state and continue where it left off"
* "A function like that, used to implement an iterator"
* (Maybe) "A function like that used to 'pipe' between a producer and consumer"
* "A function like that, designed for an executor object to multiplex running many of them in the same thread"
* "Functions like that, where the executor has a way to know when one would block and run the others instead."
But I doubt I could make that helpful to anyone else.
no subject
Date: 2026-02-19 12:00 (UTC)There's some discussion of it in my coroutines philosophy article from a year or two ago, giving a general flavour of the kind of thing PuTTY uses coroutines for: implementing layers of the SSH network protocol, in which the sequence of messages exchanged by the two ends has a "control flow nature" (sometimes you diverge into different packet sequences depending on how you negotiated a thing, sometimes you loop, sometimes you finish the loop and go on to a new phase of the protocol). That definitely seems to me like the kind of setup a Rust programmer would see as an obvious candidate for an async tokio function.
On the other hand, to immediately contradict myself :-), I also wonder if that kind of protocol layer would be better not to tie too closely into the I/O system, because the more self-contained it is, the more it can be tested in isolation. It would be pretty nice if each of PuTTY's protocol-layer coroutines could be run 100% standalone, with two
PacketQueueobjects for their input and output, and acoroutine.resume()method. Then you could have unit tests which manually stuck some packets on the input queue, resumed the coroutine, and checked that the expected packets had appeared on the output queue.(In Rust, you might also improve the unit testing story by replacing all the cryptography with a stub version that doesn't do any hard CPU work – say, you have a trait implemented by both your real crypto and your stub mock crypto, each SSH protocol layer type is parametrised by a type implementing that trait, and for safety's sake, the stub mock crypto type only exists at all under
#[cfg(test)]to make sure it can't accidentally be instantiated in live code. Of course you'd have to do real crypto while unit-testing the crypto functions, but there's no reason you should have to do it while unit-testing everything else too.)I don't know if tokio provides any sensible way to decouple one async function from the rest of the system for that kind of purpose.
I tried to write up a venn diagram of the key concepts and what different names people call them
In case you hadn't seen it, my coroutines philosophy piece also contains a list of different ways of thinking about what a coroutine even is. Interesting to compare your list with mine :-)
no subject
Date: 2026-02-19 14:42 (UTC)I'm not sure if I have this right, I'm trying to mentally compare your work to what I know of rust. I'm imagining your conception of a coroutine as a functiony thing that has one (or another number of) input streams and one (or another number of) output streams, that could be connected up in different ways. Like one layer of the ssh protocol might have input and output "network packets", in streams arranged by a higher layer.
But an async function might be more like, having dozens of input/output streams, to every different syscall it might call. But with the expectation that it will call "async read" and that will just happen, the executor won't decide what the input is, it will only run some parallel tasks until the read completes and it continues the first task.
Does that sound right...?
no subject
Date: 2026-02-19 15:13 (UTC)I think part of the terminological problem is that "coroutine" is one of those words with a broad sense and a narrow sense. In the broad sense, it means anything that looks like an ordinary function in the source code (containing ifs and whiles and breaks and continues etc) but some things in the middle of the function cause the whole thing to be suspended for later resumption. That's exactly the thing that my C preprocessor machinery implements: it's not concerned with input and output data streams at all, only with suspending and resuming. Data flow is built on top of that, at the programmer's whim.
In the narrower sense, coroutines come with built-in data flow of some kind – the
yieldstatement in Python, orco_yieldin C++20, or similar – and some kind of a notion of specifying which other coroutine they need to run next. For example, if you insert a third "adapter" coroutine in between the producer and consumer in the classic example, then the adapter has two semantically distinct ways to suspend itself: it can yield to the producer when it wants a new item, or yield to the consumer and give it an item. I guess this is where your description of having a fixed number of data streams fits in.An async function is a coroutine in the broad sense, but not the narrow sense. In place of a small set of fixed data streams, its suspend operation involves specifying under what circumstances it next wants to be resumed, in the form of whatever expression you wrote
.awaitafter. And that might be different every time. For example, I have a small HTTP proxy server I wrote as a combination of Baby's First Async Rust Program and a testing harness for PuTTY's proxy support, and within a single async task it will.awaitall sorts of things: await the socket from its client being readable (to find out what it's been asked to connect to), await a socket connect operation (once it knows, and tries to actually connect to there), await the client socket being writable (when it sends responses back).That's where the executor comes in: all the async things currently in progress have to be managed by a central component which knows what set of them exist, and what each one is currently waiting for. And quite often most of them will be waiting for external things like network activity or user input, so the async executor probably also wants to manage the top-level event loop that's listening to all of those sources to see which one does something next – and when there is network or GUI or terminal input, probably the executor's response will simply be to wake up one particular suspended async function which was waiting for that particular stimulus.
If two async tasks want to pass data items between themselves in the style of sensu stricto coroutines, then they'd probably do it by creating some kind of queue object: one task places items into the queue as it produces them, and the other takes them out. When the receiver finds the queue is empty, it uses
.awaiton the queue object itself (or rather, on its read end), asking the executor to wake it up the next time the queue is non-empty; conversely, when the consumer has an item to put in the queue but finds it's full, it.awaits the write end of the queue, asking the executor to wake it up the next time the queue has space to accept another item. So you can implement this kind of data transfer within the general async framework, but it's just one among many things you can await.no subject
Date: 2026-02-22 19:58 (UTC)More I'm working out a picture as I go along!...
Hm. Does this get closer to a taxonomy?
1. Name a "Lowest common denominator coroutine" a 'function' which can contain points at which its execution can be paused and later resumed. This is only *useful* with one of the features 2 or 3 below, hence people tending to think of "coroutine" as 1+2, or 1+3 or 1+4 depending..
This is also the concept that rust implemented in the compiler as a way of implementing async, but didn't stablise any way of using it other than async. IIUC for a long time rust called this "generator", but more recently renamed it to "coroutine", freeing up "generator" as a name for a putative stable syntax for exposing a subset of that machinary.
2. rust (and other implementations) have some fancy compilation to work out what local memory a coroutine needs, so you don't need an arbitrary-sized stack for each one. This makes coroutines a lot more practical if there might be a lot of them (as with async code with lots of parallel tasks, but could apply to #3 situations too).
3. One significant way that coroutines can pause, is to read an extra value from an input or write an extra value from an output. One of the simplest cases and fairly common is a generator (coroutine yielding an output stream one item at a time), but there are lots of generalisations of this. I think of this as control-flow untangling, the sort of thing you talk about in your article with a consumer and a producer both written as functions with yield. The sort of thing I remember talking about before, where you have cpu code and a single thread and determinism and no blocking, but what's interesting is the clearest way of running it in the necessary order.
4. Another significant way that coroutines can pause is what async code does, of pausing when it needs input from the outside world, and wanting to be resumed at some unknown point later.
async does #4, but may not have a way of doing #3. generators do #3, but may not have a way of doing #4.
Your articles are quite old, they may not represent what you think now! But when I read them, most of the examples seemed like examples of #3 (and which was what drew my interest). When I said "some number of input or output streams", I didn't mean a particular number, I meant, if they have *no* input or output streams, then they're not doing #3.
And it seems like by default async can yield "blocked on external IO, I know where to go next but maybe run some unrelated code until I'm ready?" and generators can yield "yo, intermediate value N ready, here, I know where to go next, run me again when you want the next one". But you can't do both unless your coroutine can intermediate-return both kinds of value. (Internet says python didn't originally have yield from async functions but introduced it later.)
It feels like doing #4 ought to do #3 too, but I'm not sure it naturally does, because the poll return doesn't inherently have a way of returning a value, but in fact the executor is ready to re-run the task immediately without any need for a waker to be deferred. Unless the implementation *also* has yield, or you can hack in a return value where the implementation doesn't expect one, or you set up a pipe or otherwise mediates the control flow through the OS or a separate queue structure.
Although I also had in mind, that you could view the places where async code accesses the network or file system or keyboard etc as all being separate input or output streams, just ones that are chosen by the async function itself. And if the calling function wants to inject those outside world things itself, it needs some other way of providing those inputs.
However, I may have got the wrong idea what the coroutine code actually used in PuTTy does, I haven't read it yet, only the examples in the articles. The article talked about the calling code providing packets of input to a coroutine implementing one layer of an ssh protocol, which made me think of the coroutine not doing networking itself, but that may not be true in the real code, or may not be relevant.
no subject
Date: 2026-02-20 12:55 (UTC)Yeah, the reason for the “command” notion in my article is roughly along the lines of the sans-io pattern (which seems most popular in the Python world), or algebraic effects in functional programming, both of which allow you to decouple protocol logic from the specifics of IO. The IO is dealt with by some outer layer, unlike dependency injection or mocking where the protocol still calls down to an inner IO layer, though it can be swapped for another IO implementation.