The Iron Code

triangles

Async/await - parallel execution with no threads attached

written on 2022-10-01 15:37

Ye ye we get it, memory safety is great

So we know that rust is great for parallel execution of code for several reasons. The most obvious is that by enforcing strict memory safety the usual suspects of deadlocks and race conditions are drastically reduced, some even completely impossible.

Why though?

But with async/await which entered the Rust ecosystem in version 1.39 (in November '19) we now have even more tricks up our sleeves. It's a system which implements cooperative scheduling for parallel execution using coroutines. The reasoning for this tooling mainly stems from difficulties managing of lifetimes of threads. Threads are the basic parallelization units in classic parallel programming. Libraries like Intel's TBB for C++ or rayon for rust are usually taking care about the problem of the global total number of threads. It should never exceed a certain number or your computation may become a lot slower. They are introducing global threadpools and hand out threads to work-packages. After processing, the thread returns to the pool and can be reused. Imagine an algorithm running on 8 threads, each of which query for a value which needs to be recalculated and whose calculation is again multithreaded. This would create 64 threads. Which increasing complexity this problem may get out of hand - but it's solvable using traditional multithreading techniques.

It becomes more troublesome when trying to manage several long-living worker threads. The flow of control and lifetimes becomes increasingly hard to understand and keeping everything synchronized without wasting resources can become a nightmare. You usually set aside the main thread to coordinate the workers, advanced solutions try to incorporate the main thread in calculations and processing ui events in some fixed intervals. The wasting of a complete thread is not ideal, making the mainthread partake in computations is difficult in complex projects, as you have to guarantee enough callbacks for the UI in all cases.

Finally, a more intricate problem occurs in situations when you can't know when an event happens to which you want to respond in it's own thread. The classical example also used by John Gjengset is input from a terminal vs input on network. We would have to run a loop and query whether something has happened and respond accordingly. The problem with the code below is, that it blocks the current CPU with unlimited polling. This wastes resources and makes handling the created threads complicated.

fn responder(event_source) {
  loop {
    if let Some(event) = event_source.poll() {
      match event {
        Event::Network(e) => spawn_network(e),
        Event::Terminal(e) => spawn_terminal(e),
      }
    }
  }
}

Can we improve?

So we concluded two main problems: 1.) The flow of control and the lifetimes with long living threads usually become hard to manage 2.) The main thread is blocked doing nothing useful (and so can't participate in meaningful calculations)

For the second, the solution is cooperative scheduling. Let the main thread contribute, as long as there are enough events to handle the GUI and the other tasks. The devil is in the details with this one, it's very easy to starve some tasks of events, get too many switches back to the GUI or too few.

Rusts solution was to introduce async/await. It is more descriptive (or declarative as one might state):

Try to formulate "what should be done in parallel" instead of specifying "what should be run on which thread"

async means, that the function does not immediately return a value, but a future, an object which at some point in time will be the result of the function. Without awaiting a future, the function is not even executed. The await guarantees, that after this line, the future is resolved to it's embedded value. Sounds like blocking execution just in another thread? Well, no: When await is called, we yield up the callstack to the next top-level future of the program. Our thread can maybe do some meaningful work there. As soon as the result from our awaited future is ready, the scheduler (or runtime as it is called in the rust context) will be notified via a waker object. We will then get control back as soon as possible and can continue execution. The waker objects are doing the magic of finding out whether this future can make progress. As an example I like to use the loading of a large file. If we just want all contents to be available in memory, we can ask the operating system for it and tell it that it should notify us, when the loading is finished. The waker is lurking in the shadows waiting for the ready-signal from the operating system - however long it may take.

async fn load_file() -> String {
  let file = tokio::io::File::open("/my/large_file.txt").await.unwrap();
 
  // This may take long, but fear not!
  // await yields out of this and gives our thread to another future that can do stuff in the meantime
  // as soon, as the file is ready, we will be scheduled again and can continue! :)
  let string = read_to_string.await;
  string
}

Pretty neat stuff!

Downsides

Obviously there's no such thing as a free lunch! So whats the price of these cool tools. There's a few:

  • More dependencies means longer build time and larger binary (tokio and it's features and deps need to be built)
  • Some say: more complex flow of control in general
  • Difficult debugging, stepping over await is always a pleasure
  • Difficult to get some performance metrics (tracing provides some useful tooling though)

To be completely honest, neither I nor most other devs or architects ever care about build times or binary sizes in times of 4k splash-screen pngs and incremental builds. For the second point, I really can't follow their argumentation. Sure, the console outputs may come in different order than they are used to and are a bit harder to read, but that's because we actually save time. The third point is a real thing, but let me begin by saying that the debugging experience in rust is not optimal anyways, so I am not terribly frightened by adding more breakpoints and using continue instead of stepping over. I can totally see, that it's a nuisance though.

For the next part, we will use async/await together with egui/eframe an immediate mode GUI toolkit to create a simple client software for this blog. Since immediate mode guis require near-non-blocking execution in their update function, tokio+imgui/egui sounds like a match made in heaven!

So long!

Tagged:
General
Programming