Limiting the number of open sockets in a tokio-based TCP listener

I learned quite a bit today about how to think about concurrency in Rust. I was trying to use a Semaphore to limit how many open sockets my TCP listener allowed, and I had real trouble making it work. It either didn’t actually work, allowing any number of clients to connect, or the compiler told me I couldn’t do what I wanted to do, because the lifetime of my Semaphore was not 'static. Here’s the journey I took towards working code that I think is correct (feedback welcome).

Motivation

In the tokio tutorial there is a short section entitled “Backpressure and bounded channels” (partway down the Channels page). It contains this statement:

…take care to ensure total amount of concurrency is bounded. For example, when writing a TCP accept loop, ensure that the total number of open sockets is bounded.

Obviously, when I started work on a TCP accept loop, I wanted to follow this advice.

Like many things in my journey with Rust, it was harder than I expected, and eventually enlightening.

The code

Here is a short Rust program that listens on a TCP port and accepts incoming connections.

Cargo.toml:

[package]
name = "tcp-listener-example"
version = "1.0.0"
edition = "2018"
include = ["src/"]

[dependencies]
tokio = { version = ">=1.0.1", features = ["full"] }

src/main.rs:

use tokio::io::AsyncReadExt;
use tokio::net::TcpListener;

#[tokio::main]
async fn main() {
    let listener = TcpListener::bind("0.0.0.0:8080").await.unwrap();

    loop {
        let (mut tcp_stream, _) = listener.accept().await.unwrap();
        tokio::spawn(async move {
            let mut buf: [u8; 1024] = [0; 1024];
            loop {
                let n = tcp_stream.read(&mut buf).await.unwrap();
                if n == 0 {
                    return;
                }
                print!("{}", String::from_utf8_lossy(&buf[0..n]));
            }
        });
    }
}

This program listens on port 8080, and every time a client connects, it spawns an asynchronous task to deal with it.

If I run it with:

cargo run

It starts, and I can connect to it from multiple other processes like this:

telnet 0.0.0.0 8080

Anything I type into the telnet terminal window gets printed out in the terminal where I ran cargo run. The program works: it listens on TCP port 8080 and prints out all the messages it receives.

So what’s the problem?

The problem is that this program can be overwhelmed: if lots of processes connect to it, it will accept all the connections, and eventually run out of sockets. This might prevent other things working right on the computer, or it might crash our program, or something else. We need some kind of sensible limit, as the tokio tutorial mentions.

So how do we limit the number of people allowed to connect at the same time?

Just use a semaphore, dummy

A semaphore does exactly what we need here – it keeps a count of how many people are doing something, and prevents that number getting too big. So all we need to do is restrict the number of clients that we allow to connect using a semaphore.

Here was my first attempt:

use tokio::io::AsyncReadExt;
use tokio::net::TcpListener;
use tokio::sync::Semaphore;

#[tokio::main]
async fn main() {
    let listener = TcpListener::bind("0.0.0.0:8080").await.unwrap();
    let sem = Semaphore::new(2);

    loop {
        let (mut tcp_stream, _) = listener.accept().await.unwrap();
        // Don't copy this code: it doesn't work
        let aq = sem.try_acquire();
        if let Ok(_guard) = aq {
            tokio::spawn(async move {
                let mut buf: [u8; 1024] = [0; 1024];
                loop {
                    let n = tcp_stream.read(&mut buf).await.unwrap();
                    if n == 0 {
                        return;
                    }
                    print!("{}", String::from_utf8_lossy(&buf[0..n]));
                }
            });
        } else {
            println!("Rejecting client: too many open sockets");
        }
    }
}

This compiles fine, but it doesn’t do anything! Even though we called Semaphore::new with an argument of 2, intending to allow only 2 clients to connect, in fact I can still connect more times than that. It looks like our code changes had no effect at all.

What we were hoping to happen was that every time a client connected, we created _guard, which is a SemaphoreGuard, that occupies one of the slots in the semaphore. We were expecting that guard to live until the client disconnects, at which point the slot will be released.

Why doesn’t it work? It’s easy to understand when you think about what tokio::spawn does. It creates a task and asks for it to be executed in the future, but it doesn’t actually run it. So tokio::spawn returns immediately, and _guard is dropped, before the code that handles the request is executed. So, obviously, our change doesn’t actually restrict how many requests are being handled because the semaphore slot is freed up before the request is processed.

Just hold the guard for longer, dummy

So, let’s hold on to the SemaphoreGuard for longer:

use tokio::io::AsyncReadExt;
use tokio::net::TcpListener;
use tokio::sync::Semaphore;

#[tokio::main]
async fn main() {
    let listener = TcpListener::bind("0.0.0.0:8080").await.unwrap();
    let sem = Semaphore::new(2);

    loop {
        let (mut tcp_stream, _) = listener.accept().await.unwrap();
        let aq = sem.try_acquire();
        if let Ok(guard) = aq {
            tokio::spawn(async move {
                let mut buf: [u8; 1024] = [0; 1024];
                loop {
                    let n = tcp_stream.read(&mut buf).await.unwrap();
                    if n == 0 {
                        drop(guard);
                        return;
                    }
                    print!("{}", String::from_utf8_lossy(&buf[0..n]));
                }
            });
        } else {
            println!("Rejecting client: too many open sockets");
        }
    }
}

The idea is to pass the SemaphoreGuard object into the code that actually deals with the client request. The way I’ve attempted that is by referring to guard somewhere within the async move closure. What I’ve actually done is tell it to drop guard when we are finished with the request, but actually any mention of that variable within the closure would have been enough to tell the compiler we want to move it in, and only drop it when we are done.

It all sounds reasonable, but actually this code doesn’t compile. Here’s the error I get:

error[E0597]: `sem` does not live long enough
  --> src/main.rs:12:18
   |
12 |         let aq = sem.try_acquire();
   |                  ^^^--------------
   |                  |
   |                  borrowed value does not live long enough
   |                  argument requires that `sem` is borrowed for `'static`
...
29 | }
   | - `sem` dropped here while still borrowed

What the compiler is saying is that our SemaphoreGuard is referring to sem (the Semaphore object), but that the guard might live longer than the semaphore.

Why? Surely sem is held within a scope that includes the whole of the client-handling code, so it should live long enough?

No. Actually, the async move closure that we are passing to tokio::spawn is being added to a list of tasks to run in the future, so it could live much longer. The fact that we are inside an infinite loop confused me further here, but the principle still remains: whenever we make a closure like this and pass something into it, the closure must own it, or if we are borrowing it, it must live forever (which is what a 'static lifetime means).

The code above passes ownership of guard to the closure, but guard itself is referring to (borrowing) sem. This is why the compiler says that “sem is borrowed for 'static“.

Wrong things I tried

Because I didn’t understand what I was doing, I tried various other things like making sem an Arc, making guard an Arc, creating guard inside the closure, and even trying to make sem actually have 'static storage by making it a constant. (That last one didn’t work because only very simple types like numbers and strings can be constants.)

Solution: Share the Semaphore in an Arc

After what felt like too much thrashing around, I found what I think is the right answer:

use std::sync::Arc;
use tokio::io::AsyncReadExt;
use tokio::net::TcpListener;
use tokio::sync::Semaphore;

#[tokio::main]
async fn main() {
    let listener = TcpListener::bind("0.0.0.0:8080").await.unwrap();
    let sem = Arc::new(Semaphore::new(2));

    loop {
        let (mut tcp_stream, _) = listener.accept().await.unwrap();
        let sem_clone = Arc::clone(&sem);
        tokio::spawn(async move {
            let aq = sem_clone.try_acquire();
            if let Ok(_guard) = aq {
                let mut buf: [u8; 1024] = [0; 1024];
                loop {
                    let n = tcp_stream.read(&mut buf).await.unwrap();
                    if n == 0 {
                        return;
                    }
                    print!("{}", String::from_utf8_lossy(&buf[0..n]));
                }
            } else {
                println!("Rejecting client: too many open sockets");
            }
        });
    }
}

This code:

  • Creates a Semaphore and stores it inside an Arc, which is a reference-counting pointer that can be shared between tasks. This means it will live as long as someone holds a reference to it.
  • Clones the Arc so we have a copy that can be safely moved into the async move closure. We can’t move sem in to the closure because it’s going to get used again the next time around the loop. We can move sem_clone in to the closure because it’s not used anywhere else. sem and sem_clone both refer to the same Semaphore object, so they agree on the count of clients that are connected, but they are different Arc instances, so one can be moved into the closure.
  • Only aquires the SemaphoreGuard once we’re inside the closure. This way we’re not doing something difficult like borrowing a reference to something that lives outside the closure. Instead, we’re borrowing a reference via sem_clone, which is owned by the closure which we are inside, so we know it will live long enough.

It actually works! After two clients are connected, listener.accept actually opens a socket to any new client, but because we return almost immediately from the closure, we only hold it open very briefly before dropping it. This seemed preferable to refusing to open it at all, which I thought would probably leave clients hanging, waiting for a connection that might never come.

Lifetimes are cool, and tricky

Once again, I have learned a lot about what my code is really doing from the Rust compiler. I find this stuff really confusing, but hopefully by writing down my understanding in this post I have helped my current and future selves, and maybe even you, be clearer about how to share a semaphore between multiple asynchronous tasks.

It’s really fun and empowering to write code that I am reasonably confident is correct, and also works. The sense that “the compiler has my back” is strong, and I like it.

Shutdown order consistency: how Rust helps

Some Java code with bugs

Here’s my main method (in Java). Can you guess the bug?

Db db = new Db();
Monitoring monitoring = new Monitoring();
Monitoring mon2 = new Monitoring();
Billing billing = new Billing(db, monitoring);
monitoring.setDb(db);

runMainLoop(billing, mon2);

db.stop();
billing.stop();
monitoring.stop();

If you would like to hunt down the 2 bugs manually, try reading the full code here: ShutdownOrder.java

But maybe you have an idea already? Maybe you’ve seen code like this before? If you have, you probably have an instinct that there’s some kind of bug, even if you can’t say for sure what it is. Code like this almost always has bugs!

This code compiles fine, but it contains two bugs.

First, we forgot to setDb() on mon2. This causes a NullPointerException, because Monitoring expects always to have a working Db.

Second, and in general harder to spot, we shut down our services in the wrong order. It turns out that Monitoring uses its Db during shutdown, so we get an exception. Even worse, if some other code needed to run after monitoring.stop(), it won’t, because the exception prevents us getting any further.

Of course, this is toy code, but this kind of problem is common (and much harder to spot) in real-life code. In fact, my team dealt with a similar bug this week.

It’s fundamentally hard to figure out your shutdown order. It’s complicated further if classes have start() methods too, which I have seen in lots of Java code.

Given that this is just a hard problem, maybe there’s no point looking for tools to make it easier?

Some Rust code without those bugs

Let’s try writing this code in Rust. Here’s the main method:

let db = Db::new();
let monitoring = Monitoring::new(&db);
let mon2 = Monitoring::new(&db);
let billing = Billing::new(&db, &monitoring);

run_main_loop(&billing, &mon2);

// drop() is called automatically on all objects here

Here’s the full code: shutdown_order.rs

This code shuts down all the services automatically at the end, and any mistakes we make in the order are compile errors, not things we find later when our code is running.

The code to shut down each service looks like this:

impl Drop for Monitoring<'_> {
    fn drop(&mut self) {
        // [Disconnect from monitoring API]
        self.db.add_record("MonitorShutDown");
    }
}

This is us implementing the Drop trait for the struct Monitoring (traits are a bit like Java Interfaces). The Drop trait is special: it indicates what to do when an instance of this struct is dropped. In Rust, this is guaranteed to happen when the instance goes out of scope, which is why our comment at the end of the main method sounds so confident.

Furthermore, Rust’s compiler shuts down everything in the reverse order in which it was created, and guarantees that nothing gets used after it has been dropped.

Rust’s lovely world gives us two relevant treats: no unexpected nulls, and lifetimes.

Treat number 1: no unexpected nulls

First, in Rust, like in other modern languages like Kotlin, we have to be explicit about items that could be missing. In our example, we were able to re-arrange the code so that db can never be missing (or null), and the compiler encouraged us to do so. If we really needed it to be missing some of the time, we could have used the Option type, and the compiler would have forced us to handle the case when it was missing, instead of unexpectedly getting a NullPointerException like we did in Java. (In fact, if we’d structured our code to use final in as many places as possible, we could have been encouraged towards basically the same solution in Java too.)

Treat number 2: lifetimes

Second, if you look a bit more closely at the full code of shutdown_order.rs you’ll see lots of confusing-looking annotations like <'a> and &'a:

struct Monitoring<'a> {
    db: &'a Db,
}

The approximate meaning of those annotations is: a Monitoring holds a reference to a Db, and that Db must last longer than the Monitoring.

This “lasts longer than” wording is what Rust Lifetimes are for. Lifetimes are a way of saying how long something lasts.

Lifetimes are really confusing when you start with Rust, and have caused me a lot of pain. Code like this is where they are both most painful and most helpful. As I mentioned earlier, the problem of shutdown order is fundamentally hard. Rust gives you that pain at the beginning, and until you understand what’s going on, the pain is very confusing and acute. But, once your code compiles, it is correct, at least as far as problems like this are concerned.

I love the sense of security it gives me to write Rust code and know the compiler has checked my code for this kind of problem, meaning it can’t crop up at 3am on Christmas Day…

Final note/caveat

This Rust code is probably over-simplified, because all the references are immutable (you can’t change the objects they point to). In practice, we may well have mutable references, and if we do we’re going have to deal with the further difficulty that Rust won’t allow two different objects to hold references to an object if any of those references are mutable. So it would object to Billing and Monitoring using the Db object at the same time. We’d need to make it immutable (as we have here), or find a different way of structuring the code: for example, we could hold the Db instance only within the run_main_loop code, and pass it in temporarily to the Billing and Monitoring objects when we called their methods. A large part of the art, fun and pain of learning Rust is finding new patterns for your code that do what you need to do and also keep the compiler happy. When you manage it, you get amazing benefits!

Edge computing providers

I’m looking into Edge computing at work. By Edge computing I mean running WASM programs in lots and lots of smallish computers in places near to actual people (rather than in huge cloud data centres). I think it’s cool because I love Rust, and Rust is the leading language to compile to WASM.

Here are some companies providing Edge computing services:

  • Fastly – good links with WASM community (hired Mozilla devs), and early adopters – custom WASM engine wasmtime.
  • Cloudflare – huge, and early adopters – WASM engine is Google V8.
  • AWS Lambda@Edge – docs are light on detail, but it looks like a real offering, probably.

Also-rans:

Who did I miss?

short – command line tool to truncate lines to fit in the terminal

Sometimes I run grep commands that search files with hugely-long lines. If those lines match, they are printed out and spam my terminal with huge amounts of information, that I probably don’t need.

I couldn’t find a tool that limits the line-length of its output, so I wrote a tiny one.

It’s called short.

You use it like this (my typical usage):

grep foo myfile.txt | short

Or specify the column width like this:

short -w 5 myfile.txt

It’s written in Rust. Feel free to add features, fix bugs and package it for your operating system/distribution at gitlab.com/andybalaam/short.

Creating a tiny Docker image of a Rust project

I am building a toy project in Rust to help me learn how to deploy things in AWS. I’m considering using Elastic Beanstalk (AWS’s platform-as-a-service) and also Kubernetes. Both of these support deploying via Docker containers, so I am learning how to package a Rust executable as a Docker image.

My program is a small web site that uses Redis as a back end database. It consists of some Rust code and a couple of static files.

Because Rust has good support for building executables with very few dependencies, we can actually build a Docker image with almost nothing in it, except my program and the static files.

Thanks to Alexander Brand’s blog post How to Package Rust Applications Into Minimal Docker Containers I was able to build a Docker image that:

  1. Is very small
  2. Does not take too long to build

The main concern for making the build faster is that we don’t download and build all the dependencies every time. To achieve that we make sure there is a layer in the Docker build process that includes all the dependencies being built, and is not re-built when we only change our source code.

Here is the Dockerfile I ended up with:

# 1: Build the exe
FROM rust:1.42 as builder
WORKDIR /usr/src

# 1a: Prepare for static linking
RUN apt-get update && \
    apt-get dist-upgrade -y && \
    apt-get install -y musl-tools && \
    rustup target add x86_64-unknown-linux-musl

# 1b: Download and compile Rust dependencies (and store as a separate Docker layer)
RUN USER=root cargo new myprogram
WORKDIR /usr/src/myprogram
COPY Cargo.toml Cargo.lock ./
RUN cargo install --target x86_64-unknown-linux-musl --path .

# 1c: Build the exe using the actual source code
COPY src ./src
RUN cargo install --target x86_64-unknown-linux-musl --path .

# 2: Copy the exe and extra files ("static") to an empty Docker image
FROM scratch
COPY --from=builder /usr/local/cargo/bin/myprogram .
COPY static .
USER 1000
CMD ["./myprogram"]

The FROM rust:1.42 as build line uses the newish Docker feature multi-stage builds – we create one Docker image (“builder”) just to build the code, and then copy the resulting executable into the final Docker image.

In order to allow us to build a stand-alone executable that does not depend on the standard libraries in the operating system, we use the “musl” target, which is designed to be statically linked.

The final Docker image produced is pretty much the same size as the release build of myprogram, and the build is fast, so long as I don’t change the dependencies in Cargo.toml.

A couple more tips to make the build faster:

1. Use a .dockerignore file. Here is mine:

/target/
/.git/

2. Use Docker BuildKit, by running the build like this:

DOCKER_BUILDKIT=1 docker build  .