Shutdown order consistency: how Rust helps
December 22, 2020 [Java, Programming, Programming Languages, Rust, Tech]Some Java code with bugs
Here's my main method (in Java). Can you guess the bug?
Db db = new Db(); Monitoring monitoring = new Monitoring(); Monitoring mon2 = new Monitoring(); Billing billing = new Billing(db, monitoring); monitoring.setDb(db); runMainLoop(billing, mon2); db.stop(); billing.stop(); monitoring.stop();
If you would like to hunt down the 2 bugs manually, try reading the full code here: ShutdownOrder.java
But maybe you have an idea already? Maybe you've seen code like this before? If you have, you probably have an instinct that there's some kind of bug, even if you can't say for sure what it is. Code like this almost always has bugs!
This code compiles fine, but it contains two bugs.
First, we forgot to setDb() on mon2. This causes a NullPointerException, because Monitoring expects always to have a working Db.
Second, and in general harder to spot, we shut down our services in the wrong order. It turns out that Monitoring uses its Db during shutdown, so we get an exception. Even worse, if some other code needed to run after monitoring.stop(), it won't, because the exception prevents us getting any further.
Of course, this is toy code, but this kind of problem is common (and much harder to spot) in real-life code. In fact, my team dealt with a similar bug this week.
It's fundamentally hard to figure out your shutdown order. It's complicated further if classes have start() methods too, which I have seen in lots of Java code.
Given that this is just a hard problem, maybe there's no point looking for tools to make it easier?
Some Rust code without those bugs
Let's try writing this code in Rust. Here's the main method:
let db = Db::new(); let monitoring = Monitoring::new(&db); let mon2 = Monitoring::new(&db); let billing = Billing::new(&db, &monitoring); run_main_loop(&billing, &mon2); // drop() is called automatically on all objects here
Here's the full code: shutdown_order.rs
This code shuts down all the services automatically at the end, and any mistakes we make in the order are compile errors, not things we find later when our code is running.
The code to shut down each service looks like this:
impl Drop for Monitoring<'_> { fn drop(&mut self) { // [Disconnect from monitoring API] self.db.add_record("MonitorShutDown"); } }
This is us implementing the Drop trait for the struct Monitoring (traits are a bit like Java Interfaces). The Drop trait is special: it indicates what to do when an instance of this struct is dropped. In Rust, this is guaranteed to happen when the instance goes out of scope, which is why our comment at the end of the main method sounds so confident.
Furthermore, Rust's compiler shuts down everything in the reverse order in which it was created, and guarantees that nothing gets used after it has been dropped.
Rust's lovely world gives us two relevant treats: no unexpected nulls, and lifetimes.
Treat number 1: no unexpected nulls
First, in Rust, like in other modern languages like Kotlin, we have to be explicit about items that could be missing. In our example, we were able to re-arrange the code so that db can never be missing (or null), and the compiler encouraged us to do so. If we really needed it to be missing some of the time, we could have used the Option type, and the compiler would have forced us to handle the case when it was missing, instead of unexpectedly getting a NullPointerException like we did in Java. (In fact, if we'd structured our code to use final in as many places as possible, we could have been encouraged towards basically the same solution in Java too.)
Treat number 2: lifetimes
Second, if you look a bit more closely at the full code of shutdown_order.rs you'll see lots of confusing-looking annotations like <'a> and &'a:
struct Monitoring<'a> { db: &'a Db, }
The approximate meaning of those annotations is: a Monitoring holds a reference to a Db, and that Db must last longer than the Monitoring.
This "lasts longer than" wording is what Rust Lifetimes are for. Lifetimes are a way of saying how long something lasts.
Lifetimes are really confusing when you start with Rust, and have caused me a lot of pain. Code like this is where they are both most painful and most helpful. As I mentioned earlier, the problem of shutdown order is fundamentally hard. Rust gives you that pain at the beginning, and until you understand what's going on, the pain is very confusing and acute. But, once your code compiles, it is correct, at least as far as problems like this are concerned.
I love the sense of security it gives me to write Rust code and know the compiler has checked my code for this kind of problem, meaning it can't crop up at 3am on Christmas Day...
Final note/caveat
This Rust code is probably over-simplified, because all the references are immutable (you can't change the objects they point to). In practice, we may well have mutable references, and if we do we're going have to deal with the further difficulty that Rust won't allow two different objects to hold references to an object if any of those references are mutable. So it would object to Billing and Monitoring using the Db object at the same time. We'd need to make it immutable (as we have here), or find a different way of structuring the code: for example, we could hold the Db instance only within the run_main_loop code, and pass it in temporarily to the Billing and Monitoring objects when we called their methods. A large part of the art, fun and pain of learning Rust is finding new patterns for your code that do what you need to do and also keep the compiler happy. When you manage it, you get amazing benefits!