Llogiq on stuff

Comparing Rust and Java

This post compares Rust-1.8.0 nightly to OpenJDK-1.8.0_60

It may not be obvious from my other blog entries, but I work as a Java developer. I also freely confess to enjoy it, which probably makes me part of some minority.

However, if you’ve been reading some of my other posts, you’ll be hard-pressed to overlook the fact that I really enjoy programming in Rust, too. So, given that I am both Rust- and Java-savvy, why not compare both and see where we end up?

History

Java began its life as Oak. Early versions which already carried the Java moniker floated around in 1995, the 1.0 release is dated January 1996. This makes Java more than twenty years old now! The early goals were (mostly) portability, simplicity and robustness. It was conceived as a low-level garbage-collected bytecode-interpreted, object-oriented language. Early implementations were slow as molasses and ate a lot of memory. The former problem can considered to be mostly solved by now.

Those were the early days of the internet, and Java was billed as “the programming language for the Web”, applets and all. This didn’t quite work out as planned, though the name of the language that runs on web clients today used to start with ‘Java’. Even so, a lot of the infrastructure of the Web we have today runs on Java server-side.

Rust 1.0 was a good bit longer in development (and most of the history can be seen on GitHub, which goes back to 2010) and was released in May 2015. Graydon Hoare who started the project has gone on record saying he actually started working on Rust in private around 2008. Since 2010, Mozilla has taken it upon themselves to sponsor the development, along with setting up the Servo project, which aims to create a modern browser engine in Rust.

The Rust developers iterated a lot on the design, lots of things were thrown in and fell out again. Earlier versions of Rust had green threads, and Garbage Collection was removed from the language during the 0.8 cycle if I recall correctly. While the fast iteration was off-putting to early users, together with the open discussion culture it enabled a very thorough and considerate design.

The Runtime

The first obvious difference between Rust and Java is that the latter runs on the JVM, so it’s just-in-time compiled. This means that Java can benefit from profile-based optimizations that in theory allow better performance than compile-time optimized code for some workloads. For my job, I’ve rewritten the hottest code from my in-house tool in C and found that the performance difference was too small to be measurable. So while some benchmarks show Rust comfortably in the lead (in fact I wrote the only Rust benchmark on that page that is slower than the Java version, and have long since suggested a much faster version, which alas has not been included on the site at the time of this writing), I would not bet on measurable differences for any workload prior to measurement.

One thing that I will bet on is that for most workloads that matter, a typical Rust program will consume orders of magnitude less memory than a typical Java program.

Still, having similar performance in many workloads despite Java having a fat runtime shows that the JVM team has some serious engineering chops.

As it stands now, Java doesn’t monomorphize (at least not at compile time, though the JIT may synthesize specialized versions of hot code), while Rust does. This leads to smaller binaries for Java, although that advantage is usually eaten up by runtime overhead as well as the kitchen-sink approach of popular libraries.

Java’s GC is very optimized, and should be considered world-class. While it doesn’t solve all problems Rust’s ownership system is designed to solve, it makes programming fairly painless. Tuning it is a complex exercise, however. Also in hot code, High Performance Java developers usually try hard not to allocate to keep the runtime impact of the GC manageable, sometimes resulting in byzantine code.

Update: Redditor pjmlp notes that there are ahead-of-time compilers for Java, notably Android’s ART and some commercial offerings. As far as I know, the latter aren’t widely used.

Rust on the other hand has a zero-sized runtime, for some large values of zero. Actually there is a runtime, but it consists of setting up landing pads for panics, and even that can be overridden for e.g. embedded or OS development.

As a fully compiled language, Rust isn’t as portable as Java (in theory), but as it is LLVM based, backends for many targets can be obtained with reasonable effort. More, the absence of a fat runtime and garbage collector makes Rust suitable for targets which are deemed too small for the JVM.

Lifetimes and Ownership

On the other hand, Rust can get by without a GC because of its lifetime- and ownership rules, which are upheldd by the borrow checker, sometimes affectionately called borrowck.

This is something Java doesn’t have and it gives Rustaceans a whole new set of both benefits and headaches. The former because the compiler will ensure freedom from data-races, ConcurrentModificationExceptions and other things that can plague java codebases. The latter because at some point everyone learning Rust will bash their head against the borrow checker and ask the Rust gods why their finely handcrafted lifetime annotations fail to pass its muster.

The rules are actually quite simple: You can have a plain value (which you can do anything to, except dropping it while borrowed), exactly one mutable xor as many immutable borrows as you like. Borrows end in the reverse order of taking them, which means sometimes switching statements can appease the borrow checker.

It all gets a bit more complex when you take references as part of types, because then the types usually have to be generic over some lifetimes. I won’t delve into that here, there has been written short and long and eloquently about it. Just remember that when you see 'a: 'b, it means that whatever has the lifetime 'a must live longer than something that has the lifetime 'b.

Unfortunately, while the concepts have been around for at least two decades, it is new to have them formalized in such a stringent fashion. So while the error messages are somewhat helpful (and for the record, their helpfulness wanes with the introduction of more generics), it takes some time to understand them.

Even with the steeper learning curve and the marginally longer compile times, borrow checking is a great win, because while it may seem overly restrictive and ceremony-laden at times, it catches real bugs that you could conceivably miss in Java even with a good test suite.

I’d like to add that despite those seemingly large differences, the memory models of Java and Rust are surprisingly similar.

Types

Java’s primitive types are a subset of Rust’s (there are no unsigned integer types, and Rust gives us more leeway to include e.g. SIMD types). Having all objects stored as references leads to pointer chasing galore, though the well-implemented escape analysis will reduce some of the pain in practice. This also explains why Java utilizes the heap much more than Rust, because Objects usually live on the heap (barring off-heap stuff that has become en vogue in certain circles).

Java’s integral operations are wrapping (and there is no overflow check), whereas Rust’s are checked in debug mode and wrapping on overflow in release mode. This allows the benefits of checking during testing and the speed of wrapping operations on release builds.

Rust has tuple types built in, which makes it easy to return multiple values without overhead. In Java, returning a Pair<A, B> is always a bit icky (and comes with reference-chasing overhead). Value types in Java are slated to bring it more in line with the lower level languages, but I hear they’ve been pushed to Java 10, if not later.

Rust’s array types carry their size in the type. However, creating arrays of some size determined at runtime isn’t possible without some evil hacks. Rustaceans tend to use Vecs for this, which are comparable to Java’s ArrayList (if only those would work for primitive types).

Both Rust and Java keep their generics to compile time. Rust’s type system is a lot mightier than Java’s; in fact it’s Turing capable (one can build arbitrary type structures out of zero-sized-types). Rust also uses its generics to communicate lifetime information.

Java has pervasive nulls – every non-primitive Object can be null simply due to the fact that they are referenced. Again, value types will reduce the scope of possible nullness, but we’ll be waiting for them for some time.

Rust’s enums are sum types, whereas Java’s enums are simple values akin to integers. On the other hand, Java’s enum values are really singleton classes under the hood – so one can define an abstract method in the enum to implement differently in each value, something that would require a match expression (or hash-table, whose construction at compile time some enterprising Rustaceans have built a crate for) in Rust.

Java’s primitive types have automatic widening coercion, so that you can put an int into a method that takes long. Rust only has “deref coercion” which means that on dereferencing (which the dot operator will do implicitly, or you can do explicitly with the * prefix operator) the trait implementations are queried for a Deref implementation with an appropriate Target type.

However, the Rust community has declared excessive use of this an antipattern, and Rust code usually has explicit type conversions. Consequently, it is usually easy to determine the type of any expression without looking too far, despite method-wide type inference, which also has the effect that you sometimes see absurdly long method signatures for a one-liner method. I think the locality of semantics is worth it, though. You usually need not look far to understand a Rust method, whereas Java code can sometimes look quite opaque.

All in all, the more thorough type system, borrow- and other checks, along with immutability by default and the absence of a good number of footguns mean Rust code is usually more robust than Java code written in roughly the same time. In other words: It may be harder to write Rust code than Java code, but it’s a lot harder to write incorrect Rust code than incorrect Java code.

This has the effect that when rustc compiles code, it usually runs on first try. Besides Rust, only OCamML or Haskell give me such confidence in my code, and both are markedly higher level.

Java has Class, Rust has Trait

Java has class. I’d go as far as saying it has class. classes, even. It also has interfaces, which with the latest version have gained the ability to define default methods.

I probably don’t need to reiterate that Java’s classes bind data and behavior together (encapsulation), control visibility, inherit from other classes, implement interfaces and so on. Everything in Java (apart from primitive types and possibly null) is an Object (and thus belongs to a Class). Even if it’s just a bag of static methods.

Also classes correspond to types. This multitude of responsibilities classes have makes them the center piece of Java’s structure.

Rust on the other hand has traits, which are eerily similar to Java 8’s interfaces. Then it has types (usually structs and/or enums) and implementations of traits for types. It also has inherent implementations (of types for themselves). Finally the visibility is usually determined by the module (which is akin to a java package, though the latter is only a collection of classes and possibly sub-packages).

This separation of data and behavior may seem strange at first, but it’s actually pretty clever, because it makes composition of data types very natural, and it’s possible to create new traits that add behavior to existing types, which is impossible in Java.

Rust also has free-standing functions, which just live in their module. This means less ceremony when writing procedural code. No more public class HelloWorld.

Rust’s separation of data and behavior promotes a sort of data-oriented programming where you first set up your data structures and then structure the behavior around it.

Patterns and SOLID

Both Rust and Java lack named arguments like in Python, so both use the Builder pattern to . Apart from that, Java seems more pattern-happy than Rust, presumably because the latter is still young and hasn’t grown so many patterns. Also many of the amusing patterns of yore (AbstractCompositeStrategyBeanFactoryFactory anyone?) are no longer relevant to Java thanks to the recent functional influences in Java 8.

Many Java programmers embrace the SOLID principles, which still require some patterns to be adhered in Java. To reiterate, Single-Responsibility, Open(to extension)-Closed(to modification), Liskov Substitution, Interface segregation and Dependency inversion.

At times, this can make applications and libraries seem over-engineered, where the call graph zig-zags back and forth through multiple levels of objects that each mediate some part of the functionality. Especially in enterprise-y code, there is a pull in that direction. Luckily the pendulum currently swings the other way, favoring simplicity over code reuse.

Another current trend in the Java world is to forgo implementation inheritance in favor of composition of objects and delegating the relevant methods. Though it produces more boilerplate code, it allows for finer control over the API and Projects like lombok greatly reduce the pain by auto-generating the boilerplate following an annotation (for example @Data creates a constructor, getters and setters for all attributes, equals(_), .hashCode() and .toString(), all in all quite a bit of code).

In Rust, trait coherence (and especially the orphan rule) basically mandates the open-closed principle by design. Interface segregation and Dependency inversion can be done with traits but often requires either generics, which make the code more complex or trait objects, which have a runtime overhead that Rustaceans aren’t usually willing to pay. Single responsibility and Liskov substitution rely on the vigilance of the programmer in both languages.

Flow control

The following table shows how different flow control constructs relate between the languages:

Rust Java
break / break 'label break / break label
continue / continue 'label continue / continue label
for i in 0..n { _ } ¹ for (int i = 0; i < n; i++) { _ }
for i in _ { _ } for (X i : _) { _ }
if _ { _ } else { _ } if (_) { _ } else { _ }
if let _ = _ { _ } else { _ } if (_ = _) { _ } else { _ }
loop { _ } while (true) { _ }
loop { .. ; if _ { break; }} do { .. } while (_);
match _ { .. } switch (_) { .. } ²
return _ return _
while _ { _ } while (_) { _ }
while let _ = _ { _ } while (_ = _) { _ }

¹ This obviously only works for simple ranges. Otherwise we’d probably be writing a while loop.

² Java’s switch statement is both less powerful than the fully destructuring pattern matching of Rust’s match and has surprising fallthrough between case statements (I have my IDE warn on this to mitigate the surprise).

Note that this list is incomplete, and some things don’t completely match for all cases. Most of Rust’s constructs are in fact syntactic sugar for various combinations of loop and match statements. In Java, the for-each loops compile to iterator-based for-loops in the style of for (Iterator<_> i = _; i.hasNext();) { _ v = i.next(); _ }.

So in short, Rust has done away with the C-like for loops. Since assignments don’t return a value (a = b = c is not valid in Rust), the if let and while let forms take care of this use case. This makes those cases more obvious and less prone to if (x = y) errors (though in fairness, most Java IDEs catch those, too).

Rust’s syntactic sugar along with the ridiculously powerful destructuring match makes it feel a bit higher-level than Java in some places, but the compiler still manages to produce very tight code based on it, mostly thanks to LLVM, which the Rust compiler uses to produce code.

Error handling

Java has Exceptions, which come in two flavors: Checked and Unchecked. The former are meant for probable failure modes that can be either directly handled by the caller or bubbled up (by declaring your method throws them) the call chain. The latter are meant for programmer error, which is usually seen as unrecoverable.

One instance of the latter is the often dreaded NullPointerException or the only slightly less terrible IndexOutOfBoundsException (inluding specializations): Every operation that dereferences an Object can throw NullPointerException. It’s so bad that “NPE” is a known abbreviation to many working in Java. Mostly, it’ll be clear which operation caused it, but sometimes long lines may obstruct the offending operations.

Every Exception can be caught, which can lead newbies to catch Throwable {} just to get rid of the annoying stack traces. Which of course is the wrong thing to do; as I said before, runtime exceptions should usually not be caught.

Some people think that checked exceptions are bad and that all exceptions should be unchecked. I personally disagree, but I won’t waste time arguing.

Rust as it is now has thread-bounded “panics” which can be considered RuntimeExceptions that will kill the thread and should only be “caught” from another thread. In recent Rust versions, there’s `std::panic::recover(_) that can call a closure and return a Result converting any panic to an Error.

However, the function is still unstable and can only be used in nightly Rust.

Update: Actually the function has been stabilized as std::panic::catch_unwind(_).

There is also some syntactic sugar to work with Result types, which closely resembles the monadic error handling known from Haskell and other functional languages.

The upside is that handling errors becomes a much more specific thing – one can see at a glance which expressions are possibly error bearing (formerly those would be wrapped in a try!(_) macro invocation, recently an RFC with syntactic sugar was accepted, so those will look like _? in the near future.

It’s also possible to unwrap a Result to convert any errors into panics. This is often used during prototyping, but is frowned upon in production code (there is even a third party lint against it).

The downside is that “bubbling up” errors is no longer as easy as slapping a throws SomeException on the function declaration. The return type of the function must be changed to some Result<T, E> type (where E is the error if the function fails and T is the result otherwise). All error-bearing functions must be invoked with a try!(_) macro (which expands to a match over the Result plus an early return on error), and some error types may be incompatible, leading to either boxed errors (which are basically trait objects of the std::error::Error trait) or wrappers upon wrappers that must be destructured to get at the cause.

In practice it appears to work out quite well.

Functions && Closures

Java has lambdas! Finally! They don’t look as powerful as Rust’s closures, which can modify the captured environment in accordance with Rust’s ownership rules. Still, they work reasonably well for most cases. The same goes for function handles. interfaces with one method are automatically implemented by all functions whose types match that method, which is nice (apart from some wrinkles).

Rust’s functions implicitly implement some Fn*() -> _ types and so can be used in various settings without even require heap allocation. The caller has to work with the given type bounds, which usually requires some generics gymnastics. Still, one can call the strategy more principled than Java’s.

Java’s streams offer a low-cost way to data-parallelize computations. Rust itself doesn’t have this, but Rayon offers parallel iterators that have comparable cost-benefit characteristics. There are many other third-party crates aimed at parallelism and concurrency.

Java has variadic functions which internally uses Arrays. While there are some wrinkles, this allows for nicer interfaces in some situation. Rust can at least emulate this with macros or use a slice argument. Perhaps one day Rust will get variadics, too, but it’s not high on the developers’ priority list.

Java dispatches functions based on argument types. This is done internally by mangling the function name to include the signature, like e.g. next()Lllogiq.example.Example. Rust doesn’t do this: A function always takes one set of arguments, though generics can widen the possible set of types in the signature, e.g. some_func<S: Into<String>>(s: S).

I think that the Rust designers have made a good call here: Having methods of the same name that do completely different things based on type is generally frowned upon. If I do have multiple versions of a method in Java, I usually want them to do roughly the same. For those cases, it’s simple enough to either figure out the generics or have differently-named functions. On the other hand, dispatching methods based on types can lead to confusing interactions. For example, just recently I had a bug, because an output utility class was silently coercing shorts to ints on writing.

Conversely, Java shuns operator overloading, while Rust implements it using traits (in std::ops), which removes some avenues for error (oneString == otherString anyone?), while introducing others (is a == &b the same as &a == b? What about &a == &b?). The set of operators is basically the same (apart from Java’s >>> Rust has no need for because it has unsigned integers) in Rust and Java, though Rust has shuffled the precedence a bit to be less surprising with regards to &, | and comparisons.

Metaprogramming

Rust has both procedural and bang-macros. The former are Rust programs that rewrite token trees, whereas the latter are a form of quasiquoting template language. Also as described above, the type system can be misused to do interesting things.

In contrast, Java has a few libraries to create byte code at runtime which can then be fed to the class loader. Some of them are even quite nice to use (cue shout-out to Byte Buddy creator Rafael Winterhalter) Considering that Java can synthesize code at runtime, it’s surprising that it isn’t done more often. Then again, a lot of code runs well without any byte code injection.

As for annotations, Java has better support for program metadata than Rust for now. This ties into the tooling, which we will come to later. Time will tell if Rust catches up.

Java also has runtime reflection, which is clunky and slow; bytecode wrangling is almost always faster. Rust only makes you pay for what you need, so you can implement whatever reflective capabilities you need using macros. It’s more work, but you have full control over everything.

Interfacing with other languages

In a perfect world, every language could simply call every other. This is obviously not the case, but the C Application Binary Interface (ABI) has emerges as a common denominator most languages can target. As does Java, with its Java Native Interface (JNI). There’s a javah tool that will generate C headers and stubs from a Java class with “native” methods. Some work needs to be done to adapt C interfaces to what JNI requires, and there is the persistent rumor that it was designed to keep developers from reaching down to native code too often. Also there is some overhead related to GC (because the function has to give objects it no longer uses back to the GC, lest the object is lost in a memory leak). Native code must live on the java library path to be loaded into an application.

Rust can more or less interface directly with C by allowing to define extern "C" functions. The compiler will then assume C ABI for those. There are some wrinkles related to ownership, lifetimes and types, because native code by definition cannot uphold Rust’s guarantees by itself, so usually there is some wrapping going on, to present a safe and rustic interface. There also is a crate that allows embedding some subset of C++ in Rust directly, but I have neither found the time nor the need to test it.

Rust obviously has the benefit of being lower level and needing less gymnastics to interface with C. If the Java designers really feared that people would go native too often, their worry was probably unfounded, because Java does pretty well by itself.

The Standard Library

Java’s standard library contains a great deal of stuff from Annotations to ZipOutputStreams – and beyond. While only almost as batteries included as Python, you can do a great many things using nothing but java.* and javax.* (and some org.*, which is also included).

Java never throws things away. Thus the API has three UI toolkits (AWT, Swing and JavaFX), both Enumeration and Iterator interfaces (which mostly do the same thing), two sets of IO classes (java.io and java.nio, though I’ll admit the latter builds upon the former) and other interesting thingamajigs.

On the plus side, this makes java code extremely long-lived – during JavaLand 2015, Marcus Lagergren showed a Java 1.0 applet that was still running (though with Java 9, this example will no longer work, because applets are finally on the way out).

Java’s official APIs tend to be surprise-free and are usually extremely thoroughly documented. Rust is close behind however, thanks to the amazing work of Steve Klabnik who has been hired by Mozilla as a documentation maven for Rust.

Rust’s library is lean and keen. There are a few collection classes, a fair amount of string handling, smart references and cells, basic concurrency support, some IO/network and minimal OS integration. That’s it. The result is that Rust code will usually rely on a lot of third-party libraries, which however are very easily obtained and managed. The good thing here is that no one needs to download a set of MIDI classes just because they want to write a JSON parser — or vice versa.

Owing to Rust’s low-level nature, it often also has to split up operations which are the same in Java, because it matters if something is owned or borrowed (or can be owned, or need to be owned, etc.) – this leads to a diversity of iterators, for example: Where Java Iterables have an .iterator() method, Rust will have ‘.iter()’ (iterate immutably borrowed) and .iter_mut() (iterate mutably borrowed), .into_iter() (iterate by value thus consuming the collection) and sometimes even .drain(..) (iterate by value, optionally removing or replacing elements). There also are a number of helper traits to make the most of the type system (I’ve written about them already).

Rust’s API docs allow offline keyword search, which is nice if you know what to look for. That many types mediate their behavior through a dozen traits hinders discoverability somewhat. On the other hand, once you know your traits, you can do amazing things with them.

One is also usually able to combine Rust’s standard types; it’s not uncommon to see a Rc<RefCell<Vec<T>>>. Type aliases are used in many places to reduce the amount of boilerplate. Manish Goregaokar has written a good piece on how to choose the right combination of wrapper types.

One wrinkle that many have stumbled over is the fact that implementations over arbitrary-sized arrays (not vectors, though) or tuples cannot be expressed in current Rust. As a compromise, there are implementations of the most needful traits for arrays up to size 32 and for tuples with up to 12 elements. One has to create a wrapper type to implement the traits e.g. for a 33-element array. There are proposals to deal with this problem, but apparently it is a lot of work and no suitable implementation has yet emerged.

Despite being dwarfed in size by the Java APIs, the Rust standard library is already surprisingly capable. A portion of the API is marked as unstable, which means it’s only usable with a nightly compiler and some #![feature(_)] annotation. This allows the library team to iterate on API design quickly while upholding stability guarantees. On the other hand, it also limits the usefullness of e.g. BTreeMaps, as there are a good number of methods missing in release-version Rust. This will very probably improve with time.

Tooling

Java’s tooling has matured for decades, so it’s top notch as should be expected. There are uncountable IDEs, build tools, code analysis tools, deployment and operations tools, profilers, coverage collectors, benchmarking frameworks, documentation, debuggers, etc., many of them free as in speech or at least as in beer.

However, this also leads to some fragmentation – take three Java coders and ask them to show their coding setup and you will likely see wildly different environments. Some companies mandate IDEs and a set of libraries/frameworks to use to counteract this.

Rust hasn’t yet had much time to fragment (though some crates are distinctly similar), and the tooling isn’t as mature as with Java. Still, Cargo gets build and package management just right – I wish I could use it in Java. With racer, I can at least get code completion, though the Rust core team has touted this year as the year of Rust IDE. We’ll see how this pans out.

Community and Development

One thing to say about the Java community is that it’s HUGE. There are so many people working with it that at least here in Europe you will struggle to find an acre of land without at least one Java developer in it. The same goes for the ecosystem: Whatever you need, chances are someone already wrote a library to do it. Java’s ecosystem is quite framework-happy, which to some is a downside.

Java has an air of professionalism. Big companies use it. You can comfortably (and profitably!) program it while wearing a suit and tie (source: I did for some years). That doesn’t mean that fun is prohibited, though, and Java folks usually tend to be a cheerful bunch. But when you talk business, we’re on it.

In many discussions invariably some troll will enter to declare Java dead, outdated or legacy software. Well, I gotta say from here it looks pretty healthy. By the same token, there are some who argue that Rust will never be ready for prime time. More than 25 million downloaded crates on crates.io speak a very different language.

I immensely enjoy taking part in the Rust community. The community is much smaller than Java’s, but there are so many adroit, friendly, helpful, savvy and funny people that brighten my day with every interaction, it doesn’t matter. Some say they have been repelled by the adherence to a Code of Conduct, but I have yet to see an objection to any particular part of the code and I believe the results speak for themselves.

The development of Java is mostly directed by Oracle. Given that Java is used in production in a lot of big companies, the development pace is fairly good, though because they need to care for so many different usage scenarios it understandably cannot match the speed that Rust is developed with. Delays between major versions vary, the longest was five years before 1.7 arrived, while nowadays we see new versions roughly every two years. In contrast, new Rust versions arrive every six weeks (though the changes are usually not as dramatic as between Java versions).

Rust is the underdog in this comparison, but they’ve put up a great fight so far. Though much smaller in size, the Rust community makes up for lack of numbers through agility, smarts and focus. That they have a great language at their disposal also helps.

Summing Up

Java has a lot going for it, and I probably will keep using it for some time. Likewise I’ll be a Rustacean for the foreseeable future. Both have their respective strengths and weaknesses, both have a great future ahead and (I think) both communities can learn from each other.


Discuss this on rust-users or /r/rust!