Llogiq on stuff

Holy std::borrow::Cow! – Redux

Last time I used a very helpful Cow to get around the need to clone a string just to borrow it. This was my first application of lifetime annotations, and it felt like quite the achievement. :-)

On /r/rust, user Artemciy asked the following very good question

[…] How does it work? There seem to be some magic here. I mean, how String becomes str?

And also offered a bit of insight:

P.S. Looks like there’s IntoCow’ implementation in String which turns it into str, but looking at the implementation - it’s still the same magic. Is String secretly a str?

(emphasis mine)

Good question. In fact, you may want to read the whole exchange for some good inquiry, insight, investigation, invalid solutions and internet drama. But alas! who has time? So I’m trying to give the gist from top to bottom.

The first clue is in the Cow itself. If we look into the beef code, we find the following (modulo documentation and some annotations):

pub enum Cow<'a, B: ?Sized + 'a> where B: ToOwned {
    Borrowed(&'a B),
    Owned(<B as ToOwned>::Owned)
}

I will not discuss the various Into and From implementations that tie together all the Cows, Strings and other types; you can look them up in the library documentation. This stuff is really genius and whoever came up with it probably deserves a Turing Award.

As we know from last time, our type was Cow<'a, str> for a given lifetime 'a, which we took from our default argument. Now this doesn’t say anything about str or String at all! Where do we get the implementation from?

A hint was the ToOwned trait bound, which is implemented for str. Looking into it yields the following (abridged here) definition:

impl ToOwned for str {
    type Owned = String;
    fn to_owned(&self) -> String { ... }
}

Now here lies one big piece of the puzzle. The definition has an associated type called Owned. In the str case, the Owned type is…

String. Ta-dah!

It also defines how to get a String from a str, which we will come to later. Now with this definition our Cow<'a, str> can become one of:

  • Borrowed(&'a str)
  • Owned(String)

which is exactly what we want. If we Deref or Borrow our Cow, we will get a &'a str out of it (in the owned case, it will just borrow our wrapped owned instance), and if we asked for .to_owned(), we will always get a String (as per the ToOwned implementation above).

Nice. But we’re not done yet. We haven’t answered the question if our handy dandy String is actually leading a double life, and becoming a str at full moon, at dawn, or whenever the mood strikes. Or as Artemciy put it: Is String secretly a str?

I believe that everyone should have the choice to come out on their own terms, so it seems a bit rude to meddle in the affairs of Rust types. However, String is certainly acting suspicious, and there’s nothing like a good detective story. Just don’t tell String where you got it’s little secret from, should we actually find something.

Starting our investigation, we do have some clues:

  • Whenever we borrow (i.e. reference) a String, we get &str. Wait. Eddy B. points out that I am mistaken: Any &String is of type &String. It’s actually the Deref Coercion that coerces it into a &str, because of a Deref implemented on String whose associated target type is str
  • Whenever we want an owned str, we actually get a String
  • Both dereferencing a &str and a &String gives us the same str

Now what is a str? And what is a String?

The documentation tells us the following of str:

Rust’s str type is one of the core primitive types of the language. &str is the borrowed string type. This type of string can only be created from other strings, unless it is a &'static str (see below). It is not possible to move out of borrowed strings because they are owned elsewhere.

Looking into the source, str.rs, Line 47 confirms that str is a primitive type. This means we don’t get a definition like a struct or enum, nor a type alias. The compiler apparently knows str more intimately than it readily admits.

Further down in the docs, we get a section on representation (and without taxation! Rust certainly is a republican utopia!) containing the following sentence:

The actual representation of strs have direct mappings to slices: &str is the same as &[u8].

So a str is really just a sequence of bytes that are guaranteed to be valid UTF-8. &str thus is a borrowed sequence of bytes, while String is…here we draw a blank. We have one piece of the puzzle, but for the second one we’ll have to look in collections::string, which actually defines String:

pub struct String {
    vec: Vec<u8>,
}

So a String is just a wrapped Vec of bytes, where the available constructors ensure that the contents are always valid UTF-8.

Now we see that our suspicion was unfounded: str and String are pretty closely related (in fact they share the same relation as a slice’s contents to a Vec), but they are not the same person. Case closed!

So, no harm no foul, right? You haven’t told String of our little investigation, have you? Please tell me you didn’t…

Fine. Not only does lifetime elision want to bite me in the ass, also String is now out for revenge. If you don’t hear from me until next week, tell my wife I love her.

Bonus: Master investigator diwic gives us the result of his inquiry regarding the shocking connection between Borrow and ToOwned in a /r/rust comment.