Holy std::borrow::Cow! – Redux
Last time I used a very helpful Cow to get
around the need to clone a string just to borrow it. This was my
first application of lifetime annotations, and it felt like quite the
achievement. :-)
On /r/rust, user Artemciy asked the following very good question
[…] How does it work? There seem to be some magic here. I mean, how String becomes str?
And also offered a bit of insight:
P.S. Looks like there’s
IntoCow’ implementation inStringwhich turns it intostr, but looking at the implementation - it’s still the same magic. Is String secretly a str?
(emphasis mine)
Good question. In fact, you may want to read the whole exchange for some good inquiry, insight, investigation, invalid solutions and internet drama. But alas! who has time? So I’m trying to give the gist from top to bottom.
The first clue is in the Cow itself. If we look into the beef
code,
we find the following (modulo documentation and some annotations):
pub enum Cow<'a, B: ?Sized + 'a> where B: ToOwned {
Borrowed(&'a B),
Owned(<B as ToOwned>::Owned)
}
I will not discuss the various Into and From implementations that
tie together all the Cows, Strings and other types; you can look
them up in the library documentation. This stuff is really genius and
whoever came up with it probably deserves a Turing Award.
As we know from last time, our type was Cow<'a, str> for a given
lifetime 'a, which we took from our default argument. Now this
doesn’t say anything about str or String at all! Where do we get
the implementation from?
A hint was the ToOwned trait bound, which is implemented for str.
Looking into it
yields the following (abridged here) definition:
impl ToOwned for str {
type Owned = String;
fn to_owned(&self) -> String { ... }
}
Now here lies one big piece of the puzzle. The definition has an
associated type called Owned. In the str case, the Owned type
is…
String. Ta-dah!
It also defines how to get a String from a str, which we will come
to later. Now with this definition our Cow<'a, str> can become one of:
Borrowed(&'a str)Owned(String)
which is exactly what we want. If we Deref or Borrow our Cow, we
will get a &'a str out of it (in the owned case, it will just borrow
our wrapped owned instance), and if we asked for .to_owned(), we will
always get a String (as per the ToOwned implementation above).
Nice. But we’re not done yet. We haven’t answered the question if our
handy dandy String is actually leading a double life, and becoming a
str at full moon, at dawn, or whenever the mood strikes. Or as
Artemciy put it: Is String secretly a str?
I believe that everyone should have the choice to come out on their own
terms, so it seems a bit rude to meddle in the affairs of Rust types.
However, String is certainly acting suspicious, and there’s nothing
like a good detective story. Just don’t tell String where you got
it’s little secret from, should we actually find something.
Starting our investigation, we do have some clues:
- Whenever we borrow (i.e. reference) a
String, we get&str. Wait. Eddy B. points out that I am mistaken: Any&Stringis of type&String. It’s actually the Deref Coercion that coerces it into a&str, because of aDerefimplemented onStringwhose associatedtargettype isstr - Whenever we want an owned
str, we actually get aString - Both dereferencing a
&strand a&Stringgives us the samestr
Now what is a str? And what is a String?
The documentation tells us the following of str:
Rust’s
strtype is one of the core primitive types of the language.&stris the borrowed string type. This type of string can only be created from other strings, unless it is a&'static str(see below). It is not possible to move out of borrowed strings because they are owned elsewhere.
Looking into the source,
str.rs, Line 47
confirms that str is a primitive type. This means we don’t get a
definition like a struct or enum, nor a type alias. The compiler
apparently knows str more intimately than it readily admits.
Further down in the docs, we get a section on representation (and without taxation! Rust certainly is a republican utopia!) containing the following sentence:
The actual representation of strs have direct mappings to slices:
&stris the same as&[u8].
So a str is really just a sequence of bytes that are guaranteed to be
valid UTF-8. &str thus is a borrowed sequence of bytes, while
String is…here we draw a blank. We have one piece of the puzzle,
but for the second one we’ll have to look in
collections::string,
which actually defines String:
pub struct String {
vec: Vec<u8>,
}
So a String is just a wrapped Vec of bytes, where the available
constructors ensure that the contents are always valid UTF-8.
Now we see that our suspicion was unfounded: str and String are
pretty closely related (in fact they share the same relation as a
slice’s contents to a Vec), but they are not the same person. Case
closed!
So, no harm no foul, right? You haven’t told String of our little
investigation, have you? Please tell me you didn’t…
Fine. Not only does lifetime elision want to bite me in the ass, also
String is now out for revenge. If you don’t hear from me until next
week, tell my wife I love her.
Bonus: Master investigator diwic gives us the result of his
inquiry regarding the shocking connection between Borrow and
ToOwned in a
/r/rust comment.