Holy std::borrow::Cow! – Redux
Last time I used a very helpful Cow
to get
around the need to clone a str
ing just to borrow it. This was my
first application of lifetime annotations, and it felt like quite the
achievement. :-)
On /r/rust, user Artemciy asked the following very good question
[…] How does it work? There seem to be some magic here. I mean, how String becomes str?
And also offered a bit of insight:
P.S. Looks like there’s
IntoCow
’ implementation inString
which turns it intostr
, but looking at the implementation - it’s still the same magic. Is String secretly a str?
(emphasis mine)
Good question. In fact, you may want to read the whole exchange for some good inquiry, insight, investigation, invalid solutions and internet drama. But alas! who has time? So I’m trying to give the gist from top to bottom.
The first clue is in the Cow
itself. If we look into the beef
code,
we find the following (modulo documentation and some annotations):
pub enum Cow<'a, B: ?Sized + 'a> where B: ToOwned {
Borrowed(&'a B),
Owned(<B as ToOwned>::Owned)
}
I will not discuss the various Into
and From
implementations that
tie together all the Cow
s, String
s and other types; you can look
them up in the library documentation. This stuff is really genius and
whoever came up with it probably deserves a Turing Award.
As we know from last time, our type was Cow<'a, str>
for a given
lifetime 'a
, which we took from our default argument. Now this
doesn’t say anything about str
or String
at all! Where do we get
the implementation from?
A hint was the ToOwned
trait bound, which is implemented for str
.
Looking into it
yields the following (abridged here) definition:
impl ToOwned for str {
type Owned = String;
fn to_owned(&self) -> String { ... }
}
Now here lies one big piece of the puzzle. The definition has an
associated type called Owned
. In the str
case, the Owned
type
is…
String
. Ta-dah!
It also defines how to get a String
from a str
, which we will come
to later. Now with this definition our Cow<'a, str>
can become one of:
Borrowed(&'a str)
Owned(String)
which is exactly what we want. If we Deref
or Borrow
our Cow, we
will get a &'a str
out of it (in the owned case, it will just borrow
our wrapped owned instance), and if we asked for .to_owned()
, we will
always get a String
(as per the ToOwned
implementation above).
Nice. But we’re not done yet. We haven’t answered the question if our
handy dandy String
is actually leading a double life, and becoming a
str
at full moon, at dawn, or whenever the mood strikes. Or as
Artemciy put it: Is String secretly a str?
I believe that everyone should have the choice to come out on their own
terms, so it seems a bit rude to meddle in the affairs of Rust types.
However, String
is certainly acting suspicious, and there’s nothing
like a good detective story. Just don’t tell String
where you got
it’s little secret from, should we actually find something.
Starting our investigation, we do have some clues:
- Whenever we borrow (i.e. reference) a
String
, we get&str
. Wait. Eddy B. points out that I am mistaken: Any&String
is of type&String
. It’s actually the Deref Coercion that coerces it into a&str
, because of aDeref
implemented onString
whose associatedtarget
type isstr
- Whenever we want an owned
str
, we actually get aString
- Both dereferencing a
&str
and a&String
gives us the samestr
Now what is a str
? And what is a String
?
The documentation tells us the following of str:
Rust’s
str
type is one of the core primitive types of the language.&str
is the borrowed string type. This type of string can only be created from other strings, unless it is a&'static str
(see below). It is not possible to move out of borrowed strings because they are owned elsewhere.
Looking into the source,
str.rs, Line 47
confirms that str
is a primitive type. This means we don’t get a
definition like a struct
or enum
, nor a type
alias. The compiler
apparently knows str
more intimately than it readily admits.
Further down in the docs, we get a section on representation (and without taxation! Rust certainly is a republican utopia!) containing the following sentence:
The actual representation of strs have direct mappings to slices:
&str
is the same as&[u8]
.
So a str
is really just a sequence of bytes that are guaranteed to be
valid UTF-8. &str
thus is a borrowed sequence of bytes, while
String
is…here we draw a blank. We have one piece of the puzzle,
but for the second one we’ll have to look in
collections::string,
which actually defines String
:
pub struct String {
vec: Vec<u8>,
}
So a String
is just a wrapped Vec
of bytes, where the available
constructors ensure that the contents are always valid UTF-8.
Now we see that our suspicion was unfounded: str
and String
are
pretty closely related (in fact they share the same relation as a
slice’s contents to a Vec
), but they are not the same person. Case
closed!
So, no harm no foul, right? You haven’t told String
of our little
investigation, have you? Please tell me you didn’t…
Fine. Not only does lifetime elision want to bite me in the ass, also
String
is now out for revenge. If you don’t hear from me until next
week, tell my wife I love her.
Bonus: Master investigator diwic gives us the result of his
inquiry regarding the shocking connection between Borrow
and
ToOwned
in a
/r/rust comment.