Be Careful Zero-Copying Strings with serde
When deserializing a string using serde
, it is possible to use a borrowed &str
instead of an owned String
:
use serde::Deserialize;
use serde_json;
#[derive(Deserialize)]
struct Foo<'a> {
// This string is borrowed.
text: &'a str,
}
fn main() {
let json = r#"{ "text": "Hello, world!" }"#;
let foo: Foo = serde_json::from_str(json).unwrap();
println!("{}", foo.text); // Hello, world!
}
The borrowed string is a reference to a portion of the original serialized data. In this case, foo.text
refers to a slice of the json
variable that contains the text Hello, world!
.
This process is called zero-copy deserialization, and can be more efficient than allocating a new String
and copying the data to it. Be warned, however; some strings cannot be deserialized into &str
, and must be deserialized into a String
instead.
The specific case where I found this out was when I was deserializing text with backslashes in it:
let json = r#"{ "text": "Go to C:\Users\bd\Desktop" }"#;
let foo: Foo = serde_json::from_str(json).unwrap();
println!("{}", foo.text);
Instead of printing Go to C:\Users\bd\Desktop
as I expected, it instead panicked!
thread 'main' panicked at src/main.rs:12:47:
called `Result::unwrap()` on an `Err` value: Error("invalid type: string "Go to C:\\Users\\bd\\Desktop", expected a borrowed string", line: 1, column: 34)
When deserializing the text, serde_json
needs to convert Go to C:\\Users\\bd\\Desktop
to Go to C:\Users\bd\Desktop
. The only way it can do that is by allocating a new string. serde_json
can’t do that here, however, because we told it not to by using zero-copy deserialization!
In order to fix this, you need to replace the borrowed &str
with an owned String
. It can be slower than zero-copy deserialization, but it supports all possible data inputs:
use serde::Deserialize;
use serde_json;
#[derive(Deserialize)]
struct Foo {
text: String,
}
fn main() {
let json = r#"{ "text": "Go to C:\Users\bd\Desktop" }"#;
let foo: Foo = serde_json::from_str(json).unwrap();
println!("{}", foo.text); // Go to C:UsersdDesktop
}
This kind of issue will arise when deserializing other escape codes in JSON, such as \n
and \t
. It can also occur when using other types that can be zero-copied, such as &Path
1. Next time you consider using zero-copy deserialization, be sure you’re ok with limiting what data you can support.
Further Reading:
Addendum
As @korrat has helpfully pointed out, Cow<str>
can be used as a compromise between &str
and String
. If you annotate a field with #[serde(borrow)]
, it will first try to zero-copy deserialize the string, but will fall back to cloning the data if it needs to be modified.
As a result, Cow<str>
should be preferred as it offers performance improvements without restricting what data can be deserialized:
use serde::Deserialize;
use serde_json;
use std::borrow::Cow;
#[derive(Deserialize)]
struct Foo<'a> {
// Try to borrow the string when possible, but clone it when necessary.
#[serde(borrow)]
text: Cow<'a, str>,
}
fn main() {
// No changes need to be made, the string can be borrowed.
let json = r#"{ "text": "Hello, world!" }"#;
let foo: Foo = serde_json::from_str(json).unwrap();
assert!(matches!(foo.text, Cow::Borrowed(_)));
// Changes need to be made, the string must be owned.
let json = r#"{ "text": "Hello,\nworld!" }"#;
let foo: Foo = serde_json::from_str(json).unwrap();
assert!(matches!(foo.text, Cow::Owned(_)));
}
In the initial release of this post, I incorrectly wrote that Cow<str>
does not support zero-copy deserialization. This isn’t the case, you just need to annotate the Cow<str>
field with #[serde(borrow)]
. Don’t forget that attribute, or Cow<str>
will just be equivalent to String
! See the serde
docs for more information and examples.
- Be especially careful about using this type. Since it cannot deserialize backslashes, you’re essentially eliminating support for Windows paths.↩