Rust Pointer Metadata

Welcome back! In my last blog post I talked about overriding the global allocator. It was reasonably successful, even getting mentioned in This Month in Rust OSDev without me suggesting it! That success has inspired me to write about another really interesting feature in the Rust programming language called pointer metadata.

WARNING

Some examples in this blog post use the unstable feature ptr_metadata with Nightly Rust. I have tested all of them with rustc 1.72.0-nightly (101fa903b 2023-06-04), but it may fail to compile if you use a different version. For more information, please see the Rustup documentation.

What is a pointer?

A pointer is simply a number that points to a location in memory. There are two different types in Rust: references and raw pointers. References uphold certain restrictions that prevent programmers from making common mistakes as seen in the C and C++ languages1. Raw pointers have none of these restrictions, and thus require unsafe to access the underlying memory.

After getting compiled, references and raw pointers are semantically equivalent and look the same. They are only treated differently by the compiler. From this point onward when I refer to a "pointer," I will be talking about both references and raw pointers.

Data Representation

All pointers are unsigned integers that represent a location in memory. (I fear the day where someone decides to represent them with floating point numbers.) The amount of bits they use is platform-dependent, but guaranteed to always be the same size as a usize.

QUOTE

"The pointer-sized unsigned integer type."

~ Rust usize Documentation

I use an Apple M1 CPU, which has a 64-bit architecture. This means that all usizes compiled on my computer will be 64 bits long. 64-bit pointers are the most common nowadays, but some older computers (specifically Windows ones) may use 32-bit pointers instead.

All that I've said so far about the data representation of pointers is mostly language-independent, but what comes next is Rust specific. Pointers in Rust can either one usize or two usizes. This means that any pointer can be twice the normal size, depending on the type of data it points to.

use std::mem::size_of;

fn main() {
    // `usize` and pointers are the same size...
    assert_eq!(size_of::<usize>(), size_of::<&bool>());
    assert_eq!(size_of::<usize>(), size_of::<*const u8>());

    // ...until they're not?!
    assert_ne!(size_of::<usize>(), size_of::<&str>());
}

In the above example, &str is not the size of a usize! On my machine where a usize is 8 bytes2, the size of &str is 16 bytes. The reason behind this sudden increase is size is because of a special property called pointer metadata.

Intro to Pointer Metadata

Pointer metadata is an optional feature that allows pointers to store additional data about themselves. Where a normal pointer just stores its address, a pointer with metadata stores its address and an extra usize of information.

A brief terminology break: pointers with metadata are called fat pointers, while ones without are called thin pointers.

Due to their increase in size, Rust only uses fat pointers in places where it cannot infer certain information at compile time. The most popular, and currently only, use of pointer metadata is when combined with dynamically sized types.

DSTs

Dynamically sized types, or DSTs, fit the requirements for pointer metadata perfectly. Their size is unknown to the compiler, so it delegates the size to be calculated and stored at runtime. Here is a list of commonly used DSTs:

All DSTs are required to be used behind a pointer so that the compiler can embed the metadata necessary to actually use them.

fn main() {
    // Has a known size to the compiler.
    let source: [u8; 5] = [2, 4, 8, 16, 32];

    // Has an unknown size to the compiler.
    let dst: &[u8] = &source[0..3];

    // And yet we can somehow still find its size as runtime.
    assert_eq!(dst.len(), 3);
}
NOTE

You may argue that the compiler could infer that dst is 3 bytes, but imagine scenarios where this wouldn't be possible. A good one that I can think of is taking a &[u8] as a function argument. The function needs to work with any slice, no matter its size.

What metadata is stored?

I've evaded this question long enough. What metadata is actually being stored in fat pointers to facilitate DSTs? This is type dependent, so here's a quick list:

  • For slices, the metadata is a number describing the length.
  • For str slices, the metadata is a number describing the length, specifically in bytes.
  • For trait objects, the metadata is a pointer to a static VTable used in dynamic dispatch.
  • For structs that wrap DSTs3, their metadata is the metadata of the wrapped DST.

And that's it! For all slices, the metadata can be thought of as the length. Trait objects are a bit more complex, but I hope to cover them in a later blog post.

Reading Metadata

The unstable std::ptr::metadata function can be used to read the metadata of any pointer.

#![feature(ptr_metadata)]

use std::ptr::metadata;

fn main() {
    let slice: &[u8] = &[1, 2, 3];
    assert_eq!(metadata(slice), 3);

    let string = "Boo!";
    assert_eq!(metadata(string), 4);

    // Create a DST wrapper type.
    struct Wrapper<T: ?Sized> {
        foo: bool,
        bar: T,
    }

    // The metadata is the size of `bar`.
    let wrapper: &Wrapper<[bool]> = &Wrapper {
        foo: true,
        bar: [false, true],
    };
    // `bar` has a length of 2.
    assert_eq!(metadata(wrapper), 2);

    // Thin pointers have no metadata, so they're metadata is a unit type.
    let thin: u8 = 2;
    assert_eq!(metadata(&thin), ());
}

metadata works in combination with the Pointee trait, which supplies an associated type for the metadata.

pub trait Pointee {
    type Metadata: Copy + Send + Sync + Ord + Hash + Unpin;
}

Pointee is implemented for every type. For most, Metadata will be the unit () type. It only differs for dynamically sized types, which range from just the raw usize to the more complicated DynMetadata<Dyn> type.

Conclusion

Hey, you made it! Congratulations! I hope you enjoyed reading this article and learned a little more about Rust. I found pointer metadata interesting because it exposes fun implementation details that aren't known to the average Rustacean. As per usual, here are a few more links to dive into related to pointer metadata:

If you have any questions, feel free to comment on my post in the Rust Users' Forum or contact me by creating an issue in my Github repository. The source code for all examples in this post can be found here.

Cheers!

Footnotes

  1. See the Book on the Rust Borrow Checker. I'd hope that the reader would know about Rust ownership rules, but I don't really know who my audience is yet. :P ↩
  2. Or 64 bits, as stated previously. bits = 8 * bytes ↩
  3. See object safety in the Rust Reference. You can created structures that wrap DSTs as long as it doesn't contain more than one DST (along with a bunch of other restrictions). ↩ ↩2