Intercepting Allocations with the Global Allocator

PREFACE

This post is directed towards programmers experienced with the Rust programming language that are interested in manual memory management. I will not explain some concepts like pointers, statics, etc. If you want to learn how to program in Rust, you can find a helpful page here.

It is possible to replace the global heap allocator used by Box<T>, Vec<T>, and more. This is useful if you want to use a custom allocator like jemalloc for the features it provides, or if you are working in a #![no_std] context without an OS allocator.

This article is going to cover:

  • The GlobalAlloc trait
  • The System allocator
  • Wrapping the System allocator with custom code
  • The wg-allocators' plans for the future

The GlobalAlloc Trait

GlobalAlloc is a trait that was stabilized in Rust 1.28. It is meant to be implemented to any struct that wishes to replace the default allocator. Here's a simplified version of its definition:

pub unsafe trait GlobalAlloc {
    // Required methods
    unsafe fn alloc(&self, layout: Layout) -> *mut u8;
    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout);

    // Provided methods (out of scope for this article)
    ...
}

Any struct that wants to become a global allocator has to be able to both allocate and deallocate regions of memory. It can approach this in many different ways, but all implementations use a pointer and a size to represent memory.

Take alloc() for example. It takes a Layout, which represents a size and alignment1 for a block of memory. It then returns a pointer to the first byte (u8) of the freshly allocated memory.

dealloc() is the inverse of this. It takes the pointer and the layout, then internally deallocates the memory block based on the given information.

The System Allocator

If you are running on a computer that the standard library supports, then you have access to the System default global allocator. If you don't specify a global allocator, then this will be used instead.

TIP

System uses libc's malloc and free functions on Unix platforms, while it uses HeapAlloc and associated functions on Windows.

Lucky for us, System already implements GlobalAlloc. Without changing anything, here's an overly verbose program that manually registers the System allocator:

// Import the System allocator.
use std::alloc::System;

// This attribute registers the global allocater.
#[global_allocator]
static A: System = System;

fn main() {
    // Let's allocate on the heap to prove that it works.
    let _ = Box::new(103);
}

To register a global allocator, you initialize it in a static item and annotate it with the #[global_allocator] attribute. There may only be one global allocator in a binary, but there is no restriction on where the static is and what visibility it has.

WARNING

It is actually possible for a dependency crate to register a global allocator without the consumer knowing of it. This is very bad practice, as the end-user has no control over it. Please don't do this, or at least feature-gate it.

Wrapping System

Since GlobalAlloc is a public trait, it is very possible to define our own allocator. Instead of going into the nitty-gritty of writing one from scratch, let's instead pass these allocation calls to System. This can let us intercept information, which we will see in the next section. Here's our new code:

use std::alloc::{GlobalAlloc, Layout, System};

// This is our custom allocator!
pub struct MyAlloc;

unsafe impl GlobalAlloc for MyAlloc {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        // Pass everything to System.
        System.alloc(layout)
    }

    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        System.dealloc(ptr, layout)
    }
}

// Register our custom allocator.
#[global_allocator]
static A: MyAlloc = MyAlloc;

fn main() {
    // String, like Box, also allocates on the heap.
    let _ = String::from("Boo!");
}

When running this program, you'll notice that it functions no different than before. We still use System, but we can now run our own code for each allocation. (Which is exactly what we're going to do next!)

Counting Allocations

Say you are profiling some code and want to track how many heap allocations were made. This can easily be implemented with a custom allocator.

use std::{
    alloc::{GlobalAlloc, Layout, System},
    sync::atomic::{AtomicU64, Ordering},
};

// This is our counting allocator. It wraps a u64 that stores our actual count.
pub struct Counter(AtomicU64);

impl Counter {
    // A const initializer that starts the count at 0.
    pub const fn new() -> Self {
        Counter(AtomicU64::new(0))
    }

    // Returns the current count.
    pub fn count(&self) -> u64 {
        // We're using Relaxed since there is only 1 synchronization primitive.
        self.0.load(Ordering::Relaxed)
    }
}

unsafe impl GlobalAlloc for Counter {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        // Increment our counter by 1. See other comment on Ordering.
        self.0.fetch_add(1, Ordering::Relaxed);
        System.alloc(layout)
    }

    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        // No modifications here! :)
        System.dealloc(ptr, layout)
    }
}

#[global_allocator]
static A: Counter = Counter::new();

fn main() {
    // Track initial count and count after allocating once.
    let count = A.count();
    let _ = Box::new(1);
    let new_count = A.count();

    // count = 3, new_count = 4.
    dbg!(count);
    dbg!(new_count);
}

Wait, wait! Don't go! I know it uses atomics, but we can work through this together.

This allocator keeps a bit of internal state, a u64 number to be exact. This number starts at 0 and increments by 1 every time alloc() is called. In order main() function, we track how many allocations were made before and after a Box was constructed. We know our allocator works because new_count is greater by 1 than count.

Realistically that is all the knowledge you need to understand this allocator. You may have spotted the AtomicU64 though with its load() and fetch_add() calls. This is an advanced synchronization primitive that prevents data races in concurrency. Since the global allocator is used by the entire program, it is perfectly possible for it to be used by multiple threads. Our AtomicU64 ensures that every allocation is counted, no matter where it was called from.

As much as I would love to talk about them, threads and data races are out of the scope of this article. (It's getting long enough as it is!) If you want to learn more about this fascinating subject, please see the std::thread module and Mara Bos's fantastic book.

NOTE

If we wanted to follow best practices, the AtomicU64 would be stored in a separate static and Counter would remain a zero-sized-type (ZST). I felt that the example provided would be easier for a beginner to understand, even if it is not perfect.

Towards the Future

And that's it! Thank you for reading this article, I hope you found it interesting. Global allocators can be a handy tool, though they do cover a relatively niche surface of Rust programming. There is a lot of work being put in by the wg-allocators working group to streamline this process and other related parts. Here are a few links that you may find interesting:

If you have any questions, feel free to comment on my post in the Rust Users' Forum or create a new issue in my Github repository. The source code for all examples is available here.

Have a great day!

Footnotes

  1. Alignment is used for structure packing. While it can be useful for implementing an allocator from scratch, that is out of scope for this article. For more information, you may find this Wikipedia article interesting.