Safe API around splitting a `Vec` into initialized elements and reserved capacity
⚓ Rust 📅 2025-09-19 👤 surdeus 👁️ 7I was writing some code where I was allocating groups of items into a single Vec. Specifically, I was building a computation graph.
Feel free to skip, but for those interested in building some context around my question, here is the code that's building fully connected "dense" layer in a multi layer perceptron:
vars.reserve((batch_count, output_count).product());
for (batch_index, output_index) in (batch_count, output_count).indices() {
let weights = View::new(&vars[0..bias_offset], weight_dims);
let biases = View::new(&vars[bias_offset..output_offset], bias_dims);
let output = /* ... */
vars.push(output);
}
What I don't particularly like is that we need to recreate these multi-dimensional View<&[T]> objects in the inner loop iteration because they reference parts of the vars: Vec<_> that were already inserted (outside of this new chunk of outputs).
Any implementation that would provide this type of API would have to be mindful that Vec is allowed to grow it's internal buffer, and such a reallocation could invalidate references that you've handed out to already initialized elements. However, it does seem like it should be possible to split a Vec into (&'a mut [T], Spare<'a, T>) where Spare only allows appending elements up to the already reserved capacity.
We do some searching and find Vec::split_at_spare_mut which returns (&mut [T], &mut [MaybeUninit<T>]). The tracking issue was created in 2021 and unfortunately for us it is currently still unstable.
Even if it was stable, you can safely write into the &mut [MaybeUninit<T>] but you would also need to tell the owning Vec<T> that you've added some elements for it, and Vec::set_len is unsafe!
I did not find any crates, though they probably exist, that provide this API so I went ahead and tried to implement it.
Here's the trait to be implemented on collection types that could support it:
pub trait SplitExtend<T> {
type Spare<'s>: ::std::iter::Extend<T>
where
Self: 's;
fn split_extend<'s>(&'s mut self) -> (&'s mut [T], Self::Spare<'s>);
/// Convenience function to reserve and then split_extend.
fn reserve_split_extend<'s>(&'s mut self, additional: usize) -> (&'s mut [T], Self::Spare<'s>);
}
When implemented for Vec<T>, it allows something like:
let mut vec = vec![1, 2, 3];
let (init, mut spare) = vec.reserve_split_extend(3);
assert_eq!(init, &[1, 2, 3]);
// Append elements derived from previous ones!
spare.extend(init.iter().copied().map(|i| i + init.len()));
assert_eq!(vec, &[1, 2, 3, 4, 5, 6]);
Here's an (seemingly functional but possibly unsound) implementation for Vec<T>
use std::marker::PhantomData;
use std::ptr::NonNull;
mod set_len_on_drop {
use std::{marker::PhantomData, ptr::NonNull};
// Similar to the std lib's but we have to take a Vec ptr rather than usize because we can not access private fields in
// a Layout-stable manner.
pub(super) struct SetLenOnDrop<'a, T> {
vec: NonNull<Vec<T>>,
len: usize,
_marker: PhantomData<&'a mut Vec<T>>,
}
impl<'a, T> SetLenOnDrop<'a, T> {
#[inline]
pub(super) fn new(vec: NonNull<Vec<T>>) -> Self {
SetLenOnDrop {
vec,
len: unsafe { vec.as_ref().len() },
_marker: PhantomData,
}
}
#[inline]
pub(super) fn add(&mut self, increment: usize) {
self.len += increment;
}
#[inline]
pub(super) fn get(&self) -> usize {
self.len
}
}
impl<T> Drop for SetLenOnDrop<'_, T> {
#[inline]
fn drop(&mut self) {
unsafe { self.vec.as_mut().set_len(self.len) }
}
}
}
use set_len_on_drop::SetLenOnDrop;
#[cold]
fn panic_exceeded_capacity<T>(vec: &Vec<T>) -> ! {
let spare = vec.capacity().wrapping_sub(vec.len());
panic!("Attempted to write more than {spare} elements exceeding the reserved capacity!")
}
pub struct Spare<'a, T> {
// Represented as a pointer because we can not use Vec operations that could reallocate. A reallocation would
// invalidate other mutable references we hand out.
vec: NonNull<Vec<T>>,
_marker: PhantomData<&'a mut Vec<T>>,
}
impl<'a, T> Spare<'a, T> {
fn new(vec: NonNull<Vec<T>>) -> Self {
Self {
vec,
_marker: PhantomData,
}
}
pub fn push(&mut self, item: T) {
unsafe {
let vec = self.vec.as_mut();
let len = vec.len();
let ptr = vec.as_mut_ptr();
let cap = vec.capacity();
if len >= cap {
panic_exceeded_capacity(vec)
}
std::ptr::write(ptr.add(len), item);
vec.set_len(len + 1)
}
}
}
impl<T> std::iter::Extend<T> for Spare<'_, T> {
fn extend<I: IntoIterator<Item = T>>(&mut self, iter: I) {
unsafe {
let vec = self.vec.as_mut();
let ptr = vec.as_mut_ptr();
let cap = vec.capacity();
let mut len = SetLenOnDrop::new(self.vec);
for item in iter {
if len.get() >= cap {
panic_exceeded_capacity(vec)
}
std::ptr::write(ptr.add(len.get()), item);
len.add(1)
}
}
}
}
impl<T> crate::SplitExtend<T> for Vec<T> {
type Spare<'a>
= Spare<'a, T>
where
Self: 'a;
fn split_extend<'s>(&'s mut self) -> (&'s mut [T], Self::Spare<'s>) {
let initialized = unsafe { std::slice::from_raw_parts_mut(self.as_mut_ptr(), self.len()) };
let spare = Spare::new(unsafe { NonNull::new_unchecked(self as *mut _) });
(initialized, spare)
}
fn reserve_split_extend<'s>(&'s mut self, additional: usize) -> (&'s mut [T], Self::Spare<'s>) {
self.reserve(additional);
self.split_extend()
}
}
#[cfg(test)]
mod tests {
use crate::SplitExtend;
#[test]
fn extend() {
let mut vec = vec![1, 2, 3];
let (init, mut spare) = vec.reserve_split_extend(3);
assert_eq!(init, &[1, 2, 3]);
spare.extend(init.iter().copied().map(|i| i + init.len()));
assert_eq!(vec, &[1, 2, 3, 4, 5, 6]);
}
#[test]
#[should_panic(
expected = "Attempted to write more than 0 elements exceeding the reserved capacity!"
)]
fn exceed() {
let mut vec: Vec<i32> = Vec::new();
let (init, mut extend) = vec.split_extend();
assert_eq!(init, &[]);
extend.extend([1, 2, 3].iter().copied());
}
}
Finally, here are some questions I'd like to get feedback on:
- any crates that implement this?
- how is my use of
PhantomDataandNonNull(lifetimes, variance, ...)? - how can I build confidence that this is sound?
4 posts - 2 participants
🏷️ Rust_feed