Basics
NARRATOR: This section has a looming fundamental error in it, because that's the whole point of the book. However once we start using
unsafe
it's possible to do things wrong and still have everything compile and seemingly work. The fundamental mistake will be identified in the next section. Don't actually use the contents of this section in production code!
Alright, back to basics. How do we construct our list?
Before we just did:
impl<T> List<T> {
pub fn new() -> Self {
List { head: None, tail: None }
}
}
But we're not using Option for the tail
anymore:
> cargo build
error[E0308]: mismatched types
--> src/fifth.rs:15:34
|
15 | List { head: None, tail: None }
| ^^^^ expected *-ptr, found
| enum `std::option::Option`
|
= note: expected type `*mut fifth::Node<T>`
found type `std::option::Option<_>`
We could use an Option, but unlike Box, *mut
is nullable. This means it
can't benefit from the null pointer optimization. Instead, we'll be using null
to represent None.
So how do we get a null pointer? There's a few ways, but I prefer to use
std::ptr::null_mut()
. If you want, you can also use 0 as *mut _
, but that
just seems so messy.
use std::ptr;
// defns...
impl<T> List<T> {
pub fn new() -> Self {
List { head: None, tail: ptr::null_mut() }
}
}
cargo build
warning: field is never used: `head`
--> src/fifth.rs:4:5
|
4 | head: Link<T>,
| ^^^^^^^^^^^^^
|
= note: #[warn(dead_code)] on by default
warning: field is never used: `tail`
--> src/fifth.rs:5:5
|
5 | tail: *mut Node<T>,
| ^^^^^^^^^^^^^^^^^^
warning: field is never used: `elem`
--> src/fifth.rs:11:5
|
11 | elem: T,
| ^^^^^^^
warning: field is never used: `head`
--> src/fifth.rs:12:5
|
12 | head: Link<T>,
| ^^^^^^^^^^^^^
shush compiler, we will use them soon.
Alright, let's move on to writing push
again. This time, instead of grabbing
an Option<&mut Node<T>>
after we insert, we're just going to grab a
*mut Node<T>
to the insides of the Box right away. We know we can soundly do
this because the contents of a Box has a stable address, even if we move the
Box around. Of course, this isn't safe, because if we just drop the Box we'll
have a pointer to freed memory.
How do we make a raw pointer from a normal pointer? Coercions! If a variable is declared to be a raw pointer, a normal reference will coerce into it:
let raw_tail: *mut _ = &mut *new_tail;
We have all the info we need. We can translate our code into, approximately, the previous reference version:
pub fn push(&mut self, elem: T) {
let mut new_tail = Box::new(Node {
elem: elem,
next: None,
});
let raw_tail: *mut _ = &mut *new_tail;
// .is_null checks for null, equivalent to checking for None
if !self.tail.is_null() {
// If the old tail existed, update it to point to the new tail
self.tail.next = Some(new_tail);
} else {
// Otherwise, update the head to point to it
self.head = Some(new_tail);
}
self.tail = raw_tail;
}
> cargo build
error[E0609]: no field `next` on type `*mut fifth::Node<T>`
--> src/fifth.rs:31:23
|
31 | self.tail.next = Some(new_tail);
| ----------^^^^
| |
| help: `self.tail` is a raw pointer;
| try dereferencing it: `(*self.tail).next`
Huh? We have a pointer to a Node, why can't we get the next
field?
Rust is kinda a jerk when you use raw pointers. To access the contents of a raw pointer, it insists that we manually deref them, because it's such an unsafe operation. So let's do that:
*self.tail.next = Some(new_tail);
> cargo build
error[E0609]: no field `next` on type `*mut fifth::Node<T>`
--> src/fifth.rs:31:23
|
31 | *self.tail.next = Some(new_tail);
| -----------^^^^
| |
| help: `self.tail` is a raw pointer;
| try dereferencing it: `(*self.tail).next`
Uuuugh operator precedence.
(*self.tail).next = Some(new_tail);
> cargo build
error[E0133]: dereference of raw pointer is unsafe and requires
unsafe function or block
--> src/fifth.rs:31:13
|
31 | (*self.tail).next = Some(new_tail);
| ^^^^^^^^^^^^^^^^^ dereference of raw pointer
|
= note: raw pointers may be NULL, dangling or unaligned;
they can violate aliasing rules and cause data races:
all of these are undefined behavior
THIS. SHOULDN'T. BE. THIS. HARD.
Remember how I said Unsafe Rust is like an FFI language for Safe Rust? Well, the
compiler wants us to explicitly delimit where we're doing this FFI-ing. We have
two options. First, we can mark our entire function as unsafe, in which case
it becomes an Unsafe Rust function and can only be called in an unsafe
context. This isn't great, because we want our list to be safe to use. Second,
we can write an unsafe
block inside our function, to delimit the FFI boundary.
This declares the overall function to be safe. Let's do that one:
pub fn push(&mut self, elem: T) {
let mut new_tail = Box::new(Node {
elem: elem,
next: None,
});
let raw_tail: *mut _ = &mut *new_tail;
if !self.tail.is_null() {
// Hello Compiler, I Know I Am Doing Something Dangerous And
// I Promise To Be A Good Programmer Who Never Makes Mistakes.
unsafe {
(*self.tail).next = Some(new_tail);
}
} else {
self.head = Some(new_tail);
}
self.tail = raw_tail;
}
> cargo build
warning: field is never used: `elem`
--> src/fifth.rs:11:5
|
11 | elem: T,
| ^^^^^^^
|
= note: #[warn(dead_code)] on by default
Yay!
It's kind've interesting that that's the only place we've had to write an unsafe block so far. We do raw pointer stuff all over the place, what's up with that?
It turns out that Rust is a massive rules-lawyer pedant when it comes to
unsafe
. We quite reasonably want to maximize the set of Safe Rust programs,
because those are programs we can be much more confident in. To accomplish this,
Rust carefully carves out a minimal surface area for unsafety. Note that all
the other places we've worked with raw pointers has been assigning them, or
just observing whether they're null or not.
If you never actually dereference a raw pointer those are totally safe things to do. You're just reading and writing an integer! The only time you can actually get into trouble with a raw pointer is if you actually dereference it. So Rust says only that operation is unsafe, and everything else is totally safe.
Super. Pedantic. But technically correct.
NARRATOR: Somewhere on the other side of the world, a hardware engineer feels a shiver down her spine — someone must be insisting pointers are just integers again. She looks down at her proposal for a new hardware pointer authentication scheme and sheds a single tear. The compiler engineer next door feels nothing — they long ago learned to always wear a heavy sweater.
Having only some of the pointer operations be actually unsafe raises an
interesting problem: although we're supposed to delimit the scope of the
unsafety with the unsafe
block, it actually depends on state that was
established outside of the block. Outside of the function, even!
This is what I call unsafe taint. As soon as you use unsafe
in a module,
that whole module is tainted with unsafety. Everything has to be correctly
written in order to make sure all invariants are upheld for the unsafe code.
This taint is manageable because of privacy. Outside of our module, all of our struct fields are totally private, so no one else can mess with our state in arbitrary ways. As long as no combination of the APIs we expose causes bad stuff to happen, as far as an outside observer is concerned, all of our code is safe! And really, this is no different from the FFI case. No one needs to care if some python math library shells out to C as long as it exposes a safe interface.
Anyway, let's move on to pop
, which is pretty much verbatim the reference
version:
pub fn pop(&mut self) -> Option<T> {
self.head.take().map(|head| {
let head = *head;
self.head = head.next;
if self.head.is_none() {
self.tail = ptr::null_mut();
}
head.elem
})
}
Again we see another case where safety is stateful. If we fail to null out the
tail pointer in this function, we'll see no problems at all. However
subsequent calls to push
will start writing to the dangling tail!
Let's test it out:
#[cfg(test)]
mod test {
use super::List;
#[test]
fn basics() {
let mut list = List::new();
// Check empty list behaves right
assert_eq!(list.pop(), None);
// Populate list
list.push(1);
list.push(2);
list.push(3);
// Check normal removal
assert_eq!(list.pop(), Some(1));
assert_eq!(list.pop(), Some(2));
// Push some more just to make sure nothing's corrupted
list.push(4);
list.push(5);
// Check normal removal
assert_eq!(list.pop(), Some(3));
assert_eq!(list.pop(), Some(4));
// Check exhaustion
assert_eq!(list.pop(), Some(5));
assert_eq!(list.pop(), None);
// Check the exhaustion case fixed the pointer right
list.push(6);
list.push(7);
// Check normal removal
assert_eq!(list.pop(), Some(6));
assert_eq!(list.pop(), Some(7));
assert_eq!(list.pop(), None);
}
}
This is just the stack test, but with the expected pop
results flipped around.
I also added some extra steps at the end to make sure that tail-pointer
corruption case in pop
doesn't occur.
cargo test
running 12 tests
test fifth::test::basics ... ok
test first::test::basics ... ok
test fourth::test::basics ... ok
test fourth::test::peek ... ok
test second::test::basics ... ok
test fourth::test::into_iter ... ok
test second::test::into_iter ... ok
test second::test::iter ... ok
test second::test::iter_mut ... ok
test second::test::peek ... ok
test third::test::basics ... ok
test third::test::iter ... ok
test result: ok. 12 passed; 0 failed; 0 ignored; 0 measured
Gold Star!
NARRATOR: Here it comes...