Sunday, June 26, 2022

Type-checked keypaths in Rust

Like a number of other folks, I have recently been exploring some slightly less well-worn corners of the Rust type system. In my particular case, this involves revisiting Swift-style keypaths. My previous experiments in this direction (which began at the recurse center) have never really gone anywhere, but recently I’ve come up with a new approach that feels promising; a preliminary sketch of this work is available as the keypath crate. With this crate, you can write the following:

#[derive(Keyable)]
struct Playlist {
    songs: Vec<Song>,
}

#[derive(Keyable)]
struct Song {
    name: String,
    artist: String,
    length: u32,
    favourite: bool,
}

fn main() {
    let playlist = make_sample_playlist();
    let first_song_name = keypath!(Playlist.songs[0].name);
    println!("first song title: {}", playlist[&first_song_name]);
}

Motivation

The idea behind a keypath is that it provides an uninvoked reference to a field on some type, and that field can be arbitrarily deep in a graph of objects.

This is particularly interesting to me in the context of my work on Druid, and more generally exploring patterns for building GUIs in Rust. Recently I have been spending quite a bit of time studying SwiftUI, where keypaths are used heavily. They can be helpful in a number of ways: for instance if you make a caller mutate something by passing a value and a keypath, you can apply custom update logic before or after the mutation (such as invalidating some dependency, or scheduling a redraw); similarly by abstracting away field access from the fields themselves, you can do fancy things like transparently delegate certain properties to an inner type. More prosaically, they provide a simple and ergonomic way to declare data relationships. For instance, imagine we have a have a Checkbox widget; we want this widget to toggle a bool when it is clicked, but we do not otherwise care about the data that bool comes from. We might have something like,

let checkbox = Checkbox::new(keypath!(Playlist.now_playing.favourite));

Without keypaths, the main way you might do something like this would probably be with closures, and you’d need separate closures for reading and writing:

let checkbox: Checkbox<Playlist> = Checkbox::new(
    |playlist| &playlist.now_playing.favourite,
    |playlist| &mut playlist.now_playing.favourite);

In this simple case, keypaths are easier to use and easier to reason about.

Implementation

I think it’s helpful to explain the implementation in two parts: the path traversal part, which is relatively straight-forward, and the type checking part, which is hackier.

Path traversal

Ultimately, the path traversal component of this work is not especially complicated. Types that wish to participate in keypaths have to implement the following trait:

/// A trait for types that expose their properties via keypath.
pub trait RawKeyable: 'static {
    fn as_any(&self) -> &dyn Any;
    fn as_any_mut(&mut self) -> &mut dyn Any;
    fn get_field(&self, path: &[PathComponent])
        -> Result<&dyn RawKeyable, FieldError>;
    fn get_field_mut(&mut self, path: &[PathComponent])
        -> Result<&mut dyn RawKeyable, FieldError>;
}

Where PathComponent is an enum of different possible ‘members’ a type might have, which currently looks like this (although it may certainly expand or change in the future):

/// A component of a keypath.
#[derive(Debug, Clone, Copy)]
pub enum PathComponent {
    /// An unnamed field, such as on a tuple or tuple struct
    Unnamed(usize),
    /// A named field.
    Named(&'static str),
    /// An index into a sequence, such as a vec.
    IndexInt(usize),
    /// An index into a map with string keys.
    IndexStr(&'static str),
}

In any case, all of the work here is done in the get_field[_mut] methods. These are called recursively, starting at the top object; at each layer, the implementor looks at the path slice. If it is empty, it means we’re at the end of the path, and we return self. Otherwise, we split off the first item in the path, and see if we have have a corresponding ‘member’. If we do, we call that member with the rest of the path, and if not we return an error.

As a concrete example, here is what an implementation might look like for a simple type (ignoring the _mut methods, which are trivially similar to their non-mut equivalents):

struct Person {
    name: String,
    friends: Vec<String>,
}

impl RawKeyable for Person {
    fn as_any(&self) -> &dyn Any {
        self
    }

    fn get_field(&self, path: &[PathComponent]) -> Result<&dyn RawKeyable,
    FieldError> {
        let (head, rest) = match path.split_first() {
            None => return Ok(self),
            Some(tup) => tup,
        };

        match head {
            PathComponent::Named("name") => self.name.get_field(rest),
            PathComponent::Named("friends") => self.friends.get_field(rest),
            _ => Err(FieldError::some_error())
        }
    }
}

This fairly simple trait gets us pretty far: we can make a slice of PathComponents, pass it to the appropriate type, and if the path exists we will get back a &dyn RawKeyable, which we can then try to downcast to a concrete type, by going through as_any. Putting this all together, we could write something like,

let person = Person {
    name: "coco".to_string(),
    friends: vec!["kiki".to_string(), "jojo".to_string()],
}

let path = &[PathComponent::Named("friends"), PathComponent::IndexInt(0)];
let kiki = person.get_field(&path).unwrap().as_any().downcast_ref::<String>();
assert_eq!(kiki.unwrap(), "kiki");

What is especially compelling about this approach is that (at least in clear cases like this) the compiler is smart enough to figure out what we’re doing, and when building for --release this generates the same code as direct field access (see godbolt).

Compile-time type-checking

While path traversal is important, it is the least exciting of the two parts of the story. The bit I find especially interesting is built on top of that traversal, and lets us validate at compile time that a given keypath exists, and that the to generate a strongly-typed KeyPath object that represents that known route, as well as the types of the first object and the final value. We then take advantage of this with a new trait, that allows infallible access to the values of known keypaths, which lets us write code like this:

let kiki: Keypath<Person, String> = keypath!(Person.friends_names[0]);
assert_eq!(person[&kiki], "kiki");

Implementation

Getting this working was… a journey. Beyond just stretching my comfort with Rust, it stretches my ability to communicate about Rust.

Our goal is simple enough: we want a way to generate code, at compile time, that can verify that a particular path exists, starting at a base type (the root) and ending up at some other type (the value). Importantly, we need to do this with only access to types; we can’t work with actual instances of those types. This sort of type-level programming is tricky in Rust.

To explain the eventual solution, I will first sketch out what an ‘ideal’ solution might look like: by this I mean a fairly clear, plausible-seeming implementation strategy.

What we want

Let’s start with a KeyPath type, that looks something like:

KeyPath<Root, Value> {
    _root: PhantomData<Root>,
    _value: PhantomData<Value>,
    path: Vec<PathComponent>,
}

For the purpose of these examples, let’s work with the following structs:

struct Person {
    name: String,
    stats: Stats,
}

struct Stats {
    // in furlongs per fortnight
    speed: f32,
    // in kilograms per meter per second
    viscosity: f32,
}

We would like to be able to write keypath!(Person.stats.speed), and generate a KeyPath<Person, f32>; and we would like keypath!(Person.stats.missing_field) to fail to compile. What should this look like?

As a first approach, maybe we could do all of the work in the KeyPath type, by defining the routes in impl blocks there; and then we need some way of getting a ‘root’ keypath to begin from. For instance, here, we might have:

impl Person {
    fn __keypath_root() -> KeyPath<Person, Person> {
        KeyPath::infer_types(vec![])
    }
}

impl<T> KeyPath<T, Person> {
    fn name(self) -> KeyPath<Person, String> {
        let mut path = self.path;
        path.append(PathComponent::Named("name"));
        KeyPath::infer_types(path)
    }
    fn stats(self) -> KeyPath<Person, Stats> { /* generate keypath */ }
}

impl<T> KeyPath<T, Stats> {
    fn speed(self) -> KeyPath<Person, f32> { /* generate keypath */ }
    fn viscosity(self) -> KeyPath<Person, f32> { /* generate keypath */ }
}

And then we could imagine generating code that looked like,

let valid_keypath = Person::__keypath_root().stats().speed();

That looks nice! Unfortunately, this approach does not work. The first problem is with the ‘inherent impl’ that we declare for the different KeyPath types; that is, the routes. Rust does not allow you to make ‘inherent impl’ for a type in an external crate; in this case, the KeyPath type is declared in the keypath crate, which means it is external to wherever this code is being generated. We could work around this by requiring the user to declare an appropriate KeyPath struct in their crate root, but then we bump into the second problem: we’re also using inherent impls to generate the root keypaths, so there would be no way to add get the root keypath for a std type like a HashMap or a Vec, and that does not feel like an acceptable limitation.

What we get

After experimenting with a bunch of ways to try and make this work, I settled on something that is slightly less elegant; this involves generating ‘mirror’ structs. The basic idea is that each participating type implements a trait that has an associated type, the mirror.

trait Keyable {
    type Mirror;
    fn mirror() -> Self::Mirror;
}

The mirror type itself is a way of inspecting the types of the fields on the main struct; the mirror has the same fields as the parent type, but those fields themselves return the mirror of the type of the parent’s corresponding field.

struct Stats {
    // in furlongs per fortnight
    speed: f32,
    // in kilograms per meter per second
    viscosity: f32,
}

struct StatsMirror {
    speed: <f32 as Keyable>::Mirror,
    viscosity: <f32 as Keyable>::Mirror,
}

Aside: it was only when writing this up that I realized the above syntax was even legal; prior to this I was doing another weird trick to declare the fields.

(For non-fields (indexes into collections) mirrors have specially-named methods that return the mirror of the collection’s member type.)

This gives us a nice way to check at compile-time that fields exist; the final step is to create the correct KeyPath type once we’ve verified the path. This is simple enough; we know the root type already, and we know the type of the mirrored object when we generate the mirror; so we can just add a method to each mirror type that looks like,

impl StatsMirror {
    fn to_key_path_with_root<Root>(self, fields: &[PathComponent])
        -> KeyPath<Root, Stats> { /* */ }
}

Put this all together, and the code let path = keypath!(Person.stats.speed); ends up generating (approximately) the following code:

let path = Person::mirror()
    .size
    .height
    .to_key_path_with_root::<Person>(&[
        PathComponent::Named("stats"),
        PathComponent::Named("speed")
    ]);

And that works just fine. A nice quality of the mirror objects is that they are all zero-sized types, and shouldn’t have any impact on the generated binary; they’re only used at compile-time, to ensure that paths are valid. (Impact on compile time is still possible, and I haven’t investigated that enough to say anything useful).

Codegen and performance

Despite all of the use of Any and all the generated types, the generated code can be quite efficient. In particular the ‘mirror’ types are all zero-sized, and don’t make it into the final binary; and in many cases the traversal of the fields can be inlined, and the generated code for accessing a keypath looks identical to direct field access.

The main overhead is in the additional code generated for the Keyable trait, but this should be modest in comparison to the code that is generated for things like serde.

I haven’t investigated performance as deeply as I hope to, but the preliminary indications are encouraging.

Some possible next steps

My focus thus far has been on proving out the concept: figuring out how to traverse the object graph, and then figuring out how to generate type information at compile time. This seems to work, but I haven’t put much work into making it useful, although I have some ideas in this direction.

refine the API

An important next step is going to be figuring out what the API should look like. Currently, we only support strongly-typed keys, but we may want to consider a sort of hierarchy of types. Swift distinguishes between AnyKeyPath (with neither root nor value types known) PartialKeyPath (with only the root type known) and KeyPath, which knows both. In particular, PartialKeyPath may be useful in that you could store paths with different value types in a single collection. It might also be interesting to do things like allow types to inspect and override paths, or to provide custom ‘fields’ backed by functions, etcetera. Similarly the derive macro could be improved with the addition of various attributes and customizations, and it could be possible to support keypaths for enums, with collection-like behaviour; if a given path is only valid for a subset of enum variants, it could resolve to Option<T>.

Better collection support

Currently collection access will panic at runtime if a key or index doesn’t exist in the collection. I would like to refine this, and either make collection access always return some sort of Option type, or else introduce syntax into the macro to allow the user to opt-in to cascading optionals. For instance, we might have syntax like,

let key: KeyPath<Person, u16> = keypath!(Person.friends[0].age);
let opt_key: KeyPath<Person, Option<u16>> = keypath!(Person.friends[0]?.age);

where explicitly providing a ? in the keypath allows the collection access to return an Option<T> that is propagated through to the user.

In addition, it may be interesting to provide an additional API for working with types like serde_json::Value, which contain heterogeneous data. The idea here is that it would be nice to be able to retrieve specific forms of data from these sorts of collections, so that you can attempt to retrieve a key as a u64, instead of as a serde_json::Number.

Fun stuff: predicates, orderings, etc

Another possible case for keypaths is as a DSL for creating predicates and orderings for things like constructing database queries or filtering collections. We could imagine writing something like,

let keypath: KeyPath<Person, DateTime> = keypath!(Person.registration.date);
let is_old_user: Predicate<Person> = keypath.less_than(REFERENCE_DATE);
let is_precocious: Predicate<Person> = keypath!(Person.info.age).less_than(3);
// oh no:
let extremely_precocious = is_old_user && is_precocious;

for person in data.people.iter().filter_by_predicate(&extremely_precocious) {
    /* hihi */
}

Or for orderings,

let ordering: Ordering<Person> = keypath!(Person.rank)
    .descending()
    .then(keypath!(Person.registration.signup_date).ascending())

data.people.sort_by_ordering(&ordering);

In any case this is all very speculative, and intended primarily as an illustration of some of the things that this might make possible.

conclusions

This work is not quite developed enough to be currently useful, but I do think it’s at a point where it’s worth sharing. In particular, I think there are a bunch of possible directions for this work that could be interesting, and I’m sure there are lots of good ideas that just haven’t occurred to me personally. I also have some reservations about the implementation, and I’m curious if there are problems with my approach that I haven’t identified yet.



from Hacker News https://ift.tt/4NEpwPc

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.