Tuesday, April 4, 2023

Calling Purgatory from Heaven: Binding to Rust in Haskell

Calling hell from heaven and heaven from hell is a classic paper from the previous century, introducing the Haskell foreign function interface (FFI). It describes the facilities that Haskell offers for calling functions written in C (and vice versa). In this blog post, we will consider how to call functions written in Rust instead: not quite hell, but not quite heaven either.

We will make use of two libraries that we wrote to streamline this process: a Haskell-side library called foreign-rust, and a Rust-side library called haskell-ffi. We developed these libraries as part of the development of Be, a (smart) contract platform; we are thankful to them for making these libraries open source. That said, this blog post should also be useful for people who do not want to use these libraries, and indeed, we will also show examples of interop that do not rely on them. The source code for the examples discussed in this blog post can be found at https://github.com/well-typed/blogpost-purgatory.

Getting started

Binding to Rust functions from Haskell is not quite as convenient as binding to C functions. The common denominator between Rust and Haskell is C, and so we have to do two things: we have to write a Rust-side wrapper that exposes the functionality we want to bind against as C functions, and then write Haskell-side bindings against these C functions.

Our running example in this blog post will therefore consist of a Rust library which we will call rust-wrapper, and a Haskell library which we will call haskell-wrapper. To get us started, let’s see if we can pass two numbers from Haskell to Rust, add them Rust-side, and then print the result Haskell-side.

Rust

Create a new Rust crate for our new rust-wrapper library, and then add the following to the Cargo.toml file:

[dependencies]
haskell-ffi.git = "https://github.com/BeFunctional/haskell-rust-ffi.git"
haskell-ffi.rev = "2bf292e2e56eac8e9fb0fb2e1450cf4a4bd01274"

[features]
capi = []

[package.metadata.capi.library]
versioning = false

After declaring the dependency on the haskell-ffi library, the features and package.metadata.capi.library sections are for the benefit of cargo cbuild; we will see momentarily how to use this tool. First, however, add this function to the library’s lib.rs:

#[no_mangle]
pub extern "C" fn rust_wrapper_add(x: u64, y: u64) -> u64 {
    x + y
}

The extern "C" directive tells the Rust compiler that this function should use the C calling convention. The no_mangle attribute ensures that the Rust compiler won’t change the name of our function to something unrecognizable, so that we know what the function is called in our Haskell bindings. This does mean that the function name should be unique across any C libraries that we might link against, which is why we will prefix the names of all external functions with rust_wrapper_. (See Calling Rust code from C from the Rustonomicon for more details.)

Now compile the library with

cargo cbuild

(you might need to install the cargo-c applet for cargo first.) This will create a bunch of files, but three are of particular interest:

  • target/<arch>/debug/librust_wrapper.so: this is the shared object that our Haskell application will need to link against.

  • target/arch>/debug/rust_wrapper.h: this is the C header file that we will need to compile our Haskell-side bindings. For our running example so far, this header will contain

    uint64_t rust_wrapper_add(uint64_t x, uint64_t y);
  • target/<arch>/debug/rust_wrapper-uninstalled.pc: this is a pkg-config file which contains the C compiler and linker flags (including paths) that we will need Haskell-side to know where the .so and .h files that we just described are.

Finally, we will need to set two environment variables; the first will ensure that we can find the pkg-config file, and the second ensures that when we run our application (after building and linking it), the .so file can still be found:

export PKG_CONFIG_PATH=<path>/rust-wrapper/target/<arch>/debug
export LD_LIBRARY_PATH=<path>/rust-wrapper/target/<arch>/debug

Haskell

On the Haskell side, create a new package, and then add this to the library section of the .cabal file:

build-depends: .., foreign-rust
build-tool-depends: c2hs:c2hs
pkgconfig-depends: rust_wrapper-uninstalled

The first declares a dependency on c2hs; this is a preprocessor that we will use to write our bindings; the second declares the dependency on the Rust library; cabal will use pkg-config to figure out which compiler and linker flags are required (thereby also figuring out where that Rust library is). While the library is not yet released to Hackage, we’ll need to add the repo to our cabal.project file:

source-repository-package
  type: git
  location: https://github.com/BeFunctional/haskell-foreign-rust.git
  tag: 90b1c210ae4e753c39481a5f3b141b74e6b6d96e

For this simple example we don’t benefit much from c2hs, but will nonetheless use it to bind to our add function, to give us a chance to introduce its basic syntax. (For a detailed discussion of the syntax of c2hs, see the c2hs User Guide.)

module C.GettingStarted (rustWrapperAdd) where

#include "rust_wrapper.h"

import Data.Word

{# fun pure unsafe rust_wrapper_add as rustWrapperAdd
     { `Word64'
     , `Word64'
     }
  -> `Word64'
#}

This declares a function which

  • is called rust_wrapper_add C-side, but should be called rustWrapperAdd Haskell-side

  • has two arguments, both of type Word64

  • has a result also of type Word64

  • is pure: the signature of the function should be

    rustWrapperAdd :: Word64 -> Word64 -> Word64

    rather than

    rustWrapperAdd :: Word64 -> Word64 -> IO Word64 -- unnecessary

    (because calling the function twice with the same inputs will give the same results)

  • uses an unsafe call: unsafe calls can be used for foreign functions that do not call back into the Haskell runtime; this gives the RTS some guarantees which it can take advantage of to make the foreign call more efficient (see also Guaranteed call safety in the ghc manual)

This should be sufficient; if we now start a repl (cabal repl) and import our module, we should be able to test our function:

ghci> rustWrapperAdd 2 3
5

Marshalling data

It is easy enough to ferry individual Word64 over and back, but Haskell and Rust are both languages with rich type systems. If we want to transfer more complex values across the language barrier, we have two choices: we can either serialize and deserialize, or we can use pointers to the data. The first option is the easier and less fragile, as it avoids Haskell-side managing of values that live on the Rust-side heap; it is this approach that the haskell-ffi and foreign-rust libraries aim to streamline. We will consider the second option in section Avoiding serialization.

For a more realistic example, therefore, we will consider binding against a Rust function that constructs a self-signed x509 certificate with corresponding secret key, given a list of domain names:

fn generate_simple_self_signed(alt_names: Vec<String>) -> (Certificate, SecretKey)

Interlude: (no) orphans in Rust

Although of course the precise definitions differ, the basic concept of an orphan instance is similar in Haskell and in Rust. An orphan instance is an instance of some type class (or trait) C for some type T

or

which is not “bundled with” either the definition of C or the definition of T (where “bundled with” means “same module” in Haskell, and “same package” in Rust). This ensures coherence: it can never happen that we get two different instances in scope (for the same C and T) when we import two different modules.

However, where the introduction of an orphan instance in Haskell is merely a compiler warning, which we can choose to ignore (thereby taking on the responsibility ourselves to ensure coherence), in Rust it is an error: we really cannot introduce an orphan instance.

This can be quite a serious limitation. For example, suppose we want to have a (Rust side) type class for “things we can marshall to Haskell.” If this type class is defined in an external package, and we want to marshall a type defined in a different package, unless there is instance already defined in either of these two packages, we are stuck: we cannot provide an instance ourselves (since it would be an orphan). The haskell-ffi library therefore adopts a workaround, which we will discuss now.

Marshalling in haskell-ffi

Central to the haskell-ffi library is the definition of two traits (type classes), for data that can be marshalled to and from Haskell respectively:

pub type Error     = Box<dyn std::error::Error + Send + Sync>;
pub type Result<T> = core::result::Result<T, Error>;

pub trait ToHaskell<Tag> {
    fn to_haskell<W: Write>(&self, writer: &mut W, tag: PhantomData<Tag>) -> Result<()>;
}

pub trait FromHaskell<Tag>: Sized {
    fn from_haskell(buf: &mut &[u8], tag: PhantomData<Tag>) -> Result<Self, Error>;
}

These are similar to the BorshSerialize and BorshDeserialize traits from the borsh crate (package), and indeed, ToHaskell and FromHaskell have all the standard instances that make it compatible with the Borsh binary serialization format.

The definition of these two traits might look a bit obscure to a Haskell programmer; a rough Haskell translation might be

class ToHaskell tag a where -- illustration only
  toHaskell :: forall w. Write w => a -> IORef w -> Proxy tag -> IO ()

class Sized a => FromHaskell tag a where -- illustration only
  fromHaskell :: IORef (Vector Word8) -> Proxy tag -> IO a

Some points:

  • The definition of Error is describing a boxed value of existential type, which is required to satisfy a few instance (aka implement a few traits), most notably std::error::Error; this corresponds nearly 1:1 with SomeException in Haskell:

    data SomeException = forall e. Exception e => SomeException e
  • W: Write is known as a trait bound in Rust, and corresponds to ad-hoc polymorphism in Haskell.

  • Rust does not really have multi-parameter type classes; the additional Tag parameter is an example of what is (confusingly) called generics in Rust, and corresponds roughly to parametric polymorphism in Haskell (although the two concepts don’t align perfectly).

  • PhantomData in Rust, like Proxy in Haskell, serves only as a hint to the type checker: here, to determine the type Tag.

The Tag argument allows us to work around the no-orphans limitation. The haskell-ffi library can introduce instances that are polymorphic in a choice of tag, such as

impl<Tag, T: ToHaskell<Tag>> ToHaskell<Tag> for Option<T>

corresponding to

instance ToHaskell tag t => ToHaskell tag (Maybe t) -- illustration only

but, as we shall see momentarily, we can also introduce additional instances in other libraries (such as our rust-wrapper library), provided that we choose a specific tag.

Example: Rust

Let’s now get back to our example. Recall that we want to bind to a Rust function with this signature:

fn generate_simple_self_signed(alt_names: Vec<String>) -> (Certificate, SecretKey)

To do that, we need to write a wrapper function that we expose as a C function. The wrapper will have two arguments for each argument of the function we are wrapping, as well as two arguments for the result:

#[no_mangle]
pub extern "C" fn rust_wrapper_generate_simple_self_signed(
    alt_names: *const u8,
    alt_names_len: usize,
    out: *mut u8,
    out_len: &mut usize,
) {
    ..
}

For each argument of the original function, we have a pair of C arguments: the first is a pointer to a buffer containing a serialized form of the argument, and the second is the length of that buffer. For the result of the original function we likewise have a pair of C arguments: the first points to a buffer that the result will be serialized to, and the second must initially contain the size of that buffer, and is overwritten to contain the required size of the buffer when the function returns (so that the caller can try again if the buffer is too small; see Using the C function, below).

Before we can write the body of the wrapper, we need to choose a Tag to use (see previous section):

pub enum RW {}
pub const RW: PhantomData<RW> = PhantomData;

RW (for rust-wrapper) is just an empty type; it will only serve as a type-level tag. The body of our wrapper function is now simple:

pub extern "C" fn rust_wrapper_generate_simple_self_signed(
    alt_names: *const u8,
    alt_names_len: usize,
    out: *mut u8,
    out_len: &mut usize,
) {
    let alt_names: Vec<String> = marshall_from_haskell_var(alt_names, alt_names_len, RW);
    let result = generate_simple_self_signed(alt_names);
    marshall_to_haskell_var(&result, out, out_len, RW);
}

We will discuss the _var suffix in section on bounded size data.

Example: Haskell

On the Haskell side, we first have to decide what we want to do with the serialized data we get from Rust. We can try to deserialize it, or we could just leave it in serialized form, relying on Rust-side functions to interact with it. The foreign-rust library helps us with deserialization if we choose to do so, and provides tools for working with serialized data if we choose not to.

For our example it is simplest to just leave the data in serialized form:

newtype Certificate = Certificate Strict.ByteString
  deriving newtype (BorshSize, ToBorsh, FromBorsh)
  deriving newtype (IsRaw)
  deriving (Show, Data.Structured.Show, IsString) via AsBase64 Certificate

newtype SecretKey = SecretKey (FixedSizeArray 32 Word8)
  deriving newtype (BorshSize, BorshMaxSize, ToBorsh, FromBorsh)
  deriving newtype (IsRaw)
  deriving (Show, Data.Structured.Show, IsString) via AsBase64 SecretKey

Some comments:

  • BorshSize, ToBorsh, FromBorsh and BorshMaxSize come from the Haskell borsh library. We will see the use of BorshSize and BorshMaxSize when we discuss bounded size data.

  • IsRaw is a type class from the foreign-rust library capturing “raw” values: values that are essentially just bytestrings:

    class IsRaw a where
      rawSize :: a -> Word32
      toRaw   :: a -> Lazy.ByteString
      fromRaw :: Lazy.ByteString -> Either String a
  • FixedSizeArray is a datatype from the Haskell borsh package which corresponds to bytestrings of a known, static, length; FixedSizeArray 32 Word8 is a precise analogue of [u8; 32] in Rust.

  • Data.Structured.Show, from foreign-rust, is like Show from the prelude, but producing a structured value, which can be pretty-printed a bit nicer. It’s similar to the PrettyVal class from the pretty-show package, but unlike PrettyVal (and like Show from the prelude), its pretty-printed values are valid Haskell.

  • Finally, AsBase64 is a newtype that can be used to conveniently derive Show, Data.Structured.Show and IsString instances, all using a base-64 encoding. Similarly foreign-haskell also provides AsBase16, AsBase58, and AsDecimal (list of decimal values).

With our datatypes defined, we can now bind our function:

{# fun unsafe rust_wrapper_generate_simple_self_signed as rustWrapperSelfSigned
     { toBorshVar*  `[Text]'&
     , getVarBuffer `Buffer (Certificate, SecretKey)'&
     }
  -> `()'
#}

This is not really any more difficult than the function which just passed numbers around: c2hs provides explicit support for arguments that correspond to a single argument Haskell-side and two arguments C-side (that’s what the ampersand & means), and it allows us to define specific marshalling functions; we use toBorshVar and getVarBuffer, both from foreign-rust. The syntax is a bit arcane, but the good news is that all functions you wrap will look very similar.

Using the C function

The signature of the Haskell function that c2hs made for us is not quite as convenient as we might like:

rustWrapperSelfSigned :: [Text] -> Buffer (Certificate, SecretKey) -> IO ()

When we discussed the Rust-side function, we mentioned that it expects a buffer to write its output to, along with the size of that buffer. Since we are trying to avoid managing memory allocated Rust-side in Haskell, or vice versa, we will create that buffer Haskell side; but what size buffer should we allocate? The generated function just punts on this question, and doesn’t allocate a buffer at all.

But not to worry, foreign-rust has us covered. The main function it provides here is withBorshVarBuffer:

withBorshVarBuffer :: (FromBorsh a, ..) => (Buffer a -> IO ()) -> IO a

It will allocate a 1kB buffer and then call the function; if it turns out this buffer is not large enough, the Rust-side function will tell it what the right size buffer is, and so it will just try once more with a larger buffer. We can use this to provide a selfSigned function with the signature we’d expect:

selfSigned :: [Text] -> IO (Certificate, SecretKey)
selfSigned = withBorshVarBuffer . rustWrapperSelfSigned

We can try this now in ghci:

ghci> selfSigned ["example.com"]
("MIIB..uZ04","mjAEvFcSD1DD8ZTf9hCSbCJjA259wI+rmlXQA5JU8Oc=")

Working with foreign values

We now have a Haskell side representation of the Rust Certificate type but we can’t yet do much with it; in this section we will therefore explore this a bit more.

Binding another function

Let’s bind another function, which returns the certificate’s “subject.” Rust-side, we can define

#[no_mangle]
pub extern "C" fn rust_wrapper_get_certificate_subject(
    cert: *const u8,
    cert_len: usize,
    out: *mut u8,
    out_len: &mut usize,
) {
    let cert: Certificate = marshall_from_haskell_var(cert, cert_len, RW);
    let result = format!("{}", cert.tbs_certificate.subject);
    marshall_to_haskell_var(&result, out, out_len, RW);
}

This function has exactly the same shape as the previous we wrote (indeed, an important goal of the haskell-ffi/foreign-rust library pair is precisely to make this kind of work as “boring” as possible). The c2hs declaration Haskell-side is also very similar:

{# fun unsafe rust_wrapper_get_certificate_subject as rustWrapperCertificateSubject
     { toBorshVar*  `Certificate'&
     , getVarBuffer `Buffer Text'&
     }
  -> `()'
#}

Unlike selfSigned, however, which really must live in IO (each time the function is called, it produces a different certificate), this function is morally pure:

certificateSubject :: Certificate -> Text
certificateSubject = withPureBorshVarBuffer . rustWrapperCertificateSubject

We can try it out:

ghci> (cert, pkey) <- selfSigned ["example.com"]
ghci> certificateSubject cert
"CN=rcgen self signed cert"

Using the IsString instance

We mentioned above that foreign-rust introduces Data.Structured.Show, to replace PrettyVal from pretty-show, in order to ensure that pretty-printed values are still valid Haskell. Indeed, we derived IsString for Certificate above, which means that we can denote Rust-side values in our Haskell code:

ghci> certificateSubject "MIIB..uZ04" -- same string that we got above
"CN=rcgen self signed cert"

This can be very useful when experimenting, in (regression) tests, etc.

Annotations

It’s nice that we can show and even denote Rust-side values in Haskell, but a long base-64 string is not the most informative. To make things like debugging a little easier, foreign-rust therefore provides a way to annotate values:

class CanAnnotate a where
  type Annotated a :: Type
  annotate       :: a -> Annotated a
  dropAnnotation :: Annotated a -> a

In many (but by no means all) cases, an annotated form of a value just pairs that value with some additional value; we can use this for Certificate:

deriving
  via PairWithAnnotation Certificate
  instance CanAnnotate Certificate

type instance Annotation Certificate = Text

instance ComputeAnnotation Certificate where
  computeAnnotation = certificateSubject

Trying it out:

ghci> (cert, pkey) <- selfSigned ["example.com"]
ghci> annotate cert
WithAnnotation {value = "MIIB..uZ04", annotation = "CN=rcgen self signed cert"}

or we can use Data.Structured to make this a little cleaner:

ghci> Data.Structured.print $ annotate cert
WithAnnotation {
  value = "MIIB..uZ04"
, annotation = "CN=rcgen self signed cert"
}

Annotations are designed to be “transitive” (and there is support for generically deriving CanAnnotate for your own types if you just want to transitively get all annotations). As a simple example, here’s what we get if we annotate something of type [Maybe Certificate]:

ghci> Data.Structured.print $ annotate [Just cert]
WithAnnotation {
  value = [
      Just
        WithAnnotation {
          value = "MIIB..uZ04"
        , annotation = "CN=rcgen self signed cert"
        }
    ]
, annotation = Length 1
}

This can be very helpful during debugging (there is also dropAnnotation which goes in the opposite direction).

Fixed size data

When we discussed binding rust_wrapper_self_signed, we said that withBorshVarBuffer will allocate an initial buffer of a certain size, then call the Rust function, hoping the buffer will be big enough, and then call the Rust function a second time if it turns out the buffer was too small after all.

If we know ahead of time how big the value will be, however, we can do better. For example, we know that the size of (this type of) a secret key is always 32 bytes; indeed, we said so right in the type:

newtype SecretKey = SecretKey (FixedSizeArray 32 Word8)

Rust

Let’s consider binding to a Rust function which constructs an example key, generated from a PRNG with specified seed:

#[no_mangle]
pub extern "C" fn rust_wrapper_example_key(seed: u64, out: *mut u8, out_len: usize) {
    let mut prng: StdRng = StdRng::seed_from_u64(seed);
    let result: SecretKey = SecretKey::random(&mut prng);
    marshall_to_haskell_fixed(&result, out, out_len, RW);
}

The seed is a simple C value so no need for any serialization. This is not true for the result of the function, but unlike before, the size of the output is statically known. This means we can use marshall_to_haskell_fixed Rust-side, instead of marshall_to_haskell_var; usage is almost identical, except that the out_len is now a simple usize, rather than a pointer to a usize: the haskell-ffi Rust code will verify the size of the buffer allocated Haskell-side, and panic if it’s not of the right size (this would be a bug), but there is no need for it to communicate a new size back to the Haskell code.

This depends on an additional trait which provides the size:

pub trait HaskellSize<Tag> {
    fn haskell_size(tag: PhantomData<Tag>) -> usize;
}

This class comes with all the instances we’d expect for the Borsh serialization format; for example, we have

impl<Tag, T: HaskellSize<Tag>, const N: usize> HaskellSize<Tag> for [T; N] {
    fn haskell_size(tag: PhantomData<Tag>) -> usize {
        T::haskell_size(tag) * N
    }
}

(This instance uses what is known as “const generics” in Rust parlance; in the Haskell world that const N: usize parameter would correspond to a KnownNat n constraint.) There is also a macro available you can use to derive HaskellSize for your own structs (enums do not have a statically known size).

Haskell

The c2hs declaration of this function looks like this:

{# fun pure unsafe rust_wrapper_example_key as exampleKey
     {                   `Word64'
     , allocFixedBuffer- `SecretKey'& fromBorsh*
     }
  -> `()'
#}

Since we are sure we only need to call the Rust function once (with an appropriately size buffer), we can do everything right within the c2hs incantation: allocFixedBuffer will allocate the appropriate buffer before calling the function, and fromBorsh will get the value from the buffer afterwards. Moreover, since this function is now morally pure, we can use the c2hs keyword for pure functions, and the signature of the function constructed by c2hs is exactly what we’d expect, with no further wrapping required:

exampleKey :: Word64 -> SecretKey

Bounded size data

For the case where there is no fixed size, but there is a bound on the size, we have marshall_to_haskell_max on the Rust side (depending on a HaskellMaxSize trait) and allocMaxBuffer on the Haskell side (depending on a BorshMaxSize class). The most important use case for this is Rust’s Option type, corresponding to Maybe in Haskell. For example, here is a Rust function which deserializes a secret key in PEM format:

#[no_mangle]
pub extern "C" fn rust_wrapper_key_from_pem(
    key: *const u8,
    key_len: usize,
    out: *mut u8,
    out_len: usize,
) {
    let key: String = marshall_from_haskell_var(key, key_len, RW);
    let result: Option<SecretKey> = match elliptic_curve::SecretKey::from_sec1_pem(&key) {
        Ok(key) => Some(key),
        Err(elliptic_curve::Error) => None,
    };
    marshall_to_haskell_max(&result, out, out_len, RW);
}

and here is the corresponding c2hs binding:

{# fun pure unsafe rust_wrapper_key_from_pem as fromPem
     { toBorshVar*     `Text'&
     , allocMaxBuffer- `Maybe SecretKey'& fromBorsh*
     }
  -> `()'
#}

As before, no additional wrapping is necessary Haskell-side; the signature of the function constructed by c2hs is

fromPem :: Text -> Maybe SecretKey

Composite types

Suppose we have this datatype Rust-side:

#[derive(serde::Serialize, serde::Deserialize, BorshSerialize, BorshDeserialize, HaskellSize)]
pub struct Color {
    r: f64,
    g: f64,
    b: f64,
}

We can piggyback on the BorshSerialize and BorshDeserialize instances derived by macros from the borsh crate to define our FromHaskell and ToHaskell instances:

impl<Tag> ToHaskell<Tag> for Color {
    fn to_haskell<W: Write>(&self, writer: &mut W, _tag: PhantomData<Tag>) -> Result<()> {
        self.serialize(writer)?;
        Ok(())
    }
}

impl<Tag> FromHaskell<Tag> for Color {
    fn from_haskell(buf: &mut &[u8], _tag: PhantomData<Tag>) -> Result<Self> {
        let x = Color::deserialize(buf)?;
        Ok(x)
    }
}

Here is a simple function that constructs the “red” color:

#[no_mangle]
pub extern "C" fn rust_wrapper_red(out: *mut u8, out_len: usize) {
    let result = Color {
        r: 1.0,
        g: 0.0,
        b: 0.0,
    };
    marshall_to_haskell_fixed(&result, out, out_len, RW);
}

We now have two choices how we represent this datatype Haskell-side: we can represent it by a proper Haskell value, or we can leave the Haskell-side representation opaque. We will consider these separately.

Structured Haskell representation

The cleanest representation of this datatype is of course the corresponding Haskell datatype

data Color = Color { r :: Double, g :: Double, b :: Double }
  deriving stock (Show, GHC.Generic)
  deriving anyclass (SOP.Generic, SOP.HasDatatypeInfo)
  deriving anyclass (Data.Structured.Show)
  deriving CanAnnotate via NoAnnotation Color
  deriving (BorshSize, ToBorsh, FromBorsh) via AsStruct Color

After we have derived the necessary instances, interacting with Rust is trivial; for example, here’s how we can bind the red function:

{# fun pure unsafe rust_wrapper_red as red
     { allocFixedBuffer- `Color'& fromBorsh*
     }
  -> `()'
#}

Nothing else to do.

ghci> red
Color {r = 1.0, g = 0.0, b = 0.0}

Opaque Haskell representation

However, in some cases we don’t want to try and parse the value Haskell-side; perhaps it’s just unnecessarily difficult, or perhaps we consider the exact serialized form of the Rust value an implementation detail of the Rust code. Perhaps we don’t even have a Haskell-side representation, and all we have is a pointer to a value on the Rust heap (see section Avoiding serialization).

For example, we might represent Color as

newtype Color = Color (FixedSizeArray 24 Word8)
  deriving newtype (BorshSize, ToBorsh, FromBorsh)
  deriving newtype (IsRaw)

Even with this representation, however, we might still prefer a more informative Show instance. The final trick that foreign-rust has up its sleeve is support for “Rust side JSON serialization/deserialization.” This works as follows. First, we define functions Rust-side that convert a value to and from JSON. In most cases, this is easy, because we can derive serde Serialize and Deserialize instances, and then use serde_json:

#[no_mangle]
pub extern "C" fn rust_wrapper_color_from_json(
    json: *const u8,
    json_len: usize,
    out: *mut u8,
    out_len: &mut usize,
) {
    let json: String = marshall_from_haskell_var(json, json_len, RW);
    let result: core::result::Result<Color, serde_json::Error> = serde_json::from_str(&json);
    marshall_result_to_haskell_var(&result, out, out_len, RW);
}

#[no_mangle]
pub extern "C" fn rust_wrapper_color_to_json(
    color: *const u8,
    color_len: usize,
    out: *mut u8,
    out_len: &mut usize,
) {
    let color: Color = marshall_from_haskell_var(color, color_len, RW);
    let json: String = serde_json::to_string(&color).unwrap();
    marshall_to_haskell_var(&json, out, out_len, RW);
}

Binding to these functions follows the now-familiar pattern:

{# fun unsafe rust_wrapper_color_from_json as rustWrapperColorFromJSON
     { toBorshVar*  `Text'&
     , getVarBuffer `Buffer (Either Text Color)'&
     }
  -> `()'
#}

{# fun unsafe rust_wrapper_color_to_json as rustWrapperColorToJSON
     { toBorshFixed* `Color'&
     , getVarBuffer  `Buffer Text'&
     }
  -> `()'
#}

with corresponding wrappers:

colorToJSON :: Color -> JSON
colorToJSON = withPureBorshVarBuffer . rustWrapperColorToJSON

colorFromJSON :: HasCallStack => JSON -> Either Failure Color
colorFromJSON = first mkFailure . withPureBorshVarBuffer . rustWrapperColorFromJSON

where JSON is a newtype wrapper around a lazy bytestring, and Failure pairs a Text error message with a CallStack. With these functions defined, we can provide instances for ToJSON and FromJSON instances from Foreign.Rust.External.JSON:

instance External.ToJSON Color where
  toJSON = colorToJSON

instance External.FromJSON Color where
  fromJSON = colorFromJSON

This gives us two things: we can, if we wish, derive standard aeson FromJSON and ToJSON instances; but we can also use JSON in our Show instance:

deriving via UseExternalJSON Color instance Aeson.ToJSON   Color
deriving via UseExternalJSON Color instance Aeson.FromJSON Color

deriving via AsJSON Color instance Show                 Color
deriving via AsJSON Color instance Data.Structured.Show Color

Trying it out:

ghci> red
asJSON @Color
  [aesonQQ|
    {
      "b": 0.0
    , "g": 0.0
    , "r": 1.0
    }
  |]

As mentioned above, the library always attempts to ensure that Show produces valid Haskell expressions. If we are using JSON, it does this by using the aesonQQ quasi-quoter, along with a wrapper and a type annotation, to avoid ambiguous type errors.

Avoiding serialization

Finally, we will consider when we don’t want to use serialization to transfer values between Haskell and Rust (because it’s too expensive), or we can’t use it (perhaps it’s a value that cannot be serialized). For example, suppose we have this Rust-side type of handles:

pub struct Handle(usize);
impl Drop for Handle
pub fn new_handle() -> Handle

We can expose C functions in our Rust code that allocate a handle, query a handle’s ID, and free a handle:

#[no_mangle]
pub extern "C" fn rust_wrapper_new_handle() -> *mut Handle {
    let handle: Box<Handle> = Box::new(new_handle());
    Box::into_raw(handle)
}

#[no_mangle]
pub extern "C" fn rust_wrapper_handle_id(handle: *mut Handle) -> usize {
    let handle: &Handle = unsafe { &*handle };
    handle.0
}

#[no_mangle]
pub extern "C" fn rust_wrapper_free_handle(handle: *mut Handle) {
    let _handle: Box<Handle> = unsafe { Box::from_raw(handle) };
}

On the Haskell side, we can use c2hs to create a newtype around a pointer to a handle, expose C functions that allocate a handle and query its ID, and use rust_wrapper_free_handle as the finalizer (called by the garbage collector):

{#pointer *Handle foreign finalizer rust_wrapper_free_handle newtype #}

{# fun unsafe rust_wrapper_new_handle as newHandle
      {
      }
   -> `Handle'
#}

{# fun pure unsafe rust_wrapper_handle_id as handleId
      { `Handle'
      }
   -> `Word64'
#}

The signatures generated by c2hs are

newtype Handle = Handle (ForeignPtr Handle)

newHandle :: IO Handle
handleId  :: Handle -> Word64

The biggest drawback of this approach is that we now no longer have any representation of these values Haskell-side; we cannot provide a “legal” Show instances. This can be quite inconvenient, especially in tests. Managing values on the Rust heap through the Haskell GC (even if we are using the Rust-side deallocator) is also simply more error prone, and if things go wrong, hard to debug. It is probably only the better choice if serialization is either impossible or prohibitively expensive.

Efficiency

The design of haskell-ffi and foreign-rust is optimized for ease of integration, not necessarily for optimal performance. This is almost certainly fine for most applications, but you probably don’t want the overhead of serialization and deserialization when doing FFI calls to Rust in a tight Haskell loop (of course, it is almost never a good idea to do that anyway).

Alongside withBorshVarBuffer, foreign-rust offers withBorshBufferOfInitSize which can be used to specify a different initial buffer size, which can be used to reduce the probability that a second round-trip is necessary (in the case that the initial buffer was not big enough). In principle, you could use this in conjunction with a Rust-side function that computes the required buffer size, but there isn’t much point: this would still require two FFI calls, with the same parameters; if there is a cheap way Rust-side to compute the necessary buffer size, then that behaviour should just be baked into the one Rust function: check if the allocated buffer is big enough before doing anything else. The standard marshalling functions offered by haskell-ffi do not do this, since in general it is difficult to know exactly how large the serialized form of some data is without actually serializing it.

In the case where a Rust function must really only be called once (perhaps because it has side effects), you can choose to forgo serialization altogether, as we described above in Avoiding serialization. Alternatively, foreign-rust offers a hybrid approach, where we allocate a buffer Rust-side, pass a pointer to the buffer to Haskell, deserialize it Haskell-side, and then free the buffer when no longer required. For our example where we convert a secret key to PEM, the Rust-side wrapper would look like this:

#[no_mangle]
pub extern "C" fn rust_wrapper_key_to_pem_external(key: *const u8, key_len: usize) -> *mut Vec<u8> {
    // .. construction of `result` exactly as before
    marshall_to_haskell_external(result, RW)
}

with Haskell counter-part:

{# fun pure unsafe rust_wrapper_key_to_pem_external as toPemExternal
     { toBorshFixed* `SecretKey'&
     }
  -> `Text' fromExternalBorsh*
#}

The type of the function constructed by c2hs is then SecretKey -> Text. The advantages of this approach is that no initial buffer size needs to be estimated (we just use whatever buffer was allocated Rust-side), a second round-trip is guaranteed not to be needed, and we avoid copying the buffer. We still have the serialization/deserialization overhead, of course, and—perhaps more importantly—it is difficult to predict quite how long we will hold on to that Rust-allocated buffer. The deserializer might return values that directly or indirectly point to that buffer, and since these buffers are allocated on the Rust heap, not the Haskell heap, memory profiling might be difficult. In most cases, this approach is therefore probably not the right choice.

Conclusions

This was a long blog post, so let’s summarize:

  • Expose extern "C" functions in your Rust-code; you can use the Rust library haskell-ffi to serialize and deserialize data in a convenient manner.
  • Build your Rust library with cargo cbuild, to generate a header file and a pkg-config file.
  • Declare a pkg-config dependency on the Rust library in your cabal file, as well as a dependency on the build tool (preprocessor) c2hs.
  • Use c2hs to add bindings to the C functions; the Haskell library foreign-rust is a companion library to haskell-ffi that makes this process very streamlined.
  • For data types with a fixed size encoding, the c2hs declaration might be all you need; otherwise you will write a simple wrapper function, again using functionality from foreign-rust.
  • To Show Rust-side values, foreign-rust offers various ways, which show a value in base-16, base-58, base-64, or JSON format; each of these generate valid Haskell, so that you can denote Rust-side values within your Haskell source code.
  • In addition, foreign-rust offers functionality for annotating values with additional information, which can be quite helpful to get further information about Rust-side values during debugging.
  • Finally, if serialization of Rust-side values is undesirable or impossible, you can just pass pointers back and forth, using the Haskell garbage collector to call the Rust-side deallocator when a value is no longer in use. However, when you do this, you will have no way of denoting these values Haskell-side.


from Hacker News https://ift.tt/W2h6vUz

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.