Calling hell from heaven and heaven from hell is a classic paper from the previous century, introducing the Haskell foreign function interface (FFI). It describes the facilities that Haskell offers for calling functions written in C (and vice versa). In this blog post, we will consider how to call functions written in Rust instead: not quite hell, but not quite heaven either.
We will make use of two libraries that we wrote to streamline this process: a Haskell-side library called foreign-rust, and a Rust-side library called haskell-ffi. We developed these libraries as part of the development of Be, a (smart) contract platform; we are thankful to them for making these libraries open source. That said, this blog post should also be useful for people who do not want to use these libraries, and indeed, we will also show examples of interop that do not rely on them. The source code for the examples discussed in this blog post can be found at https://github.com/well-typed/blogpost-purgatory.
Getting started
Binding to Rust functions from Haskell is not quite as convenient as binding to C functions. The common denominator between Rust and Haskell is C, and so we have to do two things: we have to write a Rust-side wrapper that exposes the functionality we want to bind against as C functions, and then write Haskell-side bindings against these C functions.
Our running example in this blog post will therefore consist of a Rust library which we will call rust-wrapper
, and a Haskell library which we will call haskell-wrapper
. To get us started, let’s see if we can pass two numbers from Haskell to Rust, add them Rust-side, and then print the result Haskell-side.
Rust
Create a new Rust crate for our new rust-wrapper
library, and then add the following to the Cargo.toml
file:
[dependencies]
haskell-ffi.git = "https://github.com/BeFunctional/haskell-rust-ffi.git"
haskell-ffi.rev = "2bf292e2e56eac8e9fb0fb2e1450cf4a4bd01274"
[features]
capi = []
[package.metadata.capi.library]
versioning = false
After declaring the dependency on the haskell-ffi
library, the features
and package.metadata.capi.library
sections are for the benefit of cargo cbuild; we will see momentarily how to use this tool. First, however, add this function to the library’s lib.rs
:
#[no_mangle]
pub extern "C" fn rust_wrapper_add(x: u64, y: u64) -> u64 {
x + y
}
The extern "C"
directive tells the Rust compiler that this function should use the C calling convention. The no_mangle
attribute ensures that the Rust compiler won’t change the name of our function to something unrecognizable, so that we know what the function is called in our Haskell bindings. This does mean that the function name should be unique across any C libraries that we might link against, which is why we will prefix the names of all external functions with rust_wrapper_
. (See Calling Rust code from C from the Rustonomicon for more details.)
Now compile the library with
cargo cbuild
(you might need to install the cargo-c applet for cargo
first.) This will create a bunch of files, but three are of particular interest:
-
target/<arch>/debug/librust_wrapper.so
: this is the shared object that our Haskell application will need to link against. -
target/arch>/debug/rust_wrapper.h
: this is the C header file that we will need to compile our Haskell-side bindings. For our running example so far, this header will containuint64_t rust_wrapper_add(uint64_t x, uint64_t y);
-
target/<arch>/debug/rust_wrapper-uninstalled.pc
: this is a pkg-config file which contains the C compiler and linker flags (including paths) that we will need Haskell-side to know where the.so
and.h
files that we just described are.
Finally, we will need to set two environment variables; the first will ensure that we can find the pkg-config
file, and the second ensures that when we run our application (after building and linking it), the .so
file can still be found:
export PKG_CONFIG_PATH=<path>/rust-wrapper/target/<arch>/debug
export LD_LIBRARY_PATH=<path>/rust-wrapper/target/<arch>/debug
Haskell
On the Haskell side, create a new package, and then add this to the library
section of the .cabal
file:
build-depends: .., foreign-rust
build-tool-depends: c2hs:c2hs
pkgconfig-depends: rust_wrapper-uninstalled
The first declares a dependency on c2hs; this is a preprocessor that we will use to write our bindings; the second declares the dependency on the Rust library; cabal
will use pkg-config
to figure out which compiler and linker flags are required (thereby also figuring out where that Rust library is). While the library is not yet released to Hackage, we’ll need to add the repo to our cabal.project
file:
source-repository-package
type: git
location: https://github.com/BeFunctional/haskell-foreign-rust.git
tag: 90b1c210ae4e753c39481a5f3b141b74e6b6d96e
For this simple example we don’t benefit much from c2hs
, but will nonetheless use it to bind to our add function, to give us a chance to introduce its basic syntax. (For a detailed discussion of the syntax of c2hs
, see the c2hs User Guide.)
module C.GettingStarted (rustWrapperAdd) where
#include "rust_wrapper.h"
import Data.Word
{# fun pure unsafe rust_wrapper_add as rustWrapperAdd
{ `Word64'
, `Word64'
}
-> `Word64'
#}
This declares a function which
-
is called
rust_wrapper_add
C-side, but should be calledrustWrapperAdd
Haskell-side -
has two arguments, both of type
Word64
-
has a result also of type
Word64
-
is pure: the signature of the function should be
rustWrapperAdd :: Word64 -> Word64 -> Word64
rather than
rustWrapperAdd :: Word64 -> Word64 -> IO Word64 -- unnecessary
(because calling the function twice with the same inputs will give the same results)
-
uses an unsafe call: unsafe calls can be used for foreign functions that do not call back into the Haskell runtime; this gives the RTS some guarantees which it can take advantage of to make the foreign call more efficient (see also Guaranteed call safety in the ghc manual)
This should be sufficient; if we now start a repl (cabal repl
) and import our module, we should be able to test our function:
ghci> rustWrapperAdd 2 3
5
Marshalling data
It is easy enough to ferry individual Word64
over and back, but Haskell and Rust are both languages with rich type systems. If we want to transfer more complex values across the language barrier, we have two choices: we can either serialize and deserialize, or we can use pointers to the data. The first option is the easier and less fragile, as it avoids Haskell-side managing of values that live on the Rust-side heap; it is this approach that the haskell-ffi
and foreign-rust
libraries aim to streamline. We will consider the second option in section Avoiding serialization.
For a more realistic example, therefore, we will consider binding against a Rust function that constructs a self-signed x509 certificate with corresponding secret key, given a list of domain names:
fn generate_simple_self_signed(alt_names: Vec<String>) -> (Certificate, SecretKey)
Interlude: (no) orphans in Rust
Although of course the precise definitions differ, the basic concept of an orphan instance is similar in Haskell and in Rust. An orphan instance is an instance of some type class (or trait) C
for some type T
or
which is not “bundled with” either the definition of C
or the definition of T
(where “bundled with” means “same module” in Haskell, and “same package” in Rust). This ensures coherence: it can never happen that we get two different instances in scope (for the same C
and T
) when we import two different modules.
However, where the introduction of an orphan instance in Haskell is merely a compiler warning, which we can choose to ignore (thereby taking on the responsibility ourselves to ensure coherence), in Rust it is an error: we really cannot introduce an orphan instance.
This can be quite a serious limitation. For example, suppose we want to have a (Rust side) type class for “things we can marshall to Haskell.” If this type class is defined in an external package, and we want to marshall a type defined in a different package, unless there is instance already defined in either of these two packages, we are stuck: we cannot provide an instance ourselves (since it would be an orphan). The haskell-ffi
library therefore adopts a workaround, which we will discuss now.
Marshalling in haskell-ffi
Central to the haskell-ffi
library is the definition of two traits (type classes), for data that can be marshalled to and from Haskell respectively:
pub type Error = Box<dyn std::error::Error + Send + Sync>;
pub type Result<T> = core::result::Result<T, Error>;
pub trait ToHaskell<Tag> {
fn to_haskell<W: Write>(&self, writer: &mut W, tag: PhantomData<Tag>) -> Result<()>;
}
pub trait FromHaskell<Tag>: Sized {
fn from_haskell(buf: &mut &[u8], tag: PhantomData<Tag>) -> Result<Self, Error>;
}
These are similar to the BorshSerialize and BorshDeserialize traits from the borsh crate (package), and indeed, ToHaskell
and FromHaskell
have all the standard instances that make it compatible with the Borsh binary serialization format.
The definition of these two traits might look a bit obscure to a Haskell programmer; a rough Haskell translation might be
class ToHaskell tag a where -- illustration only
toHaskell :: forall w. Write w => a -> IORef w -> Proxy tag -> IO ()
class Sized a => FromHaskell tag a where -- illustration only
fromHaskell :: IORef (Vector Word8) -> Proxy tag -> IO a
Some points:
-
The definition of
Error
is describing a boxed value of existential type, which is required to satisfy a few instance (aka implement a few traits), most notablystd::error::Error
; this corresponds nearly 1:1 withSomeException
in Haskell:data SomeException = forall e. Exception e => SomeException e
-
W: Write
is known as a trait bound in Rust, and corresponds to ad-hoc polymorphism in Haskell. -
Rust does not really have multi-parameter type classes; the additional
Tag
parameter is an example of what is (confusingly) called generics in Rust, and corresponds roughly to parametric polymorphism in Haskell (although the two concepts don’t align perfectly). -
PhantomData
in Rust, likeProxy
in Haskell, serves only as a hint to the type checker: here, to determine the typeTag
.
The Tag
argument allows us to work around the no-orphans limitation. The haskell-ffi
library can introduce instances that are polymorphic in a choice of tag, such as
impl<Tag, T: ToHaskell<Tag>> ToHaskell<Tag> for Option<T>
corresponding to
instance ToHaskell tag t => ToHaskell tag (Maybe t) -- illustration only
but, as we shall see momentarily, we can also introduce additional instances in other libraries (such as our rust-wrapper
library), provided that we choose a specific tag.
Example: Rust
Let’s now get back to our example. Recall that we want to bind to a Rust function with this signature:
fn generate_simple_self_signed(alt_names: Vec<String>) -> (Certificate, SecretKey)
To do that, we need to write a wrapper function that we expose as a C function. The wrapper will have two arguments for each argument of the function we are wrapping, as well as two arguments for the result:
#[no_mangle]
pub extern "C" fn rust_wrapper_generate_simple_self_signed(
alt_names: *const u8,
alt_names_len: usize,
out: *mut u8,
out_len: &mut usize,
) {
..
}
For each argument of the original function, we have a pair of C arguments: the first is a pointer to a buffer containing a serialized form of the argument, and the second is the length of that buffer. For the result of the original function we likewise have a pair of C arguments: the first points to a buffer that the result will be serialized to, and the second must initially contain the size of that buffer, and is overwritten to contain the required size of the buffer when the function returns (so that the caller can try again if the buffer is too small; see Using the C function, below).
Before we can write the body of the wrapper, we need to choose a Tag
to use (see previous section):
pub enum RW {}
pub const RW: PhantomData<RW> = PhantomData;
RW
(for rust-wrapper
) is just an empty type; it will only serve as a type-level tag. The body of our wrapper function is now simple:
pub extern "C" fn rust_wrapper_generate_simple_self_signed(
alt_names: *const u8,
alt_names_len: usize,
out: *mut u8,
out_len: &mut usize,
) {
let alt_names: Vec<String> = marshall_from_haskell_var(alt_names, alt_names_len, RW);
let result = generate_simple_self_signed(alt_names);
marshall_to_haskell_var(&result, out, out_len, RW);
}
We will discuss the _var
suffix in section on bounded size data.
Example: Haskell
On the Haskell side, we first have to decide what we want to do with the serialized data we get from Rust. We can try to deserialize it, or we could just leave it in serialized form, relying on Rust-side functions to interact with it. The foreign-rust
library helps us with deserialization if we choose to do so, and provides tools for working with serialized data if we choose not to.
For our example it is simplest to just leave the data in serialized form:
newtype Certificate = Certificate Strict.ByteString
deriving newtype (BorshSize, ToBorsh, FromBorsh)
deriving newtype (IsRaw)
deriving (Show, Data.Structured.Show, IsString) via AsBase64 Certificate
newtype SecretKey = SecretKey (FixedSizeArray 32 Word8)
deriving newtype (BorshSize, BorshMaxSize, ToBorsh, FromBorsh)
deriving newtype (IsRaw)
deriving (Show, Data.Structured.Show, IsString) via AsBase64 SecretKey
Some comments:
-
BorshSize
,ToBorsh
,FromBorsh
andBorshMaxSize
come from the Haskell borsh library. We will see the use ofBorshSize
andBorshMaxSize
when we discuss bounded size data. -
IsRaw
is a type class from theforeign-rust
library capturing “raw” values: values that are essentially just bytestrings:class IsRaw a where rawSize :: a -> Word32 toRaw :: a -> Lazy.ByteString fromRaw :: Lazy.ByteString -> Either String a
-
FixedSizeArray
is a datatype from the Haskellborsh
package which corresponds to bytestrings of a known, static, length;FixedSizeArray 32 Word8
is a precise analogue of[u8; 32]
in Rust. -
Data.Structured.Show
, fromforeign-rust
, is likeShow
from the prelude, but producing a structured value, which can be pretty-printed a bit nicer. It’s similar to thePrettyVal
class from the pretty-show package, but unlikePrettyVal
(and likeShow
from the prelude), its pretty-printed values are valid Haskell. -
Finally,
AsBase64
is a newtype that can be used to conveniently deriveShow
,Data.Structured.Show
andIsString
instances, all using a base-64 encoding. Similarlyforeign-haskell
also providesAsBase16
,AsBase58
, andAsDecimal
(list of decimal values).
With our datatypes defined, we can now bind our function:
{# fun unsafe rust_wrapper_generate_simple_self_signed as rustWrapperSelfSigned
{ toBorshVar* `[Text]'&
, getVarBuffer `Buffer (Certificate, SecretKey)'&
}
-> `()'
#}
This is not really any more difficult than the function which just passed numbers around: c2hs
provides explicit support for arguments that correspond to a single argument Haskell-side and two arguments C-side (that’s what the ampersand &
means), and it allows us to define specific marshalling functions; we use toBorshVar
and getVarBuffer
, both from foreign-rust
. The syntax is a bit arcane, but the good news is that all functions you wrap will look very similar.
Using the C function
The signature of the Haskell function that c2hs
made for us is not quite as convenient as we might like:
rustWrapperSelfSigned :: [Text] -> Buffer (Certificate, SecretKey) -> IO ()
When we discussed the Rust-side function, we mentioned that it expects a buffer to write its output to, along with the size of that buffer. Since we are trying to avoid managing memory allocated Rust-side in Haskell, or vice versa, we will create that buffer Haskell side; but what size buffer should we allocate? The generated function just punts on this question, and doesn’t allocate a buffer at all.
But not to worry, foreign-rust
has us covered. The main function it provides here is withBorshVarBuffer
:
withBorshVarBuffer :: (FromBorsh a, ..) => (Buffer a -> IO ()) -> IO a
It will allocate a 1kB buffer and then call the function; if it turns out this buffer is not large enough, the Rust-side function will tell it what the right size buffer is, and so it will just try once more with a larger buffer. We can use this to provide a selfSigned
function with the signature we’d expect:
selfSigned :: [Text] -> IO (Certificate, SecretKey)
selfSigned = withBorshVarBuffer . rustWrapperSelfSigned
We can try this now in ghci
:
ghci> selfSigned ["example.com"]
("MIIB..uZ04","mjAEvFcSD1DD8ZTf9hCSbCJjA259wI+rmlXQA5JU8Oc=")
Working with foreign values
We now have a Haskell side representation of the Rust Certificate
type but we can’t yet do much with it; in this section we will therefore explore this a bit more.
Binding another function
Let’s bind another function, which returns the certificate’s “subject.” Rust-side, we can define
#[no_mangle]
pub extern "C" fn rust_wrapper_get_certificate_subject(
cert: *const u8,
cert_len: usize,
out: *mut u8,
out_len: &mut usize,
) {
let cert: Certificate = marshall_from_haskell_var(cert, cert_len, RW);
let result = format!("{}", cert.tbs_certificate.subject);
marshall_to_haskell_var(&result, out, out_len, RW);
}
This function has exactly the same shape as the previous we wrote (indeed, an important goal of the haskell-ffi
/foreign-rust
library pair is precisely to make this kind of work as “boring” as possible). The c2hs
declaration Haskell-side is also very similar:
{# fun unsafe rust_wrapper_get_certificate_subject as rustWrapperCertificateSubject
{ toBorshVar* `Certificate'&
, getVarBuffer `Buffer Text'&
}
-> `()'
#}
Unlike selfSigned
, however, which really must live in IO (each time the function is called, it produces a different certificate), this function is morally pure:
certificateSubject :: Certificate -> Text
certificateSubject = withPureBorshVarBuffer . rustWrapperCertificateSubject
We can try it out:
ghci> (cert, pkey) <- selfSigned ["example.com"]
ghci> certificateSubject cert
"CN=rcgen self signed cert"
Using the IsString
instance
We mentioned above that foreign-rust
introduces Data.Structured.Show
, to replace PrettyVal
from pretty-show
, in order to ensure that pretty-printed values are still valid Haskell. Indeed, we derived IsString
for Certificate
above, which means that we can denote Rust-side values in our Haskell code:
ghci> certificateSubject "MIIB..uZ04" -- same string that we got above
"CN=rcgen self signed cert"
This can be very useful when experimenting, in (regression) tests, etc.
Annotations
It’s nice that we can show and even denote Rust-side values in Haskell, but a long base-64 string is not the most informative. To make things like debugging a little easier, foreign-rust
therefore provides a way to annotate values:
class CanAnnotate a where
type Annotated a :: Type
annotate :: a -> Annotated a
dropAnnotation :: Annotated a -> a
In many (but by no means all) cases, an annotated form of a value just pairs that value with some additional value; we can use this for Certificate
:
deriving
via PairWithAnnotation Certificate
instance CanAnnotate Certificate
type instance Annotation Certificate = Text
instance ComputeAnnotation Certificate where
computeAnnotation = certificateSubject
Trying it out:
ghci> (cert, pkey) <- selfSigned ["example.com"]
ghci> annotate cert
WithAnnotation {value = "MIIB..uZ04", annotation = "CN=rcgen self signed cert"}
or we can use Data.Structured
to make this a little cleaner:
ghci> Data.Structured.print $ annotate cert
WithAnnotation {
value = "MIIB..uZ04"
, annotation = "CN=rcgen self signed cert"
}
Annotations are designed to be “transitive” (and there is support for generically deriving CanAnnotate
for your own types if you just want to transitively get all annotations). As a simple example, here’s what we get if we annotate something of type [Maybe Certificate]
:
ghci> Data.Structured.print $ annotate [Just cert]
WithAnnotation {
value = [
Just
WithAnnotation {
value = "MIIB..uZ04"
, annotation = "CN=rcgen self signed cert"
}
]
, annotation = Length 1
}
This can be very helpful during debugging (there is also dropAnnotation
which goes in the opposite direction).
Fixed size data
When we discussed binding rust_wrapper_self_signed
, we said that withBorshVarBuffer
will allocate an initial buffer of a certain size, then call the Rust function, hoping the buffer will be big enough, and then call the Rust function a second time if it turns out the buffer was too small after all.
If we know ahead of time how big the value will be, however, we can do better. For example, we know that the size of (this type of) a secret key is always 32 bytes; indeed, we said so right in the type:
newtype SecretKey = SecretKey (FixedSizeArray 32 Word8)
Rust
Let’s consider binding to a Rust function which constructs an example key, generated from a PRNG with specified seed:
#[no_mangle]
pub extern "C" fn rust_wrapper_example_key(seed: u64, out: *mut u8, out_len: usize) {
let mut prng: StdRng = StdRng::seed_from_u64(seed);
let result: SecretKey = SecretKey::random(&mut prng);
marshall_to_haskell_fixed(&result, out, out_len, RW);
}
The seed is a simple C value so no need for any serialization. This is not true for the result of the function, but unlike before, the size of the output is statically known. This means we can use marshall_to_haskell_fixed
Rust-side, instead of marshall_to_haskell_var
; usage is almost identical, except that the out_len
is now a simple usize
, rather than a pointer to a usize
: the haskell-ffi
Rust code will verify the size of the buffer allocated Haskell-side, and panic if it’s not of the right size (this would be a bug), but there is no need for it to communicate a new size back to the Haskell code.
This depends on an additional trait which provides the size:
pub trait HaskellSize<Tag> {
fn haskell_size(tag: PhantomData<Tag>) -> usize;
}
This class comes with all the instances we’d expect for the Borsh serialization format; for example, we have
impl<Tag, T: HaskellSize<Tag>, const N: usize> HaskellSize<Tag> for [T; N] {
fn haskell_size(tag: PhantomData<Tag>) -> usize {
T::haskell_size(tag) * N
}
}
(This instance uses what is known as “const generics” in Rust parlance; in the Haskell world that const N: usize
parameter would correspond to a KnownNat n
constraint.) There is also a macro available you can use to derive HaskellSize
for your own structs (enums do not have a statically known size).
Haskell
The c2hs
declaration of this function looks like this:
{# fun pure unsafe rust_wrapper_example_key as exampleKey
{ `Word64'
, allocFixedBuffer- `SecretKey'& fromBorsh*
}
-> `()'
#}
Since we are sure we only need to call the Rust function once (with an appropriately size buffer), we can do everything right within the c2hs
incantation: allocFixedBuffer
will allocate the appropriate buffer before calling the function, and fromBorsh
will get the value from the buffer afterwards. Moreover, since this function is now morally pure, we can use the c2hs
keyword for pure functions, and the signature of the function constructed by c2hs
is exactly what we’d expect, with no further wrapping required:
exampleKey :: Word64 -> SecretKey
Bounded size data
For the case where there is no fixed size, but there is a bound on the size, we have marshall_to_haskell_max
on the Rust side (depending on a HaskellMaxSize
trait) and allocMaxBuffer
on the Haskell side (depending on a BorshMaxSize
class). The most important use case for this is Rust’s Option
type, corresponding to Maybe
in Haskell. For example, here is a Rust function which deserializes a secret key in PEM format:
#[no_mangle]
pub extern "C" fn rust_wrapper_key_from_pem(
key: *const u8,
key_len: usize,
out: *mut u8,
out_len: usize,
) {
let key: String = marshall_from_haskell_var(key, key_len, RW);
let result: Option<SecretKey> = match elliptic_curve::SecretKey::from_sec1_pem(&key) {
Ok(key) => Some(key),
Err(elliptic_curve::Error) => None,
};
marshall_to_haskell_max(&result, out, out_len, RW);
}
and here is the corresponding c2hs
binding:
{# fun pure unsafe rust_wrapper_key_from_pem as fromPem
{ toBorshVar* `Text'&
, allocMaxBuffer- `Maybe SecretKey'& fromBorsh*
}
-> `()'
#}
As before, no additional wrapping is necessary Haskell-side; the signature of the function constructed by c2hs
is
fromPem :: Text -> Maybe SecretKey
Composite types
Suppose we have this datatype Rust-side:
#[derive(serde::Serialize, serde::Deserialize, BorshSerialize, BorshDeserialize, HaskellSize)]
pub struct Color {
r: f64,
g: f64,
b: f64,
}
We can piggyback on the BorshSerialize
and BorshDeserialize
instances derived by macros from the borsh crate to define our FromHaskell
and ToHaskell
instances:
impl<Tag> ToHaskell<Tag> for Color {
fn to_haskell<W: Write>(&self, writer: &mut W, _tag: PhantomData<Tag>) -> Result<()> {
self.serialize(writer)?;
Ok(())
}
}
impl<Tag> FromHaskell<Tag> for Color {
fn from_haskell(buf: &mut &[u8], _tag: PhantomData<Tag>) -> Result<Self> {
let x = Color::deserialize(buf)?;
Ok(x)
}
}
Here is a simple function that constructs the “red” color:
#[no_mangle]
pub extern "C" fn rust_wrapper_red(out: *mut u8, out_len: usize) {
let result = Color {
r: 1.0,
g: 0.0,
b: 0.0,
};
marshall_to_haskell_fixed(&result, out, out_len, RW);
}
We now have two choices how we represent this datatype Haskell-side: we can represent it by a proper Haskell value, or we can leave the Haskell-side representation opaque. We will consider these separately.
Structured Haskell representation
The cleanest representation of this datatype is of course the corresponding Haskell datatype
data Color = Color { r :: Double, g :: Double, b :: Double }
deriving stock (Show, GHC.Generic)
deriving anyclass (SOP.Generic, SOP.HasDatatypeInfo)
deriving anyclass (Data.Structured.Show)
deriving CanAnnotate via NoAnnotation Color
deriving (BorshSize, ToBorsh, FromBorsh) via AsStruct Color
After we have derived the necessary instances, interacting with Rust is trivial; for example, here’s how we can bind the red
function:
{# fun pure unsafe rust_wrapper_red as red
{ allocFixedBuffer- `Color'& fromBorsh*
}
-> `()'
#}
Nothing else to do.
ghci> red
Color {r = 1.0, g = 0.0, b = 0.0}
Opaque Haskell representation
However, in some cases we don’t want to try and parse the value Haskell-side; perhaps it’s just unnecessarily difficult, or perhaps we consider the exact serialized form of the Rust value an implementation detail of the Rust code. Perhaps we don’t even have a Haskell-side representation, and all we have is a pointer to a value on the Rust heap (see section Avoiding serialization).
For example, we might represent Color
as
newtype Color = Color (FixedSizeArray 24 Word8)
deriving newtype (BorshSize, ToBorsh, FromBorsh)
deriving newtype (IsRaw)
Even with this representation, however, we might still prefer a more informative Show
instance. The final trick that foreign-rust
has up its sleeve is support for “Rust side JSON serialization/deserialization.” This works as follows. First, we define functions Rust-side that convert a value to and from JSON. In most cases, this is easy, because we can derive serde Serialize
and Deserialize
instances, and then use serde_json:
#[no_mangle]
pub extern "C" fn rust_wrapper_color_from_json(
json: *const u8,
json_len: usize,
out: *mut u8,
out_len: &mut usize,
) {
let json: String = marshall_from_haskell_var(json, json_len, RW);
let result: core::result::Result<Color, serde_json::Error> = serde_json::from_str(&json);
marshall_result_to_haskell_var(&result, out, out_len, RW);
}
#[no_mangle]
pub extern "C" fn rust_wrapper_color_to_json(
color: *const u8,
color_len: usize,
out: *mut u8,
out_len: &mut usize,
) {
let color: Color = marshall_from_haskell_var(color, color_len, RW);
let json: String = serde_json::to_string(&color).unwrap();
marshall_to_haskell_var(&json, out, out_len, RW);
}
Binding to these functions follows the now-familiar pattern:
{# fun unsafe rust_wrapper_color_from_json as rustWrapperColorFromJSON
{ toBorshVar* `Text'&
, getVarBuffer `Buffer (Either Text Color)'&
}
-> `()'
#}
{# fun unsafe rust_wrapper_color_to_json as rustWrapperColorToJSON
{ toBorshFixed* `Color'&
, getVarBuffer `Buffer Text'&
}
-> `()'
#}
with corresponding wrappers:
colorToJSON :: Color -> JSON
colorToJSON = withPureBorshVarBuffer . rustWrapperColorToJSON
colorFromJSON :: HasCallStack => JSON -> Either Failure Color
colorFromJSON = first mkFailure . withPureBorshVarBuffer . rustWrapperColorFromJSON
where JSON
is a newtype wrapper around a lazy bytestring, and Failure
pairs a Text
error message with a CallStack
. With these functions defined, we can provide instances for ToJSON
and FromJSON
instances from Foreign.Rust.External.JSON
:
instance External.ToJSON Color where
toJSON = colorToJSON
instance External.FromJSON Color where
fromJSON = colorFromJSON
This gives us two things: we can, if we wish, derive standard aeson FromJSON
and ToJSON
instances; but we can also use JSON in our Show
instance:
deriving via UseExternalJSON Color instance Aeson.ToJSON Color
deriving via UseExternalJSON Color instance Aeson.FromJSON Color
deriving via AsJSON Color instance Show Color
deriving via AsJSON Color instance Data.Structured.Show Color
Trying it out:
ghci> red
asJSON @Color
[aesonQQ|
{
"b": 0.0
, "g": 0.0
, "r": 1.0
}
|]
As mentioned above, the library always attempts to ensure that Show
produces valid Haskell expressions. If we are using JSON, it does this by using the aesonQQ quasi-quoter, along with a wrapper and a type annotation, to avoid ambiguous type errors.
Avoiding serialization
Finally, we will consider when we don’t want to use serialization to transfer values between Haskell and Rust (because it’s too expensive), or we can’t use it (perhaps it’s a value that cannot be serialized). For example, suppose we have this Rust-side type of handles:
pub struct Handle(usize);
impl Drop for Handle
pub fn new_handle() -> Handle
We can expose C functions in our Rust code that allocate a handle, query a handle’s ID, and free a handle:
#[no_mangle]
pub extern "C" fn rust_wrapper_new_handle() -> *mut Handle {
let handle: Box<Handle> = Box::new(new_handle());
Box::into_raw(handle)
}
#[no_mangle]
pub extern "C" fn rust_wrapper_handle_id(handle: *mut Handle) -> usize {
let handle: &Handle = unsafe { &*handle };
handle.0
}
#[no_mangle]
pub extern "C" fn rust_wrapper_free_handle(handle: *mut Handle) {
let _handle: Box<Handle> = unsafe { Box::from_raw(handle) };
}
On the Haskell side, we can use c2hs
to create a newtype
around a pointer to a handle, expose C functions that allocate a handle and query its ID, and use rust_wrapper_free_handle
as the finalizer (called by the garbage collector):
{#pointer *Handle foreign finalizer rust_wrapper_free_handle newtype #}
{# fun unsafe rust_wrapper_new_handle as newHandle
{
}
-> `Handle'
#}
{# fun pure unsafe rust_wrapper_handle_id as handleId
{ `Handle'
}
-> `Word64'
#}
The signatures generated by c2hs
are
newtype Handle = Handle (ForeignPtr Handle)
newHandle :: IO Handle
handleId :: Handle -> Word64
The biggest drawback of this approach is that we now no longer have any representation of these values Haskell-side; we cannot provide a “legal” Show
instances. This can be quite inconvenient, especially in tests. Managing values on the Rust heap through the Haskell GC (even if we are using the Rust-side deallocator) is also simply more error prone, and if things go wrong, hard to debug. It is probably only the better choice if serialization is either impossible or prohibitively expensive.
Efficiency
The design of haskell-ffi
and foreign-rust
is optimized for ease of integration, not necessarily for optimal performance. This is almost certainly fine for most applications, but you probably don’t want the overhead of serialization and deserialization when doing FFI calls to Rust in a tight Haskell loop (of course, it is almost never a good idea to do that anyway).
Alongside withBorshVarBuffer
, foreign-rust
offers withBorshBufferOfInitSize
which can be used to specify a different initial buffer size, which can be used to reduce the probability that a second round-trip is necessary (in the case that the initial buffer was not big enough). In principle, you could use this in conjunction with a Rust-side function that computes the required buffer size, but there isn’t much point: this would still require two FFI calls, with the same parameters; if there is a cheap way Rust-side to compute the necessary buffer size, then that behaviour should just be baked into the one Rust function: check if the allocated buffer is big enough before doing anything else. The standard marshalling functions offered by haskell-ffi
do not do this, since in general it is difficult to know exactly how large the serialized form of some data is without actually serializing it.
In the case where a Rust function must really only be called once (perhaps because it has side effects), you can choose to forgo serialization altogether, as we described above in Avoiding serialization. Alternatively, foreign-rust
offers a hybrid approach, where we allocate a buffer Rust-side, pass a pointer to the buffer to Haskell, deserialize it Haskell-side, and then free the buffer when no longer required. For our example where we convert a secret key to PEM, the Rust-side wrapper would look like this:
#[no_mangle]
pub extern "C" fn rust_wrapper_key_to_pem_external(key: *const u8, key_len: usize) -> *mut Vec<u8> {
// .. construction of `result` exactly as before
marshall_to_haskell_external(result, RW)
}
with Haskell counter-part:
{# fun pure unsafe rust_wrapper_key_to_pem_external as toPemExternal
{ toBorshFixed* `SecretKey'&
}
-> `Text' fromExternalBorsh*
#}
The type of the function constructed by c2hs
is then SecretKey -> Text
. The advantages of this approach is that no initial buffer size needs to be estimated (we just use whatever buffer was allocated Rust-side), a second round-trip is guaranteed not to be needed, and we avoid copying the buffer. We still have the serialization/deserialization overhead, of course, and—perhaps more importantly—it is difficult to predict quite how long we will hold on to that Rust-allocated buffer. The deserializer might return values that directly or indirectly point to that buffer, and since these buffers are allocated on the Rust heap, not the Haskell heap, memory profiling might be difficult. In most cases, this approach is therefore probably not the right choice.
Conclusions
This was a long blog post, so let’s summarize:
- Expose
extern "C"
functions in your Rust-code; you can use the Rust libraryhaskell-ffi
to serialize and deserialize data in a convenient manner. - Build your Rust library with
cargo cbuild
, to generate a header file and apkg-config
file. - Declare a
pkg-config
dependency on the Rust library in yourcabal
file, as well as a dependency on the build tool (preprocessor)c2hs
. - Use
c2hs
to add bindings to the C functions; the Haskell libraryforeign-rust
is a companion library tohaskell-ffi
that makes this process very streamlined. - For data types with a fixed size encoding, the
c2hs
declaration might be all you need; otherwise you will write a simple wrapper function, again using functionality fromforeign-rust
. - To
Show
Rust-side values,foreign-rust
offers various ways, which show a value in base-16, base-58, base-64, or JSON format; each of these generate valid Haskell, so that you can denote Rust-side values within your Haskell source code. - In addition,
foreign-rust
offers functionality for annotating values with additional information, which can be quite helpful to get further information about Rust-side values during debugging. - Finally, if serialization of Rust-side values is undesirable or impossible, you can just pass pointers back and forth, using the Haskell garbage collector to call the Rust-side deallocator when a value is no longer in use. However, when you do this, you will have no way of denoting these values Haskell-side.
from Hacker News https://ift.tt/W2h6vUz
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.