Monday, February 24, 2020

Strings in WebAssembly


Wasm “Hello World!” example

Given all of this information, how would we write a “Hello World!” application in Wasm, for the Web? For example, how would we pass strings back and forth between the user’s interface and the Wasm execution environment?

“Here’s the big crux … WebAssembly needs to play well with JavaScript …we need to work with and pass JavaScript objects into WebAssembly, but WebAssembly doesn’t support that at all. Currently, WebAssembly only supports integers and floats” (Williams, 2019).

Shoehorning JavaScript objects into u32 for Wasm use, is going to take a bit of grappling.

Wrestling pictorial that looks surprisingly like a Crustacean. Coincidence? I think not.

Bindgen

Wasm-bindgen is a build-time dependancy for Rust. It is able to generate Rust and JavaScript code at compile time. It can also be used as an executable, called bindgen in the command line. Essentially, the wasm-bindgen tool allows JavaScript and Wasm to communicate high-level JavaScript objects like strings. As opposed to exclusively communicating number data types (Rustwasm.github.io, 2019).

How is this achieved?

Memory

“The main storage of a WebAssembly program is a large array of raw bytes, the linear memory or simply memory (Rossberg et al., 2018).

The wasm-bindgen tool abstracts away linear memory, and allows the use of native data structures between Rust and JavaScript (Wasm By Example, 2019). The current strategy is for wasm-bindgen to maintain a “heap”. This “heap” is a module-local variable which is created by wasm-bindgen, inside a wasm-bindgen-generated JavaScript file.

This next bit might seem a little confusing, just hang in there. It turns out that the first slots in this “heap” is considered a stack. This stack, like typical program execution stacks, grows down.

Temporary JS objects on the “stack”

Short-term JavaScript objects are pushed on to the stack, and their indices (position in the stack, and length) are passed to Wasm. A stack pointer is maintained to figure out where the next item is pushed (GitHub — RustWasm , 2020).

Removal is simply storing undefined/null. Because of the “stack-y” nature of this scheme it only works for when Wasm doesn’t hold onto a JavaScript object (GitHub — RustWasm , 2020).

JsValue

The Rust codebase of the wasm-bindgen library, itself, uses a special JsValue. A hand-written exported function, like the one pictured below, can take a reference to this special JsValue.

#[wasm_bindgen]
pub fn foo(a: &JsValue) {
// ...
}

wasm-bindgen generated Rust

The Rust code that #[wasm_bindgen] generates, in relation to the hand-written Rust above, looks something like this.

#[export_name = "foo"] 
pub extern "C" fn __wasm_bindgen_generated_foo(arg0: u32) {
let arg0 = unsafe {
ManuallyDrop::new(JsValue::__from_idx(arg0))
};
let arg0 = &*arg0;
foo(arg0);
}

Whilst the externally callable identifier is still known as foo. When called, the internal code of the wasm_bindgen-generated Rust function known as __wasm_bindgen_generated_foo is actually what is exported from the Wasm module. The wasm_bindgen-generated function takes an integer argument and wraps it in a JsValue.

It is important to remember that because of Rust’s ownership qualities, the reference to JsValue can not persist past the lifetime of the function call. Therefore the wasm-bindgen-generated Javascript needs to free the stack slot which was created as part of this function’s execution. Let’s look at the generated Javascript next.

wasm-bindgen generated JavaScript

// foo.js
import * as wasm from './foo_bg';
const heap = new Array(32).fill(undefined);
heap.push(undefined, null, true, false);
let stack_pointer = 32;
function addBorrowedObject(obj) {
stack_pointer -= 1;
heap[stack_pointer] = obj;
return stack_pointer;
}
export function foo(arg0) {
const idx0 = addBorrowedObject(arg0);
try {
wasm.foo(idx0);
} finally {
heap[stack_pointer++] = undefined;
}
}

The heap

As we can see the JavaScript file imports from the Wasm file.

Then we can see the aforementioned “heap” module-local variable is created. It is important to remember that this JavaScript is being generated by Rust code. If you would like to see how this is done, see line 747 in this mod.rs file. I have provided a snippet of the Rust, code that generates JavaScript, code below.

self.global(&format!("const heap = new Array({}).fill(undefined);", INITIAL_HEAP_OFFSET));

The INITIAL_HEAP_OFFSET is hard coded to 32 in the Rust file. So the array has 32 items by default.

Once created, in Javascript, this heap variable will store all of the JavaScript values that are reference-able from Wasm, at execution time.

If we look again, at the generated JavaScript, we can see that the exported function called foo , takes an arbitrary argument, arg0. The foo function calls the addBorrowedObject (passing into it arg0 ). The addBorrowedObject function decrements the stack_pointer position by 1 (was 32, now 31) and then stores the object to that position, whilst also returning that specific position to the calling foo function.

The stack position is stored as a const called idx0. Then idx0 is passed to the wasm_bindgen-generated Wasm so that Wasm can operate with it (GitHub — RustWasm , 2020).

As we mentioned, we are still talking about Temporary JS objects on the “stack”. If we look at the last text line of generated JavaScript code we will see that the heap at the stack_pointer position is set to undefined, and then automatically (thanks to the ++ syntax) the stack pointer variable is incremented back to its original value.

So far, we have covered objects that are only temporarily used i.e. only live during one function call. Let’s look at long-lived JS objects next.



from Hacker News https://ift.tt/2v2U7eP

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.