Emscripten and WebAssembly

Posted at — Jan 26, 2020

WebAssembly

WebAssembly is now supported on major browsers and enables a higher performance than processing power due to its low-level binary executed at client-side.

WebAssemly uses modules as the distributable, loadable, and executable unit of code. Multiple module instances can access the same shared state which is the basis for dynamic linking in WebAssembly (source).

Emscripten is a toolchain for compiling to asm.js and WebAssembly, built using LLVM, that lets you run C and C++ on the web at near-native speed without plugins (source).

Modern compilers have front-end compilers to support different kind of languages generating the same bitcode that is then compiled to machine code by a backend part. For example, CLANG is the frontend compiler for C, C++ and several other languages that uses LLVM backend. For more details, it is worth to watch this video.

Emscripten is not a frontend compiler. It does not intend to do the same as clang. It uses clang in its toolchain in order to enable C/C++ code to be deployed to WASM or ASM.js. When generating LLVM bitcode from emcc or clang they will generate same bitcode, as in this hello world example compiled and disassembled using clang+llvm tools (left) and emscripten+llvm tools (right).

Installing Emscripten

The default installation is quite easy and can be found here.

It is always possible to use a docker image for using Emscripten:

docker run --rm -v $(pwd):/src -u $(id -u):$(id -g) -w /src -ti trzeci/emscripten emcc -o <output file> -s WASM=1 <c source code>

If you are not used to docker, check this reference.

Example

Before your attention span is gone, let see it is working. As it seems everybody is raytracing let’s do it too because we can. I followed Raytracing in One Weekend and got this nice DEMO working in WebAssembly.

My source code is ugly but it’s quite easy too follow if you’ve read Peter’s book. This is not intended for you to reading, but there is always someone that may feel curious.

Press the render button below and the page will become irresponsive for some moments.I should had an async js call, but it is totally out of scope for this post.

It should generate an output like this:

Raytracing may generate realistic images because they rely on a “simulation” of how lightinteract with objects, but this will be subject for another post. But in summary, for every pixel we should a ray and check whether it hits a surface or not. So for a 640x480 wewill have more than 300K rays shot at minimum, but when a ray hits an object it will scatter in different directions with different possibly with different intensities and will possibly hitting other objects and may bounce and scatter again. Still with me ? What I mean it is compute-intensive.

Due to its nature, GPUs copes well with raytracing. CPUs may have multiple threads but not enough raw power to deal with so many threadings for every ray. In the source code for this example, when running natively there is OMP and some other trickery to get it faster.

Now, can you try to imagine running this in pure javascript and its many layers of indirection? That’s why I selected for this case the raytracing algorithm.

Now back to the teory.

More about Emscripten and wasm

Emscripten supports several ouputs as ASM.js and WASM and may use a template to create a page loading the created module. By default Emscripten generates ASM.js. To specify a output as WASM it is necessary to set WASM=1 as argument for the emcc. For more details on ASM.js vs WASM check this StackOverflow discussion.

WebAssembly Text Format

This post will not go into the details of the S-expr language used as text format for WebAssembly. More information here and here.

In few words, you can create wasm modules in a S-expr language using webassembly binary toolkit. Here is an example from the fantastic post from Colin Eberhardt at Scott Logic that you definitely should read for more information.

Let’s start simple:

//hello.c
int main(){
    return 0;
}

Compiling it with Emscripten as:

docker run --rm -ti -v $(pwd):/src -u $(id -u):$(id -g) -w /src trzeci/emscripten emcc -o hello.bt -s WASM=1 hello.c

The source above will generate the LLVM bitcode:

; hello.wat
;  .wat (text) → wat2wasm →.wasm
(module
  (func (result i32)
    (i32.const 42)
  )
  (export "helloWorld" (func 0))
)

The output file will be an LLVM IR bitcode that we can disassemble and check llvm toolchain kit. The llvm-nm is able to lists the names of symbols from LLVM bitcode files. For now, as expected, there is only symbol for the main function (T stands code (text)object).

$ llvm-nm hello.bt
-------- T main

It is possible to disassemble the llvm bitcode with llvm-dis.

$ llvm-dis hello.bt

And there will be the main function first part of hello.bt.ll generated file:

; ModuleID = 'hello'
source_filename = "hello.c"
target datalayout = "e-p:32:32-i64:64-v128:32:128-n32-S128"
target triple = "asmjs-unknown-emscripten"
; Function Attrs: noinline nounwind
define i32 @main() #0 {
  %1 = alloca i32, align 4
  store i32 0, i32* %1, align 4
  ret i32 0
}

For more details over what is happening check the LLVM Lang specification. But in summary, this code only allocated 4 bytes for the unnamed variable %1. Then it stores the integer 0 value at %1address. And finally returns 0 for the main function. Extending this example with a simple add operation:

//hello.c
int main(void){
    int c = 1 + 2;
    return 0;
}

Generates the following LLVM Bitcode:

; Function Attrs: noinline nounwind
define i32 @main() #0 {
  %1 = alloca i32, align 4
  %2 = alloca i32, align 4
  store i32 0, i32* %1, align 4
  store i32 3, i32* %2, align 4
  ret i32 0
}

It is not hard to follow this disassembled as it follows the same structure from the one explained before. Next step is to understand how LLVM Bitcode stacks variable for usage in other function. This will be useful in the nexts steps of understanding WASM.

//hello.c
int sum(int a, int b) { 
    int c = a + b;
    return c;
}
int main(void){
    int x = sum(3, 4);
    return 0;
}

And the disassembled code:

; Function Attrs: noinline nounwind
define i32 @sum(i32, i32) #0 {
  %3 = alloca i32, align 4        ; memory for variable a
  %4 = alloca i32, align 4        ; memory for variable b
  %5 = alloca i32, align 4        ; memory for variable c
  store i32 %0, i32* %3, align 4  ; copy variable at %0 to %3
  store i32 %1, i32* %4, align 4  ; copy variable at %1 to %4
  %6 = load i32, i32* %3, align 4 ; now load %3 at %6
  %7 = load i32, i32* %4, align 4 ; and %4 at %7
  %8 = add nsw i32 %6, %7         ; adds %6 and %7 to %8
  store i32 %8, i32* %5, align 4  ; save the content of %8 to %5
  %9 = load i32, i32* %5, align 4 ; put the content at %9
  ret i32 %9                      ; phew, finally return %9
}
; Function Attrs: noinline nounwind
define i32 @main() #0 {
  %1 = alloca i32, align 4
  %2 = alloca i32, align 4
  store i32 0, i32* %1, align 4
  %3 = call i32 @sum(i32 3, i32 4)
  store i32 %3, i32* %2, align 4
  ret i32 0
}

As values are copied in to the scope of functions we need to store the values into the variable names a and b then the add operation is finally done and saved to c, then c is returned. I will not go into detail here, but I think it is easy to understand this last step to print something:

//hello.c
#include <stdio.h>
int main(void){
    printf("Hello world!");
    return 0;
}

LLVM disassembled bitcode:

@.str = private unnamed_addr constant [13 x i8] c"Hello world!\00", align 1
; Function Attrs: noinline nounwind
define i32 @main() #0 {
  %1 = alloca i32, align 4
  store i32 0, i32* %1, align 4
  %2 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str, i32 0, i32 0))
  ret i32 0
}
declare i32 @printf(i8*, ...) #1

If a function from another library is needed to be linked the result is very similar (i.e. sqrt from math.h needs to be linked with -lm):

#include <stdio.h>
#include <math.h>
int main(void){
    int x = sqrt(4);
    printf("%d", x);
    return 0;
}

generates:

@.str = private unnamed_addr constant [3 x i8] c"%d\00", align 1
; Function Attrs: noinline nounwind
define i32 @main() #0 {
  %1 = alloca i32, align 4
  %2 = alloca i32, align 4
  store i32 0, i32* %1, align 4
  %3 = call double @llvm.sqrt.f64(double 4.000000e+00)
  %4 = fptosi double %3 to i32
  store i32 %4, i32* %2, align 4
  %5 = load i32, i32* %2, align 4
  %6 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i32 0, i32 0), i32 %5)
  ret i32 0
}
; Function Attrs: nounwind readnone speculatable
declare double @llvm.sqrt.f64(double) #1
declare i32 @printf(i8*, ...) #2

WASM and Javascript Module

Emscripten was generating LLVM IR Bitcode, but if .js extension is provided to output option emcc will spit WASM and loadable Javascript modules.

$ docker run --rm -ti -v $(pwd):/src -u $(id -u):$(id -g) -w /src trzeci/emscripten emcc -lm -o hello.js hello.c
$ ls
hello.c  hello.js  hello.wasm

This “glue” code has about 2.5k lines of code. It exports a Module var which can be accessed from HTML which allows execution of wasm and some other functionalities. Emscripten can generate some glue code to be used inside node easily.

docker run --rm -v $(pwd):/src -u $(id -u):$(id -g) -w /src -ti trzeci/emscripten emcc -o hello.js -s WASM=1 -s ENVIRONMENT=node hello.c

To use the generate code from node it is enought to load the glue code as a module.

//index.js
const hello = require('./hello');

If it was not the glue code generate by Emscripten toolchain it would need node source code module to load the .wasm binary to a buffer then use WebAssembly module to instantiate the wasm module as shown at The Code Barbarian post. In many cases the boilerplate generated by Emscripten is not a real problem, but some time it is. It is better to check the alternatives. By default, when loading Emscripten glue code the main() function will be called.

$ node index.js 
hello world

In fact, when emcc runs it will inline several methods and remove all function calls. But we’ll get there. In fact, if we inspect our hello.js glue code, we’ll found an object named Module that will contain a function named callMain.

Module['callMain'] = function callMain(args) {
// ...
try {
var ret = Module['_main'](argc, argv, 0);
// if we're not running an evented main loop, it's time to exit
      exit(ret, /* implicit = */ true);
  }
// ...

The stacked of execution goes something like this:

run() →
   doRun() →
        Module['callMain'](args) →
            Module['main'](argc, argv)

After main() is called the wasm code will be executed then terminated. If state of memory from wasm module is to be maintained and the execution not be terminated we should add provide emcc with -s NO_EXIT_RUNTIME=1.

You can skip this digression:

I noticed when comparing generated files, I see no difference in them when generating with NO_EXIT_RUNTIME even using global variables, keepalive functions or requesting html, etc.

I guess emscripten is smart enough to know when you will want to keep state alive or not. But I notice something when you set NO_EXIT_RUNTIME=0. It adds to the js glue code a call to a function named callRuntimeCallbacks(__ATEXIT__).

__ATEXIT__ is an array as used in command pattern. In summary it adds some finalization activities when calling exit() method. The stack goes something like:

Module['callMain'] →
    exit(ret, true) →
        callRuntimeCallbacks(__ATEXIT__) →
            // consume every callback from __ATEXIT__ array

ccall, cwrap and exported functions As stated before, when there is a main() function, it gets called by default if no other functions are exported. So there are some ways of calling functions at wasm level with Emscripten help. According to Emscripten documentation:

- ccall() calls a compiled C function with specified parameters and returns the result, while
- cwrap() "wraps" a compiled C function and returns a JavaScript function you can call normally. 
- cwrap() is therefore more useful if you plan to call a compiled function a number of times.

In other words, cwrap gives you a function directly to call from javascript while ccall requires you to manipulate some stuff to call the function directly. But in other to use these functions we need to add some extra parameters when generating the wasm with emscripten:

docker run --rm -v $(pwd):/src -u $(id -u):$(id -g) -w /src -ti trzeci/emscripten emcc -o hello.js \
    -s WASM=1 -s ENVIRONMENT=node \
    -s "EXTRA_EXPORTED_RUNTIME_METHODS=['ccall','cwrap']" \
    hello.c

According to this github thread: “Emscripten toolchain slimmed down its footprint a while back, and one of the things it dropped was cwrap”. But I think that ccall and cwrap gives you access to other non-exported functions, so I am not sure if other security elements were not part of the decision to not let them enabled by default.

To call a C function other than main from Javascript using ccall or cwrap it is necessary to add the EMSCRIPTEN_KEEPALIVE property to the function. As mentioned earlier, LLVM will strip out any function that it is not meant to be there.

#include <stdio.h>
#include <emscripten.h>
int EMSCRIPTEN_KEEPALIVE add(int a, int b) {
 return a + b;
}
int main(){
 printf("Hello World!\n");
 return 0;
}

The EMSCRIPTEN_KEEPALIVE will also export functions as if they were added to EXPORTED_FUNCTIONS which will be seen soon.

Some quick examples at node repl:

> const hello = require('./hello');
 
> hello._add(3,5);
8

> let func_add = hello.cwrap('add', 'number', ['number','number']);
> func_add(3,5)
8

> hello.ccall('add', 'number', ['number','number'], [3,5]);
8

Another digression:

You can avoid using the EMSCRIPTEN_KEEPALIVE directive by exporting the functions using -s "EXPORTED_FUNCTIONS=['_<function name>']". It seems to that it works the same.

ENVIRONMENT=web x ENVIROMENT=node

According to Emscripten FAQ, it is possible to reduce the size of glue code. Indeed, if no library is used and no other special feature is activated it does resize almost 100 lines if no environment is specified but almost the same number when we specify it as node. In both cases it strips some code the other would not use. So it may be helpful in some situations.

.html output

When specifying output to be an html it will use a pre-made template from the SDK (if other is not specified). It includes a canvas, a output text block and some controls. In the core of this HTML a Module element is declared. It does some stuff such as settings the output to the correct element and creating the webglcontext and managing the wasm loading to the browser.

Conclusion

In this post we explored about WebAssembly and Emscripten as a way of making it easy. Then we checked how it works the hard way by checking the generated code and how emscripten glues everything together.

I thought doing a series about this, but there are so many details and my curiosity is satisfied for now regarding WebAssembly. There are so many good sources for understanding the details and the implementation of wasm and emscripten that I don’t feel compeled to go further. You must definily check Colin Eberhardt at Scott Logic. If you’re like me, and reached this last line you will probably love Colin’s work.

fsan