Compiling a face detector written in C to WebAssembly


In our last post, we described pico.js, a library for real-time face detection written in 200 lines of JavaScript. The original implementation of pico is written in C: https://github.com/nenadmarkus/pico. Here we show how to compile its runtime part to WebAssembly.

Highlights of this post:

About WebAssembly

The official page of the WebAssembly project contains the following definition:

WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications.

The wikipedia article mentions that Wasm should complement JavaScript to speed up performance-critical parts of web applications and later on to enable web development in other languages than JavaScript.

The main advantages that Wasm aims to deliver are

This sounds doable and very promising. After all, Wasm is a binary format that is intended to be compiled to machine code.

From recently (end of 2017), all major browsers support Wasm (see this blog post).

Let us now go through how to port the official (C-based) pico to Wasm.

Porting pico to Wasm

The following subsections talk about:

All are more-or-less self-contained, so you can skip some parts if they introduce something you already know.

The Emscripten toolchain

We will use the Emscripten toolchain for compiling C to Wasm. The official website introduces it as follows:

Emscripten is a toolchain for compiling to asm.js and WebAssembly, built using LLVM, that lets you run C and C++ on the web at near-native speed without plugins.

It is an impressive piece of software that enabled enthusiasts to port a lot of C/C++ apps to the web environment.

Our initial goal is to setup the Emscripten C compiler, emcc. This can be done by installing the Emscripten SDK: https://developer.mozilla.org/en-US/docs/WebAssembly/C_to_wasm#Emscripten_Environment_Setup. A nice thing about this SDK is that it's portable: everything is contained in a single folder and it does not mess with the configuration of your system.

The following sections assume that emcc is available on the system.

Wrapping the official pico code and compiling it to Wasm

Let us start by downloading the runtime from the official repo:

wget https://github.com/nenadmarkus/pico/raw/346881039e5d1f5abe64733a49886bdfd5ab2d51/rnt/picornt.c

This small C file contains a function find_objects that can be used to detect faces in images when invoked with proper parameters. One of the parameters that needs to be passed to this function is the detection cascade. The detection cascade can be seen as a small brain that is able to discern objects of interest from image background. One such detection cascade designed to find faces in images is named facefinder. Let us download it from the official repository:

wget https://github.com/nenadmarkus/pico/raw/346881039e5d1f5abe64733a49886bdfd5ab2d51/rnt/cascades/facefinder

Note that its size is around 250kB and that this will roughly equal the size of the output Wasm file as we will plug it in directly. However, before we can do that, we need to transform it into a C-compatible hexadecimal array with the following command:

cat facefinder | hexdump -v -e '16/1 "0x%x," "\n"' > facefinder.hex

You can open facefinder.hex with a text editor and view its contents.

Next, let us make a simple wrapper main.c aroud the pico runtime:

#include "picornt.c"

int find_faces(
    float rcsq[],
    int maxndetections,
    unsigned char pixels[],
    int nrows,
    int ncols,
    int ldim,
    float scalefactor,
    float shiftfactor,
    float minfacesize,
    float maxfacesize
)
{
    static char facefinder[] = {
        #include "facefinder.hex"
    };

    return find_objects(
            rcsq, maxndetections,
            facefinder,
            0.0f,
            pixels, nrows, ncols, ldim,
            scalefactor, shiftfactor,
            minfacesize, maxfacesize
    );
}

Notice that facefinder.hex will be included into this C program via a preprocessor #include directive.

The reader of this post might be confused by a large number of parameters needed by the function find_objects. Please go and take a look at the official sample for more details. For our purposes, it suffices to say that rcsq is an array of 4 times maxndetections that will hold the detection results after pico finishes processing the image. We can set maxndetections to some reasonable number, e.g., $1024$ as we do not expect more faces than that. The array pixels holds the grayscale pixel values of the image. Both rcsq and pixels need to be allocated in advance by the user. Parameters minfacesize and maxfacesize are self-explanatory and should be set to, e.g., $100$ and $1000$ for real-time performance, respectively. A good value for scalefactor is $1.1$ and shiftfactor can be set to $0.1$. Thus, if we assume that the image is of size $480\times 640$, we can invoke find_faces as follows:

int nfaces = find_faces(rcsq, 1024, pixels, 480, 640, 640, 1.1f, 0.1f, 100, 1000);

The variable nfaces now contains the number of faces found in the image and rcsq is filled with their positions, sizes and detection quality.

The idea is to expose the find_faces function to JavaScript and invoke it from there. We will do this through WebAssembly. The Emscripten C compiler will help us with this:

emcc main.c -o wasmpico.js -O3 -s EXPORTED_FUNCTIONS="['_find_faces', '_cluster_detections', '_malloc', '_free']" -s WASM=1

This will generate two files: wasmpico.js and wasmpico.wasm. The file wasmpico.js contains the boilerplate code to load wasmpico.wasm. Since we have passed the -s EXPORTED_FUNCTIONS="['_find_faces', '_cluster_detections', '_malloc', '_free']" flag to emcc, the following functions will be available to JavaScript:

Let us now see how to use wasmpico from JavaScript.

Running wasmpico in your JavaScript program

First, include the following boilerplate code:

fetch('wasmpico.wasm').then(function(response)
{
    response.arrayBuffer().then(function(buffer)
    {
        WebAssembly.compile(buffer).then(function()
        {
            // the script 'wasmpico.js' will instantiate the object Module once the 'wasmpico.wasm' loads
            var script  = document.createElement('script');
            script.src  = 'wasmpico.js';
            script.type = 'text/javascript';
            script.defer = true;
            document.getElementsByTagName('head').item(0).appendChild(script);

            console.log('* wasm loaded');
        })
    })
})

The whole thing loads asynchronously, so you cannot use its functionality instantly. We assume in the following text that this initialization process has finished and the object Module is available for use.

We draw the image onto the canvas to retrieve its RGBA pixel values:

// we assume these are the height and width of our image
var nrows=480, ncols=640;

var ctx = document.getElementsByTagName('canvas')[0].getContext('2d');
ctx.drawImage(image, 0, 0);
var rgba = ctx.getImageData(0, 0, ncols, nrows).data;

Next, we need to move this data into the memory of our Wasm module. This can be done as follows:

// allocate memory inside the Wasm module
var ppixels = Module._malloc(nrows*ncols);

// move the RGBA pixels to the Wasm memory as grayscale values
var pixels = new Uint8Array(Module.HEAPU8.buffer, ppixels, nrows*ncols);

for(var r=0; r<nrows; ++r)
    for(var c=0; c<ncols; ++c)
        // gray = 0.2*red + 0.7*green + 0.1*blue
        pixels[r*ncols + c] = (2*rgba[r*4*ncols+4*c+0]+7*rgba[r*4*ncols+4*c+1]+1*rgba[r*4*ncols+4*c+2])/10;

Before invoking the face detector, we need to allocate a buffer that will be used to store detection results:

// support a maximum of 1024 face detections per image (more than plenty in our case)
var maxndetections = 1024;
var prcsq = Module._malloc(4*4*maxndetections);
var rcsq = new Float32Array(Module.HEAPU8.buffer, prcsq, maxndetections);

Finally, we can now invoke the face detector and draw the found faces:

// run the detector across the image
var ndetections = Module._find_faces(prcsq, maxndetections, ppixels, nrows, ncols, ncols, 1.1, 0.1, 100, 1000);

// cluster overlapping detections
ndetections = Module._cluster_detections(prcsq, ndetections);

// draw detections
for(i=0; i<ndetections; ++i)
    // check the detection score
    // if it's above the (empirical) threshold, draw it
    if(rcsq[4*i+3]>3.0)
    {
        ctx.beginPath();
        ctx.arc(rcsq[4*i+1], rcsq[4*i+0], rcsq[4*i+2]/2, 0, 2*Math.PI, false);
        ctx.lineWidth = 3;
        ctx.strokeStyle = 'red';
        ctx.stroke();
    }

Be sure to look at the source code of the webcam-based demo if something is not clear after this exposition. The demo is pretty well commented.

Final notes

The potential of WebAssembly is huge. Some preliminary experiments show that wasmpico is two times faster than pico.js. We will write about this in a future post.

Note that the performance gap might improve further in the future.


https://tkv.io