Wasm threading: parallel without paying postMessage

The saving grace of low-tier devices is that they’re multithreaded. Wasm takes better advantage of that than JavaScript does.

In Rust + Wasm with wasm-bindgen-rayon, every thread is a Web Worker spawned over the same WebAssembly.Memory. A &[f32] passed to rayon::par_iter_mut is a pointer into shared linear memory; every worker thread can read and write it directly. There is no copy. Standard library std::sync::Mutex, RwLock, MPSC channels, atomics all just work.

In JavaScript, the equivalent fan-out is postMessage to a pool of Web Workers and there are two flavours of it shown the below benchmarks (structural cloning and trying as much as possible to minimise copy).

The workload

The lightest meaningful parallel map: SAXPY, out[i] = a * x[i] + y[i]. Per element it’s one f32 multiply + one f32 add. It’s a simple operation where we aim to measure message overhead rather than computation speed.

We test 5 variants, as detailed below:

1. JavaScript variant (single-threaded)

A baseline comparison that simply executes everything on one thread.

const n = x.length;
for (let i = 0; i < n; i++) {
  output[i] = a * x[i] + y[i];
}

2. JavaScript worker (structured clone)

A persistent pool of K = navigator.hardwareConcurrency (capped at 8) workers.

Each call, we:

postMessage {x_chunk, y_chunk, a} to each worker (structured-clone alloc + memcpy).
the worker performs the computation.
the worker allocates an output Float32Array.
the worker posts it back.
the main thread glues the K output chunks together.

self.onmessage = (event: MessageEvent<SaxpyCloneRequest>) => {
    const { requestId, a, x, y } = event.data;
    const n = x.length;
    const output = new Float32Array(n);
    for (let i = 0; i < n; i++) {
        output[i] = a * x[i] + y[i];
    }
    const response: SaxpyCloneResponse = { requestId, output };
    self.postMessage(response);
};

3. JavaScript worker (transferables)

The same as the structured clone version but where no allocations occur and we take advantage of transferables.

self.onmessage = (event: MessageEvent<SaxpyTransferRequest>) => {
    const { requestId, a, x, y, output } = event.data;
    const n = x.length;
    for (let i = 0; i < n; i++) {
        output[i] = a * x[i] + y[i];
    }
    const response: SaxpyTransferResponse = { requestId, x, y, output };
    self.postMessage(response, [x.buffer, y.buffer, output.buffer]);
};

4. Rust scalar single threaded

The same as 1 but entirely in Wasm.

#![allow(unused)]
fn main() {
#[wasm_bindgen]
pub fn saxpy_scalar(n: u32, a: f32) {
    let n = n as usize;
    SAXPY_X.with(|x| {
        SAXPY_Y.with(|y| {
            SAXPY_OUT.with(|o| {
                let x = x.borrow();
                let y = y.borrow();
                let mut o = o.borrow_mut();
                let x = &x[..n];
                let y = &y[..n];
                let o = &mut o[..n];
                for i in 0..n {
                    o[i] = a * x[i] + y[i];
                }
            })
        })
    });
}
}

5. Rust parallel (Rayon + Atomics)

We perform multithreading using the Shared Array Buffer and Atomics Web API. Zero bytes cross any boundary and the buffers live where the threads can already access them.

#![allow(unused)]
fn main() {
#[wasm_bindgen]
pub fn saxpy_parallel(n: u32, a: f32) {
    let n = n as usize;
    SAXPY_X.with(|x| {
        SAXPY_Y.with(|y| {
            SAXPY_OUT.with(|o| {
                let x = x.borrow();
                let y = y.borrow();
                let mut o = o.borrow_mut();
                let x = &x[..n];
                let y = &y[..n];
                let o = &mut o[..n];
                o.par_iter_mut()
                    .with_min_len(8192)
                    .zip(x.par_iter())
                    .zip(y.par_iter())
                    .for_each(|((out, &xv), &yv)| {
                        *out = a * xv + yv;
                    });
            })
        })
    });
}
}

RustWeek 2026

Wasm threading: parallel without paying postMessage

The workload

1. JavaScript variant (single-threaded)

2. JavaScript worker (structured clone)

3. JavaScript worker (transferables)

4. Rust scalar single threaded

5. Rust parallel (Rayon + Atomics)

The chart

Analysis

What about `SharedArrayBuffer` + `Atomics` in JavaScript?

Keyboard shortcuts

RustWeek 2026