WebAssembly Binary Format Review

In this blog post, we will review the main concepts of WebAssembly binary format, which is a dense linear encoding of the abstract syntax of WebAssembly modules. WebAssembly (abbreviated as Wasm) is a binary instruction format for a stack-based virtual machine, designed as a portable compilation target for programming languages. Wasm can be executed at native speed by taking advantage of common hardware capabilities available on a wide range of platforms.

The binary format for WebAssembly modules is defined by an attribute grammar whose only terminal symbols are bytes. A byte sequence is a well-formed encoding of a module if and only if it is generated by the grammar. The grammar specifies how to encode each syntactic construct of WebAssembly using a variable-length integer encoding scheme that is similar to UTF-8 and LEB128.

The binary format has several advantages over a textual format. It is more compact, reducing the size of modules and improving loading times. It is also more efficient to parse and validate, as it can be done in a single pass over the bytes. Moreover, it is designed to be easy to generate and manipulate by compilers and tools.

The binary format consists of four main components:

  • A module header that identifies the file as a WebAssembly module and indicates the version of the format.
  • A section table that lists the sections present in the module and their sizes.
  • A sequence of sections that contain the actual data of the module, such as types, functions, globals, tables, memories, etc.
  • A name section that provides optional human-readable names for the elements of the module.

Each section has a unique id and a payload that depends on the section type. The sections can appear in any order, except for the custom sections that must be interleaved with the predefined sections. Custom sections can be used to store additional information that is not part of the core specification, such as debugging symbols or source maps.

The following diagram shows an example of a WebAssembly binary module with three sections: type, function, and code.

Module headerSection tableType sectionFunction sectionCode section
0x0061736d0x030x010x030x0a
0x010000000x070x010x020x09
0x600x010x02
0x000x000x07
0x010x00
0x7f0x41
0x01
0x10
0x00
0x0b

The module header consists of four bytes that spell out “\asm” in ASCII, followed by four bytes that indicate the version number in little-endian order. In this case, the version is 1.

The section table consists of a single byte that indicates the number of sections in the module, followed by pairs of bytes that indicate the id and size of each section. In this case, there are three sections: type (id = 1, size = 1), function (id = 3, size = 2), and code (id = 10, size = 9).

The type section consists of a single byte that indicates the number of function types in the module, followed by sequences of bytes that encode each function type. A function type is encoded as a byte that indicates the form of the type (currently only 0x60 is allowed), followed by two vectors of bytes that indicate the parameter types and the return types respectively. A vector is encoded as a byte that indicates the length of the vector, followed by one byte per element. A value type is encoded as a single byte that indicates its numeric representation: 0x7f for i32, 0x7e for i64, 0xf32 for f32, and 0xf64 for f64. In this case, there is one function type: (func) -> (i32).

The function section consists of a single byte that indicates the number of functions in the module, followed by one byte per function that indicates its type index. The type index is an unsigned integer that refers to an entry in the type section. In this case, there is one function with type index 0.

The code section consists of a single byte that indicates the number of function bodies in the module, followed by sequences of bytes that encode each function body. A function body is encoded as a vector of bytes that indicates its size, followed by a vector of bytes that indicates its local variables, followed by a sequence of bytes that indicates its instructions. A local variable is encoded as two bytes: one that indicates its count and one that indicates its type. An instruction is encoded as a single byte that indicates its opcode, followed by zero or more bytes that indicate its immediate operands. In this case, there is one function body with size 7, no local variables, and four instructions: i32.const 1 (opcode = 0x41, operand = 1), call 0 (opcode = 0x10, operand = 0), end (opcode = 0x0b), end (opcode = 0x0b).

This example illustrates how the WebAssembly binary format encodes modules in a compact and efficient way. For more details on the binary format and its grammar rules, you can refer to the official specification.