In this blog, I explain the process I followed to implement the most memory-efficient and high-performant encoding library for the SSZ encoding protocol in Go.
[](data:image/svg+xml,%3csvg%20xmlns=%27http://www.w3.org/2000/svg%27%20version=%271.1%27%20width=%2730%27%20height=%2730%27/%3e)
[](data:image/svg+xml,%3csvg%20xmlns=%27http://www.w3.org/2000/svg%27%20version=%271.1%27%20width=%2730%27%20height=%2730%27/%3e)
A few years back, the Prysmatic Labs team, developers of the Ethereum Beacon chain Prysm opened a bounty to replace their implementation (go-ssz) of the SSZ encoding protocol.
Having worked with encoding libraries and high-performance systems in the blockchain space, I had some ideas I wanted to try out.
After some successful experiments and a little bit of coding, this was the result:
BenchmarkMarshalGoSSZ-4 753160 ns/op 115112 B/op 8780 allocs/op
BenchmarkUnMarshalGoSSZ-4 1395097 ns/op 144608 B/op 8890 allocs/op
BenchmarkMarshal_Fast-8 4088 ns/op 8192 B/op 1 allocs/op
BenchmarkMarshal_SuperFast-8 1354 ns/op 0 B/op 0 allocs/op
BenchmarkUnMarshal_Fast-8 17614 ns/op 11900 B/op 210 allocs/op
BenchmarkHashTreeRoot_Fast-8 45932 ns/op 0 B/op 0 allocs/op
For those of you who are not used to Go benchmarks, this is an x500 improvement for message encoding over the original (go-ssz) implementation.
In this blog, I want to give a walkthrough into optimizations behind the library and some of the design decisions. I will focus mostly on encoding and hashing (and a bit less on decoding) since that is where most of the high-performance strategies are located.
Simple Serialize (SSZ) is a binary data serialization format used in the Ethereum Beacon chain. It replaces the RLP serialization used in the execution clients. Unlike RLP which only specifies the encoding format, SSZ also defines how the objects are Merkleize efficiently.
SSZ is deterministic and not self-describing, to decode a blob of binary data, the message schema must be known in advance. This differs from Protobuf which encodes in the message itself a description of the format.
Encoding in SSZ is similar in nature to the ABI format to interact with smart contracts. The result byte stream is divided into two parts: fixed and heap areas. Basic and static values (i.e. integer, bool, fixed size bytes) are appended to the fixed area while dynamic types are written to the heap area recursively following the same rules. An offset is written in the fixed area to the point in the heap where the value starts.
SSZ objects can also be transformed into a Merkle-tree representation. The values of the object are recursively split into chunks of 32 bytes and hashed into a Merkle-tree. Additional empty leaves might be included so that the total number of leaves is a power of 2 and a single hash-tree-root is produced.
Even though Go does support generic types now, it is still not possible to create a generic data serialization library without the use of the reflect package, a meta-programming tool in Go to determine the shape of the objects at runtime.
A simple encoding library with reflection might look like this:
func encode(i interface{}) []byte {
res := []byte{}
switch reflect.TypeOf(i).Kind() {
case reflect.Int64:
// encode int 64
case reflect.String:
// encode string
}
}
However, it comes with a cost: it is hard to debug, error-prone, and inefficient.