The F3 Journey: Testing Insights and Timeline Adjustments

<aside> 👉

This was the draft that ultimately became https://medium.com/@filoz/the-f3-journey-testing-insights-and-timeline-adjustments-f43acfd4ea72

</aside>

In our last blog post, Finality Unveiled: Passive testing to Mainnet Launch of F3, we shared the potential of Fast Finality (F3) and outlined a timeline targeting deployment in early 2025. Since then, our engineering team has conducted intensive mainnet passive testing at 100% scale. This was the first time F3 has been tested under real-world conditions at 100% scale of the mainnet, which revealed promising results and important revelations about network behavior, participation requirements, and the need for some additional optimizations.

This blog update serves two purposes: first, to share the technical insights we gained from the testing efforts, and second, to outline the necessary adjustments to our deployment timeline. These changes reflect our commitment to ensuring F3's stability and effectiveness when it goes live on mainnet.

🔍 Testing Insights: Successes & Areas for Improvement

Over four weeks, we conducted 47 rounds of passive testing, gradually scaling from a small percentage of network participation to full 100% network involvement. We would like to give a large shoutout to the whole Filecoin community for asking great questions, notifying us about excessive logs, and raising potential concerns.

A lot of developer energy was put into optimizing and monitoring metrics, tweaking parameters as well as edge-case testing starting and stopping the system. We also put out daily passive testing reports in the Github Discussion thread. Our testing focused on two critical phases of F3 operation:

The Bootstrap Phase: Where F3 catches up to the latest state
The Steady State Phase: The ongoing finalization of the chain after the initial Bootstrap phase

✅ Bootstrapping - Metrics are Looking Good

During the Bootstrap phase in F3, nodes in the network work together to catch up on the most recent epoch that was published, they do this in groups of 100 epochs at a time until they reach the most recent ones. When F3 starts the bootstrap phase, it typically needs to process about 900 blocks to get up to speed. Below is a picture of the Bootstrapping phase that happened during one of our passive testing rounds.

The Y-axis shows how many epochs F3 (Fast Finality) was behind the latest epoch, while the X-axis represents time.

Our goal for the Bootstrap phase during passive testing was to tweak the parameters and available knobs we have in F3—like how long we are waiting for messages to arrive, how often we retry sending messages, and such—to make the "stairs" in this chart as short and happen as fast as possible. But there is a big caveat: if we make it too steep, it will increase the bandwidth usage of nodes, which is not great and could put nodes under significant stress—so there is a fine balance here.

After some time observering and tweaking these metrics, we successfully optimized this phase.At 100% scale of the network we were able to finish the Bootstrap phase in around 1 hour and 10 minutes, while the node bandwidth usage was averaging less then 10 MiB/s download, and 10 MiB/s upload. Definetly a big success!

Climbing stairs is easy peasy!

😬 Steady State Challenges

With the bootstrap phase of F3 in the rear view mirror, our next objective during the passive testing rounds was to make sure that the steady state of F3, was “good enough” that we can activate on Mainnet. And with “good enough”, we mean that the finalization progress is consistent over time and that it is finalising the chain relatively fast.

Over the course of multiple rounds of testing we realised that there were some additional knobs and parameters that we really wished we could tune which could allow us to achieve this “good enough” state.

⚠️ First challenge: progress is not consistent enough

We observed that there are periods where F3 was not consistent enough in the steady state. This inconsistency has multiple downside; builders building apps on top of the F3 APIs needs the APIs to be consistent, else it can give people the impression that things are “broken”. Additionally, such inconsistency leads to periods of higher bandwidth usage and spikes, which is not great for node operators.

We need more consistent “Zig-Zag” pattern!