
Since the birth of computing technology, engineers and researchers have been exploring how to push computing resources to their performance limits, striving to maximize efficiency while minimizing the latency of computing tasks. The two pillars of high performance and low latency have always shaped the development of computer science, affecting a wide range of fields from CPUs and FPGAs to database systems, as well as recent artificial intelligence infrastructure and blockchain systems. In the pursuit of high performance, pipeline technology has become an indispensable means. Since the introduction of pipeline technology in IBM System/360 in 1964 [1], it has been at the core of high-performance system design and has promoted key discussions and innovations in this field.
Pipeline technology is not only used in hardware, but also widely used in the database field. For example, Jim Gray introduced the pipeline parallel method in his book "High Performance Database Systems"[2]. This method breaks down complex database queries into multiple stages and runs them simultaneously, thereby improving efficiency and performance. Pipeline technology is also crucial in the field of artificial intelligence, especially in the widely used deep learning framework TensorFlow. It uses data pipelines to parallelize data preprocessing and loading, ensuring smooth data flow for training and inference, making AI workflows faster and more efficient[3].
Blockchain is no exception. Its core functionality is similar to that of a database, processing transactions and updating state, but with the added challenge of Byzantine fault-tolerant consensus. The key to increasing blockchain throughput (transactions per second) and reducing latency (time to final confirmation) is to optimize the interactions of the different stages - ordering, executing, committing, and synchronizing transactions - under high load. This challenge is particularly critical in high-throughput scenarios, as traditional designs have difficulty maintaining low latency.
To explore these ideas, let’s revisit a familiar analogy: the automobile factory. Understanding how the assembly line revolutionized manufacturing helps us appreciate the evolution of the blockchain assembly line—and why next-generation designs like Zaptos[8] are pushing blockchain capabilities to new heights.
From car factories to blockchain
Imagine you are the owner of a car factory with two main goals:
Maximize throughput: Assemble as many cars as possible per day.
Minimize delays: Reduce the build time of each car.
Now, imagine three types of factories:
Simple Factory
In a simple factory, a team of multiskilled workers assemble a car in a systematic way. One worker builds the engine, the next installs the wheels, and so on—one car at a time.
The problem? Parts of the workforce were often left waiting, and overall productivity was low because no one was working on different parts of the same vehicle at the same time.
Ford Factory
Enter the Ford assembly line[4]! Here, each worker focuses on a single task. Cars move along a conveyor belt, and as each car passes, a dedicated worker adds its own part.
The result? Multiple cars in various stages of assembly at once, with all the workers busy. Throughput has increased dramatically—but each car still has to pass through each worker in turn, meaning the delay for each car remains the same.
Magic Factory
Imagine a magic factory where all workers can work on a car at the same time! Instead of moving the car from one station to the next, every part of the car is built at the same time.
The result? Cars were assembled in record time, with every step happening in sync. This is an ideal scenario for solving throughput and latency problems.
OK, enough about car factories — what about blockchain? It turns out that designing a high-performance blockchain isn’t that different from optimizing an assembly line.
Blockchain as a car factory
In blockchain, processing a block is similar to assembling a car. The analogy is as follows:
Workers = Validator resources
Car = one block
Assembly tasks = consensus, execution, and submission phases
Just like a simple factory processes only one car at a time, a blockchain that processes only one block at a time would underutilize its resources. Instead, modern blockchain designs strive to be like a Ford assembly line—processing different stages of multiple blocks at the same time. This is where assembly line technology comes in.
The Evolution of the Blockchain Pipeline
Traditional architecture: Sequential blockchain
Imagine a blockchain that processes blocks in order. Validators need to:
1. Receive block proposals.
2. Execute the block to update the blockchain state.
3. Continue to reach consensus on this state.
4. Persist the state to the database.
5. Start consensus on the next block.
What's the problem?
Execution and commit are in the critical path of the consensus process.
Each consensus instance must wait for the previous one to complete before it can start.
This setup is like a pre-Fordist factory: workers (resources) are often idle while focusing on just one block (car) at a time. Unfortunately, many existing blockchains still fall into this category, resulting in low throughput and high latency.
Aptos: Parallelizing Performance
Diem introduces a pipeline architecture that decouples execution and submission from the consensus stage, while the consensus stage itself also adopts a pipeline design.
Asynchronous execution and commit[5]: Validators first reach consensus on a block and then execute it based on the state of its parent block. After being signed and authenticated by a quorum of validators, the state is persisted to storage.
Assembly-line consensus (Jolteon[6]): A new consensus instance can start before the previous one has completed, similar to a moving assembly line.
This improves throughput by allowing different blocks to be in different stages simultaneously, and significantly reduces block time to just 2 message delays. However, Jolteon's leader-based design can create bottlenecks as leaders become overloaded with transaction distribution.
Aptos further optimizes the pipeline through Quorum Store[7], a mechanism that decouples data distribution from consensus. Instead of relying on a single leader to broadcast large blocks of data in the consensus protocol, Quorum Store decouples data distribution from metadata ordering, allowing validators to distribute data asynchronously and in parallel. This design leverages the aggregate bandwidth of all validators, effectively eliminating the leader bottleneck in consensus.

At this point, the Aptos blockchain has created the "Ford factory" of blockchains. Just as Ford's assembly line revolutionized car production - different stages of different cars were carried out simultaneously - Aptos processes different stages of different blocks simultaneously. Each validator's resources are fully utilized, ensuring that no part of the process is left waiting. This clever orchestration results in a high-throughput system, making Aptos a powerful platform for processing blockchain transactions efficiently and scalably.

While throughput is critical, end-to-end latency — the time from transaction submission to final confirmation — is equally important. For applications such as payments, decentralized finance (DeFi), and gaming, every millisecond counts. Many users experience latency during high-traffic events because each transaction must pass through a series of stages in sequence: client-full node-validator communication, consensus, execution, state verification, submission, and full node synchronization. Under high load, stages such as execution and full node synchronization introduce more latency.

It’s like a Ford factory: even though the assembly line maximizes overall throughput, each vehicle still has to go through each worker in turn, so it takes a long time to complete. To really push blockchain performance to its limits, we need to build a “magic factory” — one where these stages run in parallel.
Zaptos: Towards Optimal Blockchain Latency
Zaptos[8] further reduces latency without sacrificing throughput through three key optimizations.
Optimistic execution: Reduces pipeline latency by starting execution immediately after receiving a block proposal. Validators add blocks to the pipeline immediately and execute speculatively after the parent block is completed. Full nodes also perform optimistic execution to verify state proofs after receiving proposals from validators.
Optimistic commit: Write the state to storage immediately after the block is executed — even before the state is certified. When the validator finally certifies the state, only minimal updates are required to complete the commit. If a block is ultimately not ordered, its optimistically committed state is rolled back to maintain consistency.
Fast certification: By sending certification messages in parallel in the final consensus round, validators can start certifying the status of executed blocks in advance without waiting for consensus to complete. This optimization effectively reduces the pipeline latency by one round in common cases.

Through these optimizations, Zaptos effectively hides the latency of other pipeline stages within the consensus stage. Therefore, if the blockchain adopts a consensus protocol with optimal latency, the overall blockchain latency can also be optimized!
Empty talk is useless, data speaks
We evaluate the end-to-end performance of Zaptos through geo-distributed experiments, using Aptos as a high-performance baseline. See the paper [8] for more details.
On Google Cloud, we simulated a global decentralized network of 100 validators and 30 full nodes, distributed across 10 regions, using commodity-grade machines similar to the Aptos deployment.
Throughput-Latency

The figure above compares the end-to-end latency and throughput of the two systems. Both systems gradually increase latency as load increases, and experience sharp spikes at maximum capacity, but Zaptos always shows more stable latency before reaching peak throughput, with latency reduced by 160 milliseconds under low load and more than 500 milliseconds under high load.
Impressively, Zaptos achieves sub-second latency at 20k TPS in a production-grade mainnet environment — a breakthrough that enables real-world applications that require speed and scalability.
Delayed decomposition


The latency breakdown chart details the duration of each pipeline stage for both validators and full nodes. Key insights include:
To 10k TPS: Zaptos’ overall latency is almost equal to its consensus latency, since the optimistic execution, validation, and optimistic commit phases are effectively “hidden” within the consensus phase.
Beyond 10k TPS: Due to optimistic execution and increased full node sync time, non-consensus phases become more significant. Nonetheless, Zaptos significantly reduces overall latency by overlapping most phases. For example, at 20k TPS, the baseline total latency is 1.32 seconds (0.68 seconds for consensus, 0.64 seconds for other phases), while Zaptos is 0.78 seconds (0.67 seconds for consensus, 0.11 seconds for other phases).
in conclusion
The evolution of blockchain architecture is similar to the transformation of manufacturing - from simple sequential workflows to highly parallelized pipelines. Aptos' pipeline approach significantly increased throughput, and Zaptos goes a step further, reducing latency to sub-second levels while maintaining high TPS. Just as modern computing architectures leverage parallelism to maximize efficiency, blockchains must be continuously optimized by design to eliminate unnecessary latency. By fully optimizing blockchain pipelines for the lowest latency, Zaptos paves the way for real-world blockchain applications that require speed and scale.
References
[1] Gene M. Amdahl, Gerrit A. Blaauw, and Frederick P. Brooks. 1964. "Architecture of the IBM System/360." IBM Journal of Research and Development. https://doi.org/10.1147/rd.82.0087
[2] David DeWitt, and Jim Gray. 1992. "Parallel Database Systems: The Future of High Performance Database Systems." Communications of the ACM. https://doi.org/10.1145/129888.129894
[3] Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin et al. 2016. "TensorFlow: a System for Large-Scale Machine Learning." In 12th USENIX symposium on operating systems design and implementation (OSDI). https://arxiv.org/abs/1605.08695
[4] The Moving Assembly Line and the Five-Dollar Workday. https://corporate.ford.com/articles/history/moving-assembly-line.html
[5] Zekun Li, and Yu Xia. 2021. DIP-213 - Decoupled Execution. https://github.com/diem/dip/blob/7dc44ee57bb7efe76559f05dcc6851d97e2d3149/dips/dip-213.md
[6] Rati Gelashvili, Lefteris Kokoris-Kogias, Alberto Sonnino, Alexander Spiegelman, and Zhuolun Xiang. 2022. "Joleon and Ditto: Network-Adaptive Efficient Consensus with Asynchronous Fallback." In International conference on financial cryptography and data security (FC). https://arxiv.org/abs/2106.10362
[7] Quorum Store: How Consensus Horizontally Scales on the Aptos Blockchain. https://medium.com/aptoslabs/quorum-store-how-consensus-horizontally-scales-on-the-aptos-blockchain-988866f6d5b0
[8] Zhuolun Xiang, Zekun Li, Balaji Arun, Teng Zhang, and Alexander Spiegelman. 202 2025. "Zaptos: Towards Optimal Blockchain Latency." arXiv preprint arXiv:2501.10612. https://arxiv.org/abs/2501.10612
Welcome to join the BlockBeats official community:
Telegram subscription group: https://t.me/theblockbeats
Telegram group: https://t.me/BlockBeats_App
Official Twitter account: https://twitter.com/BlockBeatsAsia





