Summary of the first interruption of Sui mainnet

avatar
Chainfeeds
a day ago
This article is machine translated
Show original

Chainfeeds Summary:

When issues occurred, the Sui engineering team quickly diagnosed the problem and released a fix, which was then deployed by the validator nodes, minimizing the network downtime.

Source:

https://mp.weixin.qq.com/s/6ycV6FKCL26Qu3NiM29jRw

Author:

Sui


Perspective:

Sui: The object-oriented architecture of the Sui network supports large-scale parallel processing of different user transactions, which is not achievable in most other networks. However, if multiple transactions are writing to the same shared object simultaneously, these transactions must be executed in sequence, and there is an upper limit on the transaction processing volume involving that specific object. The throttling control system prevents the network from being overloaded by limiting the rate of transactions writing to the same shared object. We recently upgraded the throttling control system to improve the utilization of shared objects by more accurately estimating transaction complexity. However, there was a bug in the code of the new TotalGasBudgetWithCap mode, which led to this issue. After the issue was identified, the code fix was straightforward (see PR #20365). The fix has been deployed to the mainnet (v1.37.4) and testnet (v1.38.1). With the active response from the validator community, the Sui network was restored to normal operation within 15 minutes from the release of the fix. Through this process, we learned: 1) The incident detection and response system worked well: the automated alerts and community reports were triggered almost simultaneously, and we quickly mobilized the team to diagnose and fix the issue. 2) The validator community performed excellently: the Sui network was restored to normal operation almost immediately after the fix was released. Preventive measures include: 1) Improving the testing system: adding more adversarial transaction types similar to those that triggered this crash to uncover potential issues; 2) Optimizing the build process: increasing the speed of debugging and releasing binary files to further reduce the response time. Part of the downtime during this incident was due to the wait for the build release.

Source

https://chainfeeds.substack.com

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments