Sniping the airdrop "wool party": How to use AI technology to find 90% of the witch addresses?

This article is machine translated
Show original
Binance's risk control team, in collaboration with the academic community, proposed a new detection system based on "AI + Blockchain Graph Analysis" for detecting Sybil addresses.

Written by: Nicky, Foresight News

This article is compiled based on the paper "Detecting Sybil Addresses in Blockchain Airdrops: A Subgraph-based Feature Propagation and Fusion Approach"

Recently, Binance's risk control department, in collaboration with the Zand AI department and ZEROBASE, published a paper on Sybil attacks. To help readers quickly understand the core content of the paper, the author has summarized the key points after carefully reading the paper.

In cryptocurrency airdrop activities, there is always a group of special players operating in the shadows. They are not ordinary users, but rather use automated scripts to create hundreds or even thousands of fake addresses - these are the notorious "Sybil addresses". These addresses parasitically attach to airdrop activities of well-known projects like Starknet and LayerZero. They consume project budgets, dilute real user rewards, and fundamentally undermine the fairness of blockchain.

Facing this ongoing technological cat-and-mouse game, Binance's risk control team, in collaboration with academic institutions, developed an AI detection system called "Subgraph-based lightGBM", which achieved a 90% accuracy rate in identifying Sybil addresses in real data tests.

The Three "Identification Cards" of Sybil Addresses

Why can these cheating addresses be precisely located? The research team, by analyzing transaction records of 193,701 real addresses (of which 23,240 were confirmed as Sybil addresses), discovered three types of behavioral traces:

Time fingerprint is the primary vulnerability. Sybil addresses have an eerie "precisely timed" characteristic: from first receiving gas fees to completing the first transaction and participating in the airdrop, these key steps are typically completed in an extremely short time. In contrast, real users' operation times are randomly distributed, as no one would create an address specifically for one airdrop and then immediately abandon it.

Fund trajectory reveals the economic motivation. These addresses always maintain a balance just "enough to get by": slightly higher than the minimum amount required for the airdrop (to save on funding costs), and quickly transferred out once rewards are received. More obviously, when operating in batches, their transfer amounts show high consistency, unlike real user transactions that have natural variations.

Relationship network becomes the ultimate evidence. By constructing a transaction graph, the team observed three typical topological structures:

  • Star network: A "command center" distributes funds to dozens of sub-addresses.

  • Chain structure: Funds are passed like a relay baton between addresses to forge active records.

  • Tree-like diffusion: Using multi-layer branch structures to attempt to evade detection.

These patterns expose the collaborative nature of programmatic operations, which are also the most difficult features for traditional detection methods to mimic.

Two-Layer Relationship Network: AI Detective's Crime-Solving Tool

Tracking transaction data across the entire blockchain is like finding a needle in a haystack. The research team used a two-layer transaction subgraph model - like a detective investigating not just the target individual (Address A), but also their direct contacts (addresses that transferred to A, addresses A transferred to) and the connections of these contacts (second-degree relationships).

More importantly, they created an innovative "feature fusion technique": the system aggregates the behavioral characteristics of neighboring addresses into a target address's "behavioral profile". For example, calculating the minimum, maximum, average, and volatility of transfer amounts for all associated addresses to form a composite indicator describing fund flow patterns; or calculating the in-degree and out-degree (number of associated addresses) to judge network density. This design allows the system to remain efficient when analyzing over 5.8 million transactions, avoiding the computational disaster of traditional methods tracking network-wide data.

Practical Test: Capturing "Ghosts" in Binance's Airdrop

This system was tested in the real airdrop data of Binance's Soul-Bound Token (BAB). BAB, launched by Binance in 2022, is used to verify the identity of KYC-completed real users, making it an ideal testing ground for detecting Sybil behaviors.

The team first manually analyzed and clustered suspicious addresses, establishing an appeal review mechanism to confirm the final Sybil address labels. When cleaning the data, they excluded institutional addresses (such as exchange hot wallets), smart contracts, and addresses older than 1 year (Sybil addresses often abandon old addresses to avoid detection), ensuring the purity of the dataset.

The results showed high precision in identifying three types of cheating networks:

  • Star network identification rate of 99% (previous methods' maximum was 95%)

  • Chain structure identification rate of 100% (previous methods' maximum was 95%)

  • Tree-like diffusion identification rate of 97% (previous methods' maximum was 95%)

All four core indicators broke through 0.9: precision reached 0.943 (previous best model was 0.796), recall reached 0.918 (meaning over 91% of Sybil addresses were captured), F1 comprehensive score reached 0.930, and AUC value reached 0.981 (near-perfect classification). This means project parties can significantly reduce the risk of harming real users while blocking cheating loopholes.

Technical Boundaries and Future Battlefields

The current technology is mainly applicable to long-term airdrop scenarios (such as phased soul-bound token distribution), as these activities can accumulate sufficient labeled data for AI learning. In terms of blockchain compatibility, it supports Ethereum Virtual Machine (EVM) compatible chains (such as BNB Chain, Polygon), and is not currently suitable for UTXO model chains like Bitcoin, though the paper notes that high gas costs make airdrops rarely conducted on UTXO chains, with limited practical impact.

The research team emphasizes that the potential of this technology extends far beyond the airdrop domain. By identifying abnormalities through transaction networks and behavioral patterns, it can also be applied to:

  • Detecting market manipulation behaviors (such as coordinated addresses in pump and dump schemes).

  • Assessing token liquidity risks (identifying fake trading pairs).

  • Constructing on-chain credit scoring systems.

As Sybil attack strategies continue to evolve, this technological arms race to protect Web3 fairness will drive detection systems to develop towards more intelligent and universal directions.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments