Dragonfly Partner: Application of “Don’t Trust, Verify” in Decentralized Reasoning

This article is machine translated
Show original

Blockchain and machine learning clearly have a lot in common.

Written by Haseeb Qureshi

Compiled by: TechFlow TechFlow

You want to run a large language model like Llama 2–70B. Such a large model requires over 140GB of memory, which means you cannot run the original model on a home computer. So what are your options? You might turn to a cloud service provider, but you might be less willing to trust a single centralized company to handle this workload for you and collect all of your usage data. What you need then is decentralized inference, which allows you to run machine learning models without being tied to any single provider.

trust issues

In a decentralized network, it is not enough to just run the model and trust the output. Suppose I ask the network to analyze a governance dilemma using Llama2–70B, how do I know that it didn't actually use Llama2-13B, provide me with a worse analysis, and pocket the difference?

In a centralized world, you might trust companies like OpenAI to be honest because their reputation is at stake (and to some extent, the quality of LLM is self-evident). But in a decentralized world, honesty is not the default, it needs to be verified.

This is where verifiable inference comes into play. In addition to providing a response to the query, you also need to prove that it runs correctly on the model you requested. But how?

The simplest way is to run the model on-chain as a smart contract. This certainly guarantees that the output is verified, but it's extremely impractical. GPT-3 uses an embedding of dimension 12,288 to represent words. If you did a single matrix multiplication of this size on-chain, it would cost about $10 billion based on current gas prices, and this calculation would fill every block for about a month.

So, we need to take a different approach.

After observing the field, I clearly see three main approaches to solving verifiable inference: zero-knowledge proofs, optimistic fraud proofs, and cryptoeconomics. Each method has its own safety and cost implications.

1. Zero-knowledge proof (ZK ML)

Imagine being able to prove that you ran a large model, but the size of the proof was effectively fixed, no matter how big the model was. This is what ZK ML (machine learning) promises, achieved through ZK-SNARK.

While it sounds elegant in principle, compiling a deep neural network into a zero-knowledge circuit, and then proving it, is extremely difficult. And it's extremely expensive, at least, you're probably looking at a 1000x increase in the cost of inference and a 1000x increase in latency (the time it takes to generate the proof), not to mention compiling the model itself into a circuit before anything happens. Ultimately, this cost has to be passed on to the user, so it becomes very expensive for the end user.

On the other hand, this is the only method that is cryptographically guaranteed to be correct. With ZK, no matter how hard the model provider tries, there is no way to cheat. But the cost of doing so is high, making this impractical for large models for the foreseeable future.

Examples: EZKL, Modulus Labs, Giza

2. Optimistic Fraud Proof (Optimistic ML)

The optimistic approach is to believe, but verify. We assume that the inference is correct unless proven otherwise. If a node attempts to cheat, "observers" can point out the cheater in the network and challenge it using fraud proofs. These observers must watch the chain at all times and rerun their own models to ensure the output is correct.

These fraud proofs are interactive challenges in the style of Truebit: Response Games, in which you repeatedly split model execution traces on-chain until you find a bug.

If this did happen, it would be extremely expensive because these programs are huge and have huge internal states, with a single GPT-3 inference costing about 1 petaflop (10⁵ floating point operations). But game theory shows that this is almost impossible to happen (fraud proofs are also very difficult to write correctly when coding, because this code will almost never be executed in actual production).

The upside of optimism is that ML is safe as long as an honest observer is paying attention. The cost is cheaper than ZK ML, but keep in mind that every observer in the network is re-running every query. In equilibrium, this means that if there are 10 observers, then the security cost must be passed on to the users, so they will have to pay more than 10 times the cost of inference (or as many as there are observers).

The disadvantage is that, as with the optimistic aggregation technique, you must wait for the challenge period to end to ensure that the response has been verified. However, depending on how your network parameters are set up, you may only have to wait a few minutes rather than a few days.

Example: Ora, Gensyn

3. Cryptoeconomics (Cryptoeconomic ML)

Here, we ditch all the fancy techniques and do something simple: stake-weighted voting. The user decides how many nodes should run their query, they each reveal their responses, and if there are differences between responses, the odd node is chopped off. Standard oracle mechanism, this is a more straightforward approach that lets users set their desired level of security, balancing cost and trust. If Chainlink were doing ML, this is how they would do it.

The latency here is fast, you only need commit and reveal for each node. If this is written to the blockchain, then technically this can happen in two blocks.

However, the security is the weakest. If a majority of nodes are willing to cooperate, they can rationally choose to collude. As a user, you have to think about how much these nodes have invested and how much cheating will cost them. That is, using Eigenlayer-like restaking and attributable security, the network can provide effective insurance in the event of security failure.

But the nice thing about this system is that users can specify how much security they want. They can choose to have 3 nodes or 5 nodes in their quorum, or every node in the network. Or, if they want to take risks, they can even choose n=1. The cost function here is simple: users pay for the amount of quorum they want in their quorum. If you choose 3, you pay 3x the inference cost.

The tough question here is: can you make n=1 safe? In a simple implementation, an isolated node should cheat every time if no one is watching. But I suspect that if you encrypt the query and pay via intent, you might be able to hide from the nodes that they are actually the only ones responding to this task. In this case, you might be able to charge the average user less than 2x the cost of inference.

Ultimately, the cryptoeconomic approach is the simplest, easiest, and probably cheapest, but in principle it is the least compelling and least secure. But as always, the devil is in the details.

Examples: Ritual, Atoma Network

Why verifiable ML is hard

You might be thinking, why don’t we already have all these things? After all, at the end of the day, ML models are just very large computer programs. Proving that a program is executed correctly has always been the core of blockchain.

That’s why these three verification methods reflect how a blockchain secures its block space, with ZK rollups using ZK proofs, optimistic rollups using fraud proofs, and most L1 blockchains using cryptoeconomics. No doubt we will end up with essentially the same solution. So what makes this difficult when applied to ML?

ML is unique in that ML computations are typically represented as dense computational graphs designed to run efficiently onGPUs . They are not designed to be proven. So if you want to prove ML computations in ZK or optimistic environments, they have to be recompiled into a workable format, which is very complex and expensive.

The second fundamental difficulty with machine learning is nondeterminism. Program verification assumes that the output of a program is deterministic. However, if you run the same model on a different GPU architecture or CUDA version, you will get different outputs. Even if you force every node to use the same architecture, you will still encounter the problem of randomness used in the algorithm (noise in diffusion models, or token sampling in LLM). You can fix this randomness by controlling the random number seed. But even so, you're still faced with one last troubling problem: the inherent non-determinism in floating-point arithmetic.

Almost all GPU operations are performed on floating point numbers. Floating-point numbers are difficult to work with because they are not associative—that is, it is not true that (a + b) + c is always the same as a + (b + c) for floating-point numbers. Since GPUs are highly parallelized, the order of additions or multiplications may be different on each execution, which may cause small differences in the output. This is unlikely to affect the output of the LLM due to the discrete nature of words, but for image models it may cause pixel values ​​to be subtly different so that the two images do not exactly match.

This means you either need to avoid using floating point numbers, which will take a huge performance hit, or you need to allow some flexibility when comparing outputs. Either way, the details are fiddly and you can't quite abstract them away. (This is why the Ethereum Virtual Machine does not support floating point numbers, although some blockchains such as NEAR do.)

In short, decentralizing inference networks is hard because all the details matter, and there are surprisingly many details in reality.

Summarize

Currently, blockchain and machine learning clearly have a lot in common. One is a technology that creates trust, the other is a technology that desperately requires trust. While each approach to decentralized reasoning has its own trade-offs, I'm very interested to see how entrepreneurs leverage these tools to build the best possible networks.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
1
Add to Favorites
Comments