SeeDAO | Introduction to Zero-Knowledge Machine Learning (ZKML)

05-21

This article is machine translated

Show original

Zero-knowledge machine learning (ZKML) is a recent area of research and development that has been making a lot of noise in the cryptography community. But what is it? And why is it useful? First, let’s break the term into two parts, zero-knowledge (ZK) and machine learning (ML), and explain what each is.

Zero-knowledge proof is a type of cryptographic protocol in which one party (the prover) can prove to another party (the verifier) that a given proposition is true without revealing any additional information. This field has made significant progress in research, protocol implementation, and application.

The two main "primitives" (components) of zero-knowledge enable the creation of proofs of computational integrity for a given set of computations that are much easier to verify than to perform the computations themselves (we call this "succinctness"). Zero-knowledge proofs also offer the possibility of hiding parts of the computational process while guaranteeing its correctness (we call this "zero knowledge").

Generating zero-knowledge proofs is computationally expensive, many times more expensive than the original computation. This means that for some computations, generating zero-knowledge proofs is infeasible because it would take an impractical amount of time on the best hardware. However, recent advances in cryptography, hardware, and distributed systems have made more intensive zero-knowledge proof computations feasible. These advances make it possible to build protocols that use computationally intensive proofs, expanding the design space for new applications.

Zero-knowledge cryptography is one of the most popular technologies in Web3 because it allows developers to build scalable and private applications. As zero-knowledge technology continues to mature, there will likely be a Cambrian explosion of new applications because the tools to build zero-knowledge applications will require less domain expertise and will be easier for developers to use. Here are some real-world application examples (many projects are under development and have not yet been completed).

ZK rollup for Ethereum expansion

Distributed systems like public chains have limited computing power because all participating nodes (computers) must verify the calculations in each block by themselves. With ZK proofs, we can perform these calculations off-chain and generate ZK proofs, and then verify the proofs on-chain, thereby achieving scalability without sacrificing decentralization and security. Example applications:

• Starknet has launched mainnet, using its original ZK-STARK proof system. Compared with ZK-SNARK, ZK-STARK is more secure, can resist attacks from quantum computers, and does not require trusted settings when the system is initially set up; ZK-STARK has a higher overhead, and the proof size and time are longer. When the number of transactions increases, the average cost per transaction will decrease.

• During the Scroll testnet phase, we are committed to creating a ZK rollup that is equivalent to EVM (Ethereum Virtual Machine), which is more compatible with EVM than other Ethereum L2 scaling solutions.

• Polygon zkEVM has been launched on the mainnet and is one of the Ethereum L2 expansion solutions, developed based on ZK-STARK. Polygon Miden is under development, aiming to be a high-throughput privacy public chain.

• zkSync has been launched on the mainnet and is currently the most active zero-knowledge proof L2 blockchain. It currently uses the ZK-SNARK proof system and will migrate to ZK-STARK in the future.

Building privacy-preserving applications

The property of zero-knowledge proof allows some of the computations in the proof process to be hidden, which is very useful for building applications that create cryptographic proofs to protect user privacy and personal data. Example applications:

• Semaphore

• MACI

• Penumbra

• Aztec Network is building a private scaling solution (ZK Rollup) for Ethereum where users’ balances and transactions are completely hidden from external observers.

Identity and data proof

WorldID Worldcoin is building WorldID, a privacy-preserving proof-of-personhood protocol. It allows anyone with a WorldID to issue a cryptographic statement that they are a unique human being and have not performed any actions (like signing up for a social network) without revealing their identity.

• Sismo

• Clique

• Axiom

L1 Blockchain

Since zero-knowledge proofs can help outsource computations and ensure the privacy of computations, it is possible to create private/succinct (small proof size, easy to verify) L1 blockchains. Example applications:

• Zcash

• Mina

Machine learning is a subfield of artificial intelligence that involves the development and application of algorithms that allow computers to learn and adapt on their own from data, optimizing their performance through continuous iteration. Large language models, such as GPT-4 and Bard, are state-of-the-art natural language processing systems that generate human-like text using massive amounts of training data, while text-to-graph models, such as DALL-E 2, Midjourney, and Stable Diffusion, can transform text descriptions into realistic visual representations. Rapid advances in machine learning techniques are expected to solve complex challenges in fields as diverse as healthcare, finance, and transportation, improving decision making and optimizing outcomes by leveraging data-driven insights and predictions. As these models become more sophisticated, they are expected to revolutionize many industries and change the way we live, work, and interact with technology.

In a world where AI-generated content increasingly looks like human-created content, potential applications of zero-knowledge cryptography can help us determine that a particular content was produced by applying a particular model to a particular input. If zero-knowledge circuits are created for large language models (like GPT4), Graph models (like DALL-E 2), or any other model, the outputs of these models can be verified. The zero-knowledge property of these proofs will also allow us to hide inputs or parts of the model when needed. A good example is applying a machine learning model on some sensitive data, where the user gets the model's inference results without revealing their inputs to any third party (for example in the healthcare industry).

Note: When we talk about zero-knowledge machine learning, we are referring to creating zero-knowledge proofs for the inference step of a machine learning model, not the training step (which itself is computationally expensive).

Current state-of-the-art zero-knowledge systems coupled with high-performance hardware are still several orders of magnitude behind the ability to prove currently available large language models ("LLMs"), but there has been some progress in proving smaller models.

We examine the state-of-the-art in zero-knowledge cryptography for creating proofs for machine learning models and bring together relevant research, articles, applications, and code repositories. Resources on zero-knowledge machine learning can be found in the awesome-zkml repository of the ZKML community on GitHub.

The Modulus Labs team recently published a paper titled "The Price of Intelligence" in which they benchmarked existing zero-knowledge proof systems for models of various sizes. Currently, using a proof system like plonky2, it takes about 50 seconds to create a proof for a model with about 18 million parameters on a powerful AWS machine. The following figure shows the difference in running time of different proof systems as the number of neural network parameters increases:

Source: The Price of Intelligence: Proving Machine Learning Inference with Zero-Knowledge Proofs Modulus Labs

Zkonduit’s ezkl library is another initiative working to improve zero-knowledge machine learning systems, which allows you to create zero-knowledge proofs for machine learning models exported using ONNX. This enables any machine learning engineer to create zero-knowledge proofs for the inference steps of their model and prove the output to any verifier.

Several teams are working to improve zero-knowledge technology and optimize hardware to speed up the computation of zero-knowledge proofs, especially for resource-intensive tasks such as prover and verifier algorithms. As zero-knowledge technology matures, it will be possible to prove larger models in less time on less powerful machines, thanks to improvements in specialized hardware, proof system architecture (proof size, verification time, proof generation time, etc.), and more powerful zero-knowledge protocol implementations. We hope that these advances will lead to new zero-knowledge machine learning applications and use cases.

To determine whether zero-knowledge machine learning is appropriate for a particular application, we can use a Venn diagram to compare the characteristics of zero-knowledge cryptography to the use case requirements.

A Venn diagram explaining how ZK fits together with ML and other technologies

Heuristic algorithms: Use rules of thumb or "heuristics" to find good solutions to problems that are difficult to solve with traditional optimization methods. Rather than trying to find the optimal solution to a problem, heuristic optimization methods try to find a good or "good enough" solution in a reasonable amount of time, based on the problem's relative importance to the overall system and the difficulty of optimization.

Fully homomorphic encryption machine learning: Fully homomorphic encryption allows developers to perform computations on encrypted data, and the decrypted result will be the output of the computation performed on the original unencrypted input. It can perform model reasoning in a privacy-preserving manner (complete data privacy, unlike zero-knowledge machine learning, where the prover needs to have access to all data); however, it cannot cryptographically prove the correctness of the computation performed, as zero-knowledge proofs can. For example, the Zama team is creating a fully homomorphic encryption machine learning framework called Concrete ML.

Zero-knowledge proof vs. validity proof: These two terms are often used interchangeably in the industry, as validity proof is a zero-knowledge proof that does not hide part of the computation or its results. In terms of zero-knowledge machine learning, most current applications are leveraging the validity proof aspect of zero-knowledge proofs.

Efficient Machine Learning: SNARK/STARK proofs of machine learning models where all computations are publicly visible to validators. Any validator can prove the computational correctness of a machine learning model.

Zero-knowledge machine learning: Zero-knowledge proofs of machine learning models where the computation is hidden from the verifier (using the zero-knowledge property). The prover can prove the correctness of the computation of the machine learning model without revealing any other information.

Specific use cases

Computational Completeness (Effective Machine Learning)

Validity proofs (SNARKs/STARKs) can be used to prove the correctness of certain computations, in the context of machine learning, we are proving that a machine learning model infers or that a model creates certain outputs using specific inputs.

For example, Modulus Labs, a startup focused on zero-knowledge machine learning, is building these use cases:

• On-chain verifiable machine learning trading robot RockyBot

• Self-evolving blockchain (example):

• Adding smart features to Lyra Finance’s options automated market maker protocol

Creating a transparent AI-based reputation system for Astraly (zero-knowledge oracle)

• Researching technical breakthroughs to enable machine learning-based contract-level compliance tools for Aztec Protocol (zk-rollup with privacy features)

Ability to easily prove and verify that output is produced by a given model and input. This allows machine learning models to be run off-chain on specialized hardware, and their zero-knowledge proofs can be easily verified on-chain. For example, Giza is helping Yearn (DeFi yield aggregation protocol) prove that some complex yield strategies using machine learning are running correctly on-chain.

Machine Learning as a Service (MLaaS) Transparency

When different companies provide machine learning model services through their APIs, it is indeed difficult for users to know whether the service provider really provides the model they say they are providing, because the API is a black box. Machine learning model APIs that come with proof of validity will help provide transparency to users, as users can verify which model they are using.

Zero-knowledge anomaly and fraud detection

Create zero-knowledge proofs of exploitability or fraud. Anomaly detection models can be trained on smart contract data and used as interesting indicators by decentralized autonomous organization (DAO) agreements to automatically enforce security procedures, such as more proactive and preventive suspension of contract operations. There are some startups working on using machine learning models for security analysis on smart contracts, and zero-knowledge anomaly detection proofs may be the next step.

Privacy (Zero-knowledge Machine Learning)

In addition to validity proofs, we can also hide some computational details to support privacy-preserving machine learning applications. Here are a few examples:

• Decentralized Kaggle: Prove that the model’s accuracy on some test data is greater than x% without revealing the model’s weights.

• Privacy-preserving model inference: Medical diagnosis results of patients’ private data are fed into the model, and sensitive model inferences (such as cancer detection results) are sent to the patient. (Source: vCNN paper, page 2/16)

• Worldcoin

Potential use cases for zero-knowledge machine learning in Worldcoin

One potential use of zero-knowledge machine learning in Worldcoin is an upgradeable iris code. World ID users can self-custody their signed biometric information in encrypted storage on their mobile device, download the machine learning model used to generate the iris code, and locally create a zero-knowledge proof that their iris code was indeed generated from the signature image using the correct model. Because the smart contract is able to verify the zero-knowledge proof and prove the generation of the iris code, the iris code can be inserted into the set of registered Worldcoin users without permission. This means that if Worldcoin upgrades the algorithm for creating iris codes, thereby breaking compatibility with previous iterations, users do not have to re-use the Orb (a dedicated device for iris scanning) and can simply upgrade on their local device.

Author: worldcoin
Translation｜Backdoor
Proofreading｜Johnny Jiang
Typography | Bo
Audit｜Bo

Original link 👇

https://worldcoin.org/blog/engineering/intro-to-zkml

SeeDAO is a digital city-state based on blockchain. It is a decentralized digital network (SeeDAO Network) and physical locations (Seeshore) mapped around the world, which are jointly built, governed and shared by SeeDAO members. Currently, it has formed several guilds such as the Translation Guild, Investment Research Guild, and R&D Guild, with more than 10,000 members. It aims to help the birth of high-quality Web3 projects from multiple aspects such as education, information, and activities.

Official website: https://seedao.xyz
Chinese Twitter: https://twitter.com/see_dao
Global Twitter: https://twitter.com/en_SeeDAO
Discord: https://discord.com/invite/seedao-xyz
Notion: https://seedao.notion.site
Mirror: https://seedao.mirror.xyz
Telegram: https://t.me/theseedao

Add the assistant WeChat to enter the SeeDAO Newbie Camp: seedao2023

Mirror

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content

CoinDesk

Five data sources say the same thing about bitcoin market. It's thinning from the inside

BTC

0.62%

MarsBit

Anthropic officially blocks OpenClaw, causing a global developer crisis within 24 hours.

ANTHROPIC

5.62%

Coinpedia

Over 20 Crypto Projects Shut Down in Q1 2026