Tracing bad funds through shielded pools

Tracing bad funds through shielded pools

I want to explain an unhinged idea for tracing bad funds through a shielded pool, whilst keeping as much as possible private.

Logline

Dr Evil stole one-hundred billion dollars and fed it into a privacy protocol.

Somehow, the theft and the deposits went unnoticed for a while, and the deposit delay window has now passed. The money’s in the system. Private transfers ensue. No one knows good notes from bad notes. Times are bad.

Eventually, someone realises the theft and sounds the alarm.

How do users identify contaminated notes in their balance? How do we maximize privacy? How can we identify contaminated withdrawals (and should we)?

Headline aim: Enable a note owner to identify how much of their note is descended from “bad” deposits, without leaking anything else to anyone.

Super high-level

A user has a note. The note contains encrypted information that they cannot decrypt, about the deposits which contributed towards their note.

notes[i][j] = {owner,amount,randomness,deposits: [{<unintelligible ciphertext>,<unintelligible ciphertext>,},{<unintelligible ciphertext>,<unintelligible ciphertext>,},{<unintelligible ciphertext>,<unintelligible ciphertext>,},{<unintelligible ciphertext>,<unintelligible ciphertext>,},...],}

Everything is nice and private.

If, some time later, a deposit is blacklisted, and if this note is a descendant of that “bad” deposit, then the user will suddenly learn this information:

notes[i][j] = {owner,amount,randomness,deposits: [{<unintelligible ciphertext>,<unintelligible ciphertext>,},{<unintelligible ciphertext>,<unintelligible ciphertext>,},{The bad deposit_id,Some fraction to deduce how much of that 'bad' depositis contained within this note,},{<unintelligible ciphertext>,<unintelligible ciphertext>,},...],}

The user can then execute a dramatically-named “decontamination circuit” to create a new note which does not contain this bad deposit_id.

Caveats

No one’s read this yet.

I’m not really advocating for this approach, but it’s a fun design which attempts to solve a problem that I haven’t seen get solved before.

I can think of some strong criticisms, which are laced throughout this doc (and summarised at the end).

It might just serve as a reference for posterity.

Hopefully it’s an interesting read :slight_smile:

Intro

If you want a really eloquent intro to this topic, read sections I and II of the Privacy Pools paper.

Here’s my tl;dr:

Some people would prefer privacy when transacting on a blockchain, so teams are exploring how to enable private blockchain transactions. A common pattern is as follows:

A user can deposit their tokens into a so-called “Shielded Token Pool”. Some time later, they can withdraw their tokens to another address. The aim of such protocols is to break the observable link between the deposit address and the withdrawal address.

Some protocols also enable private transfers of the tokens that have been deposited into the pool, whereby observers won’t be able to deduce who sent what to whom. It’s these deposit-transfer-withdraw protocols that I find most interesting.

A problem with all of these pools is that bad actors can also use them, and indeed there have been instances of bad actors sending ill-gotten funds into such pools. Once deposited, it becomes harder to trace the bad deposits, and it becomes harder to distinguish good withdrawals from bad withdrawals.

And so, various projects have sought to resolve this problem.

Approaches include:

  • Identifying bad deposits before they can enter the pool, by introducing a deposit delay.
  • Limiting the amount of money that can be deposited into the pool over some time period.
  • Only allowing deposits from reputable actors.
  • Enabling a user to provide proofs of disassociation from bad actors, or proofs of association with only good actors.
  • Encrypting the details of every transaction to a 3rd party (such as a regulator or auditor or accountant), or sharing viewing keys with a 3rd party.
  • Relaxing some privacy guarantees to enable some degree of traceability of all funds through the pool.

These approaches all have interesting trade-offs which we’ll touch on below.

The idea I’ve been toying with is as follows:

When a user is sent a note, it would contain a list of re-randomized encryptions of all upstream deposit_ids that contributed any value towards that note, along with encryptions of how much value each of those deposits contributed towards the note.

Under ideal circumstances, a user’s note wouldn’t be the descendant of any “bad” deposits, and in such cases they should have privacy comparable to zcash: the user wouldn’t be able to learn which deposits have contributed towards their note.

If a deposit is publicly deemed “bad”, however, then the design enables all downstream note owners to be able to learn how much of that bad deposit has flowed into their note. Crucially, no further information is leaked to a note owner beyond “X amount of my note is considered to have originated from this bad deposit_id”. Observers of the pool should not learn any information.

Ideally, a bad deposit would be identified and flagged as “bad” before it can enter the system, e.g. through some delay period. This way, it wouldn’t have the chance to “corrupt” honest users’ funds. But what if the delay isn’t long enough? Or what if due dilligence missed something? I thought it’d be interesting to explore how to identify bad funds that manage to slip through such a deposit-delay net, without leaking the transaction graph of users.

Edit after writing: Railgun actually experienced this problem recently: https://x.com/dethective/status/1994397800847589736?s=20

We’ll hit some scalability problems along the way, and explore how to mitigate those.


Background

Here’s a quick reminder of utxo-based privacy protocols, if only to align on terminology.

You can skip to the next section “Prior Art”, if you want.

Deposits

Conceptually, a deposit looks something like this:

A user deposits some public tokens into the system, and a note containing the deposited amount and an owner (an address or some public key) is created. The note serves as a representation of the user’s ownership of the underlying tokens, which are held by the system in a pool until the eventual owner chooses to withdraw or transfer some balance.

If you want more detail, here’s a more detailed diagram which illustrates a deposit process. Different implementations differ around the edges in many ways, but the crux is this.

Transfers

Conceptually, a transfer looks something like this:

A sender spends (or “nullifies”) two of their notes, creates a note for the transfer recipient, and creates a change note for themselves. Two notes are destroyed, two nullifiers are emitted, and two new notes are created and emitted.

Am I going to be bold enough to not explain what a nullifier is? I think I am!

Ah, actually, I’ll need to say the following about nullifiers because it’ll help me criticise an approach in the next section:

Nullifiers are used in private payment protocols to break the link between the tx which created a note and the tx which later spends that note. Without this “tx unlinkability” property that nullifiers provide, the whole transaction graph is visible to observers, and users can glean information that oughtn’t be leaked.

For the rest of this doc I’ll ignore nullifiers from diagrams, because they just cause clutter. If you see a note being fed into a circuit, assume it is being nullified.
Also, I’ll ignore the change note from diagrams to reduce clutter:

De-cluttered: no nullifiers are shown; no change notes are shown.

Withdrawals

Conceptually, a withdrawal looks something like this:

A user burns a note containing some value, and the system – which holds all publicly deposited funds whilst users transact privately within the system – transfers that value to the user’s specified address.


Prior art

As is traditional in papers, before we can dive into the detail of the new idea, we have to brutally criticise all existing ideas so as to justify this document’s existence.

I’ll be gentle though, because:

  • Existing ideas are actually pretty good, and have been turned into actual working products with actual pmf. That’s awesome.
  • You never know when you might end up working with someone in future;
  • My idea might contain bugs;
  • Even if the idea isn’t buggy, I can think of some brutal criticisms which I’ll share as we go, and which you might already be thinking.

Stay with me.

All of the products seek some form of compliant private tokens. Some are simple deposit-withdraw schemes, and some are deposit-transfer-withdraw schemes. They all essentially follow the approach outlined in the “Background” section, but with different approaches to cope with bad actors.

Payy

When you say the name of this product verbally, you have to say “Pay with two y’s” or you’ll be met with blank stares. Or you have to say “Payyyyyyy” and really stress the pronunciation.

Payy is a deposit-transfer-withdraw protocol.

Payy takes a very pragmatic and simple approach to cope with bad actors: they do away with nullifiers from their protocol:

Source

Harking back to the basics of shielded protocols, nullifiers provide “tx unlinkability”. Without nullifiers, the input notes to a transfer tx are publicly visible. I.e. the output note of every tx can later be observed as the input note to some other tx. This means on Payy the entire transaction graph is observable.

If a bad deposit enters the system, you can trace exactly where that money flowed, and you can see which withdrawals from the system are descended from it. The recipient of a note can immediately identify that a note is downstream of a bad deposit, although they won’t know how much of that deposit contributed to their note.

So Payy sacrifices some privacy to enable traceability of bad actors. Let’s talk about the extent of the privacy sacrifice.

Off the top of my head, here are the privacy leakages that Payy’s design introduces:

The first transfer after a deposit is publicly identifiable as coming from the depositor. I.e. the sender is leaked.

The last transfer before a withdrawal is publicly identifiable as being to the withdrawer. I.e. the recipient is leaked.

If A deposits, then A → B (read: “A transfers to B”), then B withdraws, the world sees that A transacted with B. The world might even learn the exact amount that A sent to B if there were no other transactions in that subgraph.

If E → B → C → E, then E learns that B and C transacted. E learns that B knows C. That’s an unexpectedly large leakage. Love affairs have probably come to light with that much leakage.

If E is a large entity such as an exchange, they might be able to infer a significant amount of the transaction graph from this kind of leakage. They could piece together who knows who, and potentially deduce the amounts being paid between participants.

Those are leakages that I’m seeing from 5 minutes thinking about it. Some advanced team of nerds might be able to infer more.

Railgun

Railgun is a deposit-transfer-withdraw protocol. Very nice.

There is a one-hour delay on all deposits into the system, so that ‘bad’ deposits may be rejected.

Once in the system, money can be privately transferred between users.

The problem – and it’s the main problem prompted me to explore and write this entire post – is that one hour isn’t a very long time, and it’s technically possible that a bad actor could ‘slip through the net’ and deposit ‘bad’ funds into the pool. Once in the pool, there would be no way to catch them.

Edit after writing all this: It happened!!!
https://x.com/dethective/status/1994397800847589736?s=20
If only I’d finished this doc before it happened. I’d have been seen as prescient.

Anyway, I set out to solve the problem of tracing ‘bad’ funds through a pool which enables private transfers, even if we don’t know the funds are ‘bad’ when they enter the pool, whilst preserving as much privacy as possible for all users. I’ll be more rigid about requirements after I mention a few more projects.

EYBlockchain

Back in 2019, I was in a small research team that explored this problem: How to design a privacy pool where bad deposits could be traced – all whilst ensuring good privacy.

I recall for a while we went down the obvious – and today well-known – dead end of ascribing a deposit_id to every newly deposited note. Then, with each transfer, the output note would contain the union of all deposit_ids of all the input notes. This leads to an exponentially growing list, which ~doubles in size with each transfer. It’s not a sustainable approach. I don’t think we published anything.

Actually, I did do some public talks in 2019 which solved the easier problem of tracing a stolen NFT through a shielded pool of NFTs. The non-fungibility of NFT transfers means you don’t get the exponentially growing list problem.

There was a lot of ahead-of-its-time research in that small team, but we didn’t publish the half-baked ideas, which is a shame in hindsight.

However… see later in this doc where we try to resolve that problem by occasionally bootstrapping (resetting) that list.

A common criticism of any approach that tracks deposit_ids is that notes cease to be fungible: the deposit_ids serve as markers on the notes. People will be able to see patterns in the notes they send and receive.

A trivial leakage is Eve->Bob->Eve, or Eve->Bob->Charlie->Eve. Or even Eve->…->Eve. Eve will be able to spot the common deposit_ids in the notes she sent and the notes she received, and she might be able to infer information from that. E.g. in the case of E->B->C->E, Eve can infer that Bob knows Charlie, and that Bob sent Charlie money.

This doc tries to solve this problem – of deposit_ids serving as markers – by re-randomising the list of deposit_ids with every transfer. This way, Eve would not be able to identify any commonality between the notes she sent and the notes she later receives.

Privacy Pools

The Privacy Pools product is a deposit-withdraw protocol.

My first criticism would be “It doesn’t support transfers”, but the Privacy Pools paper does talk about an approach that would enable private transfers.

Edit after chatting with them at devconnect: v2 of Privacy Pools apparently will enable private transfers, but it’s different from the approach explained in the privacy pools paper. Here’s my understanding of what they’re building for v2:

For every deposit, the resulting note will contain a deposit_id. Transfers can happen within the system, but all the input notes must have the same deposit_id, and all output notes are given that same deposit_id.

It’s certainly a neat way of avoiding the “exponentially growing list” problem: a note will always contain 1 deposit_id.

A problem with this approach might be as follows. If a user needs to send 100 ETH, they will need two notes with the same deposit_id whose values sum to >= 100. But what if they don’t have two notes with the same deposit_id? They’d need to send multiple transactions; one for each distinct deposit_id of the notes they intend to spend.

I suspect with sufficient deposit activity, it will become uncommon for a user to receive notes that contain the same deposit_ids. If that were true, it would therefore be uncommon for a user to join together two notes (since they must share the same deposit_id). And I suspect this could lead to a system which tends towards “dust”; where notes contain tiny and impractical values, since “split” operations would far exceed “join” operations.
Perhaps users could interact: a would-be recipient could request notes containing certain deposit_ids from a sender; a bit like the card game Go Fish: “Got any deposit_id = 1234?”.

The proposal in this post seeks to enable deposit_ids to be intermingled.

Another criticism of this v2 approach is – as mentioned in the previous subsection also – that deposit_ids serve as markers which spoil fungibility. This doc seeks to solve this problem by re-randomising the deposit_ids with every transfer.

My second (weaker) criticism of privacy pools is that there’s a variable-length delay before a deposit is accepted into the pool. Maybe that’s fine, but since I wanted to explore what happens if bad funds get into the pool, I don’t want the answer to be “just wait longer to perform more due diligence”. I’ll claim that maybe the due diligence process could make mistakes, and we’d like to be able to retroactively flag bad deposits as bad even if they’re accidentally let in.

The approach described in the privacy pools paper is as follows:

A criticism would be that there’s arguably too much information leakage; information that shielded pools traditionally like to keep private. Lots of private data is being shared between users, here, until the deposits are finally whitelisted.

Blanksquare

You probably won’t believe me, but I started writing the ideas of this post in September/October. I just got a bit busy with other things, like my day job, a delightful family holiday, and devconnect.

Then, whilst I was a +1 at a wedding on Saturday (late November), I got sent this link in a Signal group:

It suggests encrypting deposit_ids – an idea which also features this doc. Great minds!
Anyway, I’m not copying; they’re independent realisations of the same thing. I have timestamped receipts!

This doc expands on that idea much further, so I hope you’ll find it interesting. It also plugs some of the criticisms of that idea, which we’ll touch on now.

Briefly, the idea was for some entity to generate a new keypair for every deposit into the system. Each deposit_id could then be encrypted with its corresponding public key. Any later, corresponding withdrawal from the system would be forced to signal the same encrypted deposit_id (which would be salted so that the withdrawal signal looks unrelated from the deposit). On the happy path, observers would never learn that the withdrawal relates to the deposit. But if a deposit is later deemed ‘bad’, the secret key could be revealed by the holder of the decryption keys, which would enable all observers to decrypt the encrypted deposit_id of both the deposit and withdrawal transactions, and ultimately link the withdrawal to the deposit.

Problems with this approach:

The holder of the decryption keys is a point of failure for the system. If the keys are leaked (intentionally or otherwise), then people will be able to link all deposits and withdrawals of all users. There are suggestions to use TEEs or to secret-share the decryption keys through a collective of MPC nodes. Certainly, secret-sharing of decryption keys (and of other secrets) is a common approach that you’ll find in FHE and CoSnark settings, so some might consider this acceptable.

Vitalik certainly doesn’t like the notion of “rugpulling privacy”:

The approach I suggest in this post still has a controversial MPC collective who generate keys, but I try to greatly stem the worst-case leakage in the event that all of the decryption keys are somehow leaked.

With the Blanksquare approach:

  • What is leaked, in the worst case?
    • Every withdrawal is linkable to every deposit.
  • Who is it leaked to?
    • Everyone in the world (every blockchain observer).

With the approach in this doc:

  • What is leaked, in the worst case?
    • The deposit_ids that contributed to a given user’s note.
    • The amount of each deposit_id that “contributed to” the note.
  • Who is it leaked to?
    • Only to the owner of the note.

The leakage is localised.

Not terrible, eh?

At least we’re not rugpulling privacy to the extent that would make Vitalik sad (I hope).

In the worst case, the protocol in this doc collapses to the approach where a note owner can see the deposit_ids that contributed to their note. It’s not ideal (see earlier sections which complain about deposit_ids serving as non-fungible “markers” on notes), but it felt like an advancement worth describing to you all.

In the happy case, privacy is actually quite nice.

The idea

Finally we can discuss the idea.

Requirements

What features is it seeking:

  • A shielded pool that supports private transfers.
  • On the happy path, deposit_ids (within notes) are randomised, to prevent advanced note owners from using them as “markers” to spot patterns in activity.
  • If a bad deposit is accidentally allowed into the pool, it can be later flagged as ‘bad’, and all users of the pool can identify: whether their note contains any ‘bad’ value, and how much of their note’s value is deemed ‘bad’.
  • Even in the worst case (of leakage of this protocol’s decryption keys), blockchain observers do not learn whose notes contain ‘bad’ value.
  • Avoiding retroactive privacy rugpulls of historic withdrawals: This system can be designed to not leak information for withdrawals that already took place, even in the worst case of the system’s decryption keys being leaked.
  • When performing a transfer or withdrawal, a user can prove that their input notes are not descended from ‘bad’ deposits.
  • If a user’s note is descended from a bad deposit, the user can ‘decontaminate’ their note by splitting it into ‘good’ and ‘bad’ notes.
  • I make no promises relating to constraint counts :slight_smile:

High level

The low-level section is next

The basis is a simple deposit-transfer-withdraw protocol.

A blacklist of bad deposit_ids is maintained by some Blacklist Maintainer.

There can be a delay period before deposits can enter the set, but we’re interested in machinery to catch deposits that slip through such a delay period.

In a huge preprocessing step, a long list of deposit_id keypairs is generated: one for every future deposit into the protocol.

[(x_1, Y_1), (x_2, Y_2), ..., (x_{1000000}, Y_{1000000})][(x1,Y1),(x2,Y2),...,(x1000000,Y1000000)]

where the x_ixi's are secret keys and the Y_iYi's are public keys.

The list can be periodically replenished whenever it runs low.

The list could be generated by a single actor, but you probably won’t like the sound of that very much, so let’s say the list is generated by a large MPC collective. Let’s call them the Key Holder(s). The Key Holders could be the same as the Blacklist Maintainer, or they might be different entities. We’ll come back to this worrying topic soon.

When a new deposit ii is made, it will be publicly associated with public key Y_iYi.

A pre-generated list enables a user to deposit non-interactively into the protocol without waiting for the Key Holders to generate and publish a new public key.

We will encrypt information to this public key Y_iYi. Specifically, every note that is “downstream” of this deposit will will contain:

  • An encryption of the deposit_id ii, encrypted to Y_iYi;
  • An encryption of the fraction of that deposit that contributed towards this note, encrypted to Y_iYi.

We’ll go slowly through the technical details in a later section.

Since this system supports transfers, a note might be the descendant of multiple deposits. That’s ok: the note will contain the list of all upstream deposits. Astute readers will be notice that such a list would grow exponentially and unsustainably – doubling with each transfer. We’ll come back to this later.

Under normal circumstances, the owner of a downstream note will not be able to decrypt the encrypted deposit information that is contained within their note, because nobody (hopefully not even the collective of Key Holders) knows the decryption key x_ixi, and so they won’t know which deposits have flowed into their note. The aim is for the privacy of the system to be as close to zcash as possible, where observers don’t know what’s happening inside the pool, the tx graph is not known, value is fungible, and the owners of notes can’t trace their notes’ lineages.

If deposit ii is later deemed to be “bad”, then the Blacklist Maintainer will flag deposit ii onchain. Upon seeing ii get blacklisted, the Key Holder(s) are tasked with revealing the corresponding secret key x_ixi onchain. If the Key Holders are an MPC collective, they’ll need to work together to learn what x_ixi actually is, e.g. through some threshold scheme.

This publication of ii and x_ixi will trigger a flurry of activity: All note owners in the entire system can attempt to decrypt all of the encrypted deposit_ids that are contained within their notes. If their note is a descendant of deposit ii, then they will successfully decrypt one of their note’s encrypted deposit_id fields to yield ii.

Let’s suppose their note does contain “bad” funds from deposit ii. Uh oh. To learn how much, the revealed x_ixi can also be used to decrypt the corresponding “encrypted fraction” that exists within their note. This will reveal what fraction of the original deposit is deemed to be contained within their note.

A new and dramatically-named “decontamination circuit” will enable the user to “carve out” the bad funds from their note.

It’s important to note that a note owner won’t be able to decrypt any of the other encrypted deposit_id fields of their note. All that will have been leaked to the note owner is:

“My note is ‘downstream’ of deposit ii; the protocol considers my note to contain X amount from that bad deposit”.

Or, more succinctly:

“X amount of my note is considered to have originated from bad deposit ii”.

The note owner won’t learn any other information about their note’s lineage, including who the value passed-through before it reached them.

Outside observers won’t learn which notes are contaminated, and won’t learn meaningful information about the tx graph.

By this point, there are probably some burning criticisms raging in your head. Let’s list some:

  • If the Key Holders collude, they could learn the x_ixi's.
  • The Blacklist Maintainer might turn evil and censor people by adding them to the blacklist.
  • The Key Holders might not do their job of revealing x_ixi once ii is blacklisted.
  • It doesn’t seem fair for an honest user to receive a note without being able to see which deposits have contributed to it, only to later discover (upon a blacklisting event) that their note is actually contaminated with some “bad” value.
  • How does the protocol determine what proportion of a deposit flows into each note?
  • If you track a list of deposits, then it will double with every ‘transfer’ tx. That’s not sustainable.

We’ll talk about all that soon.

Step by step

Setup

In a huge preprocessing step, a long list of deposit_id keypairs is generated: one for every future deposit into the protocol.

[(x_1, Y_1), (x_2, Y_2), ..., (x_{1000000}, Y_{1000000})][(x1,Y1),(x2,Y2),...,(x1000000,Y1000000)]

where the x_ixi's are secret keys and the Y_iYi's are public keys.

The list can be periodically replenished whenever it runs low.

Slightly more formally (but still being pretty loose with notation):

G \in \mathbb{G}GG a generator point.

x_i \in \mathbb{F}xiF a secret key for deposit_id ii, in the scalar field of the group \mathbb{G}G.

Y_i := x_i \cdot GYi:=xiG the corresponding public key for deposit_id ii.

Deposit

A user deposits some amount as part of deposit ii. This deposit is associated with public key Y_iYi.

A new note is created, containing this data:

notes[i][1] = {owner,amount,randomness,deposits: [{encrypted_deposit_id = "i",encrypted_fraction = "1",},{},...{}],}

where:

we’ll sometimes use the notation "x" to lazily mean “the encryption of the value x”.

encrypted_deposit_id = "i" = \text{enc}_{Y_i}(i)encYi(i), an encryption of the deposit_id ii, encrypted to the public key Y_iYi.
This encryption scheme must support re-randomisation of ciphertexts. Elgamal is a good choice (see the Appx).

encrypted_fraction = "1" = \text{enc}_{Y_i}(1)encYi(1), an encryption of the fraction of this deposit that has flowed into this note, which is initially an encryption of 11.
This encryption scheme must support scalar-multiplication of a ciphertext by a known scalar. We’ll explain why in a sec. Paillier encryption is a cool choice, but it’s quite inefficient within a snark circuit (see the Appx).

DIAGRAM TIME!

enc_dep_id_i_j is the “encrypted deposit_id” of deposit i after the jth hop (transfer) from the deposit.
enc_frac_i_j is the “encrypted fraction” of the value of deposit i that has contributed to this note (after the jth hop (transfer) from the deposit).
Users won’t actually see j; it’s just to help diagram readers.
Paillier encryption isn’t set in stone; there are apparently much more-efficient alternatives for multiplication of a ciphertext by a known scalar.
Recall, I’m using quotes for legibility and laziness: "x" means “the encryption of x”.

Transfer

Example

A private transfer destroys two notes, and creates one for the recipient and a change note for the sender. (We’ll ignore the change note for brevity).

Suppose our original depositor of 5 ETH wants to send 1.5 ETH to someone.

Suppose – along with their 5 ETH note – they have a 10 ETH note.

They can nullify those two notes – collectively worth 15 ETH – and create a new note worth 1.5 ETH.

How much of the 5-ETH-deposit and how much of the 10-ETH-deposit ended up inside the new 1.5-ETH note?

It’s actually difficult to say how much of a deposit contributes to a downstream note, given the fluidity of money. It’s like trying to figure out how much of a tomato contributed to a particular spoonful of soup.

But there is a mathematically-natural approach one can take, and that’s to deem the contributions to be proportionate to the input notes.

In our example, we can deem that 5 * \frac{1.5}{15} = 0.551.515=0.5 came from the 5-ETH-deposit, and 10 * \frac{1.5}{15} = 1101.515=1 came from the 10-ETH-deposit. In other words, we can apply a fraction of \frac{15}{150}15150 to both of the input deposits.

In theory, a user could attribute contributions differently: e.g. allocating all of a particular deposit_id into their change note, and none of it into their intended recipient’s note. But since our protocol hides which deposit_ids are contained within a note (in the case that the deposit_id is ‘good’), there’s really no point; the user wouldn’t know which deposit_ids they’re even allocating! And in the event that a deposit_id is learned to be ‘bad’, there’s a separate decontamination circuit that can separate-out the ‘bad’ value.

This diagram illustrates how fractions can be applied to the deposit_ids that contributed to a particular note:

Recall, I’m using quotes for legibility and laziness: "x" means “the encryption of x”.

We can consider a subsequent transfer which takes as input this 1.5-ETH note. Suppose we’re transferring 20 ETH, and the other input note has 100 ETH in value.

Then the next fraction to apply to all deposits that contributed to the 20-ETH output note is: \frac{200}{1015}2001015.

Notation within a diagram is always cumbersome.

You can see I got lazy with the notation within that 100-ETH note. E.g. enc_frac_3_j = "prod(f_3_j)" is notation to mean “the encryption of the fraction of deposit 3 that contributed towards this note”. And we’ll lazily say there were j transfers since deposit 3 to get to this note".

If you look at the 20-ETH note, it should start to show the pattern of the deposit data that we’re tracking.

The data being tracked for the 5-ETH deposit is:

deposits: [(enc_dep_id = "1"enc_frac_1_3 = "1 * (15/150) * (200/1015)",),

Now, the owner of this 20-ETH note can’t actually decrypt these ciphertexts enc_dep_id_1_3 and enc_frac_1_3, because they don’t know the decryption key x_1x1.

But if suddenly the Blacklist Maintainer were to blacklist deposit 11, and if the Key Holder(s) did their job of then deriving and publishing x_1x1, then the holder of this 20-ETH note would be able to decrypt ciphertexts enc_dep_id_1_3 and enc_frac_1_3. (Note: they would still not be able to decrypt the other ciphertexts that are stored within their note, because they each are encrypted to different public keys).

The note owner would learn “Oh look, I was able to use the newly-published x_1x1 to decrypt enc_dep_id_1_3 to yield 11, implying that my note is ‘contaminated’ with some value from deposit 11. I’ll now decrypt enc_frac_1_3 to yield the fraction 1 * (15/150) * (200/1015). I know how much was deposited as part of deposit 1 (because deposit amounts are public): it was 5 ETH. So I’ll compute 5 * (1 * (15/150) * (200/1015)) = 0.098522 ETH. Therefore the protocol deems 0.098522 ETH of my note to be from the ‘bad’ deposit 11.”

Ignoring any criticisms in your head: that’s pretty cool, right? A load of gibberish, unintelligible, encrypted data can exist within hundreds of notes in the network, without leaking the transaction graph, and then suddently a blacklisting of a deposit can reveal a very thin slice of useful information, and only to the owners of downstream notes.

Paillier

We can encrypt the fraction using Paillier encryption, so that we can repeatedly multiply the resulting ciphertext by a new fraction with every future transfer transaction, without ever learning what the resulting fraction is.

I.e. if I give you enc(m)enc(m), you can compute enc(km)enc(km) for a known scalar kk, without learning what mm is.

See the Appendix for a huge explanation of Paillier encryption. Independently of the rest of this doc, it’s really cool. Apparently there are much more efficient approaches to achieve this “scalar multiplication” property within a snark using FHE schemes. Interested readers can explore those :slight_smile:

Looking back at our ongoing example, we started with:

enc(1)enc(1)

The next transfer circuit scalar-multiplied this ciphertext by a known scalar 15/15015/150 (represented as an integer rounded to 6d.p. as 15/150 * 10^6 = 100,00015/150106=100,000). If using Paillier, the operation would be enc(1)^{100000} = enc(1 * 100000)enc(1)100000=enc(1100000). The recipient of the note won’t know which deposit_id(s) flowed into the note, nor will they learn what fraction of each deposit_id that contributed to their note: they’ll just see the newly computed ciphertext.

The next transfer circuit scalar-multiplied this ciphertext by a known scalar 200/1015200/1015 (represented as an integer rounded to 6d.p. as 200/1015 * 10^6 = 197,044200/1015106=197,044. If using Paillier, the operation would be enc(1 * 100000)^{197044} = enc(1 * 100000 * 197044)enc(1100000)197044=enc(1100000197044).

When decrypted, the true fraction can be derived as 1 * 100000 * 197044 / (10^6)^2 = 0.01970441100000197044/(106)2=0.0197044

If the decryption key is ever revealed, the user will be able to decrypt this fraction and apply it to the original deposit amount (in our example 5 ETH) to discover that their note is deemed to contain 0.0197044 * 5 = 0.0985220.01970445=0.098522 ETH.

Some comments on the Example

Some comments before we go further.

Fractions → Decimals

In the diagrams, we were tracking encrypted fractions, but we don’t actually want to do that for two reasons:

Encrypting a fraction, and iteratively multiplying-in new fractions, would require both the numerator and denominator to be encrypted separately (at least with the encryption schemes I’m familiar with), and that’s a waste of constraints.

The numerator and denominator of a fraction are leaky:
The numerator can be factorised to infer the possible amounts that were transferred with each ‘transfer’ tx since the deposit. The denominator can be factorised to infer the sizes of the notes that were held by each intermediate note owner since the deposit.

Instead, we can convert each fraction into a decimal, which can be (crucially and necessarily) rounded to a certain number of decimal places. The rounding reduces the amount that can be inferred through factorising (especially if users make interesting choices of transfer amounts, such that the resulting fraction does incur rounding).

Why are the deposit_ids being encrypted at all?

Suppose Eve sends to Bob, who sends to Eve (E → B → E). If the deposit_ids aren’t encrypted, then Eve would notice the common deposit_ids between the note she sent and the note she received.

Possibly worse, suppose Eve sends to Bob, who sends to Charlie, who sends to Eve (E → B → C → E). If the deposit_ids aren’t encrypted, then Eve would notice the common deposit_ids of the note she sent to Bob and the note that Charlie sent to her. Eve might deduce things like “Bob knows Charlie” and “Bob sent money to Charlie” and possibly even the amounts that they sent to each other. That’s leakiness that we can avoid by re-encrypting the deposit_ids.

But if the encrypted deposit_ids don’t change between transfers, then those ciphertexts effectively become alternative identifiers for the deposits.

That’s why we re-randomise the encrypted deposit_ids with every transfer. This way, the list of deposit_ids is a different list of random-looking numbers in every single note! It’s why Elgamal is a nice choice of encryption scheme. See the Appx for how Elgamal and re-randomisation work.

Why are the fractions (or decimals) being encrypted at all?

It’s similar to why the deposit_ids are encrypted.

An entity who appears multiple times in the tx graph might spot patterns in the fractions contained within their notes, and piece-together the tx graph, or make inferences similar to the above sections.

It’s bad that the user got rugged of 0.098522 ETH, in the example

In the example, a portion 0.098522 ETH of the user’s note was deemed ‘bad’, some time after the user came into possession of the note. The user is honest and wasn’t aware of the ‘badness’ of the note when they received it.

Ideally, the blacklisting would happen before any transfers happen in the system, e.g. during a deposit delay period. In such cases, the secret key x_ixi for the bad deposit would already be known to all users before any transfers can occur. If the baddy tried to transfer money to an honest user, that user would be able to tell straight away.

Also, there’s an interesting rabbit hole of what should happen in the real world analogue of th

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments