Author: RareSkills
The Pedersen commitment allows us to represent arbitrarily large vectors using an elliptic curve point, while optionally hiding any information about the vector. It gives us an important component that allows us to prove the vector encoded by the point without revealing the vector itself.
motivation
When discussing Bulletproot zero-knowledge proof technology, people often say, "We have two vectors, and their inner product is c." This seems very common, but in fact, you can use this mechanism to prove very complex claims. We will expand on this later.
But in order for such a proof system to work, these vectors cannot be known only to the prover - otherwise the prover can change them at will. They must be mathematical entities in the real world. Generally speaking, the prover certainly does not want to hand these two vectors directly to the verifier, but still needs to "pass something" to the verifier to indicate that he has chosen a pair of vectors and cannot change them anymore.
This is where we need Pedersen's commitment.
Prerequisites
We assume that the reader is familiar with elliptic curve point addition and multiplication, and what "a point on the curve" means. If you are not familiar with it, please refer to the first four chapters of our " Zero-Knowledge Proof Book ".
As for notation, we use [A] to represent an elliptic curve (EC) point, a to represent an element in a finite field , and a[A] to represent the point multiplication of the finite field element a and the elliptic curve point [A] . The expression [A] + [B] represents the point addition of these two points.
Traditional commitment scheme
When designing a promise reveal function in a smart contract, we usually use something like this:
commit = hash(value, salt)The "salt" is a random value used to prevent attackers from brute-force guessing. For example, if we are committing to a vote, since the options for the vote are limited, then who we voted for can be found by trying all the options and checking the hash value matches. (With salt, such brute-force searches will not work.)
The use of salt in this scenario has an academic term called " blinding factor". Because it is random, the attacker is like blind and cannot guess what the promised value (the "value" in the above formula) is.
Because an attacker cannot guess the value behind the "promise", we say that this commitment scheme has a hiding effect .
During the reveal phase, the promiser reveals the value and salt (in the literature, they are called " openings "), and the other party (or smart contract) can verify that they match the original promise. If it is impossible to obtain the same promise with another pair (value, salt) in the same commitment scheme, we say that the scheme has a binding effect - once the promiser reveals the promise, it is impossible to change the promised value (that is, the two are bound).
Terminology Summary
- A commitment scheme with a hidden effect does not allow the adversary to know the value chosen by the committer. This is usually achieved by adding a random number that the attacker cannot guess.
- The blinding factor is a random number that makes the promised value impossible to obtain through brute force search. In some scenarios where we don't care about zero-knowledge (but only simplicity), the blinding factor may not be used.
- The opportunity is to calculate the value of the promise (the promised value and the salt).
- A binding commitment scheme does not allow the promisers to compute the same promise using different opportunities. That is, they should not be able to find two pairs
(value, salt)that compute the same promise (in this case, the hash value).
Pedersen Promise
The Pedersen commitment scheme is no different, except that it uses elliptic curve groups instead of cryptographic hash functions.
Under the assumption that “discrete logarithms on elliptic curves are hard”, given elliptic curve points [U] and [G] , we cannot calculate u such that [U] = u[G] .
In Python language code, it is:
from py_ecc.bn128 import G1, multiplyu = 569723450 # chosen randomlyU = multiply(G1, u)print(U, "cannot compute the discrete log of U") We say that u is the discrete logarithm of [U] . Even though we cannot compute u , we still call it the discrete logarithm of [U] because we know it exists. All (cryptographic) elliptic curve points have a discrete logarithm, even though we cannot compute it.
In this sense, elliptic curve point multiplications are like hash functions. They are binding, as long as we only allow opportunities within the order of the curve.
However, although it has a binding effect, we cannot invert its discrete logarithm, and if the range of u is small, we will encounter exactly the same problem as in the voting case above. The adversary can guess our x by traversing all possible x and calculating x[G] .
We can implement "Pedersen-style hiding" in the following way:
commitment = v[G] + s[Q] Here v is the value we want to commit to, s is the salt (blinding factor), and Q is another elliptic curve point whose discrete logarithm is unknown to the committer.
We want to emphasize that, although their discrete logarithms are unknown, [G] and [Q] are public and known to both the verifier and the committer.
Why can't the promiser know the discrete logarithm of [Q]
Assume that the promisor knows the discrete logarithm [Q] = d[G] behind [Q] . This allows the promisor to find opportunities for two people to have the same promise. The principle is as follows.
[C] = v[G] + s[Q] = 10[G] + 15[Q] = 10[G] + 15[dG] [C′] = 11[G] + (15d−1)[G] [C] = [C′] The reader can manually substitute any number for d to see how it works.
The committers cannot know the discrete logarithm relationship of the elliptic curve points they use.
One way to ensure this is to have the verifier provide the promisor with the elliptic curve point. However, a simpler approach is to select an elliptic curve point in a random and transparent way, for example, pseudo-randomly. Given a random elliptic curve point, we do not know its discrete logarithm.
For example, we can start with a generator ( [G] ), hash its x and y coordinates, and then feed it into a pseudo-random but deterministic function to find the next point.
How useful is the Pedersen Promise?
It seems like Pedersen commitments are just regular commitment schemes with another hash function, so why is that useful?
This approach has many benefits.
Homomorphic Additive
Given a generator [G] , we can add two commitments together: a[G] + b[G] = (a + b)[G] . Even if we add random blinding privacy, we can still make a valid opportunity:
$$
C_1 = a[G] + s_1[Q] \
C_2 = b[G] + s_2[Q] \
C_3 = C_1 + C_2 \
Promise Revealed\
(a, b, s_1 + s_2) \
Verifier Check\
C_3 ?= a[G] + b[G] + (s_1 + s_2)[Q]
$$
Conventional cryptographic hash functions (such as SHA256) cannot do this.
(In a Pedersen commitment) b[G] and s[Q] do not "conflict" with each other when added.
Do you think the following equation holds true?
(a[G]+b[H]) + (c[G]+d[H]) = (a+c)[G] + (c+d)[H] (Here [G] and [H] are two elliptic curve points with unknown discrete logarithms).
You can think of elliptic curve points as a basis for a linear combination of orthogonal dimensions.
When there are finite field elements $a_1, a_2, b_1, b_2$, and elliptic curve points [G] and [H] , and [G] is not equal to [H] , $a_1 \ne a_2$, $b_1 \ne b_2$, even if $a_1[G] + b_1[H] = a_2[G] + b_2[H]$, it is still impossible to bypass the discrete logarithm problem and solve $(a_1, a_2, b_1, b_2)$.
Moreover, when the order of our elliptic curve group is large enough, finding such a match by luck is even more unlikely.
In other words, suppose two people compute respective commitments [C] = a[G] + b[H] and [C'] = a'[G] + b'[H] with $a \ne a'$ and $b \ne b'$. As long as the discrete logarithms of [G] and [H] are unknown, it is unlikely that [C] will equal [C'] .
Pedersen commitments are zk-friendly
It is relatively easy to create circuits for addition and multiplication on elliptic curves, since the only requirements are regular arithmetic operations; however, regular hash functions use bitshifting and XOR operations, which require strict constraints to be encoded into zero-knowledge proof circuits.
We can encode any number of points into one point
Our example using [G] and [Q] can also be considered as a 2D vector commitment without a blinding factor. But we can also add any number of elliptic curve points $[G_1, G_2,…, G_n]$ and commit to any number of scalars.
Pedersen vector commitment
We take the above approach a step further and commit to a set of values, rather than a single value and a blinding token.
Vector Commitment Scheme
Assuming we have a set of random elliptic curve points ($[G_1, G_2,…, G_n]$) (whose discrete logarithms are unknown to us, then we can do this:
$$\underbrace{[C] = v_1[G_1] + v_2[G_2] + … + v_n[G_n]} {committed value} + \underbrace{r[Q]} {blinding factor}$$
This allows us to promise many values with a single promise [C] , hiding them with r .
Since the promisers do not know the discrete logarithm behind any of the generators, they will not be able to know the discrete logarithm of [C] . Therefore, this scheme has a binding effect: they can only generate the promise [C] using $[v_1, v_2,…, v_n]$, and cannot do so with another set of vectors.
Vector promises can be aggregated
We can add two Pedersen vector commitments together to get a promise for two vectors. The committer can still reveal it using the original vectors. An important implementation detail here is that the original two-vector commitment must be generated using two different sets of elliptic curve points.
$$
[C_1] = v_1[G_1] + v_2[G_2] + … + v_n[G_n] + r[Q] \
[C_2] = w_1[H_1] + w_2[H_2] + … + w_n[H_n] + s[Q] \
[C_3] = [C_1] + [C_2]
$$
Here, r[Q] and s[Q] are blinding factors. Even if the committer does not have a commitment vector, the entire promise still looks like a random point.
The promiser can then reveal the original vectors $[v_1, v_2,…, v_n]$ and $[w_1, w_2,…, w_n]$, as well as the blinding factor r + s . This also has a binding effect, and the same promise cannot be made with another set of vectors and blinding factors.
The fact that we use $[G_1, G_2,…, G_n]$ for one set of vectors and $[H_1, H_2,…, H_n]$ for another set does not imply any special relationship between these [G_i] generators and [H_i] generators. All these points need to be chosen pseudo-randomly. They are just a coincidence of notations that allows us to directly say "these elliptic curve point vectors go with this set of finite field element vectors, and those elliptic curve point vectors go with those finite field element vectors".
There is no inherent upper limit on the number of vectors we can commit to.
Question for the readers : If we use the same generator $[G_1, G_2,…, G_n]$ for two commitments and then add them together, how can the committer reveal two different sets of vectors for $[C_3]$? For example, can this be avoided by using another set of elliptic curve points $[H_1, H_2,…, H_n]$?
Question for the readers : What happens if the promiser switches the positions of the promised values when revealing them?
For example, he promised:
$$[C_1] = v_1[G_1] + v_2[G_2] + … + v_n[G_n] + r[Q]$$
However, the opportunity for disclosure is:
$$[v_2, v_1, v_3, v_4,…,v_n]$$
That is, the committer swaps the positions of the first two elements, but leaves everything else the same. Assume that the vector $[G_1, G_2,…, G_n]$ is non-commutative.
Generate random points transparently
How can we generate these random elliptic curve points? An obvious solution is to use a trusted bootstrap setup, but this is not necessary. The committer can do this by transparently picking some random points whose discrete logarithm we do not know.
They can first choose a generator [G] , mix it with a publicly chosen random number, and then hash the result (modulo field_modulus) to get another value. If this result can fall on the elliptic curve as an x value, then use it as the next generator and hash its (x, y) coordinate value again. On the other hand, if it does not have a corresponding point on the elliptic curve as an x value, then we increment the x value until it has a corresponding elliptic curve point. Because the committers did not generate these points, they cannot know the discrete logarithm of these points. The implementation details of this algorithm are left as an exercise for the reader.
In any case, you should not generate a point by multiplication, because that would result in us knowing its discrete logarithm. You need to pseudo-randomly pass a hash function to come up with an x value, and then see if it falls on the elliptic curve.
It is also possible to start with the generator [G] (whose discrete logarithm is 1, as we all know), and then generate the other points.
Exercise for the reader : Suppose we commit to a 2D vector using the points [G1] and [G2] . The discrete logarithm of [G1] is known, but the discrete logarithm of [G2] is unknown. (We will ignore the blinding factor for now.) Can the committer reveal the commitment using another 2D vector? Why?
(over)