DEV Community

Cover image for Accountable Privacy in Web3 (3/4)
Manmit Singh
Manmit Singh

Posted on

Accountable Privacy in Web3 (3/4)

Now that we've covered the key cryptographic concepts in Web3, let's try and implement them in code! We'll start the article by setting up an Elliptic Curve, a key primitive in many modern encryption schemes. Then, we will use the curve to code a simple commitment scheme. Finally, we will implement a very simple Merkle tree since it is a crucial technique used by blockchains to record state.

Please remember that none of the implementations below are suitable for production. They are rife with over-simplifications and the code has certainly not been written for efficiency. You will also need to install SageMath, a computer algebra system, to run the following Python code.

Moreover, I would like to recognize the masterful explanations and coding implementations by Emil Bay. They were crucial in helping me learn this content. This post is meant to assist beginners in their educational journey.

1 - Elliptic Curve

An elliptic curve is a type of mathematical curve defined by an equation of the form:

y2=x3+ax+b y^2 = x^3 + ax + b

Elliptic curves are fundamental to Elliptic Curve Cryptography (ECC), which is used by several blockchain systems for signing and securing transactions. For example, Bitcoin uses the secp256k1 elliptic curve, defined by the equation:

y2=x3+7 y^2 = x^3 + 7

Let's set this up as over the same finite field used by Bitcoin, as follows -

# Using same params as Bitcoin
p = 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEFFFFFC2F
F = FiniteField(p)

a = 0
b = 7
E = EllipticCurve(F, [a, b])

# Creating fixed generator element
G_x = 0x79BE667EF9DCBBAC55A06295CE870B07029BFCDB2DCE28D959F2815B16F81798
G = B = E.lift_x(G_x)
Enter fullscreen mode Exit fullscreen mode

Essentially, a user's private key is a randomly chosen 256-bit integer. This number must be chosen from the domain set of the above curve. Once chosen, the number is multiplied by a publicly known generator element G, which is an (x,y) point on the curve. The resulting (x,y) point is the user's public key.

# Numb of valid pts on the curve (order)
N = G.order()
N_field = FiniteField(N)

def keypair (private = None):
    if private is None:
        private = randint(1, N-1)
    public = private * G
    return (public, private)
Enter fullscreen mode Exit fullscreen mode

You might be wondering why someone can't just take the private key, divide it by the generator, and retrieve the private key? The answer - it's just not possible! Yay cryptography. The mathematical algorithm(s) required to calculate the private key, given G and the public key, are computationally infeasible on modern computers.

To learn more, you can read about the mathematics involved here.

Now that we have extracted a public and private key pair from this curve, let's use it to sign an arbitrary message m:

import hashlib

def sign (m: bytes, private: int):
    m_hash = int(hashlib.sha256(m).hexdigest(), 16) % N
    k = randint(1, N-1)
    R = k * G 
    r = int(R.xy()[0]) % N
    if r == 0:
        return sign(m, private)  # Retry if r == 0 (rare case)
    s = (m_hash + private*r) * pow(k, -1, N) % N
    if s == 0:
        return sign(m, private)  # Retry if s == 0 (rare case)
    return (r,s)
Enter fullscreen mode Exit fullscreen mode

Here, we are essentially hashing the message using the collision-resistant SHA256 algorithm, generating a random number k, and then running these values through an algorithm to produce two encrypted outputs. These outputs are then checked against the message, and the signer's public key, to verify the authenticity of the signature.

def verify (sig, m: bytes, public):
    r, s = sig
    if not (0 < r < N and 0 < s < N):
        return False
    m_hash = int(hashlib.sha256(m).hexdigest(), 16) % N
    u1 = m_hash * pow(s, -1, N) % N
    u2 = r * pow(s, -1, N) % N
    R = u1 * G + u2 * public
    v = int(R.xy()[0]) % N
    return v == r
Enter fullscreen mode Exit fullscreen mode

You can also run the following code block to ensure the above implementation actually works as desired:

public_key, private_key = keypair()
message = b"Hello, ECDSA!"
signature = sign(message, private_key)

print("Public Key:", public_key)
print("Signature:", signature)
print("Verification:", verify(signature, message, public_key))
Enter fullscreen mode Exit fullscreen mode

2 - Pedersen Commitment

As covered earlier, the Pedersen commitment is a cryptographic scheme that allows a value to be committed to while keeping it hidden, but still allowing it to be verified later. It's widely used in privacy-focused blockchain applications.

Given an elliptic curve with a generator point G, and a second randomly chosen generator H, a commitment to a value x is computed as:

C=xG+rH C = xG + rH

Here, r is also a randomly chosen integer.

First, let's implement the setup algorithm we covered in the last article. In the setup, we will be creating the second random generator from our secp256k1 curve:

def setup():
    x = randint(1, N-1)
    H = x * G
    return H
Enter fullscreen mode Exit fullscreen mode

Now, let's implement the commitment on an arbitrary message m with a blinding factor t:

def commit(m: bytes, H):
    s = int(hashlib.sha256(m).hexdigest(), 16) % N
    t = randint(1, N - 1)
    C = s*G + t*H
    return (C, t, s) 

def open(C, t, m: bytes, H):
    s = int(hashlib.sha256(m).hexdigest(), 16) % N
    Cprime = s*G + t*H
    return C == Cprime
Enter fullscreen mode Exit fullscreen mode

Moreover, you may remember that Pedersen commitments are additively homomorphic. Let's see what that looks like:

H = setup()

x1 = b"Hello"
x2 = b"World"
C1, t1, s1 = commit(x1, H)
C2, t2, s2 = commit(x2, H)

s_combined = (s1 + s2) % N
t_combined = (t1 + t2) % N
C_combined = s_combined * G + t_combined * H

print("First secret value:", open(C1, t1, x1, H))
print("First secret value:", open(C2, t2, x2, H))
print("Additive homomorphism:", C_combined == C1 + C2)
Enter fullscreen mode Exit fullscreen mode

Great! Now we have the basic primitives (digital signatures, commitments) to implement a very basic non-interactive zero-knowledge proof (NIZK) scheme.

3 - Merkle Tree (Accumulator)

An illustration of a merkle tree

Fig 1: Source: https://en.wikipedia.org/wiki/Merkle_tree

A Merkle tree is a simple data structure where each leaf node represents the hash of a data value, and each node above it is a hash of two lower leaves. The final node at the top of the tree is called the "root", and is generated by recursively hashing all the lower nodes together in pairs. This root, which is essentially a constant-size hexadecimal , serves as a unique commitment to the entire dataset.

Let's start by implementing the hash and tree generator functions -

import hashlib

def hash_data(data):
    return hashlib.sha256(data.encode()).hexdigest()

def merkle_tree(raw_data_list):
    if not raw_data_list:
        raise ValueError("Data list cannot be empty.")

    # Hash all the elements before tree construction
    level = [hash_data(data) for data in raw_data_list]
    tree_levels = [level]

    while len(level) > 1:
        new_level = []
        for i in range(0, len(level), 2):
            if i + 1 < len(level):
                combined_hash = hash_data(level[i] + level[i + 1])
            else:
                combined_hash = hash_data(level[i] + level[i])  # Duplicate if odd
            new_level.append(combined_hash)

        tree_levels.append(new_level) # Store this layer
        level = new_level  # Move to the next level

    return tree_levels, level 
Enter fullscreen mode Exit fullscreen mode

Note that we are not only generating a root, but also creating a nested array to store information on every consecutive layer of the tree. You can see what this looks like for yourself by printing out the following -

# Example usage
data = ["D1", "D2", "D3", "D4"]
tree, root = merkle_tree(data)

print("Merkle Tree:", tree)
print("Merkle Root:", root)
Enter fullscreen mode Exit fullscreen mode

As we had covered in a previous article, data accumulators like Merkle trees are particularly useful because they allow user to efficiently verify that a certain item exists in a data structure, without knowing the entire structure. This is done using something called a Merkle Proof. Let's implement one.

def get_merkle_proof(data_list, target_data):
    """Generates a Merkle proof for a given data item."""
    tree_levels, merkle_root = merkle_tree(data_list)
    hashed_data = hash_data(target_data)

    if hashed_data not in tree_levels[0]:  # Check if data is in tree
        raise ValueError("Target data not found in Merkle tree.")

    index = tree_levels[0].index(hashed_data)  # Find the index of the leaf
    proof_path = []
    proof_order = []  # Store whether the target is on the left or right

    for level in tree_levels[:-1]:  # Exclude root level
        sibling_index = index + 1 if index % 2 == 0 else index - 1
        if sibling_index < len(level):  # Check if sibling exists
            proof_path.append(level[sibling_index])
            proof_order.append(index % 2 == 0)  # True if left, False if right

        index //= 2  # Move up the tree

    return proof_path, proof_order, hashed_data, merkle_root[0]  # Return proof path
Enter fullscreen mode Exit fullscreen mode

This function generates the list of hashes a verifier needs to know in order to verify any one item, target_data in this case. Now this output can be fed to the verifier as follows -

def verify_merkle_proof(leaf_hash, proof_path, proof_order, merkle_root):
    """Verifies a Merkle proof by reconstructing the Merkle root."""
    computed_hash = leaf_hash

    for sibling, is_left in zip(proof_path, proof_order):
        if is_left:
            computed_hash = hash_data(computed_hash + sibling)  # Target was on the left
        else:
            computed_hash = hash_data(sibling + computed_hash)  # Target was on the right

    return computed_hash == merkle_root
Enter fullscreen mode Exit fullscreen mode

All we are doing here is simply retracing the hash path from the leaf to the root of the tree, using only those elements (proof path) that we need to do so. Knowing whether a certain element is on the 'left' or 'right' of the binary tree is crucial in order to correctly regenerate the tree from any given leaf.

target = "D3"

# Generate Merkle proof
proof_path, proof_order, leaf_hash, merkle_root = get_merkle_proof(data, target)

# Verify the proof
is_valid = verify_merkle_proof(leaf_hash, proof_path, proof_order, merkle_root)

# Print results
print("\n--- Merkle Proof Verification ---")
print("Target Leaf Hash:", leaf_hash)
print("Merkle Proof Path:", proof_path)
print("Proof Order:", proof_order)
print("Merkle Root:", merkle_root)
print("Merkle Proof Valid:", is_valid)
Enter fullscreen mode Exit fullscreen mode

Hopefully, when you run this entire code block, you should see that the value "D3" indeed exists in this data structure and can be proven simply by knowing 3, not all 7 elements of the original Merkle tree!

Conclusion

By now, you should be familiar with foundational cryptographic primitives—Elliptic curves, Pedersen commitments, and Merkle trees. Together, these concepts are essential for securing blockchain networks and enabling privacy-preserving applications.

In the next article, we will tie these concepts back to the real-world design space for privacy preserving blockchain applications.

Top comments (0)