This is a series of articles about Cairo usually mentioned in the context of web3 and Starknet smart contracts but here we would like to expose it as a language for creating provable and verifiable programs which allows you to outsource the computation in a trustless manner1.
It is a work-in-progress. Updates are expected. Experienced Caironautes are welcome to point out where I am wrong.
Why Cairo when Rust is already so awesome
Let's say, you have a task of finding arbitrage opportunities between different currency markets2. You can model this problem via finding the negative cycle in the graph where vertices are currencies and edges are pair markets. The graph is directed and weighted where weights are logarithms of exchange rates. Bellman-Ford shortest path algorithm is usually applied in this case (we will focus on verifiable computation here and on the implementation in the next article so don't worry about the algorithm now).
The time complexity of the algorithm is O(|V||E|)
(where V
is a set of vertices and E
is a set of edges) so you're going to outsource this computation for large graphs to a cloud provider but due to the money-related nature of the problem you want to make it reliably - you do want to verify the result without re-running the algorithm (i.e. be sure that the provider actually ran the binary with the algorithm). In other words, you want to ensure a computational integrity via a succinct verification of the output3.
Hm... How would we make it?
The conventional thinking would suggest us to use hashing of binaries (actually hashes are the right direction) but wouldn't guarantee the unchanged runtime (memory can be changed), inputs can be changed. So traditional approaches focus on how to protect the binary and input from being tampered in any way. But there is always some room for cheating.
To our relief brainy cryptographers have come up with various proof schemes. In our example, the cloud provider is a prover who runs the expensive computation and delivers the output and the proof4 to the verifier (us). One of the latest state-of-the-art proof schemes is STARK which in layman's (my) terms basically records the computation execution trace which is then randomly sampled by verifier and checked against predefined constraints5.
It is possible to integrate this execution tracing into an existing Rust program using some of general STARK libraries (e.g. Winterfell) but it would require knowing the low-level things very well (see the example for very simple computation) and be sure that all computations are traced (and all constraints are enforced).
Would it be nice to abstract away the low-level complexities and work with a familiar Rust syntax? Welcome to Cairo!
Cairo
Cairo is a language for creating programs which are allow to verify computations without re-running the program. Cairo is heavily inspired by Rust6 (move semantics and ownership, traits, core types, tooling) but due to the nature of verifiable computation the underlying machinery (memory model and a non-deterministic execution) is completely different. Our knowledge of Rust should help to quickly get used to the syntax and control flow but we should be careful when thinking over the computation.
Mind map
Rust | Cairo | Purpose |
---|---|---|
rustc | cairo | compiler |
cargo | scarb | multitool (dependencies, build, test) |
Cargo.toml | Scarb.toml | project configuration |
rustup | asdf | toolchain management |
crates.io | scarbs.xyz | package registry |
Rust Playground | cairovm.codes | online compilation |
Setup
I expect that you have Rust already installed. Let's also install scarb (asdf
is a recommended way) and make a project7:
$ scarb new bellman_ford_cairo
If we open the generated repository (there is the official VS code extension8 to highlight the syntax), we will see the familiar structure.
Input and output
Let think on the input of data. In the context of a pricing data and different currency pairs we can expect a sparse graph. So a lean representation would be to use a vector of tuples Vec<(String, String, f32)>
where strings denote currency names (like USD or EUR) and f32
is an exchange rate. All sounds good except... except some minor things:
- There are no utf8 String's in Cairo - alternatives are short strings (up to 31 ascii bytes) and long strings (byte arrays of ascii bytes).
- There is no builtin dynamically growing modifiable array like
Vec
in Rust (but with Cairo 2.7.0 it is expected). There is an appendable-only (and poppable from the front)Array
which can serve as a FIFO-queue andFelt252Dict<T>
which can serve as a basis to implement library-definedVec
. - There are no floats in Cairo. So either we can revert to library-defined fixed-point numbers or multiply input floats in advance by some integer constant and drop the fractional part.
And we also may stuck how to provide the input as there is no such thing as std wrappers to libc to read stdin or a file. It is like Rust in #[no_std]
. Let's investigate which options we have:
$ scarb cairo-run --help
There is a section Arguments
which says that a program argument is "a JSON array of numbers, decimal bigints or recursive arrays of those" and the main function should have a corresponding signature (there is an example [1, 2, [3, 4, 5]]` to `fn main(t: (u64, u64), v: Array<u64>)
). So let's remove an example code from lib.cairo
and try the following main
function:
fn main(data: Array<(u16, u16, felt252)>) {
println!("number of edges: {}", data.len());
}
Here we encode currency names with u16
(they represent graph edges as a pair of vertices) and felt252
represents an exchange rate (a weight of the edge). Here we should talk a little bit more about the CPU architecture which is expected by Cairo program. The machine word is 252 bits wide and is represented by felt252
type (Field Elemen*t*, because of the underlying math). So every type is Serialization of Cairo types eventually represented via felt252
. So basically the maximum number we can get is 252 bits and if we are going to encode floats by multiplying them with some integer constant, felt252
seems to be a right choice.
Let's compile and run the program (think cargo run
):
$ scarb cairo-run "[[1, 2, 300]]"
Output:
number of edges: 1
Run completed successfully, returning []
It works! Ok, what about an output? returning []
hints that we can try the same way as an input. We expect that the algorithm returns a path corresponding to the arbitrage opportunity. Let's try:
fn main(data: Array<(u16, u16, felt252)>) -> Array<u16> {
println!("number of edges: {}", data.len());
array![1, 2, 3, 4] // path 1 -> 2 -> 3 -> 4
}
The output:
number of edges: 1
Run completed successfully, returning [2837, 2841]
What??? By the way, you may get another numbers as I did in other runs as the execution is non-deterministic. Let's first inspect array!
macro. We see that it is a code-generation plugin to the compiler which basically instantiate an array ArrayTrait::new()
and appends every element to it. Why ArrayTrait
and not just Array
? Digging further leads us to Array
implementation.
We cannot find a familiar trait ArrayTrait<T>
but there is again a macro:
#[generate_trait]
pub impl ArrayImpl<T> of ArrayTrait<T> {
Searching over the codebase of Cairo - and here it is. What does it generate? In Cairo defining methods for a type requires a trait as methods are defined only for a trait. So #[generate_trait]
reduces a boilerplate generating a corresponding trait definition out of the implementation for a type. You can always define a corresponding trait manually (e.g. check the library-implemented Queue). Thus the associated (i.e. doesn't require an instance) method new
is called on ArrayTrait
.
Let's come back to the output. We do specify the output as Array<u16>
. And the output of 2 numbers instead of 4 suggests that we deal with some compact serialization scheme.
Let's test this hypothesis with a bigger int and larger array:
fn main(data: Array<(u16, u16, felt252)>) -> Array<u128> {
println!("number of edges: {}", data.len());
// u128::MAX but there is not MAX const
// for unsigned integers
let elem = 340282366920938463463374607431768211455;
array![elem, elem, elem, elem, elem, elem]
}
It again returns two numbers:
number of edges: 1
Run completed successfully, returning [2845, 2851]
Damn! Notice that 2851 - 2845 = 6
while earlier it was 2841 - 2837 = 4
so it looks like slice bounds as it corresponds to the number of integers we hope to output. Let's find Run completed successfully, returning
in the compiler source code. It seems that scarb
runs this binary. And the output is Vec<Felt252>
.
/// The ran function return value.
#[derive(Debug, Eq, PartialEq, Clone)]
pub enum RunResultValue {
/// Run ended successfully, returning the memory of the non-implicit returns.
Success(Vec<Felt252>),
/// Run panicked, returning the carried error data.
Panic(Vec<Felt252>),
}
"returning the memory of the non-implicit returns"... Here the hint - it's a memory. Cairo's memory model is a contiguous immutable array. Perhaps, these two felt252
numbers denote the start and end indices in the memory? scarb cairo-run --help
suggests an argument --print-full-memory
. Let's try that for the original program:
fn main(data: Array<(u16, u16, felt252)>) -> Array<u16> {
println!("number of edges: {}", data.len());
array![1, 2, 3, 4]
}
and run
$ scarb cairo-run "[[1, 2, 300]]" --print-full-memory
The output
number of edges: 1
Run completed successfully, returning [2837, 2841]
Full memory: [_, 290341444919459839, 1,
...
6744073709483335, 4130, 594, 4166, 500, 1, 2, 300, 49, 1997209042069643135709344952807065910992472029923670688473712229447419591075,
0, 2463311330861459210825190905860592116187541770, 19,
1, 2, 3, 4, ]
Hm... Let's try to use 2837
as an index to this "contiguous" array of memory. Just replace "_" with 0 (it is just to feed it into python) and let's feed the memory array into a python interpreter:
>>> memory = [0, 290341444919459839, 1,
...
6744073709483335, 4130, 594, 4166, 500, 1, 2, 300, 49, 1997209042069643135709344952807065910992472029923670688473712229447419591075,
0, 2463311330861459210825190905860592116187541770, 19,
1, 2, 3, 4]
>>> memory[2837]
1
>>> memory[2838]
2
>>> memory[2839]
3
>>> memory[2840]
4
>>> memory[2841]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
Nice. At least now we understand what these two numbers in the output mean - they denote output start and end indices in a memory array. Ok, but let's remember our initial goal - we need to outsource the computation and then be able to verify it. Perhaps, for that thing there is a more convenient way to get the output?9
Provers
There are several provers for STARKs and some convenient wrappers for Cairo programs. I'm going to use Stone as it is the one used by Starknet. Let's try to prove and verify our program. The readme provides an example.
The prover requires a memory dump, a trace file and an AIR10 input after the execution but cairo-run
cli doesn't expose these arguments.
We haven't yet checked how our compiled "binary" looks like. Let's inspect target
directory. It contains bellman_ford_cairo.sierra.json
. It is our program as a Sierra intermediate representation, think as of LLVM IR. I wouldn't go into any details, just recommend the articles by Mathieu Saugier
We cannot run a json file. It should be compiled to a target:
- cairo assembly (casm) interpreted by a cairo virtual machine
- wasm executed by wasm vm
- native host assembly - see cairo native
If we check what cairo-run
uses, it is a Rust vm. Let's clone the vm repository and run
$ cargo install --bin cairo1-run --path cairo1-run
then we get a cairo vm interpreter and let's run this interpreter in our repository:
$ cairo1-run src/lib.cairo \
--layout=small \
--air_public_input=public_input.json \
--air_private_input=private_input.json \
--trace_file=trace.bin \
--memory_file=memory.bin \
--proof_mode
We need to specify:
- the layout as we need an output
- the public input (our graph) as the provider should prove to verifier that he/she knows an optimal route to the specific graph and not that he/she knows an optimal route to some undisclosed graph. In other words, our input is public.
If we run the command above... we get an error:
thread 'main' panicked at cairo1-run/src/main.rs:181:18:
called `Result::unwrap()` on an `Err` value: Failed to find development corelib.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
So 1) there is some form of linking 2) if there is a development corelib, then there is a release one. There is an instruction in README for scarb projects - we can provide a Sierra program to the interpreter and add
[cairo]
enable-gas = false
to Scarb.toml
11.
So let's try
$ scarb --release build
$ cairo1-run target/release/bellman_ford_cairo.sierra.json \
--layout=small \
--air_public_input=public_input.json \
--air_private_input=private_input.json \
--trace_file=trace.bin \
--memory_file=memory.bin \
--proof_mode
And we get... Error: Runner(MissingMain)
. >_<. Let's try the dev mode again:
$ scarb build
$ cairo1-run target/dev/bellman_ford_cairo.sierra.json \
--layout=small \
--air_public_input=public_input.json \
--air_private_input=private_input.json \
--trace_file=trace.bin \
--memory_file=memory.bin \
--proof_mode
The error Error: IlegalInputValue
. Ah, we should provide an input:
$ cairo1-run target/dev/bellman_ford_cairo.sierra.json \
--layout=small \
--air_public_input=public_input.json \
--air_private_input=private_input.json \
--trace_file=trace.bin \
--memory_file=memory.bin \
--proof_mode \
--args "[[1, 2, 300]]"
Again in the readme "only allowed Array<felt252>
as return and input value in the proof mode". Ok, let's change the code:
fn main(data: Array<felt252>) -> Array<felt252> {
println!("number of edges: {}", data.len());
array![1, 2, 3, 4]
}
$ scarb build
$ cairo1-run target/dev/bellman_ford_cairo.sierra.json \
--layout=small \
--air_public_input=public_input.json \
--air_private_input=private_input.json \
--trace_file=trace.bin \
--memory_file=memory.bin \
--proof_mode \
--args "[1 2 300]"
And it works!!! Hooray!! There two other useful arguments: --print_output
and --args_file
that allow a readable output and a large input graph.
$ cat graph.txt
[1 2 300]
$ cairo1-run target/dev/bellman_ford_cairo.sierra.json \
--layout=small \
--air_public_input=public_input.json \
--air_private_input=private_input.json \
--trace_file=trace.bin \
--memory_file=memory.bin \
--proof_mode \
--args_file graph.txt \
--print_output
[DEBUG] (raw: 1997209042069643135709344952807065910992472029923670688473712229447419591075)
[DEBUG] (raw: 0)
[DEBUG] number of edges: 3
(raw: 2463311330861459210825190905860592116187542282)
[DEBUG] (raw: 19 )
Program Output : [1 2 3 4]
Yeeeeee! Before finishing this section let's fix the release mode. As the dev profile works, it seems that there is a difference in profiles. It seems that sierra-replace-ids affects this.
In Scarb.toml
we have:
[cairo]
enable-gas = false
sierra-replace-ids = true
and
$ scarb --release build
$ cairo1-run target/release/bellman_ford_cairo.sierra.json \
--layout=small \
--air_public_input=public_input.json \
--air_private_input=private_input.json \
--trace_file=trace.bin \
--memory_file=memory.bin \
--proof_mode \
--args_file graph.txt \
--print_output
[DEBUG] (raw: 1997209042069643135709344952807065910992472029923670688473712229447419591075)
[DEBUG] (raw: 0)
[DEBUG] number of edges: 3
(raw: 2463311330861459210825190905860592116187542282)
[DEBUG] (raw: 19 )
Program Output : [1 2 3 4]
The output is the same. Note [DEBUG]
output is related to cairo1-run
. In the repository we should now have files:
memory.bin
private_input.json
public_input.json
trace.bin
It seems that we have all the artifacts to generate a proof.
Proof
I've prepared a ready-to-use (and arm64 compatible!) stone prover with Docker (thus you outsourced me the prover's compilation and I outsourced Docker image building to Github in a trusted way :)).
Then in our repository run:
$ docker run --rm -it -v $(pwd):/tmp maksimryndin/starknet-stone:latest prover \
-logtostderr \
--out_file=/tmp/proof.json \
--private_input_file=/tmp/private_input.json \
--public_input_file=/tmp/public_input.json \
--prover_config_file=/tmp/cpu_air_prover_config.json \
--parameter_file=/tmp/cpu_air_params.json
The files cpu_air_prover_config.json
and cpu_air_params.json
I took from Stone e2e test.
The output:
what(): src/starkware/stark/stark.cc:187: Fri parameters do not match stark degree bound. Expected FRI degree from FriParameters: 8192. STARK: 131072
Stack trace (most recent call last):
<...>
Do you have any idea what it means? Me - no :)). Let's try to explore the source code of the prover. The number 131072
denotes the actual degree - trace_length
attribute of Air
class which seems to be instantiated from the statement. The statement is a combination of public and private inputs, air parameters file. 8192
denotes the expected which is calculated from cpu_air_params.json
with the function GetFriExpectedDegreeBound
. The sample cpu_air_params.json
is:
{
"field": "PrimeField0",
"stark": {
"fri": {
"fri_step_list": [
0,
4,
3
],
"last_layer_degree_bound": 64,
"n_queries": 18,
"proof_of_work_bits": 24
},
"log_n_cosets": 4
},
"use_extension_field": false
}
So GetFriExpectedDegreeBound
calculates it like 8192 = last_layer_degree_bound * 2.pow(0) * 2.pow(4) * 2.pow(3)
or in logarithm base 2 terms: log(trace_length) = log(last_layer_degree_bound) + ∑fri_step_list
(this is almost the note). The error raises on testing an equality between an actual and an expected degrees. To tune the air configuration in a meaningful way we have to learn some theory but for now let's just add an additional FRI step of 4 to hold the equation:
{
"field": "PrimeField0",
"stark": {
"fri": {
"fri_step_list": [
0,
4,
3,
4
],
"last_layer_degree_bound": 64,
"n_queries": 18,
"proof_of_work_bits": 24
},
"log_n_cosets": 4
},
"use_extension_field": false
}
If we run the prover again (make sure to update memory and trace paths in private_input.json
), it outputs:
I0730 10:39:04.829474 1 profiling.cc:58] Prover started
I0730 10:39:05.030071 1 memory_cell.inl:121] Filled 12256 vacant slots in memory: 171 holes and 12085 spares.
I0730 10:39:19.560891 1 stark.cc:425] Trace cells count:
Log number of rows: 17, first trace n_cols: 23, interaction trace n_cols: 2, Total trace cells: 3276800
I0730 10:40:03.643604 1 prover_main_helper_impl.cc:178] Byte count: 88424
Hash count: 1344
Commitment count: 6
Field element count: 1419
Data count: 1
I0730 10:40:03.658634 1 profiling.cc:85] Prover finished in 58.8287 sec
Note that total number of rows (2.pow(17)
) is 131072
. And we can verify the computation:
$ docker run --rm -it -v $(pwd):/tmp maksimryndin/starknet-stone:latest verifier \
-logtostderr \
--in_file=/tmp/proof.json
I0730 11:14:42.393836 1 cpu_air_verifier_main.cc:39] Proof verified successfully.
Conclusion
In this article we learnt about verifiable computations with Cairo, compiled a basic program, explored input/output for Cairo programs, and proved/verified the basic program. As a result of our trial-and-error path, the convenient wrapper tools were born which we will use further. All the source code is in the repository.
Now it's time to digest it and learn some cool things from the references 8 - 12 (the order is a recommended way).
In the next article(s) we will implement Bellman-Ford algorithm, try to break/attack the proof scheme, investigate the time complexity of the verification, an overhead over the Rust version.
References
- Cairo book (accessed in July 2024)
- Introducing Hints in Cairo programming language (accessed in July 2024)
- Cairo0 docs (accessed in July 2024)
- Stwo Prover: The next-gen of STARK scaling is here (accessed in July 2024)
- Under the hood of Cairo 1.0: Exploring Sierra (accessed in July 2024)
- Zero Knowledge Systems You Can Trust (accessed in July 2024)
- Unleashing the power of the Stone Prover (accessed in July 2024)
- STARK Math: The Journey Begins (accessed in July 2024)
- Arithmetization I (accessed in July 2024)
- Arithmetization II (accessed in July 2024)
- Low Degree Testing (accessed in July 2024)
- Anatomy of a STARK (accessed in July 2024)
-
The phrase "reliable outsourcing of computation" credits to Lurk language. Volunteers are invited to create an article "Lurk for List/Clojure devs". It would be interesting to compare both languages for verifiable computation. ↩
-
in these articles we aren't tailored to web3 at all but they are within my ongoing effort to seize arbitrage opportunities in crypto. For those interested there is the Telegram group uniting MEV (re)searchers. ↩
-
Another prominent use case in the context of web3: we can move some computation from onchain to offchain in a trustless manner (Starknet is all around this). ↩
-
A verifier is actually protected against a computationally bound prover so it is not a strict proof, rather an argument of knowledge but I will stick with the word "proof" to avoid over-complication. ↩
-
Besides a computational integrity, there can be another (optional) requirement of zero-knowledge (ZK) when the prover doesn't reveal some information about the input. The rabbit hole is deep that there is no single article - follow the suggested reading. By the way, many STARK-provers still (July 2024) do not implement perfect ZK, e.g. Winterfell, miniSTARK, Stark Platinum Prover. Also refer to the article. ↩
-
Actually, there is Cairo0 which was the first version of a language for STARK-verifiable computation. It is basically an assembly language for STARK CPU. You may still encounter it in some projects. In this article and in general, mentioning Cairo means talking about Cairo1, a Rust-like language. ↩
-
It seems that a separation between binary and library crates in Cairo is via inclusion/exclusion of
main
function inlib.cairo
. ↩ -
Make sure to choose Cairo 1.0 from Starkware Industries. ↩
-
There are also hints which were first introduced in Cairo0 and haven't yet implemented in Cairo1 in the official compiler. And they can possibly allow for more interactions with the outer world. ↩
-
See Stark math articles on Arithmetization. ↩
-
Gas metering should protect a prover from infinite loops as relations between a prover and a verifier are trustless. But for the moment I have no idea why "cairo1-run skips gas checks when running". ↩
Top comments (0)