跳到主要内容

4 篇博文 含有标签「project」

查看所有标签

PSE Core Program 2024 Capstone Project

· 阅读需 7 分钟
Jimmy Chu
Site Author

PSE Core Program

In the past two months, I got into PSE Core Program and have intensively studied zero-knowledge cryptography. In the first five weeks, we have a curated list of materials to study, ranging from cryptographic primitives such as Merkle tree, symmetric and asymmetric encryption to going more in-depth about various zk-SNARK protocols such as Groth16 and PLONK, and finally being introduced to the frontier of cryptography, or so they are called, including multi-party computation (MPC) and fully homomorphic encryption (FHE).

In the last three weeks of the program, we are open to scoping and working on our project to put the previous learning into practice. We can either tackle issues from various PSE projects or develop our idea. I decided to build a dApp end-to-end, meaning:

  1. Having a zk-SNARK component generating proofs on the client side (front end),
  2. Having a smart contract component that verifies the proof on-chain and
  3. Building the front end that interacts with users and connects the above two components.

This way, I will be able to see how they connect and how each component interfaces with the others. So let this be the goal!

Capstone Project: Number Guessing Game 🎮

I decided to build a Number Guessing Game. In this game, the goal of each player is to guess the closest to the mean of submissions from all participating players. Each game is divided into multiple rounds. Players first commit to a guess between 1 and 100. The guess value and a randomly generated value, aka nullifier, are processed through a zk-SNARK circuit on the client side, and the commitment [Poseidon(guess, nullifier), Poseidon(nullifier)] and proof are then submitted on-chain.

After all players have submitted their commitments and proofs, they open and reveal their commitments. Then, the game host will conclude the round. The mean of all guesses will be computed (on-chain), and the player who has submitted the guess closest to the mean will win the round. The player who wins three rounds first wins the game.

The following is the game flow:

Game Flowchart

  • Alice (the first player) initiates a game and becomes the game host.
  • In this phase, Other players can join the game. In this case, Bob, Charlie, and Dave join.
  • Alice starts the game, and the first game round begins. No more players will be able to join.
  • Players submit their commitment on-chain.
  • Once all players finish submitting their commitments. They open the commitment. After all players open their commitments, the game host concludes the round. The mean of all submissions is computed, and the round winner is determined.
  • If the winner has won three rounds, the game ends. Otherwise, another round begins.

The project is finally built, and you can try it out at:

You can see the demo of a game walkthrough below:

Some Technicalities ⚙️

  1. The zero-knowledge part kicks in when a player submits a guess on the chain. A 128-bit random value, a.k.a. nullifier, is generated. The guess and nullifier are passed into a wasm zk-circuit with a return value of [Poseidon(guess, nullifier), Poseidon(nullifier)]. The circuit performs a range check that the guess is between 1 and 100.

    We need a nullifier here to expand the preimage domain. We know the guess is a value between 1 and 100. Suppose we store Poseidon(guess) as a commitment. In that case, an attacker/cheater can Poseidon hash all values between 1 to 100 and compare the digest with the one on-chain and be able to figure out what other players' preimages (their guess value) are. So we add a 128-bit nullifier to expand the preimage domain.

  2. This is how the player commitment is stored on-chain.

    Inspecting on-chain storage on commitment

  3. On the front end, upon a round conclusion, only the winning guess is revealed but not the actual mean. This gives the game a "fog of truth" element and will make the game a bit more interesting. But the player openings are actually stored as plain numbers on-chain. So, tech-savvy players can still inspect the smart contract storage to compute the mean. I would love to explore further ZK protocols (multi-party computation, maybe?) that could be used even to hide these openings and be able to compute the mean and the distances between users' guesses and the mean. This will then truly make this game zero-knowledge!

  4. UX matters a lot. The following is my development time allocation for this project.

    • 15% on circuit development, thanks to circomkit library that eases my development workflow.
    • 30% on smart contract development and proof parameter interfacing. The time spent on writing the smart contract logic and comprehensive test cases is about 50-50.
    • 45% is spent on front-end development. The front end involves interacting with various components, e.g., user wallets, loading wasm and generating proofs, transforming proofs to a format appropriate for on-chain submission, etc.

    Many details on the front end must be done right to deliver a great UX. Right now, there is still a lot of room for improvement: e.g., the wallet display on the navbar keeps reloading when navigating within the site, and the page has to be manually refreshed to get the latest game status instead of subscribing to the smart contract events. I truly appreciate front-end engineers who build a great user experience on a Web3 app or web app.

Standing on the Shoulders of Many Giants

Standing on the Shoulder of Giant

In the past eight weeks I have been truly standing on the shoulders of, not one, but many giants enabling me to deliver this capstone project.

I am grateful PSE Team has selected me for their Core Program and has an excellent curation of study materials. One can find a lot of materials on zero-knowledge cryptography on the internet and be easily overwhelmed by them. The curated study materials, with mentors and tutors always available to answer your questions and teammates holding regular weekly study groups/presentations, definitely helped push me through the study barrier.

Kudo to the Semaphore project team too. Studying the source code of the Semaphore Protocol and its Boilerplate has significantly contributed to this project. In fact, my project is a fork of the Boilerplate template, which is a well-architected demonstration of how a React (Next.js) front-end interfaces with the zk circuit and smart contract. I have also learned a lot of coding and optimization tricks from them.

The project has been made feasible with countless libraries that I used. For example, circom, Circomkit, snark.js, Hardhat, Next.js, Wagmi and Viem, Web3Modal etc, just to name a few. Of course, I also deployed it to the Optimism ecosystem.

Finally, I stand in awe when looking at all the researches that have gone through on pushing the boundary of zero-knowledge protocols and the field of programmable cryptography.

This is just the beginning of my BUIDLing journey, and hopefully I will share more projects with you soon.

Semaphore - part 1

· 阅读需 3 分钟
Jimmy Chu
Site Author

I have been studying about zero-knowledge proof in the past eight weeks. It is hard to grasp how a software project could integrate zero-knowledge proof. Then, through some community chat groups, I came across Privacy & Scaling Explorations group and one of their projects, Semaphore. Semaphore is a very educational project. In this project, I see how ZKP is being applied (in a fashion I understand) and observe excellent software engineering inside.

Semaphore allows users belonging to a certain group to vote without revealing which member they are, aka anonymous voting. There is also a mechanism to prevent users from double-voting, which is done by posting some "residues" generated by the zero-knowledge proof protocol on-chain.

But what is great about the project is that it is very beginner-friendly, and I say this with great respect. The whole project is architected very nicely. It can even be viewed as a model project for how software and packages are structured in modern-day dApps. I have learned a lot from their software structure and encapsulation.

  • The core software project: Semaphore protocol, with a nice structure of:

    • contracts package - the smart contract that verify the proof generated offchain.

    • proof package - the off-chain logics that generate the ZKP with wasm code generated by an arithmetic circuit.

    • circuits package - contains the arithmetic circuit written in circom that generates and verifies proof.

    • hardhat package - a package that allows developers to deploy the core semaphore contract via hardhat without downloading the semaphore project.

  • A template to see how the protocol can be incorporated into other projects. It consists of:

  • The project leverages the Merkle tree structure to verify that a member belongs to a group, with all the tree leaves being the users' identity commitment. The user will generate a Merkle proof to show that he belongs to the group. The core team implements the Merkle tree structure separately as a Lean Incremental Merkle Tree (with a summary note).

  • The project has excellent documentation and website as well, including as two different packages inside the source code.

Studying the project taught me a lot of optimization tricks I can apply in the zk-battleship codebase. I also saw how they used to save on-chain gas costs, particularly using emitted events to store voting results (the user feedback messages) instead of on-chain storage to reduce gas costs further. This is something that never crossed my mind before.

There are two feature enhancements I can help implement on the current Semaphore basis to further my engineering understanding in zk-app development.

  1. Users can later prove they have voted without revealing their voting option, which can become proof of participation for them.

  2. The voting result is hidden during the voting phase until it ends. In this way, the choices made by late voters won't be affected by the given voting result so far.

Cangjie & Chu Bong Foo - part 1

· 阅读需 4 分钟
Jimmy Chu
Site Author

On one occasion in Taiwan, people around me were amazed that we Hong Kong people type Chinese using the Cangjie (倉頡) method. I realize it is uncommon for people to learn how to deconstruct Chinese characters and type them with the Cangjie encoding. Further thinking about it, it is a brilliant idea that Chinese characters can be deconstructed this way. It makes me wonder how the inventor came up with this input methodology. So I dug deeper into its history and found an inspiring story.

The architect Chu Bong Foo (朱邦復), during the 1960s, was working in a publishing house in Brazil. He saw that the publishing house could finish an English book publication in 3 to 5 days. From his youth experience, he knew it would take about two to three months to get a Chinese book published, converting from the author's manuscript to a publishing form. This is because, at that time, there was no systematic way to index Chinese characters. English words are composed of 26 alphabets, and how to index them is well-established. However, there is no systematic way of indexing Chinese characters, and there are over 14k+ most frequently used characters and 40k+ frequently used characters.

So Chu Bong Foo first studied the most frequently used characters, analyzing their shape and deconstructing them into 24 basic shapes (radicals) and 76 auxiliary shapes, often rotated or transposed versions of the basic shapes. Then, he used the following character roots to represent them.

Philosophical Group

  • A: 日 - Sun
  • B: 月 - Moon
  • C: 金 - Gold
  • D: 木 - Wood
  • E: 水 - Water
  • F: 火 - Fire
  • G: 土 - Earth

Stroke Group

  • H: 竹 - apostrophe / slant
  • I: 戈 - dot
  • J: 十 - cruciform
  • K: 大 - cross
  • L: 中 - vertical
  • M: 一 - horizontal
  • N: 弓 - hook

Body Parts Group

  • O: 人 - person
  • P: 心 - heart
  • Q: 手 - hand
  • R: 口 - mouth

Character Shape Group

  • S: 尸 - corpse
  • T: 廿 - twin
  • U: 山 - mountain
  • V: 女 - female
  • W: 田 - field
  • Y: 卜 - fortune telling

Collision / Difficult Group

  • X: 難 - difficult
  • Z: 重 - unknown/collision

Each Chinese character can be represented by a sequence of one to five character roots above, and the duplication rate (given a valid sequence returning more than one character) is less than 10%. Cangjie encoding is the most efficient way of inputting Chinese characters.

With such a system, publication houses can now input Chinese characters much like an English word, and Chinese characters can be sorted in a particular order and indexed efficiently. This has increased the Chinese publication process.

Later in Mr. Chu Bong Foo's life, he was also involved in the following projects:

  • Chinese integration in modern computer systems, back in the 1980s. When the computer itself is not too popularized yet.

  • Launched the first Chinese e-book system in 2001, when the first version of Kindle Reader was launched six years later in 2007.

  • Launched an AI system that takes a Chinese poem text as input and generates a 3D animation, all without human intervention. The process involves natural language processing, Chinese character interpretation, and generative AI, which we talk about today, all back in 2011! For details, refer to his announcement post at that time and the animation 記承天寺 (original download link, or the Youtube video).

Mr. Chu Bong Foo is truly a visionary.

References

Advent of Code 2021

· 阅读需 6 分钟
Jimmy Chu
Site Author

In the past three months I have been working on solving Advent of Code 2021 puzzles. I probably have spent 80+ hours in solving all 25 puzzles. Yes, that is pretty much a whole two weeks of full-time work cumulatively, and that is with the help of this subreddit forum and a lot of online resources 😳.

My solutions are here, with a corresponding website.

For those who don't know, Advent of Code are a series of programming puzzles released every year before Christmas, from Dec 1st till the Christmas day 25th to be exact, one problem a day. The author Eric Wastl has been doing this since 2015, and the programming puzzles have since used in company training, university coursework, and practice problems.

Coming out from the rabbit hole of solving these puzzles, there are three problems that stand out from the rest, in my opinion. So I want to document them down and reflect on what I have learned from them.

Spoiler Alert

For those who want to solve the puzzles yourselves, please stop reading here 😄

Day 19: Beacon Scanner

The original problem. In solving this puzzle, the first issue is to clarify what the 24 orientations are. The way I eventually visualize it is with a Blu Tack and three piece of papers.

So we can have X, Y, Z axes like that, and we can rotate them to have three different axes arrangement:

image

Now we can have axes pointing at different directions as follows:

image

With eight different set of axes directions and each direction with three axes arrangement, we get a total of 24 orientations.

I then learn that we can rotate a coordinate XYZ based on these orientations by matrix transformation(multiplication). The 24 matrics are listed here at the end.

Now with all beacon coordinates relative to a scanner location, and the scanner orientation be one of the 24 orientations, how are we going to solve the problem? At this point I am out of my wit and look for clues in the subreddit forum. The key is by comparing the distance of beacons between different scanner report.

Yes, Of course! That is the only invariant we have here! And I recall what Bezos have famously said "What is not going to change?".

Indeed, if a scanner have scanned the same set of beacons as another scanner, the distance between any pair within that beacon set are not going to change, no matter where the scanner is (its x, y, z offset), and what orientation the scanner faces (the matrix transformation).

With this insight we proceed to the next problem. How could we determine if two scanners have scanned the same set of twelve beacons, as specified by the problem?

Here we need to get to a concept in graph theory - complete graph. It is a set of nodes with each node have edges connect to every other nodes. So if two scanners have scanned the same set of 12 beacons, we should be able to build a complete graph from these 12 beacons (nodes) and get all of its edge length matches. Now how many edge lengths should we be counting here? It is 12 chooses 2, so 66. So if we build an edge table (with all edge length between any two beacons) for the two scanner reports, and find there are 66 entries in the two tables matches, we can claim there is a very high likelihood that the two scanners have scanned the same set of twelve beacons.

Once we identify the set of 12 beacons, we can determine the orientation of the second scanner. Here we always use the first scanner as the baseline, meaning it is located at (0, 0, 0) with the [[1, 0, 0], [0, 1, 0], [0, 0, 1]] orientation matrix.) We apply all 24 orientations to four beacon coordinate and see which one will result to having the same vector displacement of the first scanner, then we can determine the location and orientation of the second scanner.

Using this methodology we can map out all beacon and scanner locations and the orientation of scanners iteratively.

Day 22: Reactor Reboot

The original problem. I learned two concepts in solving this puzzle. The first one is how to compute the area covered by multiple sets, without double counting, when they overlap with each others.

For instance, the following:

image

We know the total of area are:

Area A + Area B - Area A^B

Now with the following:

image

Area A + Area B + Area C - Area A^B - Area A^C - Area B^C + Area A^B^C

So with N set, following the pattern, it is:

Area A + Area B + .... Area N - Area A^B - Area A^C - .. Area A^N - Area B^C - ... + Area A^B^C + Area A^B^D + .... - ...

I then learned that this is the Inclusion-exclusion principle.

I stumbled upon another sub-problem when solving this puzzle, that is, given a set of items, efficiently generate all its combination and compute its n choose k combinations. I thought it will be fun to code that, so using the algorithm from Dr. James McCaffrey, I coded the algorithm and also the front end Combination App (source code) that you can play with.

Day 23: Amphipod

The original problem. This is a fun puzzle. I learn from sub-reddit that this is basically a shortest path problem and can be solved with Dijkstra's algorithm. So it is a refresher on coding that algorithm and encoding various game states into keys of nodes, with edge extended from the node as all its next possible moves, with the edge length as the move cost.

But in implementing Dijkstra's algorithm, with a large amount of possible states and edges, how do you efficiently get the next shortest path to try? This is when I learn the data structure about Indexed Priority Queue (IPQ) from this Youtube video.

The overall structure is shown as follow:

image

You use an index to represent its nodes' keys and corresponding edge costs. Then during the heap operations (swimming up or down), we just move the index around. So here is the implementation of indexed priority queue with test cases.


That's it on Advent of Code 2021. Overall it is a fun and intellectually challenging experience, and I am glad to finish all 2021 problems 😉.