Today I am going to go on a journey to go through bitcoin’s build system, show you how to verify your bitcoin download is the right one and show you some improvements in the work. Here’s where we come back to Nick’s journey. You might ask, where do these compilers and other things in the toolchain come from? In other words, we might be reproducible but we might also be reproducibly malicious. Similarly, the way you make a toolchain is you give a toolchain to one you already have and make more toolchains. Similarly, in “reflections on rusting trust”, someone was asking about the notoriously hard-to-bootstrap rust compiler. The existing malicious compiler had another trick up his sleeve. The compiler was somehow able to replicate and poison successful generations of itself. After his epiphany, he got an AT&T tech to show up with 3.5 inch floppies and loaded the proper compiler and linker source so they can recompile the compiler, thus making the yogurt from yogurt.
These .assert files are generated whenever we do a gitian build from source. We need to minimize our trusted set of binaries as much as possible, and have an easily auditable path from those toolchains to what we use how to build bitcoin. So what can we do about the fact that our toolchain can have a bunch of trusted binaries that can be reproducibly malicious? We should know how these tools are built and exactly how we can go through the process of building them again, preferably from a much smaller set of trusted binaries. We all know that software was source code at some point. How do we know that whoever uploaded this binary modified it to steal your coins and upload your keys? Gitian ensures that given identical source code, we get identical binary outputs. So obviously there’s something wrong with the source code. He thinks it would solve it, but he was wrong. You take some milk, and add it to some existing yogurt you already have to make more yogurt. We need to be more than reproducible.
We need to be bootstrappable. Thank you to Chaincode Labs for funding this, and Cory Fields for tolerating my questions, ryanofsky for telling me about bootstrappable package managers, and the good folks on IRC in bootstrappable, and guix people, and my friend Tales from the Crypt for giving me this awesome microphone setup. Every package built on guix can be traced back to a small set of trusted binaries. The problem with gitian is that although Bitcoin Core binaries can be reproducibly built, the tools to build that binary are hard to audit and difficult to make reproducible, resulting in a possibly malicious bitcoin binary. Guix means that when we use it to build our toolchain, we can audit how each tool in our toolchain was built and easily bootstrap them from a small set of trusted binaries. This is in stark contrast from our current gitian process where we build bitcoin from trusted debian boundaries that we pull from an ubuntu package repository.
Even if it was reproducible, the toolchain that gitian downloads and trusts and uses to build bitcoin source code can still be malicious. He tried some other techniques, like recompiling the standard library, and even learning assembly language. We should be using functional package managers like guix, which is a package manager where bootstrappability and reproducibility are fundamental tenants. Nick’s story reveals why reproducibility is not enough. I am here to tell you that reproducibility is not enough. Completely clean source code is also not enough. Somehow the source code had the offending lines re-inserted. There’s also hex0 which can be the only trusted binary, in a few hundred bytes of source code. But with the current version of mes, in guix’s core update branch, we can eliminate gcc as a trusted binary, mitigating the “trust in trust” attack, and bringing our bootstrap collection down to 131 MB. Before reproducibility, we had to trust that the bitcoin binary was not malicious. I also want to talk about the ongoing work to have a reduced binary seed botstrap in guix.