Following on from my previous post … Here’s a simple puzzle challenge. The SHA256 hashing algorithm (see earlier post: Immutable data) converts any string of text (integers or letters) into a (near) unique 64 character long string of numbers and letters. Using this algorithm the string “301253” (someone’s birthdate) translates to the string of characters
bad69ad103cbb4dbf57f608802ab93498422f14a276eb4b2f668c8fed254802d
I explained the process in the post: Immutable data. The resultant character string (not normally meant to be read by humans) is known as a hash, which has various uses. The algorithm for making such conversions is complicated, but well known to programmers, and is coded into lots of online digital security systems.
To reverse the process, and to unpack the original string from the hash would be a mammoth computational task, requiring software to generate every combination of characters, of different lengths, and calculate the SHA256 hash of each to see if they mach the target hash string. Even then, there may be several strings that produce the same hash. So you can’t be sure you’ve found the right one.
The SHA256 algorithm is designed to make this reverse engineering virtually impossible. Unlike various computational puzzle solving tasks the algorithm would have no way of detecting that it is getting closer to a solution. It might as well iterate every combination of characters for the input at random. It would be like a room full of monkeys typing out the Lords Prayer by hitting keys arbitrarily.
There are simpler challenges: find an input string that generates a hash with certain characteristics, e.g. the hash starts with a zero.
I messed about with “301253” by changing, adding and deleting characters, and eventually and randomly came up with “301252” that generates a hash
0135ce9c1094a7b56219acca34ba9e3bb68bd883cf842577c1e8c107346af173
I did this manually by pasting strings into the HSA256 calculator at xorbin.com. But it could be automated.
A hypothetical multi-user game puzzle
Imagine this is a contest in an online multi-user game. The first person to “solve” the puzzle gets a game reward, points, etc. Once you guessed it, you would send around your input string, and the rest of the players could easily verify that you came up with an answer. They would run the string you distributed through their own copy of the SHA256 algorithm to see if it did indeed generate a hash starting with zero.
In the case above the challenge was too easy, computationally. You could set a harder challenge: what input string will generate a hash string that starts “01010”, “ababab”, “0000” or some other arbitrary sequence? Aiming for a hash that starts with a row of zeros is as good as any target, as it’s easy to spot by eye if needed, and the degree of difficulty will be obvious from the number of zeros required.
One leading zero presents a trivial challenge; 4 zeros increases the degree of difficulty and the number of iterations required considerably. If the challenge appears to be too easy, solved too quickly, then some algorithm shared across the network will increase the degree of difficulty of the challenge by increasing the number of zeros required.
It seems that cryptocurrency systems that use this method expect solving the challenge to take about 10 minutes of dedicated CPU time, which is a lot of computation. As faster CPUs enter the network then an algorithm will increase the degree of difficulty to match the capability of the computers. The degree of difficulty for the bitcoin system is currently set at a hash with 19 leading zeros!
What is a nonce?
Here’s another sophistication to the puzzle. Instead of requiring just any character string as input, start with some known data, and ask, what sequence of characters would I need to add to that in order to generate a hash that begins with 19 zeros?
So you have any block of data, and the challenge is to work out what string you must append to that to achieve the target hash string. The string you add on is called a “nonce,” and has no use other than to modify the hash output: <data>+<nonce>. The puzzle-solving algorithm doesn’t manipulate the data, just the nonce.
Here’s some data: “NeZha conquers the dragon King.” That yields a SHA256
685db16fb573bc842a5a3aa3338a98e99cba49355d822c9ce7f7422489d3f407
Add an arbitrary nonce to that; keep adjusting the nonce, and eventually “NeZha conquers the dragon King.4532” yields a hash starting with 1 zero:
0efb8584a11494583e645a838b662d0613ccab1675022971901b05a8bea35c76
That was easy and only took a bit of adding, deleting and shuffling characters in the nonce. Aiming for a hash output starting with 19 zeros would take considerably longer.
Turning a page of transactions into a puzzle
In the previous post I demonstrated how a page of monetary transactions can also be hashed. The page starts with a hash of the previous page followed by a list of transactions, and a nonce. But before the nonce there’s another transaction offering the puzzle solver some cash, if they solve the puzzle before anyone else.
The challenge is to produce an HSA256 hash of that whole page that begins with 19 zeros just by adjusting the nonce.
This puzzle-solving is used to verify and embed a set of transactions. Cryptocurrencies, such as bitcoin store linked pages (blocks) of transactions, and distribute these to all the users (or their surrogates) who are connected together on a vast peer-to-peer network.
Any node on the network, usually a self-appointed subset with adequate computing power, can use this game to bed down a set of transactions. The node must collect all the transactions buzzing across a network since the last page was formed, check they are consistent and legal, put them into a virtual ledger page (called a block) in the format as described above, and calculate the nonce that will result in the page having an HSA256 hash starting with 19 zeros. Lots of the independent nodes will be doing this as the same time, incentivised by the potential financial reward. So it’s a contest.
As soon as one of these nodes generates a solution it broadcasts the result to all the other nodes, who quickly verify that the answer is correct. The block of the winning node then gets added to the set of all approved ledger pages, called a block chain, which is in turn distributed around the network as the approved set of transactions making up the correct and current state of the ledger.
Proof of work
The work required to bed down the pages in the ledger (i.e. verify, approve and seal the blocks in the block chain, is known as a proof of work.
Were any potential hacker try to overwrite any transactions or otherwise doctor the ledger, from the most recent page going back many other pages, then they would have to adjust the whole sequence of hashes in all the blocks. According to the seminal paper on the technique,
“To modify a past block, an attacker would have to redo the proof-of-work of the block and all blocks after it and then catch up with and surpass the work of the honest nodes. … the probability of a slower attacker catching up diminishes exponentially as subsequent blocks are added” (3).
There are other sophistications to the process, and I’ve simplified some steps and concepts, but I think the explanation gives me at least an understanding of the purpose of setting arbitrary puzzles in keeping transactions safe from tampering, a necessity where ledgers are open, duplicated and distributed across networks, breaking the need for a single “trusted” custodian (e.g. a bank) of the database.
The deliberate and extravagant “waste” of CPU time is put to a purpose. It’s part of the cost of data security in a distributed database. Security does not come cheap.
Reference
- Nakamoto, S. (2008), ‘Bitcoin: A Peer-to-Peer Electronic Cash System’. Bitcoin. Available online: https://bitcoin.org/bitcoin.pdf (accessed 19 June 2017).
Notes
- You can see the real time state of the bitcoin blockchain here: https://blockchain.info/blocks. When blocks of transactions are approved and bedded into the blockchain they are said to be “mined,” so-called as the node that verifies and beds down the block gets a fee, which incentivises the miner, and effectively adds more currency to the system, as if the node is mining for gold.
- Though the ledger can be viewed by anyone, the senders and receivers of digital cash are anonymised. According to the paper by Satoshi Nakamoto,
“The traditional banking model achieves a level of privacy by limiting access to information to the parties involved and the trusted third party. The necessity to announce all transactions publicly precludes this method, but privacy can still be maintained by breaking the flow of information in another place: by keeping public keys anonymous. The public can see that someone is sending an amount to someone else, but without information linking the transaction to anyone. This is similar to the level of information released by stock exchanges, where the time and size of individual trades, the ‘tape’, is made public, but without telling who the parties were” (6).
- Image: Devil’s Bridge, Kiev, Ukraine.
2 Comments