Immutable data

An application on the exorbin.com website can convert any block of data into a near unique 64 character string of characters.

Fun with SHA256

So “abc” converts to
ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad

My name, “Richard Coyne”, converts to
ba7c5c9e90f05540084e798effd0389b60ec98677662017065d34a0b880778a7

Here’s some Shakespeare

“To be, or not to be — that is the question:
Whether ’tis nobler in the mind to suffer
The slings and arrows of outrageous fortune
Or to take arms against a sea of troubles,
And by opposing end them. To die — to sleep —
No more; and by a sleep to say we end
The heartache, and the thousand natural shocks
That flesh is heir to. ‘Tis a consummation
Devoutly to be wish’d. To die — to sleep.
To sleep — perchance to dream: ay, there’s the rub!”

That translates to
ff5397a0736b5a58b3e4a5950982296967ce1c9ad26c4277b0094acbb3196edb

The arbitrary looking 64 character string output serves as a fixed-size unique condensed representation of the original data. Take out the first comma in the Shakespeare above and you end up with a very different string output
e1ad69b86b1e897c34ad111959543515f2f5b729bf359646b3fad999d2a3b24c

The conversion is not a method for compressing or encrypting data. There’s no algorithm to unpack the 64 character string to recover the original data. The output string is for machine consumption rather than for human beings to read, check and ponder.

The output string is called a hash, and the SHA256 algorithm used by that website is the latest, the most sophisticated and the least likely to generate “a collision,” i.e. two or more sets of different data that have the same hash. The conversion provided by the exorbin.com website  application is just a demonstrator of processes normally invisible to users.

Why hash?

What is the use of this (near) unique hash representation of a block of data? It’s used for authentication. As well as transmitting the segment of Shakespeare, transmit the hash as well — perhaps in a different channel. The receiver software then runs the data through the SHA256 algorithm, and if the receiver generated hash does not match the hash sent with the data, then that alerts the receiver that the data has been tampered with en route. It won’t tell you what was changed, or the magnitude, but it may send up an alert, or notify the sender software to transmit the data again.

A hash can be used to transmit a password over the Internet. My online banking service has my password in a database saved as a hash. So if I enter my (not really) password “appLeTr33” via the banking website then the transmission software converts that to the hash
80dfcee4ead6ef8cd1ba4edfc782d5c4b4505affc223d1345bcce51c9818fc61
and sends the hash through the Internet. The server software receives the hash and looks that up to access my account. If the hash fell into the wrong hands then there is almost no way to reverse engineer the hash string to the original password. So that’s one of several measure for keeping passwords safe in transit.

Hacking ledgers

Another use of the SHA256 hash algorithm is to retain the integrity of a ledger of data, e.g. a sequence of monetary transactions.

Here’s a page from a spreadsheet of transactions. I’ve added a column to show the SHA256 hash of the data in each row, added to the hash of the previous row. Yes, you can have a hash of a hash.

If someone changes the content of one of the rows in the ledger then that changes the hash values for the transactions that follow, as shown here.

With access to the SHA256 algorithm, someone who wanted to doctor the ledger could change the hash value of that transaction and those that follow. But if the ledger has already been shared with others and verified, then the discrepancy will be obvious, and rejected as invalid by an auditing algorithm.

Rock solid

In fact, this method of verification via hashing works on whole pages of transactions as well as for individual transactions. One of the best descriptions of this method of verification that I have found comes from a blog by Antony Lewis.

A traditional paper ledger consists of pages of transactions. Instead of page numbers at the top of each page (header) print a hash of the previous page. At the bottom of the page print a hash of the page (including its header). Lewis calls the hash a unique fingerprint of the page, the page is a block, the whole set of pages linked in this way is a chain.

“By using a fingerprint instead of a timestamp or a numerical sequence, you also get a nice way of validating the data. In any blockchain, you can generate the block fingerprints yourself by using some algorithms. If the fingerprints are consistent with the data, and the fingerprints join up in a chain, then you can be sure that the blockchain is internally consistent. If anyone wants to meddle with any of the data, they have to regenerate all the fingerprints from that point forwards and the blockchain will look different.”

The term immutable is used commonly to describe this state of a data set. Each page of transactions gets embedded in the transactions that follow. So you end up with a massive unalterable data set. It’s a bit like sedimentary rock formations where the newest strata depend for their integrity on older formations below, which in turn require more effort to penetrate (i.e. alter).

This “immutability” is a key feature of distributed digital ledgers as used in cryptocurrencies, where the ledger is duplicated and shared across a large number of users, and there is no centralised keeper of the data, or auditing authority such as a bank. Also see my posts Digital money and Wasting time in the bit economy.

References

Note

  • Picture is of South Stack, Holy Island, Wales.

6 Comments

Leave a Reply