Forward (2018-03-31): I wrote this years ago, and it sat in draft because I was going to get around to writing an OSS library to do this (My employer at the time did not want to open source the version I wrote). Turns out I never did write an OSS version. Although I still think the content is valid and maybe will inspire somebody to write an OSS version.
For several years now, I have had numerous assignment to write an electronic agreement system for one use case or another. After 4 rounds of it, I am seeing some trends in how things should be best organized. I am also seeing pitfalls where the whole thing could fall apart. Since the whole point of these things is to be a legally-enforceable agreement, terms and conditions, signature, or whatever the word flavor of the month is, we need to make sure it is bulletproof. Typically this is accomplished by leveraging a service like EchoSign.
In my typical challenge-everything way of thinking, I decided there must be a better way. Essentially you are paying quite a bit of money to EchoSign and when you boil it all down, you end up with a 3rd party who is vouching for the integrity of the agreement. This means, at the end of the day, either the custodian of the record cannot manipulate the agreement record without invalidating the record (my way), or the third party has a vested and audited interest in maintaining the integrity (EchoSign's way). So this got me thinking about the following question. How can I write an open-source, free solution which enables anybody to verify the validity of the information in question? Oh, and I need to make it happen in 2 days since I am under a deadline.
The solution requires a snapshot of the relevant pieces of information at a certain time be hashed and then that hash needs to be published to an open, independent, and reliable 3rd party. The answer to that question hit me like a ton of bricks. Bitcoin block chain! Bitcoin does a damn good job of ensuring the integrity of the BitCoin block chain, because without the integrity the currency would not be trusted and the double spending problem would rear its ugly head.
Now that I had my solution determined, I needed to figure out how to make it happen. After Googling a bit, I ran across this awesome sitehttp://www.originstamp.org/ which allows you to ensure time-stamped possessionof anything by hashing whatever you send (or you send the hash) and then it embeds the hash in the block-chain once per day. As an added bonus, it immediately tweets the hash as well, which means we are doubly publishing the fact that we had the given hash at a certain point in time. Since I am on a deadline, I decided this service will be much easier to make use of instead of writing my own integration with the block-chain. Naturally I will abstract this functionality so additional adapters can be written in the future.
Now that we have the third-party thing sorted out, lets take a look at how we need to setup our application to work with this. I am using PHP, MongoDB, and Symfony2 for this exercise currently as it is the technology stack we are working on. The same techniques should apply with any other language though.
First thing we need to make verifiable is the agreement itself. This represents the text and any rules associated with the agreement, as well as a name. This should never be updated, only replaced with a new version. So at the most simple terms, we have an agreement which has a date created, some text, a name, and a version. A version is specific to the name. IE. If the agreement is used when trading your first-born for a new TV, I could give the agreement the name “FIRST_BORN_FOR_TV”. The important part is that it does not change, and that it is not the same as other agreements you may be keeping track of. If you are a normalization Nazi, then you probably want the name to be a UUID. Now when some lawyer gets a hold of our agreement and invalidates it because we are inciting slavery, we need to update our agreement for all future customers. Perhaps we change the wording so it has the same net-effect, but now we are not owning a person, but merely being the recipient and director of all labor for the rest of the first-born's life. Now we create a new agreement using the same “FIRST_BORN_FOR_TV” and increment it to the next version. We create a SHA256 hash of the name, version, UTC timestamp, and the text of the agreement. Then this is stored along with the agreement. At the time of creation, we have also published the hash to Origin Stamp via their API. You can see my first test agreement published to the BitCoin block-chain and on twitter at the links below.
Looking at the hash, you say, “so what? This doesn't tell me anything.” You would be correct in that it is not much use other than being able to prove that the creator of the hash (me) was in possession of the hash by2015-04-05 02:22:45 UTC at the latest. This means I did not generate the hash later than that time, which means the data I stored can be verified that it did not change from that time. What this means is that a third party can take the data we stored as the agreement, run it through the same hash generator, and get the same hash. If anything changed, the same hash would not be created, and the information would be suspect.
In order to capture the event of somebody signing the agreement, we need to create a signature entity. This needs to have a reference to the agreement, the full name, the IP Address of the signor, the email address, and the date signed. We then hash all of this info (using the integrity hash from the agreement to represent the agreement) and we have a signature integrity hash. We then publish this.
One issue which may arise is any changes made to the system. Meaning, how do you deploy any changes to the hashing system (like capturing additional pieces of data). This is also solved through hashing. Conveniently we use GIT for our code repository, which has a unique hash for every commit. All we need to do is capture the git hash at the time of the signature, store it with the signature record, and add it to the hash. In this way, we know the code used, the data, and the point in time. This can all be verified by a third part who can get that point in time of the code, recreate the same hash steps, and recreate the same hash, thus proving that we captured the exact data at the time when the agreement was signed and not after the fact.
My wife has suggested we call this technique HashCash. Between the long explanation of BitCoin and Hashes, I think she tuned me out and this was her way of showing she was still listening.