Hashing to Reduce Work

When writes are expensive relative to reads, hashing can help to reduce work. The example we'll follow is writing n objects to a web service over an API that requires one call per object to write (e.g. a POST or PUT for a REST API), but offers a way to read all n objects in fewer than n calls (e.g. paginated GETs for a REST API). In addition to the time taken to process the request, each API call requires a network call, which can often take on the order of hundreds of milliseconds, so making n such calls can be prohibitively expensive depending on the size of n.

Now, suppose we only need to write an object if it's different from the instance currently stored on the server. The key insight is to hash the data of the object and include this hash alongside the object data written. Then, before writing the set of n objects, we first read all n objects with fewer than n API calls, compare the hash of each current object on the server (if it exists) to the hash of the new instance to be written, and only write the new one if the hashes are different. Note that this only reduces work if the ratio of unchanged objects to the frequency of performing the task described is high enough.

Beyond the assumed efficient reads relative to writes, the time taken to generate and compare hashes should be much less than making an API call. Applying a non-cryptographic hash function like the Fowler-Noll-Vo (FNV) algorithm to the bytes of the data representing the object is probably the simplest and fastest way to compute the hash, and the result of algorithms like FNV is just an unsigned integer, which makes comparison trivial.