“Apology-based computing”

I came across this phrase sometime back, and was instantly intrigued by it. So, like any good samaritan, let me share what I could make of it for the larger good of humankind! While I present my assessment, I’ll also highlight aspects that make it very viable in most of the computing contexts. We will also delve into how this phrase, at some level, dabbles with the aspects of modern day computing like eventual consistency, the tradeoff between performance and correctness, and Amdahl’s law.

First, let’s get to understand what the phrase means.

Let me make a claim: We come across apology-based computing in our day-to-day digital ongoings — be it shopping on e-com websites, chatting using our phone apps, or general browsing.

So what is it?!

It merely points to the fact that in this age of highly distributed systems1, in a majority of situations, it’s ok for messages to be delayed, go undelivered, or, just go awry! Note that messages here imply any sort of communication between two or more systems1.

If this sounds a bit overbearing and leads to squinting of eyes, you might get want to recall situations in which you had to “refresh the page” in order to see an update, or those times when your text or chat messages did not deliver and a cute little (!) sign appeared beside it, or, (for the technically inclined) you had to explicitly invalidate a cache, so that updated data is reflected quickly. Also recall the fact that you were sort of okay with it, and in a majority of such scenarios, did not complain.

And, that’s precisely it!

Over-time, as the systems have grown, and as the apps have proliferated into almost all aspects of our life — we’re becoming more and more OK, when every once in a while, the systems do not behave as expected. This is not just philosophical — we humans did not become more patient overnight! Rather, over the years, something interesting has happened — our response as intermediate or ultimate users of these systems has evolved. So much so, that a seemingly bureaucratic statement that,

It’s easier to ask for forgiveness than to get permission.

https://en.wikiquote.org/wiki/Grace_Hopper

has found a benign presence in computing and forgiveness or apology aspects have become the order of the day!

Why did this happen, you ask? Because there’s no other option!

Something of this sort is what David Ungar has deliberated upon, and proved in his iconic talk titled “Everything You Know (about Parallel Programming) Is Wrong!” (see embedded video).

In this talk, in the light of the above quote, David Ungar highlights how the bias in computing is leaning (or should lean) more towards something he refers to as “end-to-end nondeterminism”, or “race-and-repair”, rather than correctness.

Correctness or Determinism comes at a cost, and despite one’s best efforts, we are limited by Amdahl’s law — when it’s within the confines of a system, and aspects like CAP, and other distributed system vagaries, the moment a process (transaction) crosses the boundaries of one system.

So, what do we do?

Well, distributed systems engineers are well-aware of the phenomenon that,

Failure is a norm rather than an exception.

which, if you come to think of it, is a paradigm shift from the conventional thinking where we, as programmers or system architects, were told to treat it as catastrophic! However, treating failures as a norm in modern-day computing leads us to something very practical — something we call “designing for failure”.
That is to say, we need to build systems with better resiliency, quick failure detection, fault tolerance, and, with CAP in perspective — be willing to compromise on correctness in favour of availability, by being eventually correct! That pretty much takes care of most of the scenarios. (Of course, we’re not talking about mission critical systems or transactions — where correctness and/or availability, whatever be the cost, is indispensable!)

So, going back to the aforementioned scenarios, what the systems are doing by making us ‘refresh’ the browser window or by making us log back in, is aligning to the correctness part of the application. The other option, of course, would have been a 503 or something like this, which leads to far more painful memories!

1Systems = Processes