On the Importance of Purity

The benefits of advanced programmings languages are sometimes difficult to grasp for everyday programmers. The features of such languages and how they relate to industrial software development are sometimes hard to understand, especially since the arguments are couched in terms such as "referential transparency", "totality", "side-effect-free", "monads", "non-determinism", "strong static typing", "algebraic data types", "higher-order functions", "laziness/call-by-need", and so on.

Many of these features are attributed to "pure" languages, but purity is also a nebulous concept. I will explain the importance of a number of these features and how they impact the everyday programmer's life.

The Benefits of Referential Transparency

Referential transparency (RT) is a simple principle with profound consequences. Essentially, RT dictates that functions may access only the parameters they're given, and the only effect functions may have, is to return a value.

This may not sound very significant, but permitting functions which are not
RT is literally a nightmare for software developers. This fact is simply not obvious because most mainstream languages are not referentially transparent.

Consider a fairly benign scenario where you're in a loop and you call a function, but the function suddenly changes your loop index variable even though you didn't pass it to the function! This generally doesn't happen, but it could happen in languages like C/C++. The only reason it's not prevalent is because functions are RT regarding local variables allocated on the stack, and most languages other than C/C++ enforce this property.

I'm sure it's not difficult to imagine the horrors if any function you call could modify any of your local variables: you could no longer rely on the variables holding correct values at any time. How could you rely on anything? Mutable global variables are a good example of this mess. It's common wisdom that one should avoid globals. Why? Because they're not RT, and since their values could change at any time, they're unreliable.

If a client calls you with a problem in your program written in a RT language, you immediately know that the problem is exactly in module B providing that feature, and perhaps even the particular function B.F performing the given operation. Instead, in non-RT languages a completely different module C could be interfering with module B by changing its state behind the scenes. Debugging is thus much easier with RT.

Consider a more malicious scenario, where a function does some useful computation, but also deletes all of your files behind your back. You download this library because it seems useful, only to lose your entire machine. Even worse, it may only deletes files when deployed to a server. Or perhaps it installs some adware. This situation is only possible because the function is not referentially transparent. If it were, the only way it could have deleted your files is if you had given it a handle to a directory.

Those of you well-versed in the security field might recognize this constraint: it is the same authority propagation constraint underlying capability security.

However, the properties of full referential transparency are stronger than those of capability security: where the latter permits non-determinism as long as the facility providing it is accessed via a capability, full referential transparency requires "pure functions", which are fully deterministic.

The Benefits of Determinism

So why is determinism important? It's important that everything in your program is reproducible, and thus testable. No arguments with that statement I'm sure.

Consider a particular function, that whenever called with the same parameters, it returns a completely different value. How could you ever produce a useful program if every function were like that? How could you even test it?

Most developers have experienced the problems of non-determinism when a client calls in with a problem, but the problem is not reproducible even by retracing the exact same steps. If the program was deterministic, then retracing those exact steps would always produce the error, no exception. I can't understate how essential reproducibility is for testing and quality assurance purposes.

However, non-deterministic functions do exist. We've all used such functions at one time or another. Consider a function that returns the value of the clock, or consider the random() function. If we want to use a RT, deterministic language, we need some way to use non-deterministic functions. How do we reconcile these two conflicting ends? There are a number of different ways, but all of them involve controlling and isolating non-determinism in some way.

For instance, most capability languages permit non-determinism, but the source of non-determinism can only be accessed via a capability which must be explicitly granted. Thus, you know that only the modules that were granted access to a source of non-determinism can behave non-deterministically and every other module in the program is completely deterministic. What a relief when debugging!

Essentially, this means that any source of non-determinism cannot be globally accessible, or ambient. So for you C/C++ programmers, rand() and getTime() are not global functions, they are function pointers that must be passed explicitly to the modules that need them. Only main() has access to all sources of non-determinism, and main() will pass on the appropriate pointers to the authorized modules.

Purely functional languages like Haskell take a similar approach to capability languages. RT is absolutely essential to purely functional languages, and any source of non-determinism violates RT. Such languages have a construct which was mathematically proven to isolate and control non-determinism: monads. It's not important what a monad is, just consider it as a design pattern for purely functional languages that preserves RT in the presence of non-determinism.

I would say that determinism is strictly more important than RT, but that RT is currently the best known method of achieving determinism. Now if your client calls you with an irreproducible problem, at least you've narrowed the field to only the modules that use sources of non-determinism.

The Benefits of Totality

What is a total function? A function is total if it is defined for all possible values of its inputs, no exception. Integer addition is a total function: regardless of the values of the two input integers, adding them produces a valid result.

By contrast, many functions are partial functions. Here, "partial" means that not all values passed to the function are valid. For instance, integer division is defined for all integers except zero. Dividing by zero is considered an error and its behaviour is undefined. Division is thus a partial function.

Using total functions always results in defined behaviour. Using partial functions sometimes results in undefined behaviour, if they are applied to invalid values. Thus, the more total functions you use, the more likely your program will run without generating errors.

If a language forces all of your functions to be total, then your program will have no undefined behaviour at all. You are forced to consider and handle all possible error conditions and edge cases, in addition to the program' expected normal operating mode. No undefined or unknown behaviour is possible.

As you can imagine, totality inevitably produces more reliable software. In C you are free to ignore errors and segfault your program, but with total functions you can't ignore those errors.

Unfortunately, totality can be a serious burden on the developer, which is why partial programming is more prevalent. Exceptions were invented to help deal with invalid inputs to partial functions: don't segfault, throw an exception! They do permit more modular code to be written, since errors can propagate and be handled at higher levels. Unfortunately, unchecked exceptions, where the exceptions a function can throw are not defined in the function's signature, just bring us back to square one: uncaught exceptions result in undefined behaviour, but the language doesn't help us by telling us what exceptions we need to catch.

Correct software inevitably requires totality of some sort. How do we transform a partial function into a total function? Here's how we can make division total:

fun divide(n,d) : int * int -> int

This snippet is the signature for the division function. It takes two integers, n and d, and returns an integer. No mention is made of the error condition in the signature when d=0. To make divide total, we transform it to the following:

data Result = Defined(x) | Undefined
fun divide(n,d) : int * int -> Result = 
 if d == 0 then
   return Undefined
 else
   return Defined( n/d )

So now, any code that calls divide must deconstruct the return value of divide into either a Defined result x, or an Undefined result indicating an invalid input:

fun half(x) =
  match divide(x,2) with
    Undefined -> print "divide by zero!"
  | Defined(y) -> print "x/2=" + y

As you can see, a total divide function forces you to handle all possible cases, and so any program using it will never have undefined behaviour when dividing by zero. It will have whatever behaviour you specify on the Undefined branch. No efficiency is lost as any decent compiler will inline the above code, so the cost is just an integer compare against 0.

A similar strategy can be used for errors that come from outside the program as well. Consider the file open() function. It's a partial function:

fun open(name) : string -> FILE

It can be transformed it into a total function as follows:

data FileOpenResult = File f | PermissionDenied | FileNotFound | ...
fun open(name) : string -> FileOpenResult

If all partial functions are made total using the above technique, then all of your software will have fully defined behaviour. No more surprises! To a certain extent, totality even mitigates some problems with non-determinism: after all, who cares if the output is not reproducible since you are forced to handle every possible case anyway.

At first glance, totality is not appealing to the lazy programmer. But if you've ever had to develop and maintain complex software, you'll soon appreciate your language forcing you to deal with all possible cases at development time, instead of dealing with irate customers after deployment.

For all the lazy programmers out there, consider this: totality almost eliminates the need for testing. As a lazy programmer myself, that's a selling point I can support. ;-)

The Benefits of Strong Static Typing

What is strong static typing? Static typing is an analysis a compiler performs to ensure that all the uses of data and functions in a program are consistent. In other words, you're not saying X in one place, and then saying Y which contradicts X in another place.

I take strong static typing to mean a static type analysis that the developer cannot circumvent. C/C++ are statically typed, but their static type systems are not strong because you can freely cast between types, even when doing so is undefined. C#, Java, and similar languages are statically typed, but they are on the border line of strong typing: there is no way to defeat the type system, but the static analysis is weak since you can still cast. Casting just introduces dynamic runtime checks and runtime errors.

There are many arguments for and against strong static typing, but in combination with the previously discussed features, static typing enables developers to write software that is "almost correct by construction". By this I mean that if your program compiles, it will run without errors, and it has a high probability of actually being correct.

The reason for this is some computer science black magic called the Curry-Howard isomorphism. What that is exactly isn't important. Just consider a static type system to be a built-in logic machine: when you declare and use types this logic machine is making sure that everything you're telling it is logically consistent. If you give it a contradiction, it will produce a type error. Thus, if your whole program type checks, it is logically consistent: it contains no contradictions or logical errors in the statements the logic machine understands, and the logic machine constructed a proof of this fact.

The power of the static type system dictates how intelligent the logic machine is: the more powerful the type system, the more of your program the machine can test for consistency, and the closer your program is to being correct. Your program essentially becomes a logical specification of the problem you are solving. You can still make errors in the specification, but those errors must not contradict anything else in the specification for it to type check.

The downside of strong static typing is that the analysis is necessarily conservative, meaning some legitimate programs will produce a type error even though they would not generate an error at runtime.

Such is the price to pay for the additional safety gained from static typing. As a lazy programmer I'm willing to pay that price.

Conclusion

These are some of the less common idioms found in advanced programming languages. Other idioms are either more well known and have already been adopted into other languages (algebraic datatypes, higher-order functions), or are simply less relevant for constructing reliable software (laziness).

Absolute purity may ultimately prove to be unachievable, but its pursuit has given us a number of powerful tools which significantly aid in the development and maintenance of reliable, secure software.

[Edit: clarified some statements based on some feedback from LTU]

Easy Reverse Mode Automatic Differentiation in C#

Continuing from my last post on implementing forward-mode automatic differentiation (AD) using C# operator overloading , this is just a quick follow-up showing how easy reverse mode is to achieve, and why it's important. Why Reverse Mode Automatic Differentiation? As explained in the last post, the vector representation of forward-mode AD can compute the derivatives of all parameter simultaneously, but it does so with considerable space cost: each operation creates a vector computing the derivative of each parameter. So N parameters with M operations would allocation O(N*M) space. It turns out, this is unnecessary! Reverse mode AD allocates only O(N+M) space to compute the derivatives of N parameters across M operations. In general, forward mode AD is best suited to differentiating functions of type: R → R N That is, functions of 1 parameter that compute multiple outputs. Reverse mode AD is suited to the dual scenario: R N → R That is, functions of many parameters t...

Anonymous said…

"At first glance, totality is not appealing to the lazy programmer."

for a good paper by D.A.Turner on the subject see :

http://www.cs.mdx.ac.uk/staffpages/dat/sblp1.pdf

October 26, 2007 at 9:51 AM

Sandro Magi said…

That paper was my first serious introduction to totality. See the LTU post on the paper for some interesting discussion.

October 26, 2007 at 10:15 AM

Unknown said…

"Consider some function that is RT, but, whenever you call this function with the same parameters, it returns a completely different value!"

By definition, a referentially transparent function is deterministic. I don't understand the examples you give where it may be non-deterministic.

Other than that hiccup, I enjoyed your entry.

November 9, 2007 at 12:03 AM

Haskell is referentially transparent, yet not deterministic. Interaction with the external world is necessarily non-deterministic, and any useful language must be capable of such interaction. The hard part is reconciling referential transparency with non-determinism. This involves splitting the notion of values from computation. Monads are one way of accomplishing this. There also "witness types", and other schemes.

The examples I presented were intended to highlight intrinsically non-deterministic functionality that we would like to use from with an RT language.

November 10, 2007 at 1:39 PM

I don't know what a referentially transparent programming language is. From my understanding, RT is used for an expression or function or piece of code. It says that that expression can be replaced with that its value (for a given set of inputs) without changing the program.

The output of a non-deterministic expression is not predictable. Thus, two threads printing out different characters. We can't predict how the threads are scheduled, therefore it is ND.

Given those two definitions, you can't have a RT expression that is ND, because then it could not be replaced with a value without changing the program.

November 10, 2007 at 2:03 PM

Consider the monadic approach: all computations with non-deterministic functions take place within a monad; the monad encapsulates hidden state which corresponds to the external world. This hidden state is an *input* to the program. These non-deterministic functions are accessing this hidden state, which is why they are not predictable.

So from the theoretical perspective, the program exhibits non-deterministic behaviour only because the world passed in is different every time; if the same world were passed in, the behaviour would be the same.

This restores referential transparency in the presence of non-determinism.

November 10, 2007 at 2:58 PM

Higher Logics

Search This Blog