Monday, August 30, 2021

Optional Value Handling in Nim

I recently had a look at a functional language named Toccata which amongst other things does away with booleans. While this migth seem utterly insane it's not an entirely new concept and proponents of such ideas will warn you of the perils of boolean blindness. This refers to the fact that booleans themselves carry no information apart from their value. But as others have pointed out this is true of all values and the linked article instead refers to the problem as part-blindness and recommends the user to use a good type system. In this article I'll discuss the ideas behind boolean and part blindness and muse on the concept of computing and error checking with options.

The reason why boolean blindness is more common to talk about than part blindness is in my opinion partially because we tend to think of booleans only when we check things. Statements like if this pointer is nil, is this value greater than that value, or if a data structure contains an element are all common tasks in programming. When our code takes actions based on these checks we are required to implicitly hold that postulation in our head as we write the following action. If a pointer is nil we must remember not to use it, if a value is greater  than another we must remember to use the right one, and if our data structure doesn't contain some data we must not try to access it. This is what most examples of boolean blindness will show you, forgetting that you performed a check and doing something that can't possibly work. What part blindness tackles is another beast entirely, what values can we pass where, and can we guarantee for the compiler through our types that we will never get errors.

Nim is a language in which you have a very rich type system, similar to what you might find in Ada even. We can create simple distinct types that share the same machine interpretation, like both miles and kilometers can store their values as an int, but the compiler can know that they are different types. We also have things like integer ranges, so the compiler can make sure that we don't pass in values outside of what we expect. Along with this we also have the concept of object variants which allows us to enumerate different states for our types, with  completely distinct value representations. With these it is trivial to implement the concept of parts to avoid part blindness. We can even go further and use it for things like a dynamic type with completely different underlying values based on it's type enumeration. But there are some issues with doing this, and it's a good reason to look at boolean blindness in greater detail. As I mentioned earlier we mostly consider boolean blindness when considering checks, so I'll rather refer to the following concept as check-blindness to avoid confusion and repeating arguments for or against boolean blindness. The concern for check-blindness is not that booleans don't carry any information other than their value, it's rather that the context of the check that produced them is lost. In Nim we could of course have our equality operator return a distinct IntCompareBool, but even that misses a lot of the context. This is where a generic Option or Maybe type comes in. Instead of returning a boolean from a check we can return a simple structure that tells us whether we have a value or not along with the actual value. Since we now have a way to return a value and a has-ity we remove the need for explicit has checks for containers, along with the need to check for a special bad value in calls that return a value (such as the nil pointer). But with just the most simple option type implementation we don't actually gain much. Right after our check-less call we need to check if our option has a value or not. It strengthens the type system slightly by recognizing that an optional pointer is not actually the same as a pointer, but we can take the benefits further with some extra logic.

Basic options handling

The main idea of more complex option type implementations like the one found in Toccata is composability. But let's first have a look at what we would expect from a simple implementation:

type
  Option[T] = object
    has: bool
    value: T

proc some[T](value: T): Option[T] =
  Option[T](has: true, value: value)

proc none[T](kind: typedesc[T]): Option[T] =
  Option[T](has: false)

proc isSome[T](option: Option[T]): bool =
  option.has

proc isNone[T](option: Option[T]): bool =
  not option.has

template either[T](option: Option[T], otherwise: T): T =
  if option.has: option.value
  else: otherwise

We have a way to create options with or without values, and we have a way to check if they have values and a way to convert them back into a value. Note that either is a template and not a procedure, this mean that otherwise can actually be a procedure call to something that produces a side-effect. This procedure will then only be called when option does not have a value which is an important property we'll use more later. Apart from the either template, this is something you will find in pretty much all option/maybe implementations. But as previously mentioned it does not really reap much benefit from options as a concept. Imagine you have a call like a table lookup, it might return a none option if there is no value for the given key, or it might give us a some option with the value. In most imperative languages this would look something like the following snippet. It is here assumed that the get procedure over a table will return an option with the value if it exists, or a none option if the key is missing. And that get over an option will throw an exception if the value doesn't exist:

let greatestLanguage = table.get("greatestLanguage")
if greatestLanguage.isSome():
  echo "The worlds greatest language is ", greatestLanguage.get() # We know implicitly that this will never create an exception
else:
  echo "No-one knows what the greatest language is"

The problem here is that our check blindness kicks in, what happens if I switch the if statement for some reason, or forget which code path I'm on. I might end up with a get statement in the wrong place, which isn't technically illegal, and hence won't be caught by the compiler. This creates an issue, for what will happen is that we run our program and it crashes on runtime. This of course might already be in a contingency case which we haven't thoroughly tested, and we might not catch our bug until it's too late. And you might also notice that we haven't actually gained any safety from this. Using those get calls for an option is not any safer than using the tables own get procedure behind a check to see if it has the key.

Safer access patterns

So what can we do to mitigate this? The answer is to create a special form which will make introducing such errors impossible. I won't show the implementation for this as it involves a bit of trickery and adds some nice extras, but the result ends up looking something like this:

match table.get("greatestLanguage"):
  some greatestLanguage: echo "The worlds greatest language is ", greatestLanguage
  none: echo "No-one knows what the greatest language is"

Note that the only get statement we use here is the one for the table that's been made safe with options. The match special form (created in Nim with a macro of course) get's rewritten to an if check and a case statement that automatically unwraps the value into a symbol of our choice. This symbol is simply not visible in the none part of this form, so we can't access it by accident (note that we still could use our imaginary get procedure from the above snippet, which is why we shouldn't have one at all). If the only ways to access the value of an option is through these special form constructions or our earlier either statement (which in itself is a bit of a special form construct since it's not just a simple procedure) we are guarenteed to never have options introduce runtime errors. But as I mentioned this converts into a case statement, so any kind of matching supported by those are also available to us here. The current implementation is also able to return a value instead of just running code. So combining these different powers we can do something like this:

let languagesMessage = match table.get("languagesCount"):
  some x of 0..100: "Only " & $x & " languages? I would've expected there were more, sure you counted them all?"
  some x of 101..500: "Well I guess " & $x & " languages are quite a lot to choose from, know any good ones?"
  some x: "Wowzer " & $x & " is a lot of languages!"
  none: "I guess no one knows how many languages there are"
echo languagesMessage

Control flow and bash-like conditionals

Before going further let's go back to Toccata and see how it treats options, after all it doesn't have booleans so pretty much all control flow logic must be made with this concept.

First of let's look at some conditionals around has-ity. Now this could be done in Nim with just regular checks with isSome or isNone but adding some simple templates makes them composable which adds to their flexability. Namely we want to override the boolean operations for our Options. So instead of taking two bools and returning a bool, they will now take two options and return an option. Or returns the first option if it has a value, otherwise the second. And and returns the last one if both options are true. This is actually surprisingly similar to how bash handles the continuation control flow && and ||, so the concept should not be all that unfamiliar to Linux users. This is indeed a powerful concept, and how one may create ternary operations in bash. Imagine our previous table lookup examples but with a fallback key:

let greatestLanguage = table.get("greatestLanguage") or table.get("bestLanguage")

This assignes an option to greatestLanguage which is a some option if either the "greatestLanguage" or "bestLanguage" key is found (only evaluating the second get if the first one fails of course. The and or && statement might be much more familiar to bash users, but that's simply because programs in bash returns what in this case would be an option with an error code. So it essentially checks if there is an error, and if there isn't then it runs the next program and checks if that has an error message:

let bashError = bashCmd("mkdir testDirectory") and bashCmd("echo 'Hello world' > testDirectory/newfile")

We can only create a file in a directory if it exists, so this statement first runs the bash command to create a directory (with the imaginary bashCmd that returns an option with the error code), then writes "Hello world" to a new file in that directory. The returned error code would either be from the mkdir command failing to create the directory, or from the echo command failing to write to the file. We could then use a matcher to handle the error and success cases.

Comparators and fleshing out the concept

Let's now also take a look at comparators, like equality, smaller than, etc. These are not strictly necessary for a good options system, but makes them more ergonomic to use. This is also how Toccata handles these operations, so they're included here for completeness. To help make these examples look a bit less confusing I've opted to use the proposed Nim syntax for the operators with a postfixed question mark. These operators behave like their normal counter-parts, but return the first option if they succeed. So a statement like x ==? some(10) where x is an option would not return a boolean as we might expect, but rather a none option if x is not equal to 10, or the x option if it is equal to 10. This means that it works more like a filter than a regular check. If we for example chain these with something like x <=? 200 >=? 100 the returned option can only be in the 100-200 range, any other value would produce a none option. Keep in mind that these would all be implemented the same way as our previous operators like either, and, and or which means that the right hand side does not get evaluated if the left hand side is already a none option. With this we can filter our values directly, without caring if the previous parts actually return something valid or not. While these value based examples might not look like it, consider this piece table.get("height") <=? some(180) >=? some(160). This will filter the result iff the table is able to find it. Otherwise the entire statement would return a none option.

Final pieces and discussion

Since Toccata is a functional language it doesn't have the need for the matcher procedure, and handles the control flow simply with the conditionals and the comparators. But to put it in the words of Toccatas creator, showing how powerful of a concept this is:

With just these, I was pretty much able write the entire Toccata compiler in Toccata itself.

And while this might seem very disruptive and over the top for a single concept in an imperative language it offers a useful way of handling error cases. It's a tale as old as programming itself, how do we handle errors. There are exceptions, error codes, and options as the three main candidates. Nim already has a strong system around exceptions, and error codes work just like you would expect in C, so options are the odd man out here. As we've seen above they allow us to chain and use logic directly on the optionally returned values in a completely safe and side-effect proof manner. They help alleviate check-blindness, and help making it obvious to the compiler that things can't possibly happen instead of the implicit nature of a regular check. But there is one thing missing, and it's one thing that more and more languages are starting to introduce, and that is conditional execution. Now we already have this in all our special forms as they will only evaluate parts of the statement if the option has a value. But in Nim we like to, as in many other imperative languages, chain commands on our return values. In Nim regular dot-chaining is just a way to call a procedure with the parts before the dot as the first argument of the statement. But to avoid having to write wrappers around everything that accepts an option as the first argument we need one more special form. What makes the conditional chaining operator different is that the part before the operator is an option, or something that returns an option, and the part after the operator takes the type of this option. The operator then evaluates the first part, and only if it's given a some option will it run the second part with the unpacked value. The entire chaining then either returns an option with the return type of the second part, or if it doesn't return anything then it won't return anything at all. So image our table now contains more complex object, which have a procedure greet defined for them, we could then use conditional chaining like so:

table.get("bestGreeter").?greet()

In this case the procedure doesn't return anything, it just print out a greeting from the object, and only if the object does exist in the table. It's nothing we couldn't do with our matcher, but it's a common enough pattern that it deserves it's own special form. Since the conditional chaining operator also returns an option if the second part has a return type we can do things like this:

match table.get("greatestLanguage").?listConcepts():
  some concepts:
    echo "The greatest language has a nice total of: ", concepts.len, " concepts"
  none: echo "No-one knows what the greatest language is, but I'm sure it would have many nice concepts"

This code would extract a list of concepts for us, but only if the key is found in the table. Otherwise it would simply return a none option, and we wouldn't try to illegally run our procedure on it causing a runtime error. Of course all of this is composable, so we could do things like this if we really wanted to:

match (toDo ==? some("Talk about concepts")) and ((table.get("greatestLanguage") or table.get("someGoodLanguage")).?listConcepts()):
  some x: echo "So you want to talk about concepts? Some languages have as many as ", x, " concepts, imagine that!"
  none: discard

I hope by now I've managed to make a good case for options. They offer a flexible, composable, and safe way to work with procedures that may or may not return something, and all with just a few simple comparators, logic operators, and special forms. The names of all the special forms are still up for debate and all the information in this article is based on my ongoing pull-request to Nim introducing this to the options module. Implementation details are of course also up for debate. And things like introducing more comparators between an option and a value of the type of that option to avoid the obsessive use of some() would in my opinion be a good idea.



from Hacker News https://ift.tt/2ONI8qE

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.