LamE


The LAMbda Evaluater

On GitHub

Defining the syntax

The first thing to do is nail down the syntax of the language. It's important that the concrete syntax is: * Easy for the interpreter to parse * Easy for humans to read and write

Requiring lots of parantheses and keywords gives us the first, at a slight hit to the second. The syntax is more fully described here.

This naturally gives rise to the following data types of expressions and definitions.

Where Match t (x,e1) (s,t,e2) (y,p,e3) comes from match expressions:

match t as
  (Var x) (e1)
  (App s t) (e2)
  (Abs y p) (e3)

and If c l r comes from if expressions:

if c then l else r

and the meaning of the rest is clear. We may as well define the primitive functions too.

Parsing

We use Parsec to parse strings into expressions. This comes with a handy method to tokenize a string. Unfortunately, this doesn't work straight out of the box.

The builtin lexer consumes spaces after a token, but spaces are an important feature of our syntax when it comes to function application. So we must recreate the lexer so that it doesn't consume trailing whitespace.

This is not too hard, and from here it is quite straightforward to build parsers for constants, functions, etc.

We run into another snag when we try to parse applications, however. Consider the production rules for a fragment of our grammar:

<expr>  ::= <variable> | <expr> <expr> | (<expr>)

It's left-recursive! So the naive parser will sometimes loop. Thankfully, we can fix this by redefining the grammar.

<expr>  ::= <expr'> | <expr>' <expr>   
<expr'> ::= <variable> | (<expr>)

The obvious parser now works. We extend this to the full grammar.

Partial evaluation

Now that we can turn strings into expressions we need to decide what to do with them. Constants (variables, strings, integers, etc) can be turned into into lambda-terms straightforwardly (more on this in the section on Church encodings). But what about an expression like + 1 2?

We could translate each subexpression (+, 1, and 2) into a lambda term, and return the application. Or, given that we know what + means, we can notice that this is equal to the expression 3.

So this will be our strategy: translate an expression into an intermediate form by reducing it "as much as we can". Then we translate that into a lambda-term.

We need to do a few things all at once now, with the end goal of defining a function eval :: Expr -> Env -> Value for some suitable types Env and Value.

Consider the expression let val x = 1 in x. This should certainly evaluate to 1. To allow for this we'll need some type of environment, that is a mapping from names to values.

A list of pairs will do. This will have the added benefit of acting like a stack, so that newer definitions override older ones.

We add some helper functions for manipulating and creating environments.

Let's think about the definition of Value now. Constants should pass through eval, so we'll need a value constructor for each of those. Variables should be looked up in the environment. If there's a match we can return that. But what if a variable isn't defined in the environment?

We have a choice. Either throw an error, or just return the variable itself. For simplicity we'll do the latter. This gives us a good start on defining Value and eval.

We can handle If expressions by hoping that the condition evaluates to a boolean and then choosing the appropriate branch (we leave the case where the condition is not a boolean to later):

Match expressions are a little more complicated but it's a similar idea. Again, we hope that the thing to match on evaluates to a term. Then match in Haskell to pick a branch. The semantices of match are that if t is the term Var y then the expression:

match t as
  (Var x) (e1)
  (App s t) (e2)
  (Abs y p) (e3)

evaluates to e1 but with the variable x set to the value of y, and something similar for the other cases.

What about functions? We neatly side-step the issue by wrapping everything up into a closure.

We also put off applications and let expressions till later:

where the functions apply :: Value -> [Value] -> Value and elab :: Defn -> Env -> Env are going to do exactly what you'd think.

We will handle the primitive operations by adding a new constructor to Value ~~~ {.haskell} data Value = ... | Primitive Prim ~~~

Recall that the epxression + 1 2 parses to Apply (VarExp "+") [NumExp 1, NumExp 2]. That is, rather than parsing "+" directly in the primitive Plus we just parse it as a variable. This is similar to how Haskell does it, and it means we can redefine primitives (let val + = - in + 1 1 will evaluate to 0).

We do this by defining a default environment that contains the primitives.

Let's get back to apply and elab. To apply a Closure xs e env to the evaluated arguments vs we should evalute e in the environment where the paramaters xs are bound to the arguments vs. We'll have a helper function to apply primitives. Applying anything else will result in a syntactic Application value.

We use applyPrim to try and simplify things if we can, and just do a formal application if we can't.

Now we define elab, a function which adds a definition to an environment. We have two types of definitions, recursive and value (non-recursive). Value definitions are straightforward. Because the value on the right doesn't reference the name on the left, we can just evaluate it. We won't fully deal with recursive definitions yet, we'll just wrap them up in a new case of Value.

And we're mostly done here, just a few loose strings to be wrapped up later.

Church encodings (and decodings)

Let's take a step back. The goal is to turn programs into lambda terms. Such a transformation is called a Church encoding.

We'll start by encoding some base values, and using those to build up to encodings for entire programs. The first thing to do is to encode the constant values and primitive functions.

Lets define a typeclass for things that have Church encodings.

Church encodings of the boolean values True = \ab.a and False = \ab.b and the boolean operators and, not, and or are well known. As are encodings for the natural numbers and addition, subtraction, multiplication, comparisons, and test for zero. Division and modulo is a little more involved, but not unbearable.

Another standard construction is the ordered pair, with terms pair = \xyz.zxy, fst = \p.p(\ab.a) and snd = \p.p(\ab.b) such that fst (pair x y) = x and snd (pair x y) =yq.

From here we can define integers as pairs of natural numbers, where the pair (n,m) represents the integer n-m. It is not hard to extend arithmetic and comparisons from naturals to integers. The only complication is again division and modulo, in which case we have to normalize an integer (put it in the form (n,0) or (0,n)) and then extract the absolute value and sign.

How could we encode a list? Recall the Haskell definition of a list, List a = Nil | Cons a (List a). When we write a function that takes a list as argument, we usually pattern match on the structure of the list, like so:

So we can think of a function on a list as consisting of two functions. The first is a constant, and it's returned in the case that list is Nil. The second takes two arguments (the head and tail) and is called with the arguments x and xs if the list is Cons x xs.

Hence we may encode a list l as the term [l], where [Nil] = \ab.a and [Cons x y] = \ab.bxy. Then a function with a list argument may be written like:

and we can translate it to f = \l.l(e1)(e2).

So we have the terms cons = \xyab.bxy and nil = ab.a. Then if we can encode values of type a we can also encode values of type [a]:

In fact this generalizes to arbitrary abstract data types.

So we can encode lambda terms too.

This has the consequence that if t is an encoded term, we can represent a match expression:

match t as
  (Var x) (e1)
  (App s t) (e2)
  (Abs y p) (e3)

as the application t(e1)(e2)(e3), which reduces to (e1)x if t encodes the term Var x, and so on. We have something similar for if expressions: if c is an encoded boolean then the expression if c then a else b can be encoded as cab. So we can finish our definiton of eval:

We can encode characters directly as natural numbers, using Haskell's Enum typeclass. This, combined with the previous, gives us an encoding of strings.

We can also go back the other way, from encoded terms to Haskell values. We must be careful about alpha conversion though. In all cases we assume that the term to be decoded is in normal form.

A natural number looks like \fx.f(f(...f(fx)...)). So we can decode a term in that form by just counting how many fs there are. The unfoldr function lets us do this in a nice way.

And from here it is straightforward to decode every other type we have introduced.

Lambda terms with holes

To turn Values into Terms, we'll find it helpful to introduce another intermediate form. It is quite common to write lambda-terms like w = \x.x, W = ww. Now in the definition of W, w is not a variable, but a reference to the previously defined term. We will want to do something similar, represent things with a term-like structure but possibly containing other values as well, so that the value Application (VarVal "+") [VarVal "x", NumVal "3"] will turn into something like +x3, and then we can translate the + and 3 into lambda-terms.

We can hence consider a type of "almost terms", things with a term-like structure but possibly containing other values.

What types of things will we want to put in? For now, the constant values and primitive functions.

It is clear how to make Const an instance of Church. And in fact, we can do something more. If we have an instance of Church a, then we can define an instance of Church (Partial a) in the obvious way!

We are very close to being done, we just need to actually turn Values into Partial Consts. The constant values translate in the obvious way.

Applications are straightforward too. For the closure Closure [x1...xn] e env, which we recall represents a function taking the arguments x1, ... xn to the value of e in the environment env, we partial-ize the body and bind the arguments.

We can think of a recursive definition rec x = e as wanting x to satisfy the equation x = e, where x may be a free variable in e. The standard trick here is to take x as the term Y (\x.e), where Y is Church's fixed point combinator. So we define such a term y and add Y as a Const.

So now we have a full chain, parse :: String -> Expr, eval :: Expr -> Env -> Value, partial :: Value -> Partial Const, and church :: Partial Const -> Term!

A simple compiler could be defined as:

It is not too hard to add some optional flags for allowing reading from input files, writing to output files, etc. But how might we write a REPL?

Reduction

Before we do that however, let's take a small diversion. The compiler above will, given the input ~~~ let rec f = func (x) (x) in f true ~~~ produce the term ~~~ ((\f.((\x.(f (x x))) (\x.(f (x x))))) (\f x.x)) \a b.a ~~~

This is correct, but we'd like it to produce the fully reduced version instead, simply 'a b.a. We have [already seen]({{<relref "bnfterm.markdown">}}) how to find beta-normal forms of terms.

Let's try to implement head reduction. We can head-reduce a term by stripping off the abstractions, and then unpeeling the applications until we get to the head. If it's an abstraction then we can beta-reduce, otherwise we can't do anything.

Monad of environments

Consider a simple GHCi-like REPL. There is a persistent environment that can be modified by entering definitions. Entering an expression will evaluate it in the current environment. You can load definitions from files, and refresh them.

We can model this behaviour by defining a monad of environments. Or rather, a monad transformer.

First though, we should refine our definition of environments. We will want to seperate the environment into several base environments that have been loaded from files, and the current user environment. A reload command should wipe the user-added definitions, and reload the files.

A module will have a name, a bunch of definitions that get turned into an environment, and a way of reloading itself. An Environments object will contain a bunch of modules and a current environment.

If m is a monad then EnvT v m wraps m in an environment with values of type v. This is a specialized version of the StateT transformer. We now add some instances.

Now we define some helper functions. Some things we'll want to with an Environments a are: * Define a default one * Add a new module * Reload all modules * Lookup a name * Add a definition to the current env * Return a list of module names * Replace the current env with a different one So let's get to it!

The default will be empty.

Adding a new module is not hard.

To reload the loaded modules we try to call each of their reload methods.

For simplicity, we do not enforce unique definitions. So to look up a name we just iterate through the loaded modules in order.

We have to unwrap a few layers to add definitions to the current environment, but it's all rather standard.

Finally we have these two functions that return a list of the loaded modules, and replace the current environment with a provided one.

We will also redefine the type synonym Env to:

So that our functions eval and elab remain monad-free, with the same type signatures.

Wrapping up

Now let's put this all together into a simple REPL. The structure is to: * Read instructions from the console * Parse them into commands * Execute them * Repeat

Suppose we had a function rep :: EnvT Action IO () that, in an environment, does the first three steps above. We could turn this into a straight IO () by calling runEnvT rep prim. But we want to keep looping, and importantly, keep passing the environment around. So we'll have to make the loop inside the EnvT, like so:

Now we'll try to define rep appropriately.

We'll start by thinking about what kind of commands we should accept. We'll want to be able to add definitions to the environment, evaluate expresions and optionally reduce/decode them, load and reload modules, and quit. So we can define a type for commands as:

It is not hard to define a parser for these, so let's define a function parseCommand :: String -> Command to parse them.

How do we execute a command?

Adding a definition to the environment is pretty simple now. To evaluate an expression, and then process it afterwards is straightforward too:

Quitting and reloading are simple as well.

To load a module from a file path we will find it helpful to define a function loadFile :: MonadIO , => String -> EnvT Action m (), that, in an environment, adds a new module from the given file path. From there it is straightforward to define the appropriate command:

Now we'd like to display a prompt on each line. We have to make sure to flush the buffer so that the prompt actually appears first, but it's not too difficult.

Remember how we defined a function listM :: Monad m => Envt a m [String] that returned a list of module names from the current environment? We can now use that to define a GHCi-like prompt, that prints the names of the currently loaded modules.

Now we can simply define rep as:

And we are done!

References

The interpreter was inspired by Mike Spivey's Principles of Programming Languages lecture notes.

Here are some resources I found useful:

  • http://matt.might.net/articles/compiling-up-to-lambda-calculus/
  • http://okmij.org/ftp/Computation/lambda-calc.html#neg
  • http://jwodder.freeshell.org/lambda.html
  • https://blog.jez.io/variables-and-binding/
  • https://en.wikibooks.org/wiki/WriteYourselfaSchemein48Hours