LamE
The LAMbda Evaluater
On GitHub
Defining the syntax
The first thing to do is nail down the syntax of the language. It's important that the concrete syntax is: * Easy for the interpreter to parse * Easy for humans to read and write
Requiring lots of parantheses and keywords gives us the first, at a slight hit to the second. The syntax is more fully described here.
This naturally gives rise to the following data types of expressions and definitions.
data Expr = VarExp Name
| CharExp Char
| ListExp [Expr]
| StringExp String
| BoolExp Bool
| NumExp Int
| Match Name (Name, Expr) (Name, Name, Expr) (Name, Name, Expr)
| If Expr Expr Expr
| Func [Name] Expr
| Apply Expr [Expr]
| Let Defn Expr
deriving (Eq)
data Defn = Val Name Expr
| Rec Name Expr
deriving (Eq)
Where Match t (x,e1) (s,t,e2) (y,p,e3)
comes from match
expressions:
match t as
(Var x) (e1)
(App s t) (e2)
(Abs y p) (e3)
and If c l r
comes from if
expressions:
if c then l else r
and the meaning of the rest is clear. We may as well define the primitive functions too.
Parsing
We use Parsec to parse strings into expressions. This comes with a handy method to tokenize a string. Unfortunately, this doesn't work straight out of the box.
The builtin lexer consumes spaces after a token, but spaces are an important feature of our syntax when it comes to function application. So we must recreate the lexer so that it doesn't consume trailing whitespace.
This is not too hard, and from here it is quite straightforward to build parsers for constants, functions, etc.
We run into another snag when we try to parse applications, however. Consider the production rules for a fragment of our grammar:
<expr> ::= <variable> | <expr> <expr> | (<expr>)
It's left-recursive! So the naive parser will sometimes loop. Thankfully, we can fix this by redefining the grammar.
<expr> ::= <expr'> | <expr>' <expr>
<expr'> ::= <variable> | (<expr>)
The obvious parser now works. We extend this to the full grammar.
Partial evaluation
Now that we can turn strings into expressions we need to decide what to do with them. Constants (variables, strings, integers, etc) can be turned into into lambda-terms straightforwardly (more on this in the section on Church encodings). But what about an expression like + 1 2
?
We could translate each subexpression (+
, 1
, and 2
) into a lambda term, and return the application. Or, given that we know what +
means, we can notice that this is equal to the expression 3
.
So this will be our strategy: translate an expression into an intermediate form by reducing it "as much as we can". Then we translate that into a lambda-term.
We need to do a few things all at once now, with the end goal of defining a function eval :: Expr -> Env -> Value
for some suitable types Env
and Value
.
Consider the expression let val x = 1 in x
. This should certainly evaluate to 1
. To allow for this we'll need some type of environment, that is a mapping from names to values.
A list of pairs will do. This will have the added benefit of acting like a stack, so that newer definitions override older ones.
find :: Environment a -> Name -> Maybe a
find env x = case (filter (\(a,b) -> a==x) env) of
[] -> Nothing
((a,b):xs) -> Just b
define :: Name -> a -> Environment a -> Environment a
define x v = define' (x,v)
define' :: (Name, a) -> Environment a -> Environment a
define' = (:)
new_env :: Environment a
new_env = []
We add some helper functions for manipulating and creating environments.
Let's think about the definition of Value
now. Constants should pass through eval, so we'll need a value constructor for each of those. Variables should be looked up in the environment. If there's a match we can return that. But what if a variable isn't defined in the environment?
We have a choice. Either throw an error, or just return the variable itself. For simplicity we'll do the latter. This gives us a good start on defining Value
and eval
.
data Value = VarVal Name
| CharVal Char
| ListVal [Action]
| StringVal String
| BoolVal Bool
| NumVal Int
| TermVal Term
eval :: Expr -> Env -> Value
eval (VarExp n) env = case find env n of
Just v -> v
Nothing -> VarVal n
eval (BoolExp b) env = BoolVal b
eval (NumExp n) env = NumVal n
eval (CharExp c) env = CharVal c
eval (StringExp s) env = StringVal s
eval (ListExp xs) env = ListVal (map (flip eval $ env) xs)
We can handle If
expressions by hoping that the condition evaluates to a boolean and then choosing the appropriate branch (we leave the case where the condition is not a boolean to later):
eval (If c e1 e2) env = case (eval c env) of
BoolVal b -> if b then (eval e1 env) else (eval e2 env)
Match
expressions are a little more complicated but it's a similar idea. Again, we hope that the thing to match on evaluates to a term. Then match in Haskell to pick a branch. The semantices of match
are that if t
is the term Var y
then the expression:
match t as
(Var x) (e1)
(App s t) (e2)
(Abs y p) (e3)
evaluates to e1
but with the variable x
set to the value of y
, and something similar for the other cases.
act m@(Match x (y,e1) (s,t,e2) (z,u,e3)) env = case act x env of
(TermAct (Var (Name y'))) -> act e1 (define env y (StringAct y') )
(TermAct (App s' t')) -> act e2 (define (define env s (TermAct s')) t (TermAct t') )
(TermAct (Abs (Name z') u')) -> act e3 (define (define env u (TermAct u')) z (StringAct z'))
What about functions? We neatly side-step the issue by wrapping everything up into a closure.
We also put off applications and let expressions till later:
eval (Apply e es) env = apply (eval e env) (map (flip eval $ env) es)
eval (Let d e) env = eval e (elab d env)
where the functions apply :: Value -> [Value] -> Value
and elab :: Defn -> Env -> Env
are going to do exactly what you'd think.
We will handle the primitive operations by adding a new constructor to Value
~~~ {.haskell} data Value = ... | Primitive Prim ~~~
Recall that the epxression + 1 2
parses to Apply (VarExp "+") [NumExp 1, NumExp 2]
. That is, rather than parsing "+" directly in the primitive Plus
we just parse it as a variable. This is similar to how Haskell does it, and it means we can redefine primitives (let val + = - in + 1 1
will evaluate to 0
).
We do this by defining a default environment that contains the primitives.
Let's get back to apply
and elab
. To apply a Closure xs e env
to the evaluated arguments vs
we should evalute e
in the environment where the paramaters xs
are bound to the arguments vs
. We'll have a helper function to apply primitives. Applying anything else will result in a syntactic Application
value.
data Value = ...
| Application Value [Values]
apply :: Value -> [Values] -> Value
apply (Closure xs e env) vs = eval v (defargs xs vs env)
where defargs = foldr (define') (id) . zip
apply (Primitive p) vs = applyPrim p vs
apply v vs = Application v vs
We use applyPrim
to try and simplify things if we can, and just do a formal application if we can't.
applyPrim :: Prim -> [Value] -> Value
applyPrim (Plus) [NumVal n, NumVal m] = NumVal (n + m)
applyPrim (Plus) [NumVal n, NumVal m] = NumVal (n - m)
...
applyPrim p vs = Application (Primitve p) vs
Now we define elab
, a function which adds a definition to an environment. We have two types of definitions, recursive and value (non-recursive). Value definitions are straightforward. Because the value on the right doesn't reference the name on the left, we can just evaluate it. We won't fully deal with recursive definitions yet, we'll just wrap them up in a new case of Value
.
data Value = ...
| DefRec Name Expr Env
elab :: Defn -> Env -> Env
elab (Val x e) env = define x (eval e env) env
elab (Rec x e) env = DefRec x e env
And we're mostly done here, just a few loose strings to be wrapped up later.
Church encodings (and decodings)
Let's take a step back. The goal is to turn programs into lambda terms. Such a transformation is called a Church encoding.
We'll start by encoding some base values, and using those to build up to encodings for entire programs. The first thing to do is to encode the constant values and primitive functions.
Lets define a typeclass for things that have Church encodings.
Church encodings of the boolean values True = \ab.a
and False = \ab.b
and the boolean operators and
, not
, and or
are well known. As are encodings for the natural numbers and addition, subtraction, multiplication, comparisons, and test for zero. Division and modulo is a little more involved, but not unbearable.
Another standard construction is the ordered pair, with terms pair = \xyz.zxy
, fst = \p.p(\ab.a)
and snd = \p.p(\ab.b)
such that fst (pair x y) = x
and snd (pair x y) =yq
.
From here we can define integers as pairs of natural numbers, where the pair (n,m)
represents the integer n-m
. It is not hard to extend arithmetic and comparisons from naturals to integers. The only complication is again division and modulo, in which case we have to normalize an integer (put it in the form (n,0)
or (0,n)
) and then extract the absolute value and sign.
How could we encode a list? Recall the Haskell definition of a list, List a = Nil | Cons a (List a)
. When we write a function that takes a list as argument, we usually pattern match on the structure of the list, like so:
map :: (a-> b) -> List a -> List b
map f l = case l of
Nil -> Nil
(Cons x xs) -> Cons (f x) (map f xs)
So we can think of a function on a list as consisting of two functions. The first is a constant, and it's returned in the case that list is Nil
. The second takes two arguments (the head and tail) and is called with the arguments x
and xs
if the list is Cons x xs
.
Hence we may encode a list l
as the term [l]
, where [Nil] = \ab.a
and [Cons x y] = \ab.bxy
. Then a function with a list argument may be written like:
and we can translate it to f = \l.l(e1)(e2)
.
So we have the terms cons = \xyab.bxy
and nil = ab.a
. Then if we can encode values of type a
we can also encode values of type [a]
:
In fact this generalizes to arbitrary abstract data types.
So we can encode lambda terms too.
instance Church Term where
church (Var (Name n)) = abss "abc" $ App (v"a") (church n)
church (App s t) = abss "abc" $ app2 (v"b") (church s) (church t)
church (Abs (Name n) s) = abss "abc" $ app2 (v "c") (church n) (church s)
This has the consequence that if t
is an encoded term, we can represent a match expression:
match t as
(Var x) (e1)
(App s t) (e2)
(Abs y p) (e3)
as the application t(e1)(e2)(e3)
, which reduces to (e1)x if t
encodes the term Var x
, and so on. We have something similar for if
expressions: if c
is an encoded boolean then the expression if c then a else b
can be encoded as cab
. So we can finish our definiton of eval
:
act m@(Match x (y,e1) (s,t,e2) (z,u,e3)) env = case act x env of
...
v -> Application v [Closure [y] e1 env, Closure [s,t] e2 env, Closure [z,u] e3 env]
act (If c e1 e2) env = case (act c env) of
...
v -> Application v [act e1 env, act e2 env]
We can encode characters directly as natural numbers, using Haskell's Enum
typeclass. This, combined with the previous, gives us an encoding of strings.
We can also go back the other way, from encoded terms to Haskell values. We must be careful about alpha conversion though. In all cases we assume that the term to be decoded is in normal form.
A natural number looks like \fx.f(f(...f(fx)...))
. So we can decode a term in that form by just counting how many f
s there are. The unfoldr
function lets us do this in a nice way.
unApp' :: Name -> Term -> Maybe (Name, Term)
unApp' x (App (Var y) s) = if x==y then Just (x, s) else Nothing
unApp' _ _ = Nothing
unNat :: Term -> Int
unNat (Abs f (Abs x t)) = length . unfoldr (unApp' f) $ t
And from here it is straightforward to decode every other type we have introduced.
Lambda terms with holes
To turn Value
s into Term
s, we'll find it helpful to introduce another intermediate form. It is quite common to write lambda-terms like w = \x.x, W = ww
. Now in the definition of W
, w
is not a variable, but a reference to the previously defined term. We will want to do something similar, represent things with a term-like structure but possibly containing other values as well, so that the value Application (VarVal "+") [VarVal "x", NumVal "3"]
will turn into something like +x3
, and then we can translate the +
and 3
into lambda-terms.
We can hence consider a type of "almost terms", things with a term-like structure but possibly containing other values.
What types of things will we want to put in? For now, the constant values and primitive functions.
data Const = CInt In
| CBool Bool
| CChar Char
| CList [Partial Const]
| CString String
| CPrim Prim
It is clear how to make Const
an instance of Church
. And in fact, we can do something more. If we have an instance of Church a
, then we can define an instance of Church (Partial a)
in the obvious way!
instance Church a => Church (Partial a) where
church p = case p of
(PVar x) -> Var x
(PApp s t) -> App (church s) (church t)
(PAbs x s) -> Abs x (church s)
(Hole x) -> church x
We are very close to being done, we just need to actually turn Value
s into Partial Const
s. The constant values translate in the obvious way.
partial :: Value -> Partial Const
partial (VarVal x) = PVar x
partial (NumVal n) = Hole (CInt n)
partial (BoolVal a) = Hole (CBool a)
partial (CharVal c) = Hole (CChar c)
partial (StringVal s) = Hole (CString s)
partial (ListVal xs) = Hole (CList (map partial xs))
partial (Primitive p) = Hole (CPrim p)
Applications are straightforward too. For the closure Closure [x1...xn] e env
, which we recall represents a function taking the arguments x1, ... xn
to the value of e
in the environment env
, we partial-ize the body and bind the arguments.
partial (Application f es) = foldl (PApp) (partial f) . map partial $ ex
partial (Closure xs e env) = foldr (PAbs) (partial $ eval e env) xs
We can think of a recursive definition rec x = e
as wanting x
to satisfy the equation x = e
, where x
may be a free variable in e
. The standard trick here is to take x as the term Y (\x.e)
, where Y is Church's fixed point combinator. So we define such a term y
and add Y
as a Const
.
So now we have a full chain, parse :: String -> Expr
, eval :: Expr -> Env -> Value
, partial :: Value -> Partial Const
, and church :: Partial Const -> Term
!
A simple compiler could be defined as:
It is not too hard to add some optional flags for allowing reading from input files, writing to output files, etc. But how might we write a REPL?
Reduction
Before we do that however, let's take a small diversion. The compiler above will, given the input ~~~ let rec f = func (x) (x) in f true ~~~ produce the term ~~~ ((\f.((\x.(f (x x))) (\x.(f (x x))))) (\f x.x)) \a b.a ~~~
This is correct, but we'd like it to produce the fully reduced version instead, simply 'a b.a
. We have [already seen]({{<relref "bnfterm.markdown">}}) how to find beta-normal forms of terms.
Let's try to implement head reduction. We can head-reduce a term by stripping off the abstractions, and then unpeeling the applications until we get to the head. If it's an abstraction then we can beta-reduce, otherwise we can't do anything.
unabs :: Term -> [Name] -> Maybe (Term, [Name])
unabs t@(App p q) ns = Just (t, ns)
unabs (Abs n s) ns = unabs s (n:ns)
unabs _ ns = Nothing
unapp :: Term -> [Term] -> Maybe (Name, Term, Term, [Term])
unapp (App (Abs x t) u) ts = Just (x, t, u, ts)
unapp (App p q) ts = unapp p (q:ts)
unapp _ ts = Nothing
hred :: Term -> Maybe Term
hred p = do
(app, is) <- unabs p []
(x,t,u,ts) <- unapp app []
let h = sub t u x
h' = foldl App h ts
h'' = foldr Abs h' is
return h''
hnf :: Term -> Term
hnf t = case hred t of
Just t' -> hnf t'
Nothing -> t
Monad of environments
Consider a simple GHCi-like REPL. There is a persistent environment that can be modified by entering definitions. Entering an expression will evaluate it in the current environment. You can load definitions from files, and refresh them.
We can model this behaviour by defining a monad of environments. Or rather, a monad transformer.
First though, we should refine our definition of environments. We will want to seperate the environment into several base environments that have been loaded from files, and the current user environment. A reload command should wipe the user-added definitions, and reload the files.
data Module a = Module
{ module_name :: String
, module_env :: Environment a
, module_reload :: IO (Maybe (Environment a))
}
data Environments a = Environments
{ current_env :: Environment a
, loaded_modules :: Map.Map String (Module a)
}
deriving Eq
A module will have a name, a bunch of definitions that get turned into an environment, and a way of reloading itself. An Environments
object will contain a bunch of modules and a current environment.
If m
is a monad then EnvT v m
wraps m
in an environment with values of type v
. This is a specialized version of the StateT
transformer. We now add some instances.
instance Functor m => Functor (EnvT v m) where
fmap f xm = EnvT ( \e -> fmap (\(a,e) -> (f a, e)) $ runEnvT xm e )
instance Monad m => Applicative (EnvT v m) where
pure = return
(<*>) = ap
instance Monad m => Monad (EnvT v m) where
return x = EnvT (\e -> return (x,e))
xm >>= f = EnvT (\e -> runEnvT xm e >>= (\(a,e') -> runEnvT (f a) e' ) )
instance MonadTrans (EnvT v) where
lift xm = EnvT (\e -> xm >>= (\a -> return (a,e)))
instance MonadIO m => MonadIO (EnvT v m) where
liftIO = lift . liftIO
Now we define some helper functions. Some things we'll want to with an Environments a
are: * Define a default one * Add a new module * Reload all modules * Lookup a name * Add a definition to the current env * Return a list of module names * Replace the current env with a different one So let's get to it!
The default will be empty.
new_env_ :: Environment a
new_env_ = []
new_env :: Environments a
new_env = Environments
{ current_env = new_env_
, loaded_modules = Map.empty
}
Adding a new module is not hard.
addEnv :: Environments a -> Module a -> Environments a
addEnv envs m= envs{loaded_modules = Map.insert (module_name m) m (loaded_modules envs)}
addM :: Monad m => Module a -> EnvT a m ()
addM m = EnvT (\e -> return ((), addEnv e m))
To reload the loaded modules we try to call each of their reload methods.
reload :: Module a -> IO (Maybe (Module a))
reload m = module_reload m >>= \case
Just env -> return . Just $ m{module_env=env}
Nothing -> return Nothing
reloadM :: MonadIO m => Environments a -> m (Environments a)
reloadM envs = do
ms <- liftIO . sequence $ Map.map reload (loaded_modules envs)
return $ new_env {loaded_modules = Map.foldr f (Map.empty) ms, current_env = new_env_}
where
f x ys = case x of
Nothing -> ys
Just y -> Map.insert (module_name y) y ys
For simplicity, we do not enforce unique definitions. So to look up a name we just iterate through the loaded modules in order.
find1 :: Environment a -> Name -> Maybe a
find1 env x = case (filter (\(a,b) -> a==x) env) of
[] -> Nothing
((a,b):xs) -> Just b
find :: Environments a -> Name -> Maybe a
find envs x = let
y = find1 (current_env envs) x
ys = map (\m -> find1 (module_env m) x) . Map.elems . loaded_modules $ envs in
foldr (<|>) (Nothing) (y:ys)
findM :: Monad m => Name -> EnvT v m (Maybe v)
findM n = EnvT (\e -> return (find e n, e))
define1 :: Environment a -> Name -> a -> Environment a
define1 env x v = (x,v):env
define :: Environments a -> Name -> a -> Environments a
define envs x v = envs{current_env = define1 (current_env envs) x v}
defineM :: Monad m => Name -> v -> EnvT v m ()
defineM x v = EnvT (\e -> return((), define e x v ))
defs1 :: Environment a -> [(Name, a)] -> Environment a
defs1 = flip (++)
defs :: Environments a -> [(Name, a)] -> Environments a
defs envs ds = envs{current_env = defs1 (current_env envs) ds}
defsM :: Monad m => [(Name, v)] -> EnvT v m ()
defsM xs = EnvT (\e -> return ((), defs e xs))
We have to unwrap a few layers to add definitions to the current environment, but it's all rather standard.
listM :: Monad m => EnvT v m [String]
listM = EnvT (\e ->return (Map.keys . loaded_modules $ e ,e) )
inM :: Monad m => Environments v -> EnvT v m ()
inM env = EnvT (\e -> return ((), env))
Finally we have these two functions that return a list of the loaded modules, and replace the current environment with a provided one.
We will also redefine the type synonym Env
to:
So that our functions eval
and elab
remain monad-free, with the same type signatures.
Wrapping up
Now let's put this all together into a simple REPL. The structure is to: * Read instructions from the console * Parse them into commands * Execute them * Repeat
Suppose we had a function rep :: EnvT Action IO ()
that, in an environment, does the first three steps above. We could turn this into a straight IO ()
by calling runEnvT rep prim
. But we want to keep looping, and importantly, keep passing the environment around. So we'll have to make the loop inside the EnvT
, like so:
Now we'll try to define rep
appropriately.
We'll start by thinking about what kind of commands we should accept. We'll want to be able to add definitions to the environment, evaluate expresions and optionally reduce/decode them, load and reload modules, and quit. So we can define a type for commands as:
data Process = Bnf | Hnf | ToInt | ToBool | ToChar | ToString | None
data Command = Define Defn
| Evaluate Expr Process
| Reload
| Load String
| Quit
It is not hard to define a parser for these, so let's define a function parseCommand :: String -> Command
to parse them.
How do we execute a command?
Adding a definition to the environment is pretty simple now. To evaluate an expression, and then process it afterwards is straightforward too:
process :: Process -> Term -> String
process = \case
Bnf -> show . bnf
Hnf -> show . hnf
ToInt -> show . (unchurch :: Term -> Int)
ToBool -> show . (unchurch :: Term -> Bool)
ToChar -> show . (unchurch :: Term -> Char)
ToString -> show . (unchurch :: Term -> String)
None -> show
execute :: Command -> EnvT Action IO ()
execute = \case
...
Evaluate e p -> eval e >>= (lift . putStrLn . (process p) church . partial)
Quitting and reloading are simple as well.
execute :: Command -> EnvT Action IO ()
execute = \case
...
Quit -> lift $ putStrLn "Quiting LamER" >> exitSuccess
Reload -> envM >>= lift . reloadM >>= inM
To load a module from a file path we will find it helpful to define a function loadFile :: MonadIO , => String -> EnvT Action m ()
, that, in an environment, adds a new module from the given file path. From there it is straightforward to define the appropriate command:
Now we'd like to display a prompt on each line. We have to make sure to flush the buffer so that the prompt actually appears first, but it's not too difficult.
Remember how we defined a function listM :: Monad m => Envt a m [String]
that returned a list of module names from the current environment? We can now use that to define a GHCi-like prompt, that prints the names of the currently loaded modules.
Now we can simply define rep
as:
And we are done!
References
The interpreter was inspired by Mike Spivey's Principles of Programming Languages lecture notes.
Here are some resources I found useful:
- http://matt.might.net/articles/compiling-up-to-lambda-calculus/
- http://okmij.org/ftp/Computation/lambda-calc.html#neg
- http://jwodder.freeshell.org/lambda.html
- https://blog.jez.io/variables-and-binding/
- https://en.wikibooks.org/wiki/WriteYourselfaSchemein48Hours