Behavioral Reflection

Behavioral Reflection

The ultimate goal of behavioral reflection (aka procedural reflection and no doubt other things) is to make a language where programs within the language are able to completely redefine the language as it executes. This is arguably the pinnacle of expressive power. This also means, though, local reasoning about code is utterly impossible, essentially by design.

Smalltalk is probably the language closest to this that is widely used. The Common Lisp MOP (Meta-Object Protocol) is also inspired by research in this vein. The ability to mutate classes and handle calls to missing methods as present in, e.g. Ruby, are also examples of very limited forms of behavioral reflection. (Though, for the latter, often the term “structural reflection” is used instead, reserving “behavioral reflection” for things beyond mere structural reflection.)

Very Brief History

The seminal reference for behavioral reflection is Brian Smith’s 1982 thesis on 3-LISP. This was followed by the languages Brown, Blond, Refci, and Black in chronological order. This is not a comprehensive list, and the last time I was in to this was about a decade ago. (Sure enough, there is some new work on Black and some very new citations of Black.)

Core Idea

The core idea is simply to expose the state of the (potentially conceptual) interpreter and allow it to be manipulated.

From this perspective Scheme’s call/cc is a basic, limited example of behavioral reflection. It exposes one part of the state of the interpreter and allows it to be replaced. Delimited continuations (i.e. shift and reset) are a bit more powerful. They expose the same part of the state as call/cc, but they expose it in a more structured manner that allows more manipulations beyond merely replacing it. We could imagine representing the continuation with a less opaque object than a function which would lead to Smalltalk’s MethodContext.

Early Lisps’ fexpr was another example of exposing a different part of the interpreter state, namely the expression under evaluation. The Kernel language explores this in more depth (among other things which can likely also be classified as forms of behavior reflection.)

For a language with mutation the heap would also be exposed. For a language without mutation, mutation can be added using the techniques from Representing Monads since something at least as powerful as delimited continuations will almost certainly be available. At that point, the heap would be first-class.

As an aside, I like to use the word “introspection” for purely read-only access to interpreter state, and reserve the word “reflection” for when manipulations are possible. Terminologically, “reflection” is also often broken down into “structural reflection” for the widely available ability to add and remove methods to classes and similar such operations; and “behavioral” or “procedural” reflection, the ability to manipulate the language itself. To control introspection and, at least, structural reflection, mirrors can be used.

Example

Making a simple behaviorally reflective language is actually pretty straightforward. If you have an abstract machine for your language, all you need to do is provide one additional primitive operation which gets passed the interpreter state and returns a new interpreter state. It may be clearer and cleaner, perhaps, to split this into two operations: one, often called reify, that provides the interpreter state as a value, and another, often called reflect, that sets the interpreter state to the state represented by the given value. Note that both reify and reflect are (very) impure operations. The more tedious part is marshalling the state of the interpreter into a language level object and vice versa. In the special case of a meta-circular interpreter, this part turns out to be trivial. The following interpreter is NOT meta-circular though.

The approach taken in this example, while simple, is very unstructured. For example, it is possible to write a procedure that when evaluated transforms the language from call-by-value (the CEK machine is an implementation of the CbV lambda calculus), to call-by name. However, to do this requires walking all the code in the environment and continuation and rewriting applications to force their first argument and delay their second argument. Primitives also need to be dealt with. It would be far nicer and cleaner to simply be able to say, “when you do an application, do this instead.” The newer behaviorally reflective languages work more like this.

Note, despite the fact that we do a global transformation, this is not an example of lack of expressiveness. We can define this transformation locally and execute it at run-time without coordination with the rest of the code. In this sense, everything is macro expressible (a la Felleisen) because arbitrary global transformations are macro expressible.

The Code

You can get a copy of the full version of the code from here.

I arbitrarily decided to start from the CEK machine: an abstract machine for the untyped call-by-value lambda calculus. (CEK stands for Control-Environment-Kontinuation, these being the three pieces of interpreter state with control being the expression driving evaluation.) While a call-by-value language is probably the way to go because both reify and reflect are very impure operations (particularly reflect), the main motivation for choosing this machine was that the state components correspond in a direct way to concepts that are natural to the programmer. Compare this to the SECD machine which stands for Stack-Environment-Control-Dump.

The AST uses deBruijn indices.

data Expr 
    = EVar !Int 
    | ELam Expr 
    | Expr :@: Expr

The only types of values we have are closures. The two additional pieces of interpreter state are the environment and the continuation (the call stack).

data Value = Closure Expr Env

type Env = [Value]

data Kont = Done | Arg Expr Env Kont | Fun Value Kont

Finally the evaluator is fairly straightforward, and is a straightforward transcription of the operational semantics. Evaluation starts off with the final contination and an empty environment. With a bit of inspection you can see that evaluation is call-by-value and proceeds left-to-right. Derive Show for Expr and Value and these 15 lines of code are a complete interpreter.

evaluate :: Expr -> Value
evaluate e = cek e [] Done

cek :: Expr -> Env -> Kont -> Value
cek (EVar i)  env                         k = cek e env' k where Closure e env' = env !! i
cek (f :@: x) env                         k = cek f env (Arg x env k)
cek (ELam b)  env                      Done = Closure b env
cek (ELam b)  env            (Arg x env' k) = cek x env' (Fun (Closure b env) k)
cek (ELam b)  env (Fun (Closure b' env') k) = cek b' (Closure b env:env') k

The first stab at adding reflective features looks like this. We add the primitive operation to reify and reflect. We’ll require them to be wrapped in lambdas so the interpreter doesn’t have to deal with unevaluated arguments when interpreting them. Note that reify doesn’t pass the Expr part to its argument. This is because the Expr part would just be EReify. The arguments of this particular application are stored, unevaluated, in the continuation as arguments needing to be evaluated. So, if we want to define quote which simply returns the expression representing it’s argument, we’ll have to dig into the continuation to get that argument.

data Expr
    = ...
    | EReify
    | EReflect

-- reify f = f e k
reify = ELam EReify

-- reflect c e k
reflect = ELam (ELam (ELam EReflect))

And here’s what we’d like to write in the interpreter (and is very close to what we ultimately will write.)

cek :: Expr -> Env -> Kont -> Value
...
cek EReify   (Closure b env':env) k = cek b (k:env:env') k
cek EReflect (k:e:c:_)            _ = cek c            e k

There are two problems with this code: one minor and one major. The minor problem is that the argument to reify takes two arguments but we can’t just pass them directly to it. We need to follow our calling convention which expects to evaluate arguments one at a time. This problem is easily fixed by pushing an Arg call stack frame onto the continuation to (trivially) evaluate the continuation.

The major problem is that this doesn’t type check. c, e, env, and k can’t simultaneously be Values and Exprs, Envs, and Konts. We need a way to embed and project Expr, Env, and Kont value into and out of Value. The embedding is easy; you just fold over the data structure and build up a lambda term representing the Church encoding. The projection from Values is… non-obvious, to say the least.

Instead of figuring that out, we can simply add Expr, Env, and Kont to our language as primitive types. This is also, almost certainly, dramatically more efficient.

We extend our AST and Value.

data Expr
    = ...
    | EInject Value

-- Int so we can manipulate the EVar case.
data Value = Closure Expr Env | Int !Int | Expr Expr | Env Env | Kont Kont

The changes to the interpreter are minimal. We need to change the EVar case to handle the new kinds of values that can be returned and add a trivial EInject case. Some cases can be omitted because they would only come up in programs that would “get stuck” anyway. (In our case, “getting stuck” means pattern match failure or index out of bounds.)

inject :: Value -> (Expr, Env)
inject (Closure b env) = (ELam b, env)
inject v = (EInject v, [])
 
cek :: Expr -> Env -> Kont -> Value
cek (EVar i) env k = cek e env' k where (e, env') = inject (env !! i)
...
cek EReify (Closure b env':env) k = cek b (Env env:env') (Arg (EInject (Kont k)) [] k)
cek EReflect (Kont k:Env e:Expr c:_) _ = cek c e k
cek (EInject v) _ Done = v
cek (EInject v) _ (Fun (Closure b' env') k) = cek b' (v:env') k

While this interpreter achieves the goal, it is somewhat limited. We don’t have any means to manipulate the values of these new primitive types, so our manipulation of the interpreter state is limited to replacing a component, e.g. the environment, with some version of it that we got before via reify. Though it may be limited, it is not trivial. You can implement something close to call/cc if not call/cc itself.

Still, the scenario above of turning the language into a call-by-name language doesn’t seem possible. Modifying the interpreter to support primitive operations defined in Haskell is a simple matter of programming: you add a constructor for primitive operations to the AST, you make a very slight variant of the EInject case in cek, and then you tediously make primitives corresponding to each constructor for each type and a fold for each type. See the linked source file for the details.

The file additionally defines a pretty printer and a layer using parametric higher-order abstract syntax because humans are terrible with deBruijn indices. The end result is code that looks like this:

one = _Suc :@ _Zero

identity = lam (\x -> x)

loop = lam (\x -> x :@ x) :@ lam (\x -> x :@ x)

tailEnv = lam (\e -> paraEnv :@ e :@ _Nil 
                                  :@ lam (\_ -> lam (\x -> lam (\_ -> x))))

tailKont = lam (\k -> paraKont :@ k :@ _Done
                                    :@ lam (\_ -> lam (\_ -> lam (\x -> lam (\_ -> x))))
                                    :@ lam (\_ -> lam (\x -> lam (\_ -> x))))

eval = lam (\c -> reify :@ lam (\e -> lam (\k -> reflect :@ c :@ (tailEnv :@ e) :@ k)))

quote = reify :@ lam (\e -> lam (\k -> reflect :@ (_Inject :@ c k) :@ e :@ (tailKont :@ k)))
    where c k = paraKont :@ k :@ garbage 
                              :@ lam (\x -> (lam (\_ -> lam (\_ -> lam (\_ -> x))))) 
                              :@ lam (\_ -> lam (\_ -> lam (\_ -> garbage)))
          garbage = _Inject :@ _Zero

callCC = lam (\f -> reify :@ lam (\e -> lam (\k -> 
            f :@ lam (\a -> reflect :@ (_Inject :@ a) :@ (tailEnv :@ e) :@ k))))

example1 = evaluate $ quote :@ loop -- output looks like: Expr ((\a -> a a) (\a -> a a))
example2 = evaluate $ eval :@ (quote :@ loop) -- loops forever
example3 = evaluate $ callCC :@ lam (\k -> k :@ one :@ loop) -- escape before evaluating the loop
example4 = evaluate $ callCC :@ lam (\k -> lam (\_ -> loop) :@ (k :@ one)) -- also escape before the loop