Tuesday, July 25, 2023

The 'eval' pattern for interpreting simple languages

Every so often, we as programmers create simple domain specific languages that need to perform some form of computation (ie, they aren't simply a readable serialization format for a static configuration). In a sufficiently simple DSL you can simply interpret things on the fly as you handle each token (usually a logical line at a time). For example, you might have a simple DSL to specify what machines to operate on which allows a syntax like 'class1 class2 -class3' or 'class1 & class2' (to mean machines that are only in class1 and class2). However, sooner or later immediate token at a time evaluation runs into irritating limitations on how sophisticated a language it can feasibly support. Generally the easiest answer is to parse the DSL into some form of syntax tree and then work from it.

A simple and basically universal approach for getting an answer from such a tree is what I call the 'Eval' pattern. In this pattern, you give each node in the tree an 'Eval()' method (often it takes some sort of context object as its argument), and then to get the answer you call Eval() on the top level node. Terminal nodes return their value in their Eval() (these may be strings and numbers or more complex things); higher level nodes call Eval() on their children (as necessary) and know how to combine the answers into their own answer, adding and subtracting and anding and oring and so on as required.

(Sometimes a terminal node encapsulates a complex operation like 'look up the hostname for a request's IP address and try to match it against this pattern'. The important thing is that it doesn't need to combine any sub-nodes.)

Doing this requires or at least wants some form of polymorphic methods or functions, since you want to be able to call a node's Eval() without having to know anything about the node. However, you don't need inheritance, although inheritance may help enable some code sharing. Python's duck typing definitely helps make your life simpler, because otherwise you need to be careful about designing what all the Eval()s return.

This Eval approach is easiest if you can limit the syntax tree to expressions, mostly or entirely deferring control structures to other code. A lot of DSLs (although not all of them) can be implemented this way, with things like 'try to match these rules in order and use the first one that provides a result'. You put the rules in a big list and then 'evaluate' the list without trying to use your Eval() approach. If you want things like conditional rules, these may also be simpler to hard code outside of the expression syntax tree.

(If you have to deal with looping and so on inside your syntax tree, well, you can use an Eval() approach but it gets increasingly complex for reasons that are outside of the scope of this entry.)

I've done versions of this pattern in both Python (cf) and Go (cf). The Go version uses Go interfaces, while the Python uses objects and duck typing (I believe with a bit of inheritance sprinkled in for my coding convenience).

(This entry has been sitting in my 'to be written someday' pile since I wrote a comment about this pattern elsewhere years ago.)



from Hacker News https://ift.tt/YuPgTjC

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.