Showing posts with label language recognition. Show all posts
Showing posts with label language recognition. Show all posts

Tuesday, February 08, 2011

Recognizing Variables


Nearly every programming language that supports variables, uses them to name a thing in one place and refer to it someplace else. The notion of variables in computing comes from this need to refer to things repeatedly. The Content Type Rule Language (Ctrl) is no different from other programming languages in this respect.

The Ctrl focuses on a problem domain called event processing, in contrast to general purpose programming languages whose scope is broader. The Ctrl's power in its problem domain is increased by variables: a handful of variable types and some basic ways to act upon them play an important role in solving complex event processing problems.

Language constructs in programming languages usually contain implied instructions for an interpreter. Even the smallest of language constructs imply some instructions for an interpreter to carry out. When the Ctrl parser recognizes a variable declaration, for example, it builds little trees to encode these instructions. In essence, the parser's job is to break down the language into condensed trees that encode the instructions to be carried out by an interpreter or a translator.

An integer variable declaration in the Ctrl should look familiar:

int myInt = 10;

A declaration like this one contains three implied instructions:
  1. name something ( myInt ) 
  2. define a type for the thing named ( int )
  3. assign a value to it ( 10 )

Ideally, a compact tree representation of a variable declaration encodes instructions in a way that tells a language interpreter (or translator) exactly what to do. For variable types supported by the Ctrl, the sub-trees built by the parser have the following structures:

String Variable

var myString = 'str';



The VARDEF root node in this tiny sub-tree instructs the interpreter to define a variable using the two immediate child nodes. The left child bears the variable's name, myString, and the right child instructs the interpreter to assign it a LITERAL string value of 'str'.

Integer Variable

int myInt = 10;



A single root node and two-children is sufficient to tell the interpreter to create an integer and assign a specific value to it.

Time Roll Variable

var myDate = this.time - 20s;



We looked briefly at this type of expression in a previous post, and here we show that the result of evaluating a time roll expression may also be assigned to a variable to hold onto a date/time value calculated from the date/time of an incoming event.

Event Array Variable

var myEvents = lookup.events.type("A");



Since lookups resolve to an array of event, declaring a variable from the result of evaluating a lookup expression is a good fit for the Ctrl's problem space because it gives Ctrl scripts a reusable handle for event lookup results. It allows scripts to process, examine, scan, manipulate, act on, and react to an event lookup result. The tree representation of such a variable declaration includes a root VARDEF instruction, a single left child node that bears the variable's name, and the lookup criteria is encoded succinctly in a child sub-tree to the right.

Event Array Variable (time constrained)

var myEvents = (this.time - 20s <= lookup.events.type("A"));



The tree above combines the two previous trees into a single, larger one. This tree instructs the interpreter to perform two individual but related sub-tasks. First, calculate a date/time value from the TIMEROLL sub-tree, and second, lookup events using the result of the date/time value calculation. The intent of the tree is to encode these individual tasks, or instructions, combine them, and to assign the lookup result, an array of event, to a variable called myEvents.

Wednesday, January 05, 2011

For Loop Redux

Added several new elements to the ANTLR grammar for the Ctrl today. With these additions, the ANTLR-generated parser recognizes a for loop from the following basic syntax:
for (int i = 0; i < 10; i++) {
  ...
}
We need a for loop in our language to allow blocks of code to be executed repeatedly for some iteration within the context of a rule. This basic loop declares a counter to control the number of iterations and to expose the counter's value to the code inside the loop. As with for loops in other programming languages, the Ctrl loop has a loop-test declaration, i < 10, which may be any type of conditional expression that evaluates to a boolean value to stop the loop when the expression evaluates to false.

A few simple tests against the new grammar rules that recognize this basic loop are good so far.

Out of curiosity, I looked up the Wikipedia entry for for loop. The article includes a flowchart diagram of a for loop's behavior at run time:



During my syntax tests today, and to my delight, I discovered that the parser recognizes another for loop or an entirely different statement where D appears in the diagram above, that is, it recognizes a full statement nested inside the body of the loop. For example, the parser also reads statements like this without any fuss:
for (int i = 0; i < 10; i++) {
  if (color == 'blue') {
    for (int p = 0; p < 20; p++) { 
       ... // and so on
    }
  }
}
This means that if statements inside for loops and for loops inside if statements are syntactically valid constructs in Ctrl which was entirely unintentional side effect of the grammar changes.

Even though nested statements are not yet in the immediate requirements that I'm addressing with these changes, I think they provide the Ctrl with great expressive power so I'm definitely going to leave this feature in the grammar.

The other noteworthy observation about the changes in the grammar is that the abstract syntax trees built by the ANTLR-generated parser are composed from succinct little subtrees that come out of applying the new grammar rules to a statement. The example above is translated by the parser into a tree with a two-dimensional structure shown below. The tree was drawn for me courtesy of ANTLRWorks and shows how the statement breaks down syntactically:



Tomorrow's task will be to add rules to the ANTLR TreeWalker to traverse and to translate a tree like this into an instance of a RuleStatement in the Content Type Rule Language Java model, which is what the Ctrl compiles to.

Monday, January 03, 2011

For Loop

Last year I spent a considerable amount of time thinking about, designing, and building an interpreter for a small programming language called the Content Type Rule Language (Ctrl). The language itself is tiny, consisting of little more than if-then-else statements. The Ctrl turned into a special purpose programming language or Domain Specific Language that includes expressions specific to our rule engine's problem space.

In the next iteration, the Ctrl will be extended to include several new features including variable declaration, variable assignment, and a for-loop. A for-loop is needed in order to execute blocks of code repeatedly, and to support rules that are triggered by a counter. Because a counter in the Ctrl will be declared as a variable, the language needs variables too.

My goal in the next few blog posts is to explain how I intend to add these new features to the language.