Open Mind: February 2011

Thursday, February 24, 2011

Document Management

A while back, the small team that started this project for IBM began to document the software in user guides and install guides, capturing what they thought was essential in just a handful of Word documents. These documents still exist, and are neatly split into small chapters.

The software grew. Over time, these individual documents have expanded, and new features have caused the total document count to increase. To cover the basic software features, the end user documentation is now dense subject matter spanning a sizeable document library that contains hundreds of pages of text and graphics. The software is, after all, a full blown video analytics suite, and documenting its features in detail is no small endeavour.

With each software release, the amount of content increases proportionally with new features. In fact, the last time I did a page count across the entire library, altogether it came to about half the number of pages in War and Peace.

Even a well planned process to manage that many pages of software documentation in Word will not scale for the team that writes it or for the customers who consume it.

We have reached a point where we need to change the way we manage our end user documentation and I've been asked to look into Daisy for this purpose.

At the moment, I'm cooking up a plan to consolidate all end user documentation into a Daisy repository and to host a Daisy wiki as a front end for the subject matter experts on our team. We want to cross-reference, index, and be able to manage translations of everything. We also want to support several publishing formats including xhtml and pdf and to generate document aggregates. Daisy seems to fit nicely into these objectives.

Tuesday, February 08, 2011

Recognizing Variables

Nearly every programming language that supports variables, uses them to name a thing in one place and refer to it someplace else. The notion of variables in computing comes from this need to refer to things repeatedly. The Content Type Rule Language (Ctrl) is no different from other programming languages in this respect.

The Ctrl focuses on a problem domain called event processing, in contrast to general purpose programming languages whose scope is broader. The Ctrl's power in its problem domain is increased by variables: a handful of variable types and some basic ways to act upon them play an important role in solving complex event processing problems.

Language constructs in programming languages usually contain implied instructions for an interpreter. Even the smallest of language constructs imply some instructions for an interpreter to carry out. When the Ctrl parser recognizes a variable declaration, for example, it builds little trees to encode these instructions. In essence, the parser's job is to break down the language into condensed trees that encode the instructions to be carried out by an interpreter or a translator.

An integer variable declaration in the Ctrl should look familiar:

int myInt = 10;

A declaration like this one contains three implied instructions:

name something ( myInt )
define a type for the thing named ( int )
assign a value to it ( 10 )

Ideally, a compact tree representation of a variable declaration encodes instructions in a way that tells a language interpreter (or translator) exactly what to do. For variable types supported by the Ctrl, the sub-trees built by the parser have the following structures:

String Variable

var myString = 'str';

The VARDEF root node in this tiny sub-tree instructs the interpreter to define a variable using the two immediate child nodes. The left child bears the variable's name, myString, and the right child instructs the interpreter to assign it a LITERAL string value of 'str'.

Integer Variable

int myInt = 10;

A single root node and two-children is sufficient to tell the interpreter to create an integer and assign a specific value to it.

Time Roll Variable

var myDate = this.time - 20s;

We looked briefly at this type of expression in a previous post, and here we show that the result of evaluating a time roll expression may also be assigned to a variable to hold onto a date/time value calculated from the date/time of an incoming event.

Event Array Variable

var myEvents = lookup.events.type("A");

Since lookups resolve to an array of event, declaring a variable from the result of evaluating a lookup expression is a good fit for the Ctrl's problem space because it gives Ctrl scripts a reusable handle for event lookup results. It allows scripts to process, examine, scan, manipulate, act on, and react to an event lookup result. The tree representation of such a variable declaration includes a root VARDEF instruction, a single left child node that bears the variable's name, and the lookup criteria is encoded succinctly in a child sub-tree to the right.

Event Array Variable (time constrained)

var myEvents = (this.time - 20s <= lookup.events.type("A"));

The tree above combines the two previous trees into a single, larger one. This tree instructs the interpreter to perform two individual but related sub-tasks. First, calculate a date/time value from the TIMEROLL sub-tree, and second, lookup events using the result of the date/time value calculation. The intent of the tree is to encode these individual tasks, or instructions, combine them, and to assign the lookup result, an array of event, to a variable called myEvents.

Tuesday, February 01, 2011

Language-oriented programming

From Wikipedia:

Language oriented programming (LOP) is a style of computer programming in which, rather than solving problems in general-purpose programming languages, the programmer creates one or more domain-specific languages for the problem first, and solves the problem in those languages.

LOP is the programming paradigm I took on last year for the Content Type Rule Language (Ctrl). The Ctrl has several LOP characteristics:

It is formally specified, that is, it has a grammar defined in EBNF.
It is domain-oriented, focused on real-time event stream analytics.
It is high-level: provides abstractions, compiles dynamically, and is interpreted.

While the initial effort in developing a DSL is quite large--specifying a grammar and writing an interpreter for the language took a considerable amount of time and wasn't easy--we now have a language in which to solve a wide range of event analytics problems.

The immediate pay off is a real change in the way I work. Now I think of new requirements almost entirely in terms of the DSL I designed. If the language needs a new construct or feature to solve a problem in a general way, I can extend it as needed. Language enhancements are easy. The Ctrl is so compact that a large amount of work can be accomplished in just a few lines of code. This keeps the Ctrl code base small even as the complexity of tasks accomplished by code written in the language, increases. LOP is powerful stuff.

Open Mind