Tuesday, January 18, 2011

Sequencing Algorithm, Least Recent

One of the problems mentioned in a previous post, remains unresolved. In my last post, I describe a better approach to find the most recent events in an event stream while matching a pattern. This post describes an approach to find the earliest instances of each of the elements of a pattern in a stream of events, or the least recent events.

The algorithm to find the most recent events performs essentially the same steps as the algorithm that finds the least recent ones. The key difference between the two is that the latter reads the event stream in ascending order by timestamp, and processes the elements of the pattern in order.

Like this:
String[] sequence = {"A", "B", "C"};
List<Event> result = new ArrayList<Event>();
for (String type: sequence) {
   for (Event e: events) {
      if (e.getType().equals(type)) {
         if (result.isEmpty()) {
            result.add(e);
            break;
         } else {
            Event x = result.get(result.size() - 1);
            if (e.compareTo(x) > 0) {
               result.add(e);
               break;
            }
         }
      }
   }
}
The first step processes the first element in the pattern, A, and loops over the event list until the first instance of A is matched. This event is always the first one to make it into the result list. No need to continue scanning at this point. Since the algorithm scans the event list in ascending order, all other instances of A are more recent, and should be skipped.

The next step is to find the least recent B that occurs after the A in the result list. If the least recent B has a timestamp greater than the timestamp of the A that is in the result list, then it is "in sequence"  chronologically, and may safely be added to the result:
Event x = result.get(result.size() - 1);
if (e.compareTo(x) > 0) {
   result.add(e);
   break;
}
The next element in the pattern is processed in the same manner: the least recent C that occurs after the B that is in the result list will be added. These steps are repeated until each element of the pattern is processed in order.

If the first 20 seconds of the minute long event stream we've been using as a test case looks like this:

B 10:00:00 <--
A 10:00:02
B 10:00:05
B 10:00:15
C 10:00:20
...

This version of the algorithm generates the expected output:

A 100002
B 100005
C 100020

No comments:

Post a Comment