The key point of GrIDS is to detect large-scale automated attacks on networked systems. The mechanism that we propose is to build activity graphs. The nodes of an activity graph correspond to hosts in a system, while edges in the graph correspond to network activity between those hosts.

GrIDS's core component is a graph-based language for analyzing network connection activity in a LAN-MAN sized system. GrIDS contains a language for specifying how network connection activity is represented in graphical form, incorporating information from host-based IDSs, network sniffers and connection analysis components.

The same language allows users to specify graphical structures which require notification of system administrators. This language is suitable for implementing network access control policies as well as identifying attack signatures.

A first, simple example

Let us take the tracking of a worm as an example of building an activity graph. The worm begins on system A, then it initiates connections to hosts B and C which causes them to become infected. When the aggregation engine receives a report of a connection between host A and B, it starts a new graph with two nodes and one edge between them (represented here as "A->B"). The engine sets an activity time on this graph. If nothing is received within the activity time, the graph will be dropped. Next, the engine receives a report of the B->C connection. Provided that this connection occurs soon enough after the A->B connection, the engine notices that it already has a graph containing B, and so it adds C and the edge B->C to the graph which previously only contained A->B.

Next the worm spreads to D, E, and so on. The aggregation engine incorporates them too. The graph soon looks like:


          A--> B--> D 
          |    | 
          |    `-> E 
          | 
          `-> C--> F
              | 
              `-> G

At this point, the aggregation engine notices the tree like structure of the graph, and it reports a suspected worm.

More refined graph building

In the simple example, we put edges into a graph purely based on time co-incidence. In a more complicated algorithm, we could also use other information to decide if edges were correlated - such as whether they used the same destination ports, whether they went to hosts of similar operating system, etc.

In addition to forming graphs from network connections, we allow the graphs to be "decorated". That is, both nodes and edges in an activity graph may have attributes which supply additional information about the connection/host, or document other significant events which occurred in the same time frame. For example, if an NSM reported that the B->D connection appeared to contain a password file, an annotation to that effect could be added to that link. Similarly, if a tripwire module on C reported a change in a critical system file in the right activity time-frame, that would become an an attribute of the C node in this graph.

In the worm example, we detected the worm based soley on the tree like structure of the graph. In a more complicated algorithm, we could also use the node attributes ("decorations") in detecting a worm. A tree structure may not, in some other algorithm, be suspicious until some node in the tree is decorated by some worm-related attribute.

A difficulty in the simple worm example is that only fast spreading worms will create large graphs (because graphs built at each stage of a slower worm will time-out before the next stage of infection.) Slower worms will create only small trees, each tree (possibly only a single connection) timing-out before another tree is created. Without the large trees which are used to detect worms, a slow worm won't be detected.

One solution is to allow graphs longer latency periods between connections. This, however, creates a serious problem: legitimate user activity clumps together into one huge graph, obscuring possible worm activity.

A better solution is to refine what kinds of network activity should be associated with a particular network abuse. If a worm is known to spread using rlogin and telnet connections, then a graph can be built that only contains rlogin and telnet connections. Since legitimate rlogin and telnet activity is less profuse than the sum of all legitimate activity, the latency time of this new graph may be increased without creating a huge (obscured) graph of legitimate activity.

Of course, for numerous types of network abuse we may search, so we need to be able to build numerous kinds of graphs. The graph engine is capable of maintaining multiple graph spaces, where each graph space contains only graphs of one type. The "type" of graph is specified by a set of rules which specify how graphs in the graph space are built (including rules specifying which kinds of connections to represent).

Graph Spaces

A different graph could be maintained for each kind of network activity we wish to detect. An example would be a graph that contained only telnet and rlogin activity (in order to catch our predescribed worm). However, multiple worms could invade a network, making the development of multiple worm graphs more desireable than a single unconnected graph containing all worms. GrIDS graph spaces contain rules specifying the type of graphs desired. When activity that is interesting to a given graph space occurs, the graph space adds this information to zero or more existing graphs, or creates a new graph from the information. Which occurs is determined by the rules. When new node or link information is received by the graph engine, the information is given to each graph space. The rules governing each graph space determine whether and how the information is incorporated into graphs in the graph space. Each space operates independently, the actions of each space having no effect on the other spaces.