Genoa

Genoa is a framework for generating software analysis tools. To analyze source code, many tools typically parse the input file, build an abstract syntax graph (ASG), and then perform some series of operations on the ASG. The only major difference between these tools is what those operations are. When one wishes to build a new tool using Genoa, a specification is written in the Genoa language to specify what operations should be done on the ASG; building the ASG is done automatically. This language is a functional language with constructs for iterating over the ASG, and allows the writing of fairly compact Genoa specifications. It also turns out that an algorithm is specifiable in Genoa if and only if its running time is polynomial in the size of the ASG. Genoa tools can only query ASGs; they cannot perform transformations on them.

Here is an example of a Genoa program that finds all goto statements and prints out the label to which the goto jumps to.


ROOTPROC find_gotos



PROC find_gotos

ROOT CFile;

{

 [

  /* look for gotos */

  (?Goto

   <gotoname

   (PRINT stdout "%s: goto to label \"%s\": " $location $token)>)

 ]

}

Genoa programs can have many procedures; this program only contains one, called find_gotos. The ROOTPROC declaration just says to invoke the "find_gotos" procedure when the program begins. The ROOT declaration says that the use of a global iteration construct will apply to the whole file being processed. The datatype of ASG nodes in Genoa is GNODE, and GNODE variables may also be delcared and assigned to; none happen to be used in this particular program. The square brackets specify interation of all ASG nodes in pre-order traversal, so the statement inside of the square brackets will be executed at each ASG node. This statement first asks, "is this node a Goto node?" If so, the argument to the statement will be executed, with the current node as an implicit argument. In this case, the stament is within "<>" brackets, meaning that we should make the new current ASG node the one that is reached by following the "gotoname" slot of the current node. If the "gotoname" node is reached, the PRINT statement is then executed. It uses the printf conversion specifiers from the C language, and in this case, prints the contents of two variables, $location and $token. The contents of $location is a string stating the line number of the current implicit node, and the contents of $token is a string containing the token from the original source file corresponding to the implicit current node.

If the functions and GNODE data type are not sufficient for one's purpose, Genoa allows you to write procedures in C, and call them from within a Genoa program. This may be neccessary for more complex source code analysis.

It's desirable to implement Genoa for different languages. Rather than write a new implementation of Genoa for every language, Genoa can be re-targeted to different front-ends via a tool called Genii. A specification in Genii describes the ASG produced by the front-end. For example, Gen++ is an implementation of Genoa that is based on the Cfront compiler for C++, and a Genii specification was written that Gen++ uses to query the ASG produced by Cfront. One of the advantages of re-targeting Genoa to a compiler for the target language is that the command line parameters for the tool produced by Genoa will be the same as those of the compiler. This makes it easy to incorporate a Genoa-produced tool into complex build procedures.

More information on Genoa and Gen++ can be found at seclab.cs.ucdavis.edu/~devanbu/genp.