JastAdd
This is our old website.
Broken links can occur.
A new website is under construction.

Concept overview

Index

Just add modules in JastAdd

The name JastAdd is meant to indicate that it is easy to just add various modules written in a language based on java and the ast (the Abstract Syntax Tree). You can easily extend your language implementation, both syntactically and computationally. There are three main ideas that contribute to this modularity and extensibility: object-orientation, static aspects, and declarative computations. In addition, you can make use of imperative computations (ordinary Java code), but those parts of your system can then not be extended in the same way. This will be discussed below.

Abstract syntax is a class hierarchy

The abstract syntax is modelled by an object-oriented class hierarchy in Java, giving the normal benefits of object-orientation. These classes are called AST classes since they model nodes in the abstract syntax tree. For example, if your language has assignments and while loops, you will typically have an abstract superclass Stmt with concrete subclasses Assignment and WhileLoop. General and default behavior for statements can be implemented by adding behavior to the Stmt class, and behavior special for Assignment and WhileLoop can be implemented in the subclasses.

Static aspects modularize crosscutting behavior

Our notion of static aspects is essentially the same as introduction in AspectJ or open classes in MultiJava. The aspects allow you to add features to AST classes without having to syntactically edit those classes. For example, when you implement type checking, you need to add some specific behavior to most of the AST classes. This behavior can be grouped together into an aspect. If you want to extend the system with code generation, simply add a new aspect module for that.

Declarative features allow decoupling of behavior

JastAdd supports declarative features for implementing behavior in the form of attributes and rewrites. Attributes are values attached to the AST nodes. Rewrites allow the AST to be changed. The declarative implementation of this behavior means that you don't need to specify in which order attributes are given values or when rewrites are carried out. The order is instead figured out automatically by the JastAdd evaluation engine, and follows the dependencies between the different attributes and rewrites. This order will depend on which aspect modules are included and also on the actual program that you are compiling or analyzing.

The important advantage of using these declarative features over simply programming using ordinary Java code, is that they allow a high degree of decoupling between the different pieces of behavior. For example, you don't need to specify that name resolution is computed before type checking. If the type checker needs the information computed by the name resolution module, the system will automatically compute the name resolution behavior first. More importantly, it may be the case that you actually need some of the name resolution in order to compute some of the type checking, and vice versa. The declarative features allow you to write the type checker assuming that all the names are already resolved, and the name resolution assuming that all the types are already computed. The system will figure out in which order to compute the individual names and types. (For circular dependencies, see the discussion on circular attributes below.)

Since you do not have to specify the order in which different pieces of information are computed, you can modularize chunks of behavior according to how you want to reuse the behavior, rather than according to in which order the behavior is computed.

An additional advantage of declarative implementation is that it is more concise than the imperative implementation. This is mainly because the order of computation is implicit and because optimizations are generated and need not be hand coded.

Attributes are defined by equations

JastAdd supports attributes in the sense of attribute grammars: attributes are declared in AST classes, and their values are defined by equations. As in attribute grammars, an attribute is either synthesized or inherited depending on if it is used for propagating information upwards or downwards in the AST. In addition, JastAdd has features not found in plain attribute grammars: there is support for reference-valued attributes, for parameterized attributes, for circularly defined attributes, and for nonterminal attributes. All of these are explained briefly below.

There are several possible analogies between attributes and OO concepts. One is that an attribute corresponds to a field in an AST node, and an equation corresponds to a method that computes the value of the field. Similar to OO methods, an equation in a subclass can override an equation in a superclass. The JastAdd system ensures that the "equation methods" are called in such an order so that whenever you try to access an attribute, its value will already be computed. If you traverse an AST and look at all the attributes, they will all have values according to their defining equations. I.e., the equations hold.

In JastAdd, some attributes are indeed represented as fields, but this is done under the hood. The interface to an attribute is a method with the same name. E.g., to access the value of an attribute "a", you write "a()".

An equation is different from an ordinary method in that it must correspond to a function. I.e., it must compute the same value each time it is executed, and it must not result in any visible side effects. The syntax for equations is also slightly different from the syntax for ordinary methods.

Synthesized attributes are almost like virtual methods

Synthesized attributes are used for computing information in an AST node, usually in order to propagate that information upwards in the AST. A typical example is to compute the type of an expression. The type is declared as a synthesized attribute of a general AST class for expressions. Equations then define the value of the attribute for the different kinds (subclasses) of expressions. For example, an addition node can have an equation defining the type as an Integer or Float, depending on the types of its operands.

A synthesized attribute is analogous to an ordinary virtual method (where the method is a side-effect free function): The attribute declaration corresponds to the method declaration, and the equations to the method implementations. General classes can have default equations that are overridden in subclasses, analogous to method overriding.

So why use synthesized attributes instead of ordinary virtual methods? The main reason is that the system can automatically cache the value of the synthesized attribute (in a private field), so that the method body (equation) does not have to be executed each time the attribute is accessed. For many attributes, such caching is essential in order to get reasonable speed. Another reason is that the syntax for equations is more concise than for methods, making your aspects easier to read.

Inherited attributes pass information to children

Inherited attributes are used for passing information downwards in an AST, giving child nodes information about their context. A typical example is that information about visible declarations, usually called the environment is passed down to each statement, which in turn passes down this information to the variables inside it. A variable can use this environment information to find its declaration and type.

Note that the term "inherited" attribute has different meanings in attribute grammars and object-orientation. In the attribute grammar sense, an inherited attribute is an attribute whose value is defined in the parent AST node. I.e., if an AST class C declares an inherited a, then each AST class that has a child of type C must have an equation defining the a attribute of that C child. This is checked by the JastAdd system.

The use of inherited attributes decouples an AST node from its parent: the AST node does not need to know which parent it has. All the information it needs is in the inherited attributes whose values are defined by the parent. This allows AST classes with all their behavior to be reused in many different contexts. For example, expressions may occur inside many different kinds of statements and declarations. The behavior of the expression depends only on its inherited attributes, not on any specific surrounding node.

Reference attributes link AST nodes together

In traditional plain attribute grammars, all attributes are functional values without identity. In contrast, JastAdd allows attributes to have reference values. This means that an attribute can be a reference to another AST node. This is a very powerful notion. It allows different places in the AST to be directly connected. For example, a use of a variable can be connected directly to its declaration. Attributes can be accessed directly via such references. For example, the use node can access its type directly from the declaration. Similarly, a class can be connected directly to its superclass; a method call directly to the method declaration; and so on.

Reference attributes are powerful because they allow graph structure to be superimposed on top of a tree (the AST), and information to be propagated along those edges. In contrast, plain AGs only support information propagation along the AST structure, in effect forcing any desired graph structure to be encoded in large complex attributes. It is symptomatic of plain AGs to encode symbol table information into large complex environment attributes, whereas in JastAdd, the environment can simply be a reference attribute to another AST node. All the symbol table information is already in the AST and does not need to be encoded.

Attributes can have parameters - just like methods

Another difference between plain AGs and JastAdd AGs is that JastAdd attributes can have parameters. Previously, we made the analogy between an attribute and a field. But if the attribute has parameters, the analogy of a method is closer. Parameterized attributes are powerful in combination with reference attributes. They allow information in a distant node to be accessed using a parameterized method call rather than just accessing a value. For example, to do type checking of an OO language, a class node can have a parameterized attribute subclassOf(c) which performs a type comparison with another class node c.

As mentioned earlier, attribute values can be cached for efficiency. The same goes for parameterized attributes. If a parameterized attribute is called with a particular set of arguments, the resulting value can be cached and returned directly when the attribute is accessed the next time with the same arguments (memoization).

Rewrite ASTs to improve them

The AST produced by a parser is seldom ideal for all analysis and compilation. Rewrite rules allow conditional changes to the AST to be declared in order to obtain an improved AST, better suited for other computations. For example, a Java parser cannot distinguish between static method calls and virtual method calls. Suppose the parser constructs general MethodCall nodes. These nodes can be replaced by more specific StaticMethodCall or VirtualMethodCall nodes by using a rewrite rule that accesses attributes to find out if the call is a static or virtual method call. By introducing these more specialized nodes, other computations, e.g., code generation, can be simplified and modularized.

The rewrites are carried out automatically and implicitly as soon as the AST is traversed: code traversing the AST will only see the final rewritten AST. This implicit rewriting frees the user of scheduling the order of the rewrites, and makes it possible to combine rewrites implemented in different modules.

Another important use of rewrites is for normalizing the AST, e.g., to replace shorthands used in the language with a normal form. This makes subsequent compilation simpler since the special cases with shorthands do not need to be dealt with. An example from Java is replacing literal strings with character arrays.

Rewriting also allows for making up for deficiencies in the parser or odd corners in the language. Many languages do not have a clean context-free grammar that is easy to parse. An infamous example is the language C. Instead of using ad-hoc techniques in the semantic actions of the parser, one can define a clean (but undesired) context-free grammar for parsing, build the (undesired) AST in a straight-forward manner in the parser's semantic actions, and then use rewrites to change the AST into the desired form.

It should be noted that rewrites can use and depend on attribute information, and that performing one rewrite may lead to that another rewrite becomes applicable. This allows complex rewrites to be broken down into a collection of small simple rewrites. Again, the order in which these smaller rewrites are applied does not need to be specified explicitly.

Circular attributes allow declarative iteration

Many static analyses require iterative computations: starting out with a set of initial values, and iterating until a solution is found. Examples include generation of SSA form, data-flow analysis, reachability analysis, etc. The normal way of implementing such computations is to do it imperatively, using an iterative work-list algorithm. By using circular attributes, it is possible to express these computations in a declarative manner instead of imperatively.

A circular attribute is given an initial value and an equation defining the attribute, possibly in terms of itself (usually indirectly via other circular attributes). The JastAdd system will automatically perform the iteration as needed until a solution is found, i.e., the equations for the circular attributes hold. Such iteration will terminate if the values that the attributes can take on can be organized into a lattice of finite height, and all equations are monotonic, i.e., they can only raise the value to a higher point in the lattice. Typical examples are booleans (that go from false to true) and sets (that go from the empty set to a finite set, e.g., the set of all declared variables in a program).

By using circular attributes to express iteration declaratively, there is no need for coding when the iteration should take place. This allows modules containing these computations to be freely combined with other modules that use the circular attributes. JastAdd automatically makes sure that a circular attribute is computed if its value is needed in some other equation or rewrite.

Expand the AST with nonterminal attributes

It is sometimes useful to expand an AST with additional nodes that are defined by equations, rather than constructed by the parser or by a rewrite rule. These nodes are called nonterminal attributes (NTAs) since they are both similar to nodes (nonterminals) and to attributes. An NTA is like a node in that it can itself have attributes and it can be rewritten. It is also like an attribute in that it is defined by an equation.

There are similarities between NTAs and rewrites. In both cases, you can declaratively define changes to the AST. The main difference is that you should use rewrites when you are interested in replacing some nodes with others, and NTA's when you want to keep existing nodes and introduce some additional ones. A typical use of NTA's is for adding AST subtrees corresponding to predefined types and methods. This allows predefined and user-defined entities to be treated in the same way. Another use of NTAs is in doing instantiation-like computations in the compiler. For example, for macro-expansion or computations on generic types.

Combine imperative and declarative aspects

JastAdd allows you to use both imperative and declarative aspect modules. The declarative modules result in an attributed AST where the attributes and the traversal methods form an API that can be freely used by imperative modules. For example, you can express name and type analysis declaratively, and implement the printing of error messages imperatively. It is in principle possible to also let the declarative aspects use things computed imperatively. But then you need to understand how the declarative evaluation works inside JastAdd.

Typically, you will write your imperative code as aspect modules, adding ordinary Java methods to the AST classes. You could also use the Visitor design pattern if you like, but aspects are usually simpler to write and extend.

Use any parser generator

JastAdd should work with any Java-based parser generator that supports user-defined semantic actions. In our projects we have used JavaCC, CUP, and Beaver. You need to write semantic actions in the parser specification in order to build JastAdd ASTs. If you use the JavaCC system you can use its add-on tool JJTree for building the JastAdd ASTs. Look at the provided JastAdd examples for implementation examples.

Under the hood

You don't have to understand the underlying evaluation machinery in order to be able to use JastAdd. However, you might be curious as to how it works. The main idea is that all evaluation is done on demand. I.e., an attribute is not evaluated until its value is asked for. And a rewrite of an AST node is not carried out until there is some code that accesses that node through the AST traversal API. All accesses to attributes and nodes go via Java method calls, and the implementation machinery performs the evaluation transparently. The algorithms are in our published papers, and you can also look at the generated Java code if you are interested in specific details of the evaluation. The generated code of a JastAdd project is usually in a subdirectory called AST.

The demand evaluation scheme allows traversing code to regard the AST as a fully evaluated tree: any attribute will have a value according to its definition, and there will be no remaining rewrite rules that apply to the nodes. Initially, the AST is in fact completely unevaluated, coming raw from the parser. But any access to a node or an attribute will cause a sufficient amount of evaluation to take place so that the final value of that node or attribute is returned. An important consequence of the demand evaluation is that it does not cost anything in execution time to define an additional attribute. There is only a cost if the attribute is actually accessed.

As a concrete example of demand evaluation, lets say you have implemented a type checker for a language. All the code for type checking is implemented in declarative aspects where each expression has a type() attribute and each identifier has a reference attribute decl() which refers to the appropriate declaration. The decl() attributes are defined in terms of lookup(String) attributes in the different blocks that return matching local declarations. In addition to the declarative aspects, there will be an imperative aspect that simply traverses the AST, looks at the values of the type() attributes and prints possible error messages. The main program calls the parser to create the AST and then calls the type-checking traversal code. It is this traversal code that ultimately drives the order of evaluation. Let's say the traversal has arrived at an expression "a" and to type-check it, it accesses its type() attribute. This attribute is defined in terms of the decl() attribute, so this triggers an evaluation of the decl() attribute for "a". The decl() attribute is defined in terms of an inherited lookup(String) attribute which is then evaluated, possibly accessing and evaluating other lookup(String) attributes in other parts of the AST in order to search through the different blocks for matching declarations. Later on during the type-checking traversal, perhaps another expression "a" is encountered. A similar sequence of evaluation takes place, but this time the access to lookup attributes can be short-circuited returning the same (cached) value that was accessed for the previous use of "a".

The rewrites are also carried out transparently. Let's say there are rewrite rules that rewrite the expression "a" to a VariableAccess node if the declaration of "a" is a variable. In this case, accessing the "a" node will trigger rewriting of it so that the resulting VariableAccess node will be returned. In this process, the decl() attribute of "a" will be evaluated and cached, since it is needed to determine if the declaration of "a" is indeed a variable. When the traversal code then accesses the type() attribute, the computation can be short-circuited already at the decl() attribute which is already computed.

Future work includes optimizations of global rewrites

There is ongoing research and development for improving JastAdd. In particular, we are looking at how high-level phases can be used in order to improve performance. Currently, attributes are not cached if they are evaluated during a rewrite. This is not a problem for many applications since most rewrites are very local and affect only a small subtree of the AST at a time. After the rewrite has finished, the attributes can be cached for subsequent speedups. However, if the application makes use of many global rewrites (affecting large subtrees of the AST), efficiency may be a problem. We expect high-level phases to solve this problem, allowing attributes to be cached earlier.