Friday, 18 April 2014

Sharing boilerplate across multiple parsers in JavaCC (6.1.0 or above)

JavaCC 6.1.0 adds a new flag that attempts to makes it easier to share boilerplate classes between multiple parsers in the same codebase.

Imagine the scenario, you have a project, and it has two or more parsers, which you are generating using JavaCC. You wish to deploy the application via GWT, so it is important for each parser to be as small as possible.  Let's use an example.

A super trivial language grammar (as per the javacc/gwt blog), the reference parser :



Generated Classes for the reference parser (10 generated .java classes) :




Size of jar file (including .class and .java resources in the jar file):



38 Kilobytes inclusive of source is actually a pretty good size for a generated parser and all the associated overhead classes (although I'm sure level 9 could do better). But how much of that applies the the grammar definition and how much is boilerplate?

Boilerplate classes for the reference parser (marked in grey):




If we disregard our .jj grammar definition file, that leaves only 3 classes that are unique to our particular parser, with the remaining classes, boilerplate.

Now lets define a second super trivial grammar (almost identical to the first), and levering the new GENERATE_BOILERPLATE=false option:

A second grammar:



The two active configuration items are as follows:

1)    
   // Flags not to create boilerplate classes
   options {
       GENERATE_BOILERPLATE = false;
   }

2)
   // References boilerplate classes in another package.
   import org.consoli.javaccgwt.example1.parse.*;

Now, using the JavaCC plugin for Eclipse, or manually executing javacc at the command line, use JavaCC (6.1.0 or above) to generate code for the second grammar definition.

Let us now review the generated classes (there are only 3 generated classes, down from 10 if boilerplate classes were included):



Now lets look at the size of the generated jarfile (including .class and .java resources in the jar file):



That is a roughly 24K jar-file size reduction. Admittedly, it isn't very much, and that includes the source code being included in the jar-file, so lets assume that in the real world it is a 12K per-additional-parser saving. It isn't much - but it is at least something, and less bytes for the same functionality can only be a good thing when it comes to reducing initial-load web app latency.

Key Points
  1. This guide assumes the JAVA_TEMPLATE_TYPE = "modern"; style of code generation.
  2. At least one set of the boilerplate classes are essential to operation of a JavaCC generated parser. It is not possible to eliminate usage of boilerplate classes, such classes are essential and it is only possible to eliminate duplicates.
  3. One parser must generate boilerplate classes (we call this the reference parser).
  4. Other parsers may or may not specify the "GENERATE_BOILERPLATE = false;" option, and in doing so, JavaCC will not generate boilerplate classes in the same package. The only benefit of this is saving bytes and of course there is no harm in duplicating boilerplate if low code size is not a design requirement.
  5. The non reference parsers MUST import the reference parsers package in general, or all the boilerplate classes explicitly in order to be able to function. 
  6. The non reference parsers MUST be able to access the reference parsers boilerplate classes on the classpath at runtime in order to be able to function.

Github repo corresponding to this example, see the Example1 (reference parser) and Example2 projects.