Automatic Code Generation Techniques
Creating programs that write,
or help write,
other programs.
Copyright (C) 1995, J Consultants
Call me, I can help. 408-705-2284
Wherever large sections of code have the potential to be
similar in some aspects, it often pays to look at ways of generating this
code via automation techniques.
This point of view is not always appreciated before the fact, as most
American Programmers tend to believe in producing actual code rather than
analyzing specifications and descriptions for similarities, and in working
at coding, rather than thinking about structure. This is particularly
true when deadlines are unlikely to be met by the very conventional
techniques they favor.
Does it make more sense to pursue techniques which can not succeed?
Or to seek out more favorable alternatives? I think a small investment
in the latter is often prudent.
The code generators are by necessity "one-off" in nature, and are
quite often discarded once they have produced the final code. Often, it is
believed that the newly generated code can best be maintained via more
conventional techniques, and larger extensions are not contemplated. Other
times, this due to the belief that this was some special case, a fluke;
and that "if this worked more often,
we'd all be using it."
Nevertheless, during the implementation phase, the ability to
regenerate some large portion of code in a matter of hours or days in
response to each specification change, often becomes a motivating factor
for the organization of the specification into a new level of clarity.
While this is very often worthwhile in itself, it often brings
the specification into a form facilitating the use of tables. These tables
can often be generated from the rule base core of the code generator by
changing a portion of the code generator.
The process usually involves several phases:
- Familiarization with the nature of the desired product.
- Developing one or more quick prototype to
- explore possibilities
- prove the concept
- convince doubters
- Expanding the prototype to encompass more of the project
- Keeping up with specification changes now that changes are so cheap!
Although the prototype may take a day or two, the process tends to
take longer. It is not that the prototype or final
generator takes very long to
make; but when the people creating the specification see that changes can
often be implemented in a matter of a day or two, they begin to feel it
reasonable make more changes to the specification in response to other
needs they have been holing back on. Also, the ability to produce code
often results in it becoming
apparent much earlier in a project, that what was initially desired, may not
adequately reflect the needs of the end users or the needs or capabilities
of other groups involved in the project.
The kinds of languages usually chosen are output oriented. Word
processor mail merge packages or text runoff packages are more useful
where there is more variation in the theme, with more code being
generated. In contrast, while database packages can handle more variables,
they generally require more work for each code fragment. The two can often
be combined to some extent.
Code Generation vs Process Tables:
- Table:
- Passive
- Orthogonal ("One Of", row-column oriented)
- Difficult to nest
- Difficult to handle exceptions
- Often easier to expand
- Generator:
- Active - mixed code segments
- Variations on a theme, more than one, similarities
- Changeable variable names
- Nesting
- Special cases
- Easier to adapt to larger spec changes
- May be able to generate tables when spec "firms up"
- Metastable, easier to break and fix
Each approach has it's merits. But remember, it is often possible to use
the rule portion of a code generator to generate a table if the system
evolves towards a more orthogonal design. The advantage is that the rule
set has already been (more or less) proven.
A Few Sample Cases:
-
Adaptec
- Product:
- C Code, Application Program Interface
- Objectives:
- Rapid code generation
- Flexibility for specification changes
- Scale:
- 140,000 lines of code in four months
- Type:
- Disposable, product specific
- Variations on general themes
- Selection most applicable prototype code fragment
- Generate required prologues
- Mach appropriate variable and routine names
- Generate required epilogues
- Generator:
- Mail merge package (Word Star)
- Expansion Ratio:
- About 35 to 1
- Time to prototype:
- Two days
- Time to completion:
- Four months till specification gelled.
- Final Outcome:
- Generator was repeatedly used to produce up to 140,000 lines of code
per run which, in effect, tested the specification. Each code delivery
served to illuminate factors not considered in the original specification.
Due to the ability to deliver another set of code in a day or two, no
"quick fix" compromises were needed when major problems were discovered in
the specification.
When specification gelled, it was much more orthogonal. The
specifications and the rule set having been proven, the back end of the
code generator was modified to produce two tables and four major routines;
under 4,000 lines total. Having served its purpose, the code generator was
discared.
-
IBM - Script GML
- Product:
- Test cases, code in seven programming languages
- Objective:
- Functionally identical code in seven programming languages
- Identical message tags to allow computer processing
- Insure use of redundant self testing constructs
- Reduction in number statements written
- Reduction in debugging time
- Increase Systematic nature of information attainable
- Type:
- Disposable, product specific
- Duplication of related sets from originals
- Insure pre and post test validation is done each time
- Duplicate test statement with different values
- Duplicate test cases in different languages
- Only the language specific translator file differs
between languages to
insure identical functionality across languages (where applicable)
- Scale:
- Hundreds of test cases, thousands of programs
- Generator:
- Script GML, a text formatter
- Expansion Ratio:
- Per Case:
- Better than 7 to 1.
One test case yields a program in each of seven
languages
- Per Statement:
- Better than 10 to 1.
1 line of case code yields 5 to 20
lines of test and verification code
in each selected language
- Total expansion:
- Better than 70 to 1
- Time to prototype:
- Several proof of concept prototypes in one month
- Project was part of a larger, much longer term effort
- Final Outcome:
- Techniques were used in testing. Methodology was written up and
presented at an IBM-only international SQA conference. The abilityo to
compare results between languages pointed out which bugs were in common
libraries, and which were in language specific code, considerably easing
debugging.
-
Santa Clara
- Product:
- Legal documents, English Text
- Objective:
- Correct selection of specific clauses in Legal documents
- Generator:
- A word processing package
- Type:
- Generator as product
- Selection of specific relevant parts
- Generation of specific required prologues and postlogues
- Expansion ratio:
- Two man weeks condensed to half a man day.
- Program selected appropriate clauses and clause
variations based on user responses.
- Time to prototype:
- Two days
- Time to product:
- Three weeks for system "A"
- Two weeks for product "B"
- Outcome:
- Systems were used for several years before major policy and
departmental changes rendered them obsolete. They were easily
maintainable by staff writers.
-
Mall-Net
- Product:
- Database code: form generation and processing code
- Objective:
- Maximize flexibility for change as needs clarify
- Rapid prototyping
- Reduce manual code generation
- Type:
- Generator is part of the internal product
- Experimental
- Generator:
- Database package
- Expansion Ratio:
- Better than 8 to 1
- Time to prototype:
- Several days
- Time to product:
- One month, but evolution continues
- Outcome:
- Product is in use. Having served as a robust prototype, a "variation"
to UNIX is in progress.
A portion of the HTML code used in the main index structure of Mall-Net,
the HOT LISTS, is handled by
database code produced by a code generator; a double expansion of sorts.
Copyright (C) 1996, J Consultants