Bootstrapping

Bootstrapping is a technique used by language designers (including compiler writers).

The idea is to “test” a language design by having it compile itself.

This is a form of regression testing + agile-like testing. Bootstrapping goes beyond just code testing, but also tests the basic concepts and UX of the language.

Who is the customer? In the case of a language, the customer is usually other programmers. Bootstrapping tests the language design against a target customer (but only one such customer)

What code tests can be performed? The largest code base for a newly-minted language is, usually, the language implementation, itself.

Bootstrapping can occur in two stages:

test generated code against the manual implementation
use the generated code to generate code (again) and diff the generated code against the generated-generated code.

Language Evolution

Bootstrapping can show weak points in the language.

The language design is tested and can evolve during bootstrapping.

Implementation Evolution

Bootstrapping can show implementation gotchas and cause the implementation to evolve.

Example: as I bootstrap the glue language, I find places wherein I optimized. A compiler would have generated more-normalized code. In particular, I used Ohm-JS’s capitalization rules and whitespace elimination. Explicit whitespace would have sufficed.

The goal of producing an identity transform pointed this out. Compiler writers never produce identity transforms — they optimize whitespace away early (during the lexing phase).

This simple change — striving for an identity transform — has made it possible to think “out of the box”.

Seeing it in this light, shows that the addition of capitalized-uncapitalized rule names in Ohm-JS is a bandaid optimization that does not help automation. This change “sucks” users into using it and to not being able to see possibilities for automation (much like the existence of fast registers sucked assembler programmers into manually-writing assembler code instead of automating the process bu using stack variables instead of register variables).