Glue Manual

Introduction

Glue converts one language into another.

Glue goes one step further than REGEXs and I believe that it is as easy to use as REGEXs.

Glue is a tool for building transpilers.

As an example, in another essay,^[1] I use glue to convert SVG diagams into programs.

Technically, glue is a PEG ^[2] to help in writing PEGs.

Basics

The little language, that the I call glue, is used to generate JavaScript code to be used with transpilers constructed with Ohm-JS.

Glue specifications consist of one "statement" for every matching Ohm-JS grammar rule.

Each specification statement gives the expected Ohm-JS grammar plus variables that hold the sub-matches.

Each specification statement defines a mapping from sub-matches to output format.

Each specification statement can include setup code and access to a dynamically scoped set of variables. Glue emits code to create variable frames for each each grammar rule at runtime.

I discuss efficiency of this approach.

[This little language could use a better name. At the time of creation "glue" had meaning (to me). In retrospect, this little language has more to do with generating semantic code for Ohm-JS and might need to be renamed to something more appropriate. OTOH, I would think that, if this little language were to be used frequently, it would be rolled into a single tool workflow and combined with Ohm-JS and that the rolled-together workflow would be given its own, unique, name.]

Working Example

To ease discussion, we will consider an actual use-case ^[3] of glue:

htmlsvg [@ws docH htmlH bodyH @elements bodyE htmlE] = ${elements}

htmlHeader [_ @ws] = ${_}${ws}

htmlEnd [_ @ws] = ${_}${ws}

bodyHeader [_ @ws] = ${_}${ws}

bodyEnd [_ @ws] = ${_}${ws}

docTypeHeader [_1 @stuff _2 @ws] = ${_1}${stuff}${_2}${ws}

element [e] = ${e}

svgElement [_1 _2 @ws @attributes _5 @_6 @elements @text _9 _10 _11 @_12] =

{{ var name = "svg"; scopeAdd ("path", name); scopeAdd ("counter", 0); }}

[[ svgbox(${name},"").

${attributes}

${elements}]]

rectElement [_1 _2 @ws @attributes _5 @_6 @elements @text _9 _10 _11 @_12] =

{{ var name = scopeGet ("path") + "_rect_" + gen (); scopeAdd ("path", name); }}

[[

rect(${name},"").

${attributes}

${elements} ]]

textElement [_1 _2 @ws @attributes _5 @_6 @elements @text _9 _10 _11 @_12] =

{ var name = scopeGet ("path") + "_text_" + gen (); scopeAdd ("path", name); }}

[[

text(${name},"").

${attributes}

${elements}

string(${name}, "${text}"). ]]

basicElement [_1 _2 @ws @attributes _5 @_6 @elements @text _9 _10 _11 @_12] = ${_1}${_2}${ws}${attributes}${_5}${_6}${elements}${text}${_9}${_10}${_11}${_12}

attribute [a] = ${a}

widthAttribute [_ _eq str @_ws] = [[width_str(${scopeGet ("path")},${str}).\n]]

heightAttribute [_ _eq str @_ws] = [[height_str(${scopeGet ("path")},${str}).\n]]

xAttribute [_ _eq str @_ws] = [[x_str(${scopeGet ("path")},${str}).\n]]

yAttribute [_ _eq str @_ws] = [[y_str(${scopeGet ("path")},${str}).\n]]

fillAttribute [_ _eq str @_ws] = [[fill(${scopeGet ("path")},${str}).\n]]

genericAttribute [_ _eq str @_ws] = \n

text [x] = ${x}

name [c @cs] = ${c}${cs}

name1st [c] = ${c}

nameFollow [c] = ${c}

stuff [c] = ${c}

string [_1 @cs _2] = "${cs}"

notQ [c] = ${c}

ws [c] = ${c}

Formatted Strings

Glue produces one formatted string for each grammar rule.

LHS

Each statement corresponds to a rule in the accompanying Ohm-JS grammar.

The first line of our example is

htmlsvg [@ws docH htmlH bodyH @elements bodyE htmlE] = ${elements}

which we will elide in phases:

htmlsvg … = …

The first line corresponds to an Ohm-JS grammar rule named "htmlsvg".

The grammar parses 7 sub-matches, given in our specification inside brackets:

… [@ws docH htmlH bodyH @elements bodyE htmlE] = …

Here, we have defined 7 JavaScript variables (parameters, actually) — ws, docH, htmlH, bodyH, elements, bodyE and htmlE.

All variables prefixed with "@" are multiple-match items generated by the Ohm-JS grammar. In this case, ws and elements are the multiple-match items. Multiple-match items correspond to the use of "?", "*" and "+" operations in the grammar.

The LHS creates, roughly, a JavaScript function, eg.

function htmlsvg (ws, docH, htmlH, bodyH, elements, bodyE, htmlE) {

…

}

The parameters are evaluated in the body of the function, as required by Ohm-JS.

Multiple-match items are evaluated one step further in the body of the function (multiple-matches return an array of CSTs ^[4], and evaluation consists of flattening the arrays into single strings using the .join('') operator of JavaScript).

RHS

The right-hand-side is a block of characters that are combined into JavaScript back-tick strings.

… = ${elements}

This RHS results in JavaScript code

`${elements}`

If we look at the whole 1^st statement:

htmlsvg [@ws docH htmlH bodyH @elements bodyE htmlE] = ${elements}

we see that the specification says to flatten and format the elements parameter and discard the other 6 parameters.

The second line of the example,

htmlHeader [_ @ws] = ${_}${ws}

says that when an htmlHeader grammar rule is matched, it should be mapped to a string containing the first ^[5] ("_") and second ("ws") parameters.

Most of the glue specifications have this form

match, format.

The exceptions are specifications 8, 9 and 10.

Specification 8

svgElement [_1 _2 @ws @attributes _5 @_6 @elements @text _9 _10 _11 @_12] =

{{ var name = "svg"; scopeAdd ("path", name); scopeAdd ("counter", 0); }}

[[ svgbox(${name},"").

${attributes}

${elements}]]

has the form

ruleName [ parameters ] = {{ JavaScript code }} [[ formatting ]]

In this case, the format specification is wrapped by double brackets [[ … ]] (whereas in most other lines, it does not need to be wrapped with double brackets).

There is extra code wrapped in double-braces {{ … }} on the RHS. This code is copied verbatim to the transpiled output. This code is meant to contain local variables, and, scoped variables (see separate section).

Specification 8 says

svgElement [_1 _2 @ws @attributes _5 @_6 @elements @text _9 _10 _11 @_12] =

{{ var name = "svg"; scopeAdd ("path", name); scopeAdd ("counter", 0); }}

[[ svgbox(${name},"").

${attributes}

${elements}]]

to create a JavaScript variable "name" and to add two scoped variables to the scope stack ("path" and "counter").

Then, the usual formatting occurs. Everything inside the double-brackets is wrapped in a JavaScript back-tick string, e.g. `svgbox(${name},"")…`. JavaScript variables and scoped variables can be included in the format back-string.

Scoped Variables

Currently, the glue transpiler inserts code at all rule entry and exit points

_ruleEnter ("rule name");

_ruleExit ("rule name");

These enter/exit functions push and pop stack frames for dynamically scoped variables. The rule names are passed in as strings, if debugging code is added to the enter/exit functions.

As it stands, the mainline function must call

_ruleInit ();

to initialize the dynamic variable scope stack.

Dynamic variables act somewhat like inheritance in graphical systems, e.g. when a variable is pushed onto the stack (scopeAdd()), it shadows all other variables with that same name, when a variable is dereferenced (scopeGet()), the top-most value is returned.

The API for dynamic variables is:

scopeAdd ("name", value)
- pushes value onto the stack under the name name
scopeGet ("name")
- returns the top-most value on the stack with the given name
scopeModify ("name", value)
- outlier for special cases - modifies the top-most variable of the given name
- in the example code, this is used to create a "global" counter used for creating unique formatted variable names
_ruleInit ()
- must be called once before parsing

Foreign Functions

In this example, I've place all "outside" functions into a file called foreign.js.

This is a form of fractal eliding. The code is needed to satisfy the JavaScript compiler, but its implementation is uninteresting at the DI layer.

It is the Architect's responsibility to make the design clear and understandable to readers.

In this example, there is but one foreign function — gen().

I chose to put the gen() function into a separate file and to call it from the glue specification. Other Architects might choose to do this differently.

IMO, the gen() function contains implementation details and, IMO, such details must be pushed out of the specification and elided.

Efficiency

The glue tool is used only by the grammar programmer.

As such, glue is used infrequently in the programmer's workflow.

As such, efficiency of glue is not a major issue. The only "rule" is "is it fast enough for use?".

I suspect, but haven't even bothered to measure, that the implementation of scoped dynamic variables would be considered "inefficient". I stick to the YAGNI principle — the glue tool works "fast enough" on my laptop (there is no perceived time taken by the glue tool — it appears to operate instantly with no delay — that is "good enough" and no time has been spent making it faster).

Error Handling

Glue does not check for any errors and leaves all error handling to the support language — JavaScript and Ohm-JS, in this case.

Ohm-JS does sufficient error checking (more than JavaScript).

One needs to "know" how Ohm-JS works to use the glue tool, in its present form.

Postscript - Architectural Reuse

The goal was to build something "as quickly as" building an editor macro instead of building a full-blown DSL or building a full-blown PL.

This tool would not have been built if its construction was estimated to take longer than a few hours.

SCLs based on PEG are meant to be one-use-only DSLs, like REGEXPs are in other existing languages.

The semantics of glue are quite uncomplicated. There was just enough work put in "to get the job done". I can throw glue away and build something bigger and more complicated if the situation arises.

I would retain the experience from building this tool and use this experience when building the next one. A case of Architectural Reuse instead of plain code reuse.

Using fractal-design principles, I would keep chopping a problem down into sub-components until I could implement it in whatever way I choose (using glue or ignoring glue).

[1] See "SVG to Code (1)".

[2] Don't worry if you don't understand this statement or you don't know what a PEG is.

[3] See my essay "SVG to Code"

[4] CST means Concrete Syntax Tree — basically the AST (Abstract Syntax Tree) instantiated with actual matches.

[5] In JavaScript, underscore "_" is an ordinary character, hence, _ is a variable name that is one character long.