Goal of DaSWB - Diagrams as Syntax WorkBench

The DaS Workbench (“Diagrams as Syntax Workbench”) transpiles a raw .drawio file into code.

In this example, we will use the drawing final.drawio and transpile it into PROLOG facts.

Decoding

Drawio (aka diagrams.net) stores diagrams as compressed XML in mxFile format.

The first step of transpiling the .drawio file into code is to uncompress the data.

The uncompression is done automatically, using a bit of JavaScript code. The support code can be found in support.js.

The uncompressed code is deposited into the textarea with id =”decodertranspiled” seen below:

2021-07-30 Decode.png

The textarea is currently only 1 line high, but all of the code can be copied (e.g. using ⌘-a, ⌘-c).

Paste the code into an editor and pretty-print it as HTML (convert “<” into newline-“<”, then format).

<diagram id="kCBzqsQgc0aW30EmMs_m" name="Page-1">

  <mxGraphModel dx="673" dy="353" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="1100" pageHeight="850" math="0" shadow="0">
    <root>
      <mxCell id="0"/>
      <mxCell id="1" parent="0"/>
      <mxCell id="Nl1LcCOVLVZGkuQ6EcLl-2" value="m" style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=top;" parent="1" vertex="1">
	<mxGeometry x="120" y="80" width="440" height="210" as="geometry"/>
      </mxCell>
      <mxCell id="Nl1LcCOVLVZGkuQ6EcLl-3" value="n" style="rounded=1;whiteSpace=wrap;html=1;verticalAlign=top;" parent="1" vertex="1">
	<mxGeometry x="180" y="120" width="320" height="140" as="geometry"/>
      </mxCell>
      <mxCell id="Nl1LcCOVLVZGkuQ6EcLl-4" value="b" style="ellipse;whiteSpace=wrap;html=1;aspect=fixed;fillColor=#fff2cc;align=center;strokeColor=#d6b656;textOpacity=50;" parent="1" vertex="1">
	<mxGeometry x="540" y="175" width="30" height="30" as="geometry"/>
      </mxCell>
      <mxCell id="JY7Yr9pDnzS2nqslWDs0-1" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=1;exitY=0.5;exitDx=0;exitDy=0;entryX=0;entryY=0.5;entryDx=0;entryDy=0;" edge="1" parent="1" source="Nl1LcCOVLVZGkuQ6EcLl-6" target="Nl1LcCOVLVZGkuQ6EcLl-7">
	<mxGeometry relative="1" as="geometry"/>
      </mxCell>
      <mxCell id="Nl1LcCOVLVZGkuQ6EcLl-6" value="a" style="ellipse;whiteSpace=wrap;html=1;aspect=fixed;fillColor=#d5e8d4;align=center;strokeColor=#82b366;textOpacity=50;" parent="1" vertex="1">
	<mxGeometry x="110" y="175" width="30" height="30" as="geometry"/>
      </mxCell>
      <mxCell id="Nl1LcCOVLVZGkuQ6EcLl-7" value="c" style="ellipse;whiteSpace=wrap;html=1;aspect=fixed;fillColor=#d5e8d4;align=center;strokeColor=#82b366;textOpacity=50;" parent="1" vertex="1">
	<mxGeometry x="174" y="175" width="30" height="30" as="geometry"/>
      </mxCell>
      <mxCell id="JY7Yr9pDnzS2nqslWDs0-2" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=1;exitY=0.5;exitDx=0;exitDy=0;" edge="1" parent="1" source="Nl1LcCOVLVZGkuQ6EcLl-9" target="Nl1LcCOVLVZGkuQ6EcLl-4">
	<mxGeometry relative="1" as="geometry"/>
      </mxCell>
      <mxCell id="Nl1LcCOVLVZGkuQ6EcLl-9" value="d" style="ellipse;whiteSpace=wrap;html=1;aspect=fixed;fillColor=#fff2cc;align=center;strokeColor=#d6b656;textOpacity=50;" parent="1" vertex="1">
	<mxGeometry x="474" y="175" width="30" height="30" as="geometry"/>
      </mxCell>
    </root>
  </mxGraphModel>

</diagram>

We see that drawio saves each diagram element as an mxCell element with a unique (albeit unmemorable) id.

Grok and Emit

Transpiling the .drawio file consists of repeatedly applying a 2-step process:

Grok - understand the input.
Emit - rearrange the input into some other format.

Grok is pattern-matching. Formally, this is known as parsing. At the time of writing, PEG parsers seem to be the easiest parsers to build and use. We happen to use a PEG library called Ohm-JS.

The Emit phase just outputs the matched code, possibly rearranged and embellished. We happen to use JS “template” (backtick) strings, which form the basis of the glue tool.

Ohm-JS performs the grok phase using a REGEXP-like syntax¹.

Ohm-JS leaves the emit phase up to the programmer. In this case, we’ve written a simplistic tool, called glue, that generates JS template strings from a specification.

Grok

For the decompressor, the grok code is quite simple:

it pattern-matches an mxfile header,
then matches one or more compressed diagrams,
then matches a trailer.

The actual code for grok is an Ohm-JS grammar.

You can see this code by copying the code from the textarea labelled “grok (decoder)”. For discussion, the code appears below:

2021-07-30 grok decoder.png

AppDiagramsEncodedNet{
  TabbedDiagrams = Header Diagram+ Trailer
  Header = "<" "mxfile" encodedChar+
  Trailer = "</mxfile>"
  Diagram = "<diagram" Attribute* ">" encodedChar+ "</diagram>"
  Attribute = alnum+ "=" attributeValue
  string= "\"" notDQ* "\""
  notDQ = ~"\"" any
  encodedChar = ~"<" any		   
  attributeValue = number | string
  number = digit+
}

Ohm-JS grammars are based on PEG grammars. The syntax is very similar to REGEXP, except that matches are arranged in rules and rules can call one another.

The above says:

The pattern-matcher (grok grammar) is named “AppDiagramEncodedNet”
The block of pattern matching rules is eclosed in brace brackets {}
Example: the rule named “TabbedDiagrams” consists of calls to 3 other rules:
- Header
- Diagram
- Trailer
The Header and Trailer rules must match exactly once.
The rule Diagram is to be matched one or more times (specified by the + operator).
An equals sign = separates the rule name from the rule body.
The other rules are similar
- "..." matches a literal string
- * matches zero or more times
- | specifies alternation, e.g. `(A B) means to match an A or a B
Note that rule names are case-sensitive
Note that rules with names beginning with capital letters work differently than rules with lower-case first-letter names in Ohm-JS
- Capitalized rules skip spaces
- Non-capitalized rules do not skip spaces (the default for most other PEG libraries).
Any is a special operator in Ohm-JS, meaning to match any single character.
~ is an Ohm-JS operator that succeeds only if the immediately subsequent pattern fails, e.g. ~"\\"" any means to match one character that isn’t a double-quote
Backslashes are used to escape certain characters
Further details about Ohm-JS syntax is found in the Ohm-JS documentation.

Emit

Likewise, we can examine the emit code by copying the “emit (decoder):” textarea and viewing it.

2021-07-30 emit decoder.png

TabbedDiagrams [h @d t] = [[${d}]]
Header [k k2 @ec] = [[${k}${k2}${ec}]]
Trailer [k] = [[${k}]]
Diagram [k @a k2 @ec k3] = [[${k}${a}${k2}\n${decodeMxDiagram(ec)}\n${k3}\n]]
Attribute [@an k s] = [[\ ${an}${k}${s}]]
string [q1 @cs q2] = [[${q1}${cs}${q2}]]
notDQ [c] = [[${c}]]
encodedChar [c] = [[${c}]]
attributeValue[x] = [[${x}]]
number [n] = [[${n}]]

The above code is arranged as rules, with exactly the same names as the pattern matching rules.

After pattern matching has finished (and succeeded), the above rules are called to emit code for each match.

The right-hand side of the rules contain

a rule name
a list of parameters to the rule - essentially variables corresponding to each match ; the @ symbol means that the grammar contained one of the operators ?, * or + for a given match².

The left-hand side of the rules contain rewriting commands in the form of JS template strings surrounded by double-brackets [[]].

For example, the first rule TabbedDiagrams expects 3 matches and assigns them to the variables h, d and t. The second parameter matches 1-or-more times (using the + operator in the grok grammar). The first rule returns the value of the d parameter and ignores the h and t parameters. The d parameter is created recursively from the subsequent rules and matches[^@].

[^@]: The “@” parameter operator signifies recursive deconstruction of the parameter d.

The aim of this first rule is to return the diagram portion of the source text and to drop the header and trailers from the source.

The other emit rules work in a similar fashion.

The goal of this grok/emit phase is only to recognize incoming .drawio files and to break them up into headers and diagrams. More interesting manipulations of the incoming .drawio files remains the domain of subsequent phases.

The syntax for these rules is further described in the glue manual.

After decompressing the .drawio file, we’ll perform a few cleanups, then convert the code into PROLOG.

It is possible to convert the code into just about any GPL, not just PROLOG. PROLOG was chosen because it has a fairly clean syntax for representing triples. Triples can be reprented as Lisp sexps, or user-defined data structures, etc.

We will not go into as much syntactic detail in subsequent articles.

Parsing Diagrams - DaS Workbench 1 Decoding Phase

Goal of DaSWB - Diagrams as Syntax WorkBench

Decoding

Grok and Emit

Grok

Emit

Next

See Also