Introduction

In this essay, I show how to create a small language rapidly (in about one day).

Normal programming skills are required.  One doesn't need to know compiler theory to build little languages in this way.

I believe that this method can be used to enable a new class of small languages that I call SCLs — Solution Centric Languages.



How - Overview

I use a PEG[1] parser — Ohm-JS.

I create an example program using the small language.

I transpile the example program into JavaScript.  I expect that the transpiler could be tweaked to emit code in just about any current PL (Programming Language).

I defer many details, in the hopes of clarifying the underlying methods.

I defer most checking (type checking, etc.) to the underlying language (in this case JavaScript), although checking could be performed by adding small amounts of effort and code.


Github

The code for this example can be found at https://github.com/guitarvydas/sm/tree/master.


Other References

I created other documents and videos as I built this particular transpiler.


For a summary of the method, see:


https://guitarvydas.github.io/2021/02/26/State-Machines-to-JavaScript.html

https://youtu.be/uk90Sc6XZrw


For a screencast record of the actual development process (including mistakes and backtracking), see

https://youtu.be/zaj-xO3rhRA

https://youtu.be/7u0SxpS1Alc

https://youtu.be/4_R6cuvkHos

https://youtu.be/4Ecft5yrMvE


Syntax

IMO there is no reason to restrict oneself to using only textual representations for creating languages and little languages (SCLs).


Diagrams express networks better than text.

Diagrams express composition better than text.

Text expresses mathematical functions better than diagrams.

Text expresses mathematical operations better than diagrams.

Diagrams express certain kinds of control flow better than text.

Text expresses one-in-one-out functions well.

Diagrams express one-in-many-out functions better than text.

Text does not express M-in-zero-out or zero-in-N-out functionality well.

Future

I expect to show:


Need to Know - Defer, Defer, Defer

I argue that details are the bane of PLs.[3]


Most PLs (Programming Languages) require too many details and elide very little.


I do not advocate removing details, but I do advocate eliding details.  


Which details need to be elided depends on the problem-at-hand and cannot be generalized.


I advocate building software in layers, like the acetates used to make cartoons at Disney.


Each layer of software does only one thing.


This attitude can be seen in the design of this little language — SCL.  This little language deals with transpiling state machines into control flow, e.g. switch statements — and leaves all other details, like type-checking to other software layers.


Most current PLs provide only a single layer — and drown the user / reader in a sea of details.  


If the language you use provides, say, a "+" operator, then it requires too much detail.


I advocate making the DI (Design Intent) clear to the reader.  My thoughts  on DI can be found in my essays at https://guitarvydas.github.io/.


To achieve my recommended level of layering, one should try to defer as much as possible.  Each layer should be built on the need to know principle, i.e. if the information is not needed at a certain layer, then leave that information aside and use the minimum little language that allows you to express the DI at that layer.

Steps for Creating a Little Language

Realization

Mock up the SCL

[I show how to create the SCL in https://guitarvydas.github.io/2021/02/26/State-Machines-to-JavaScript.html]


The mockup emulates Fig. 1.


Image

Fig. 1 Simple 2-state State Machine


The corresponding text version of this diagram is:


name:

  Toggle

inputs:

  _in

outputs:

  _no

  _yes


machine Toggle:

  state _off:

    entry: "fire (_no, true);"

    on _in : next _on

  state _on:

    entry: "fire (_yes, true);"

    on _in : next _off

  default: _off

end machine


Of note is the fact that the SCL does not attempt to express everything.  


StateChart diagrams are good for expressing control-flow, whereas textual code is good for expressing details such as function calls.


Specifically, the entry code represents deferred code as a string of characters.  These characters are not processed by the SCL transpiler and are simply output to the final result.


The SCL creates only ten (10) keywords, namely:



Everything else is left up to the base language.  In this case, I will be using Javascript as the base language.  It should be possible transpile to most PLs (Programming Languages).  


In fact, if one were to design a separate SCL layer for expressing entry, exit and transition code, then the transpiler could be made to be language-independent.


In the above, the underscore "_" character is in no way special — it is simply a valid character for naming identifiers.  I have used "_" as a way of distinguishing significant identifiers, such as state names, inputs and outputs.


Write a Grammar

I used the Ohm Editor to help write and test the grammar.


The Ohm Editor is found at https://ohmlang.github.io/editor/.


The grammar is


SM {


  Main = StateMachine

  StateMachine = NameSection InputSection OutputSection MachineSection

  NameSection = "name" ":" Name

  InputSection = "inputs" ":" InputPinNames

  OutputSection = "outputs" ":" OutputPinNames

  

  MachineSection = Header State+ Default Trailer

  Header = "machine" MachineName ":"

  Trailer = "end" "machine"

  Default = "default" ":" Name


  State = "state" StateName ":" EntrySection Transition*

  EntrySection = "entry" ":" string

  Transition = "on" Name ":" "next" Name




  keyword = "machine" | "name" | "inputs" | "outputs" | "end" | "state" | "entry" | "on" | "next" | "default"

  InputPinNames = nameList

  OutputPinNames = nameList

  MachineName = Name

  StateName = Name

  InputPinReference = Name

  StateReference = Name

  Name = ~keyword id

  nameList = (~keyword id delim)+




  id = firstId followId*

  firstId = "A".."Z" | "a".."z" | "_"

  followId = firstId


  string = "\"" stringChar* "\""

  stringChar =

        escapedChar                                

     |  anyStringChar

  escapedChar = "\\" any

  anyStringChar = ~"\"" any

 

  delim = (" " | "\t" | "\n")+

}




Ohm Editor Usage

The Ohm Editor is found at https://ohmlang.github.io/editor/.


It consists of three (3) windows (divs):



Enter the grammar in the grammar window.


Enter test cases in the examples window.  Each test snippet appears as a single line, at first.  Double-click on a test case to open a pop-up window that shows all of the details of the test case.


The result of parsing is shown in the parse window.  If the parse is successful, the results are shown in gray color.  Unsuccessful partial parses are shown in red.  Hovering over a parse entity highlights the grammar rule and highlights the part of the test text that is constitutes the parsed entity.

Test the Grammar

Enter partial tests in the top-right window by clicking "Add".


Note that the parser defaults to using the first grammar rule as the top-level target.  Parsing will fail if the top rule does not match the test case — in small test cases it might be necessary to insert sub-rule names as the first line of the grammar.


Run the Grammar / Parser

Ohm-JS creates a parser in Javascript.


To use the created parser, one needs to have access to the ohm library (e.g. via npm).


The Ohm-JS documentation explains how to use the parser in a web page or at the command line.


I choose to use the command line, via node.js, but Ohm-JS is certainly not restricted to being used in this manner.


My code for a full parser is:


// npm install ohm-js

const grammar = `

SM {


  Main = StateMachine

  StateMachine = NameSection InputSection OutputSection MachineSection

  NameSection = "name" ":" Name

  InputSection = "inputs" ":" InputPinNames

  OutputSection = "outputs" ":" OutputPinNames

  

  MachineSection = Header State+ Default Trailer

  Header = "machine" MachineName ":"

  Trailer = "end" "machine"

  Default = "default" ":" Name


  State = "state" StateName ":" EntrySection Transition*

  EntrySection = "entry" ":" string

  Transition = "on" Name ":" "next" Name




  keyword = "machine" | "name" | "inputs" | "outputs" | "end" | "state" | "entry" | "on" | "next" | "default"

  InputPinNames = nameList

  OutputPinNames = nameList

  MachineName = Name

  StateName = Name

  InputPinReference = Name

  StateReference = Name

  Name = ~keyword id

  nameList = (~keyword id delim)+




  id = firstId followId*

  firstId = "A".."Z" | "a".."z" | "_"

  followId = firstId


  string = "\\"" stringChar* "\\""

  stringChar =

        escapedChar                                

     |  anyStringChar

  escapedChar = "\\\\" any

  anyStringChar = ~"\\"" any

 

  delim = (" " | "\\t" | "\\n")+

  

}

`;



function parse (text) {

    var ohm = require ('ohm-js');

    var parser = ohm.grammar (grammar);

    var cst = parser.match (text);

    if (cst.succeeded ()) {

return {parser: parser, tree: cst};

    } else {

console.log (parser.trace (text).toString ());

throw "Ohm matching failed";

    }

}



var fs = require ('fs');


function getNamedFile (fname) {

    if (fname === undefined || fname === null || fname === "-") {

return fs.readFileSync (0, 'utf-8');

    } else {

return fs.readFileSync (fname, 'utf-8');

    }

}



var text = getNamedFile("-");

var {parser, tree} = parse (text);



Running the parser consists of invoking it and supplying it with code to be parsed.  I saved my test code in toggle.scl.  I saved the the above parser code in sm.js.  I invoked the parser using node at the command line:


> node sm <toggle.scl

Wirte the Code Emitter

When run, the Ohm parser parses the source code and returns a data structure — a CST (concrete syntax tree — a tree representing the actual source code that was parsed[4]).


One writes a set of rules — in Javascript, if using Ohm-JS — to transpile the CST into final output.  The Ohm-JS documentation calls this "the semantics".


One creates a "semantics" object, names it, then adds transpilation rules to it.


You can add more than one set of rules to a semantics object.  For example, one set of rules might check the types and declarations of all of the incoming source code, whereas another set of rules might be used transpile the (checked) CST into final code.


For purposes of this example, I show only the transpiler rules and leave out checking rules.  I avoid such details to make this example simpler to comprehend.


The set of transpilation rules are included in the code base.  I chose to call the transpilation rules "js()".  The final code for the parser and transpiler is:



Ohm-JS creates a parser in Javascript.


To use the created parser, one needs to have access to the ohm library (e.g. via npm).


The Ohm-JS documentation explains how to use the parser in a web page or at the command line.


I choose to use the command line, via node.js, but Ohm-JS is certainly not restricted to being used in this manner.


My code for a full parser is:


// npm install ohm-js

const grammar = `

SM {


  Main = StateMachine

  StateMachine = NameSection InputSection OutputSection MachineSection

  NameSection = "name" ":" Name

  InputSection = "inputs" ":" InputPinNames

  OutputSection = "outputs" ":" OutputPinNames

  

  MachineSection = Header State+ Default Trailer

  Header = "machine" MachineName ":"

  Trailer = "end" "machine"

  Default = "default" ":" Name


  State = "state" StateName ":" EntrySection Transition*

  EntrySection = "entry" ":" string

  Transition = "on" Name ":" "next" Name




  keyword = "machine" | "name" | "inputs" | "outputs" | "end" | "state" | "entry" | "on" | "next" | "default"

  InputPinNames = nameList

  OutputPinNames = nameList

  MachineName = Name

  StateName = Name

  InputPinReference = Name

  StateReference = Name

  Name = ~keyword id

  nameList = (~keyword id delim)+




  id = firstId followId*

  firstId = "A".."Z" | "a".."z" | "_"

  followId = firstId


  string = "\\"" stringChar* "\\""

  stringChar =

        escapedChar                                

     |  anyStringChar

  escapedChar = "\\\\" any

  anyStringChar = ~"\\"" any

 

  delim = (" " | "\\t" | "\\n")+

  

}

`;



function parse (text) {

    var ohm = require ('ohm-js');

    var parser = ohm.grammar (grammar);

    var cst = parser.match (text);

    if (cst.succeeded ()) {

return {parser: parser, tree: cst};

    } else {

console.log (parser.trace (text).toString ());

throw "Ohm matching failed";

    }

}



var fs = require ('fs');


function getNamedFile (fname) {

    if (fname === undefined || fname === null || fname === "-") {

return fs.readFileSync (0, 'utf-8');

    } else {

return fs.readFileSync (fname, 'utf-8');

    }

}


//////////// transpiler ////////


var nameCounter;


function createTranspiler (parser) {

    var semantics = parser.createSemantics ();

    nameCounter = 0;

    semantics.addOperation (

"js",

{

    Main : function (_1) { return _1.js (); }, // StateMachine

   

    StateMachine : function (_1, _2, _3, _4) { // NameSection InputSection OutputSection MachineSection

var nameSection = _1.js ();

var inputSection = _2.js ();

var outputSection = _3.js ();

var machineSection = _4.js ();

return `

${inputSection}

${outputSection}

${machineSection}

`;

    },

   

    NameSection : function (_1, _2, _3) { return _3.js ().name; }, // "name" ":" Name

    InputSection : function (_1, _2, _3) { return _3.js (); }, // "inputs" ":" InputPinNames

    OutputSection : function (_1, _2, _3) { return _3.js (); }, // "outputs" ":" OutputPinNames

   

    MachineSection : function (_1, _2s, _3, _4) { // Header State+ Default Trailer

var machineName = _1.js ();

var snippets = _2s.js ();

var defaultState = _3.js ();

var preamble = snippets.map (snippet => { return snippet.preamble; }).join ('\n');

var stateCode = snippets.map (snippet => { return snippet.step; }).join ('\n');

var entryCode = snippets.map (snippet => { return snippet.entry; }).join ('\n');

var smCode = `

${preamble}

function ${machineName} () {

  this.state = ${defaultState};

  this.enter = function (next_state) {

    switch (next_state) {

      ${entryCode}

    }

  }

  this.step = function (event) {

    switch (this.state) {

      ${stateCode}

    };

 }

}

`;

return smCode;

    },

   

    Header : function (_1, _2, _3) { return _2.js (); }, // "machine" MachineName ":"

    Trailer : function (_1, _2) {return "";}, // "end" "machine"

   

    State : function (_1, _2, _3, _4, _5s) {  // "state" StateName ":" EntrySection Transition*

var pair = _2.js ();

var name = pair.name;

var preamble = pair.preamble;

var entry = _4.js ();

var transitions = _5s.js ();

var stepcode = `

      case ${name}:

switch (event.tag) {

${transitions}

};

      break;

`;

var entrycode = `

case ${name}:

${entry}

this.state = ${name};

break;`;

return { preamble: preamble, step: stepcode, entry: entrycode, defaultState: "" };

    },

    EntrySection : function (_1, _2, _3) {return _3.js ()}, // "entry" ":" string

    Transition : function (_1, _2, _3, _4, _5) { // "on" Name ":" "next" Name

                var tagName = _2.js ().name;

var nextStateName = _5.js ().name;

var transitionCode = `

      case ${tagName}: 

this.enter (${nextStateName});

break;

`;

return transitionCode;

    },

   

    Default : function (_1 ,_2, _3) { // "default" ":" Name

var name = _3.js ().name;

return name;

    },

   

    keyword : function (_1) {return _1.js ()}, // "machine" | "name" | "inputs" | "outputs" | "end" | "state" | "entry" | "on" | "next" | "default"

    InputPinNames : function (_1) {return _1.js ()}, // nameList

    OutputPinNames : function (_1) {return _1.js ()}, // nameList

    MachineName : function (_1) {return _1.js ().name}, // Name

    StateName : function (_1) {return _1.js ()}, // Name

    InputPinReference : function (_1) {return _1.js ().name}, // Name

    StateReference : function (_1) {return _1.js ().name}, // Name

    Name : function (_1) { // ~keyword id

var name = _1.js ();

nameCounter += 1;

var constant = `const ${name} = ${nameCounter};`;

return { preamble: constant, name: name };

    },

    nameList : function (_1s, _2s) { // (~keyword id delim)+

var consts = _1s.js ().map (name => {

    nameCounter += 1;

    return `const ${name} = ${nameCounter};`;

});

return consts.join ('\n');

    }, 

   

   

   

    id : function (_1, _2s) {  // firstId followId*

var name = `${_1.js ()}${_2s.js ().join ('')}` ;

return name;

    },

    firstId : function (_1) {return _1.js ()}, // "A".."Z" | "a".."z" | "_"

    followId : function (_1) {return _1.js ()}, // firstId

   

    string : function (_1, _2s, _3) { return `${_2s.js ().join ('')}`; }, // "\\"" stringChar* "\\""

    escapedChar : function (_1, _2) { return _2.js (); }, // "\\\\" any

    anyStringChar : function (_1) {return _1.js ();}, // ~"\\"" any

   

    delim : function (_1s) {return _1s.js ().join (''); }, // (" " | "\\t" | "\\n")+

   

    _terminal: function () { return this.primitiveValue; }

});

    return semantics;

}

////////////


var text = getNamedFile("-");

var {parser, tree} = parse (text);

var transpiler = createTranspiler (parser);

console.log (transpiler (tree).js ());


// boilerplate

console.log (`

 function fire (output, value) {

  console.log ("Fire called: " + this.toString () + " output:" + output.toString () + " value:" + value.toString ());

 }

 function send (component, tag, value) {

  component.step ( {tag, value} );

 }

 function inject (component, event) {

  component.step (event);

 }

 var top = new Toggle ();

 inject (top, {tag: _in, value: true});

 inject (top, {tag: _in, value: true});

 inject (top, {tag: _in, value: true});

`);


Note that I added some boilerplate code at the end.  This code is needed to make this simple example run.


Run the Transpiler

The transpiler is built at the command line using:


> node sm <toggle.scl >out.js


To run the transpiler, one needs to invoke the node interpreter:


> node out


 As mentioned earlier, Ohm-JS is not restricted to being run from the command line and can run inside a web page.  


Refer to the Ohm-JS documentation https://github.com/harc/ohm or to my essay https://computingsimplicity.neocities.org/blogs/OhmInSmallSteps.pdf.


Iterate

Once the SCL has been built for the test sample code, embellish the transpiler with more language constructs, recompile and re-run.


[1] PEG means Parsing Expression Grammars.  PEGs are IMO, better than REGEXPs, and are just as accessible to programmers without any need for compiler theory.  PEG libraries are available for many languages.  The original PEG thesis can be found at https://bford.info/pub/lang/peg/

[2] Stack-based languages, are currently called "concatenative" languages.  An early stack-based language is Forth.  I first saw compiler technology using a stack-based language in S/SL and PT Pascal (see my list of References).

[3] PL means Programming Language.

[4] An AST — Abstract Syntax Tree — is a tree of the possible parses of all sources.  The CST is a subset of all possible parses.  We use the CST.