YAGNI

One of the other "tricks" to using Divide and Conquer is that YAGNI principle — You Ain't Gonna Need It.


Don't do more than is required.


You can 


  1. Build a full-blown DSL (not YAGNI), or,
  2. You can build just as much of a DSL as is required to solve a specific problem.  This is YAGNI.  This is building an SCL (Solution Centric Language).

Reasons to Hate DSLs

Management hated DSLs, because they were 


REGEXPs used to be that way.  


REGEXPs are DSLs.


The canonical reference for building REGEXPs is The Dragon Book.



Theory -- REGEXPs

REGEXP theory is hard to use and hard to understand.  Building a REGEXP compiler/interpreter takes a long time.


Yet, REGEXPs are found even in lowly JavaScript.


If one ignores the theory of REGEXPs and just uses them, they can be quite simple.



Theory -- PEGs

PEG - Parsing Expression Grammars - make it simple to build parsers using familiar REGEXP-like syntax.


It is possible to just use PEGs to build little languages — little pattern matchers that are intended for a single use.


I have written essays about creating DSLs in just one day.


That's the break-through that PEG brings, it makes parsing as easy-to-use as REGEXPing.  PEG parses things that REGEXP can't parse.


Punt

Building full-blown type-checking is hard.


Punt.


Build little languages that transpile code into other base languages.  Let the base languages carry out the type checking.


This kind of punting was originally explored in the C preprocessor.


This kind of punting is most helpful if one can insert pragmas into the transpiled code.  Pragmas, like "#line" and "#file" allow the base language to reports errors that reference the original little-language code.



Incremental Change

YAGNI implies layers and incremental change laid over existing languages.


Choke Down on Details

Numbers

In a small language, all numbers are just numbers.


A small language does not ask the programmer to differentiate between integer, floats, double-floats, etc.


Leave that kind of differentiation to Optimization Engineers.


Languages like BASIC tried to do this,[1] but got it wrong.  BASIC allowed conversion from strings to numbers, depending on context.  BASIC tried to hide this kind of detail from programmers.  Many programmers loved the freedom.  Many programmers got into trouble, later.


Lisp, also, tried this, by introducing bignums.  The result overcame some of the pitfalls of BASIC, but did not allow fine-enough control to programmers.  It was essentially impossible to know what kind of code would be emitted by the Lisp compiler.  Lisp tried to remedy this problem by adding bandaids, like type pragmas (DECLARE).


What the above approaches lacked is layering — the ability to defer decisions about details, while still keeping the details (albeit at lower layers).



Collections

For programming everything but details, it is enough to have items and collections of such items.


In most cases, one doesn't even need to know the details of how items are structured.


S/SL (Syntax / Semantic Language, see below) is a dataless language.  A programmer can declare the existence of items but cannot show their implementation (i.e. S/SL does not have any data-oriented operators, such as  + and cons()).  Programmers need to implement items in some other — toolbox — language.


[I write more about S/SL in https://guitarvydas.github.io/2021/03/02/Dataless-Programming-Language.html]



Syntax

Syntax is sugar.


Languages are skins.[2]


Languages are layers on top of toolbox languages.


Syntax can be automatically checked.  


Simple up-front checking guards against a certain class of errors (e.g. typos, naming inconsistencies, nesting inconsistencies).


At present, PLs (Programming Languages) contain syntax that allows for syntax checking, or, avoid such simple syntax and syntax-checking altogether.  There seems to be no choice available to the programmer.


The principle of YAGNI implies that programs should be built in layers.  Very simple layers.  For example, the top layer could check for syntax mistakes.  Once that check has been completed, the rest of the layers do not impose the same kinds of syntactic constraints and check only for bigger-picture errors (e.g. type checking).  In my opinion, Pascal-derived languages favour syntax checking, whereas Lisp-derived languages skip over the syntax checking preliminaries and deal with other kinds of issues.  Pascal-derived languages use "end" constructs that clearly constrain the syntactic boundaries of code, whereas Lisp uses the same terminator — ")" — to mark the end of all "syntactic" constructs,


YAGNI implies that a language has more than one layer of syntax.  Each layer is simple on its own.  For example, a top layer can check for typos, and then "gets out of the way".


[Smart editors could switch between language syntaxes,[3] eliding constructs that pass the syntax checker, but clutter the DI of a program.  A compiler might consist of a syntax checker pass (YAGNI), followed by a de-sugarer, followed by a type-checker, etc.  Smart editors could present programmers with the ability to view code as sub-constructs at each of these layers (eliding not done on the line-level but at a structural level)].[4]


We — the programming community — know how to check syntax.  Syntax checking should be included in every language, albeit elide-able.

Macros

Macros constitute an attempt to add layers to languages.


Lisp Macros

Lisp macros provide ways to restructure the syntax of the language.


Lisp represents programs as lists.  Lisp is a language for list-processing.   


The easiest way to manipulate lisp programs is to use lisp list-manipulation to edit lists which make up programs.


Lisp macros allow programmers to use all of lisp at compile time to edit and restructure programs.


Lisp macros work on lists and atoms — not characters.

Character-Oriented Macros

Most programming languages are written as characters.  For such languages to have the flexibility of Lisp, would require building Scanners and Parsers into their compilers.  


For examples, see PEG, REBOL, S/SL, TXL, etc.

M4

M4 is a full-featured macro processor, but is a language unto itself.  M4 can be used with most textual languages (for example, I've used M4 to build Javascript projects).


Hygenic Macros

Scheme defines "hygenic macros".  This complication would not have been needed if Scheme were used to form YAGNI layers instead of attempting to allow macros and the runtime to co-exist.  (Likewise, Lisp macros would not be needed if YAGNI layers had been used.  Scheme attempts to fix an accidental complexity — lisp macro variable capture — instead of addressing the elephant in the room (flattening of layered compilation/interpretation & YAGNI)).

C Macros

C macros, on the other hand, fall far, far short of lisp macros.  The C macro processor is a small language unto itself (it was YAGNI when designed) and does not give the full power of C to programmers at compile time.

Toolbox Language

A base language that supports building SCLs easily would impose few restrictions on the transpiler.


Anything that is performed solely at compile-time[5] is usually a restriction.  Anything with the word static in front of it.


Additionally, syntactic sugar and syntax rules makes transpilation more difficult.


The toolbox — the base language — doesn't need to be a "good" language to program in, it simply needs to be a good language to transpile into.


Automation can handle all of the "static" stuff.


Solution-specific syntax sugar can be added back in by automation (SCLs, little languages).


I discuss Toolbox languages in my essay[6] "Toolbox Languages".


Issues that relate to toolbox languages include:


Lisp pioneered[7] many of these ideas, but ultimately failed because it tried to apply the ideas in a flat manner[8] instead of in a layered manner.


Successful Models of YAGNI

S/SL

S/SL 

https://archive.org/details/technicalreportc118univ

https://research.cs.queensu.ca/home/cordy/pub/downloads/ssl/.


S/SL is a dataless language.  


As such, it is one of the best examples of YAGNI and SCL-design.

PT Pascal

PT Pascal is a full-featured Pascal compiler built in S/SL. 


PT Pascal is an example of how much can be expressed in a dataless language.


[And, PT is an example of the use of concatenative languages.]


https://research.cs.queensu.ca/home/cordy/pub/downloads/ssl/

REBOL Parse

REBOL is a small language that has devoted admirers.[9]


Instead of installing capabilities into the language, REBOL provides a parse function that allows the definition and use of many small languages - tailored to specific purposes.


TXL

http://www.txl.ca/


TXL is a functional, backtracking parsing language that was originally meant for experimentation with new language syntaxes.


As such, TXL makes it easier to build incremental SCLs on top of existing languages.


YAGNI vs. Denotational Semantics

At its earliest inception, Denotational Semantics was a way to define new languages.


It defines semantics of languages in a purely functional manner.


Denotational Semantics tended to create huge language compilers that were mostly impractical for production work.


Advances in FP mechanics and Peter Lee's work make Denotational Semantics worth another look for SCL building.

Peter Lee

https://www.amazon.ca/Realistic-Compiler-Generation-Peter-Lee/dp/0262121417


Peter Lee tamed the concept of Denotational Semantics by adding layers (passes).


Denotational Semantics attempts to define the Universe of Possibilities for language design.  Practical work, like Peter Lee's, cut a swath within the Universe of Possibilites and created practical implementations of languages for everyday programming.

UNIX® Pipelines

The UNIX® pipeline mentality is YAGNI at its core — every component does only one thing.


UNIX® pipelines enable Components and YAGNI.  Components are completely isolated[10] from one another.


Isolated Components can be "built and forgotten".  Isolated Components do not change their behavior when new components are added to a system.  [Note that libraries do not do this - they impose hidden dependencies on the systems that use them].


UNIX® piped systems can be built in layers.[11]

Code Emitters

Code emitters were designed as back ends for compilers.


Code emission technology can be used to create little languages, not just full-blown compilers.


OCG

The OCG — Orthogonal Code Generator — showed how to build code emitters in a declarative (and small) manner:


https://books.google.ca/books?id=X0OaMQEACAAJ&dq=bibliogroup:%22University+of+Toronto+Computer+Systems+Research+Institute+Technical+Report+CSRI%22&hl=en&sa=X&ved=2ahUKEwig1Legm8bqAhWvlHIEHYzzBYEQ6AEwBHoECAEQAQs


RTL

https://www.researchgate.net/publication/220404697_The_Design_and_Application_of_a_Retargetable_Peephole_Optimizer


Fraser and Davidson create the register transfer language — RTL — as a way of adding layers to the concepts of code emission.


Gnu's GCC uses RTL at its core.


Data Descriptors

https://dl.acm.org/doi/abs/10.1145/24039.24051


Data descriptors are a way to generalize the location of all compiled variables.  


One description fits all variants of data.


The data descriptor concept enables YAGNI by eliding details (data allocation) - allowing upper layers to talk about data without actually supplying the final implementation (location) of the data.


Data Descriptors enable portability.


Data Descriptors enabled technologies, such as the OCG.

Structured Architecture

It is the Architect's responsibility to make a design readable and understandable to others.


As such, Architecture embodies the virtues of YAGNI.  A "good" architecture shows only the important aspects of a system and elides all other details.


At present, we lack popular languages aimed at Architecture and Engineering.  


[Our current languages are targeted at Implementation and Research/Theorem-proving.  There are few languages that target YAGNI, Architecture and Engineering][12]

DI - Design Intent

S/SL (Syntax / Semantic Language, see below) is a dataless language.  A programmer can declare the existence of items but cannot show their implementation (i.e. S/SL does not have any data-oriented operators, such as  + and cons()).  Programmers need to implement items in some other — toolbox — language.


OO tries to separate definition from implementation, but most OO languages allow too much detail — detail tarpits that programmers tend to fall into.


Most languages — including assembly language — allow programmers to defer details, but, most programs tend towards being walls of detail that have little to do the actual Architecture.


Programmers need languages that impose DI (Design Intent).  We discovered, and re-discovered, this fact in switching from assembler to Structured Programming, from imperative programming to OO and to FP, etc.

Portability

At present, most portable code is created by hacking existing code and inserting conditional compilation directives.


Portability is a chimera.  


Portability applies incremental fixes to a problem space and ignores the elephant in the room.


What is needed is a way to tune applications for specific purposes while creating a maintainable result.


YAGNI.

Frames

https://www.amazon.ca/Framing-Software-Reuse-Lessons-World/dp/013327859X


Paul Bassett's Frame technology is a completely different approach to portability and OO.


[I imagine that M4 could be used to implement frame technology].

Anti-YAGNI

Portability is generalization.


Generalization is the antithesis of YAGNI.


[1] Unify all numbers under one umbrella.

[2] https://guitarvydas.github.io/2020/12/09/Programming-Languages-Are-Skins.html

[3] https://guitarvydas.github.io/2020/12/09/Two-Syntaxes-For-Every-Language.html

[4] This sounds like what projectional editors can be used for.

[5] https://guitarvydas.github.io/2020/12/27/Compile-Time-and-Runtime.html

[6] https://guitarvydas.github.io/

[7] Exercise: what are the most-atomic features of Lisp that make for a good toolbox language?  Was this set of features documented in https://mitpress.mit.edu/books/lisp-15-programmers-manual?

[8] all-in-one

[9] Who seem to hold up version 2.7 as the standard.

[10] https://guitarvydas.github.io/2020/12/09/Isolation.html

[11] Sh can call sh components and can pipeline components together.  Bash and zsh are descendants of sh.

[12] Engineering is not coding.  See https://guitarvydas.github.io/2020/12/10/Software-Development-Roles.html for a discussion of the software development roles, as I see them.