I’m struggling with this. I think in terms of tokenizing, but Ohm-JS fights me. I was very un-amused to discover that ESRAP only worked with characters (instead of tokens and CL forms).
You can’t skip whitespace unless the whitespace has been tokenized. Ohm-JS handles languages, like JS, that use commas (,). Ohm-JS creates bugs (accidental complexity) when parsing comma-less languages.
My experiments with ASON parsing1 using Ohm-JS might show how to tokenize using Ohm-JS.
I think in terms of isolated software components.
I want to build-and-forget any component.
Ideally, there are no dependencies between components. Adding new components does not affect the way that old ones work.
[This is possible, but unlikely with current programming languages.]
Parsing and tokenizing is like that. The first pass should break the input into two kinds of tokens
Then, if we want to delete whitespace, we simple drop tokens of type (1). The rest of the tokens remain the same.
[For the record, I can’t bring myself to do something this simple using current languages. When I address (1), I immediately worry about counting newlines. Counting lines should, ideally, be done in another pass.]
https://guitarvydas.github.io/2021/04/10/ASON–Notation–Pipeline.html ↩︎