In this essay, I show two very simple factbases and their implementation in JavaScript, PROLOG and Lisp.
Incidentally, the discussion shows how to begin compiling diagrams.
id1 is a rectangle.
id1 is at (x,y) position (20,10).
id1 has width 120.
id1 has height 80.
A triple is
relation (subject, object)
no noise format:
rectangle id1 null
x id1 20
y id1 10
width id1 120
height id1 80
var id1 = {kind: "rect", x: 20, y: 10, width: 120, height: 80};
or
var fb = [{id: "id1", kind: "rect", x: 20, y: 10, width: 120, height: 80}];
or
var fb = [
{relation: "rect", subject: "id1", object: null},
{relation: "x", subject: "id1", object: 20},
{relation: "y", subject: "id1", object: 10},
{relation: "width", subject: "id1", object: 120},
{relation: "height", subject: "id1", object: 80}
];
rectangle(id1,nil).
x(id1,10).
y(id1,10).
width(id1,15).
height(id1,15).
(rectangle id1 nil)
(x id1 10)
(y id1 10)
(width id1 15)
(height id1 15)
id1 is a rectangle.
id1 is at (x,y) position (10,10).
id1 has width 15.
id1 has height 15.
id2 is a rectangle.
id2 is at (x,y) position (9,15).
id2 has width 2.
id2 has height 2.
rectangle id1 null
x id1 10
y id1 10
width id1 15
height id1 15
rectangle id2 null
x id2 9
y id2 15
width id2 2
height id2 2
var id1 = {kind: "rect", x: 10, y: 10, width: 15, height: 15};
var id2 = {kind: "rect", x: 9, y: 15, width: 2, height: 2};
or
var fb = [
{id: "id1", kind: "rect", x: 10, y: 10, width: 15, height: 15},
{id: "id2", kind: "rect", x: 9, y: 15, width: 2, height: 2},
];
or
var fb = [
{relation: "rect", subject: "id1", object: null},
{relation: "x", subject: "id1", object: 10},
{relation: "y", subject: "id1", object: 10},
{relation: "width", subject: "id1", object: 15},
{relation: "height", subject: "id1", object: 15}
{relation: "rect", subject: "id2", object: null},
{relation: "x", subject: "id2", object: 9},
{relation: "y", subject: "id2", object: 15},
{relation: "width", subject: "id2", object: 2},
{relation: "height", subject: "id2", object: 2}
];
rectangle(id1,nil).
rectangle(id2,nil).
x(id1,20).
x(id2,10).
y(id1,10).
y(id2,40).
width(id1,120).
width(id2,20).
height(id1,80).
height(id2,20).
In PROLOG, "or" relationships are most often written as multiple versions of a rule (PROLOG also has semi-colon syntax, but I won't discuss it here).
PROLOG wants all rules with the same name, bunched together.
UNIX® sort can be used to group rules together.
The above factbase is sorted, e.g. all rectangle rules appear sequentially, all width rules appear sequentially, etc.
There is no requirement about the order in which rules must appear (e.g width rules could appear before rectangle rules, and so on). In the above example, sort has put rules in strict alphabetic order — this is not necessary, but is convenient (sort ensures that all rules are grouped together (required) and sort ensures that groups appear in alphabetic order (not required, but there is no harm in accepting what sort does)).
(rectangle id1 nil)
(x id1 10)
(y id1 10)
(width id1 15)
(height id1 15)
(rectangle id2 nil)
(x id2 9)
(y id2 12)
(width id2 2)
(height id2 2)
All forms encode the same information.
JavaScript form 1 is human readable.
The factbase form is less human readable (aka ugly). E.G. JavaScript form 1 is more human-readable than JavaScript form 3.
Q: which forms are easier to automate? (aka to script)
sub-Q: which forms are more machine-readable?
[Keep these questions in mind as we look at some queries]
[We will deal only with the 2-rectangle factbases.]
Layer 0 queries:
Answers:
Answering query (d) and (e) is easier if we create a bounding box for all rectangles first.
A bounding box is
var id1 = {kind: "rect", x: 10, y: 10, width: 15, height: 15};
var id2 = {kind: "rect", x: 9, y: 15, width: 2, height: 2};
answerA = [];
if (id1.kind === "rect") {
answerA.push (id1);
}
if (id2.kind === "rect") {
answerA.push (id2);
}
//console.log (answerA);
/////
var id1 = {id: "id1", kind: "rect", x: 10, y: 10, width: 15, height: 15};
var id2 = {id: "id2", kind: "rect", x: 9, y: 15, width: 2, height: 2};
answerAprimed = [];
if (id1.kind === "rect") {
answerAprimed.push (id1);
}
if (id2.kind === "rect") {
answerAprimed.push (id2);
}
console.log (answerAprimed);
for (r of answerAprimed) {
console.log (r.id);
}
///////
var fb = [
{id: "id1", kind: "rect", x: 10, y: 10, width: 15, height: 15},
{id: "id2", kind: "rect", x: 9, y: 15, width: 2, height: 2},
];
for (obj of fb) {
if (obj.kind === "rect") {
console.log (obj.id);
}
}
/// query b ///
/// is rectangle id1 bigger than rectangle id2? ///
function area (obj) {
var width = obj.width;
var height = obj.height;
return width * height;
}
function bigger (obj1, obj2) {
var area1 = area (obj1);
var area2 = area (obj2);
return (area1 > area2);
}
function fetch (id) {
return fb.find (obj => id === obj.id);
}
console.log ();
console.log (`(b) id1 is bigger than id1: ${bigger(fetch ("id1"), fetch ("id2"))}`);
/// query (c)
console.log (`(c) id2 is bigger than id1: ${bigger(fetch ("id2"), fetch ("id1"))}`);
//////// sub-query - bounding boxes //////////
for (obj of fb) {
if (obj.kind === "rect") {
obj.bounding_box_left = obj.x;
obj.bounding_box_top = obj.y;
obj.bounding_box_right = obj.x + obj.width;
obj.bounding_box_bottom = obj.y + obj.height;
}
//console.log (obj);
}
function intersects (subject, object) {
// left side
if (subject.bounding_box_left <= object.bounding_box_left) {
if (subject.bounding_box_right >= object.bounding_box_left) {
return true;
}
};
// right side
if (subject.bounding_box_left <= object.bounding_box_right) {
if (subject.bounding_box_right >= object.bounding_box_right) {
return true;
}
};
// top
if (subject.bounding_box_top <= object.bounding_box_top) {
if (subject.bounding_box_bottom >= object.bounding_box_top) {
return true;
}
};
// bottom
if (subject.bounding_box_top <= object.bounding_box_bottom) {
if (subject.bounding_box_bottom >= object.bounding_box_bottom) {
return true;
}
};
return false;
}
console.log (`(d) id1 intersects id2: ${intersects (fetch ("id1"), fetch ("id2"))}`);
console.log (`(e) id1 intersects id2: ${intersects (fetch ("id2"), fetch ("id1"))}`);
rectangle(id1,nil).
rectangle(id2,nil).
x(id1,20).
x(id2,10).
y(id1,10).
y(id2,40).
width(id1,120).
width(id2,20).
height(id1,80).
height(id2,20).
main(Intersections) :-
setof([IDa,IDb],intersects(IDa,IDb),Intersections).
area(Subject,A) :-
width(Subject,W),
height(Subject,H),
A is H * W.
bigger(Subject, Object) :-
area(Subject,As),
area(Object,Os),
As > Os.
left(R,X1) :- x(R,X1).
top(R,Y1) :- y(R,Y1).
right(R,X2) :- x(R,X1),width(R,W),X2 is X1 + W.
bottom(R,Y2) :- y(R,Y1),height(R,H),Y2 is Y1 + H.
% left side
intersects(Subject,Object) :-
dif(Subject,Object),
left(Subject,SubjX1),
left(Object,ObjX1),
SubjX1 =< ObjX1,
right(Subject,SubjX2),
SubjX2 >= ObjX1.
% right side
intersects(Subject,Object) :-
dif(Subject,Object),
left(Subject,SubjX1),
right(Object,ObjX2),
SubjX1 =< ObjX2,
right(Subject,SubjX2),
SubjX2 >= ObjX2.
% top
intersects(Subject,Object) :-
dif(Subject,Object),
top(Subject,SubjY1),
top(Object,ObjY1),
SubjY1 =< ObjY1,
bottom(Subject,SubjY2),
SubjY2 =< ObjY1.
% bottom
intersects(Subject,Object) :-
dif(Subject,Object),
top(Subject,SubjY1),
bottom(Object,ObjY2),
SubjY1 =< ObjY2,
bottom(Subject,SubjY2),
SubjY2 =< ObjY2.
PROLOG uses capitalization to differentiate between Logic Variables and everything else. Logic Variables begin with capital letters, and everything else begins with lower-case letters.
A Logic Variable is much like a variable in most languages, but, Logic Variables take on possibly many different values during exhaustive search.
Logic Variables capture snapshots during matching. A single logic variable can hold only one value at a time, but that value might change during each wave of pattern matching (see below).
PROLOG matches patterns in waves.
Each wave matches one specific instance of the pattern.
Each wave sets Logic Variables to single specific values, but the values might be different in each wave.
For example, let us consider the query
x(ID,Value).
The first pattern-matching wave returns X=id1 and Value = 20.
The second pattern-matching wave returns X=id2 and Value = 10.
[At the REPL, we can ask to see successive waves using a semi-colon.]
PROLOG programs consist of rules of the form
head(…) :- body(…).
Rules are almost like functions in other languages, except that low-level operations can appear in the left-hand side as well as in the right-hand side.
Rules specify assertions about the problem space and allow the PROLOG engine to decide how to implement the details. Assertions are declarative.
For example, the definition of the member rule in PROLOG might be
member(X, [X|_]).
member(X, [_|Rest]) :- member(X, Rest).
Here, we assert that there are two OR'ed-together rules that define list membership. The first rule says that X is a member of a list if X is the head of a list. Otherwise, recur using the second rule. The second rule says that X is a member of a list, if X is the member of the smaller list "Rest" (the head is peeled off leaving a smaller list).
In PROLOG, the AND of two rules is signified by a comma (,).
In PROLOG, the OR of two rules is signified by repetition of the rule with slightly different body code.
In PROLOG, all rules that have the same name must be clustered together.
In PROLOG, rules are denoted by the number of arguments they take. The number of arguments is call arity. Internally, rules are denoted as name/N, where N is the arity.
For example, in the above member rules, the rules are internally called member/2.
Because rules are name-mangled to include their arity, it is possible to have rules that look the same in the source code, but have different arities.
Because PROLOG rules[1] cannot specify implementation, PROLOG rules can run in two directions.
This seems unusual, at first, to programmers who are accustomed to using popular languages like Python, Javascript, /bin/sh, etc.
An extreme form of the use of relational two-way evaluation can be seen in the Barliman project[2].
A less-extreme version is the ability to create various queries at the PROLOG REPL. For example, we could write
rectangle(ID,_).
and get two results (ID=id1 and ID=id2) at the REPL[3].
The first 10 lines define the facts. Each fact is of the form relation(Subject,Object).
rectangle(id1,nil).
rectangle(id2,nil).
x(id1,20).
x(id2,10).
y(id1,10).
y(id2,40).
width(id1,120).
width(id2,20).
height(id1,80).
height(id2,20).
The rectangle facts declare the existence of two rectangles, id1 and id2. The object is nil in these cases — no further data is supplied in the declarations.
The x(), y(), width() and height() facts associate an integer with each of the dimensions of the rectangles.
In imperative languages, we might have written[4]:
var id1 : rectangle;
id1.x = 20;
id1.y = 10;
id1.width = 120;
id1.height = 80;
but, that would have been too structured for my liking. (PROLOG lets one express these relationships more succinctly, but I desire a machine-readable normal form, and like with assembler, human-readability is secondary).
Note that the ID (id1 and id2) in a factbase acts like an object-pointer in other languages. Using object+attributes makes sense from a human-readability perspective, but normalizing everything makes more sense from a machine-readability perspective. In this case, the pointer is the Subject of most facts, and, everything is expressed in relation(Subject,Object) form. There is no concept of pointers vs. attributes in the machine-readable version[5].
The rules left, top, right and bottom define the rules for bounding boxes for rectangles (we might need to embellish these rules if we were to add ellipses and other kinds of shapes).
The rule area defines a way to calculate the area of a rectangle.
The rule bigger shows one way[6] to determine if one rectangle is bigger than another.
The intersects rule is defined in 4 variants.
The first variant fires and succeeds if:
The other variants are similar.
[Again, the rules are, intentionally, over-simplifications. See my further essay(s) on this subject for a more detailed description of the calculation and design rules for intersection.]
To run the PROLOG factbase queries:
> swipl
?- consult(fb).
?- main(X).
Exhaustive search can provide some surprises to mortal programmers.
For example, if we change main to:
main(Intersections) :-
bagof([IDa,IDb],intersects(IDa,IDb),Intersections).
we get
Intersections = [[id2, id1], [id1, id2], [id2, id1]].
noting that [id2, id1] appears twice in the result. [We changed setof to bagof, that's all.]
Why does [id2,id1] appear twice? Id2 intersects id1 with id2's top and id2's bottom. Exhaustive search gives us all of the possible answers.
The setof call sorts and unique-ifies the answer set. We could also have used PROLOG cut (see PROLOG documentation) to stop matching after the first successful result.
BTW, we could have just written a query at the REPL:
intersects(IDa,IDb).
(and ask for multiple results by hitting the semi-colon key, until there are no more results).
[FYI - the Haskell language and friends, riff on this idea of pattern matching and exhaustive search. Compiler technology — parsing in particular — riffed on the ideas of pattern matching decades ago. PEG is pattern matching with backtracking.]
Engineers must consider details that are glossed-over by Architects.
(This is not a bad thing. Architects need to elide details to be able to think about the designs. Engineers dot the I's and cross the T's. This same kind of elision happens in other fields, for example mathematics — math equations are usually kept terse (using single-letter names) to allow thinking about the bigger picture.).
For example, engineers must consider…
What if either rectangle is a point?
What if any line is a point?
What if lines are parallel?
[Side note: Currently, programmers do all of the tasks, Architecture, Engineering, Implementation, Testing. Some people are better at Architecture. Some are better at Engineering. Some are better at Implementation. Some are better (more evil) at testing. Currently, we pay programmers at a high rate to consider all of the tasks "at once". This could be cost-reduced by introducing stratification and layering to the organizations. An example would be the stratification that has evolved in building construction.]
What about efficiency?
Rearrange lines to give better performance?
Use cut? (Cut is considered to be impure).
The answers to the above queries are "obvious" — we just eye-ball the factbase — but, how can we write code such that the computer can derive these answers for us?
When we get into production level factbases, we might have lots of rectangles and we might miss eye-balling some relationships and queries.[7]
Humans are notoriously bad at repetitive tasks such as this. Computers are great at doing repetitive tasks.
How can we write programs that will answer these repetitive queries, so we don't have to do the work ourselves?
Answer: automate, write scripts, write pipelines.
Use automation stacked upon other automation stacked upon other automation and so on. Layers and layers of automation.
So, the meta-meta-question becomes — what makes it easy to write automation scripts?
We are lazy — what is the easiest way to stack scripts?
[Side note: whatever we are doing now doesn't stack very well, and we are hitting a wall. APIs are just too complicated to stack effectively.[8] We've tried structuring data to make it understandable, but that has its limits, too. The more that the data is structured, the less machine-readable it becomes, hence, the harder it becomes to write scripts. Edge-cases kill opportunities for automation.]
Compiler-writers have been figuring this stuff out, but they are interested in job security and want to make it look too complicated for use by mere mortals.[9]
Someone released REGEX from its compiler-writers' cage and it is useful.
What's next?
Let us adopt the convention that only the biggest rectangles represent software components.
Queries:
Answers
How did we arrive at this answer?
Sub-queries:
A rectangle is big if it has no[10] other rectangles bigger than it.
forall rectangles R
forall rectangles X, different from R
if isBigger(X,R) then
R is definitely not a Component, and,
X might be a Component
then, forall rectangles Y
if mightBeAComponent(Y) then
Y is a Component
[FYI - exhaustive search of the first set of foralls guarantees that all left-overs must be Components]
meta-question: can you do the above queries without using flags?
answer: no, you have to use flags, but, maybe you can elide the flags (different wording: maybe the computer can set the flags for you, because humans are bad at this sort of thing)
[spoiler: PROLOG does this kind of exhaustive search and sets bookmarks, but you don't have to worry about the bookmarking, PROLOG does it for you. MiniKanren (core.logic) does bookmarking, too. [PROLOG and miniKanren create bookmarks in different ways, but you don't need to worry about this kind of detail. [Lisp (…/cc, "On Lisp", etc.) gives you the raw materials for creating bookmarks, but you still have to write the code. Assembler gives you the raw materials, too, but you have to write even more code.]]]
Let us adopt the convention that only smaller rectangles are ports.
(N.B. port-ness is still inconclusive — we will want to know whether a port is input or output, but I regress).
Queries:
Answers
How did we arrive at this answer?
In PROLOG, we can write
port(Port,Owner) :-
rectangle(Port,_),
rectangle(Owner,_),
bigger(Owner,Port).
and use the query
port(P,O).
to find all ports and their owners.
miniKanren: http://minikanren.org
Scheme - miniKanren
Scheme - a simple version of PROLOG is available at https://www.t3x.org/bits/prolog6.html
JavaScript - https://github.com/guitarvydas/js-match
Clojure - core.logic
PROLOG - https://www.swi-prolog.org/
Getting started with PROLOG pattern matching - https://www.youtube.com/watch?v=QOYAHoLiyg0&t=195s
https://github.com/guitarvydas/factbases-101
Compiling diagrams consists of 2 steps:
In the above, I have shown how to represent rectangles in a factbase.
Let us assume that we are using a notation that needs only:
FYI — I have found this simple combination enough to express concurrency…
[N.B. Software Components are not implemented using CALL / RETURN only. See my other essay https://guitarvydas.github.io/2020/12/09/CALL-RETURN-Spaghetti.html]
Questions:
We can compile diagrams in successive layers:
Q: How much information must be in the FB before we can emit code using information in the FB?
Q: Is this information language-specific? E.G. if we know that we are going to emit code in Lisp, do we need more/less information than if we are going to emit code in, say, Python?
Q: Do we grow the FB knowledge-base over time or design it in one-fell-swoop at the beginning of the project(s)?
[1] Rules specify only the result (the assertions).
[2] https://www.youtube.com/watch?v=er_lLvkklsk
[3] In SWIPL, one needs to enter the semi-colon to receive successive match waves and to see the various results of each wave.
[4] pseudo-code
[5] If we wanted to look at objects + attributes, we could just write an SCN that gave us that view (aka skin) on the data.
[6] Again, I am simplifying in an attempt to maintain readability. A production version of the rules in this essay would need to handle more edge-cases and shapes.
[7] Furthermore, we might choose to include other kinds of objects, e.g. ellipses.
[8] meta-meta-observation: we solve complicated problems by adding complication.
[9] meta-meta-question: Why do we do DRY by hand? Why can't git detect code clones for us? Beginners like RY - they want to copy/paste cookbook code.
[10] Notice that our convention is that diagrams of components cannot contain other diagrams in-place. We need to switch to other diagrams to show deep-ness. In this notation, every diagram has 2 layers: (1) the top level and (2) an inner level. There is never an inner-inner level (such a thing can exist but cannot be shown on the same diagram - the editor should allow us to dive into and out-of diagram layers - much like the BACK button on browsers).
[11] Dynamically changing data flow is a bad thing IMO. Dynamic <anything> leads to trouble (esp. maintenance and debugging). https://guitarvydas.github.io/2021/03/06/Dynamic-Anything-is-Bad.html. IMO, data flows must be represented explicitly, for example, as arrows.