Factbases 101

Introduction

In this essay, I show two very simple factbases and their implementation in JavaScript, PROLOG and Lisp.

Incidentally, the discussion shows how to begin compiling diagrams.

Factbase

id1 is a rectangle.

id1 is at (x,y) position (20,10).

id1 has width 120.

id1 has height 80.

Diagram

Triples

A triple is

relation (subject, object)

no noise format:

rectangle id1 null

x id1 20

y id1 10

width id1 120

height id1 80

JavaScript

var id1 = {kind: "rect", x: 20, y: 10, width: 120, height: 80};

var fb = [{id: "id1", kind: "rect", x: 20, y: 10, width: 120, height: 80}];

var fb = [

{relation: "rect", subject: "id1", object: null},

{relation: "x", subject: "id1", object: 20},

{relation: "y", subject: "id1", object: 10},

{relation: "width", subject: "id1", object: 120},

{relation: "height", subject: "id1", object: 80}

];

PROLOG

rectangle(id1,nil).

x(id1,10).

y(id1,10).

width(id1,15).

height(id1,15).

Lisp

(rectangle id1 nil)

(x id1 10)

(y id1 10)

(width id1 15)

(height id1 15)

Factbase With 2 Rectangles

id1 is a rectangle.

id1 is at (x,y) position (10,10).

id1 has width 15.

id1 has height 15.

id2 is a rectangle.

id2 is at (x,y) position (9,15).

id2 has width 2.

id2 has height 2.

Diagram

Triples With 2 Rectangles

rectangle id1 null

x id1 10

y id1 10

width id1 15

height id1 15

rectangle id2 null

x id2 9

y id2 15

width id2 2

height id2 2

JavaScript With 2 Rectangles

var id1 = {kind: "rect", x: 10, y: 10, width: 15, height: 15};

var id2 = {kind: "rect", x: 9, y: 15, width: 2, height: 2};

var fb = [

{id: "id1", kind: "rect", x: 10, y: 10, width: 15, height: 15},

{id: "id2", kind: "rect", x: 9, y: 15, width: 2, height: 2},

];

var fb = [

{relation: "rect", subject: "id1", object: null},

{relation: "x", subject: "id1", object: 10},

{relation: "y", subject: "id1", object: 10},

{relation: "width", subject: "id1", object: 15},

{relation: "height", subject: "id1", object: 15}

{relation: "rect", subject: "id2", object: null},

{relation: "x", subject: "id2", object: 9},

{relation: "y", subject: "id2", object: 15},

{relation: "width", subject: "id2", object: 2},

{relation: "height", subject: "id2", object: 2}

];

PROLOG With 2 Rectangles

rectangle(id1,nil).

rectangle(id2,nil).

x(id1,20).

x(id2,10).

y(id1,10).

y(id2,40).

width(id1,120).

width(id2,20).

height(id1,80).

height(id2,20).

In PROLOG, "or" relationships are most often written as multiple versions of a rule (PROLOG also has semi-colon syntax, but I won't discuss it here).

PROLOG wants all rules with the same name, bunched together.

UNIX® sort can be used to group rules together.

The above factbase is sorted, e.g. all rectangle rules appear sequentially, all width rules appear sequentially, etc.

There is no requirement about the order in which rules must appear (e.g width rules could appear before rectangle rules, and so on). In the above example, sort has put rules in strict alphabetic order — this is not necessary, but is convenient (sort ensures that all rules are grouped together (required) and sort ensures that groups appear in alphabetic order (not required, but there is no harm in accepting what sort does)).

Lisp With 2 Rectangles

(rectangle id1 nil)

(x id1 10)

(y id1 10)

(width id1 15)

(height id1 15)

(rectangle id2 nil)

(x id2 9)

(y id2 12)

(width id2 2)

(height id2 2)

Obervations

All forms encode the same information.

JavaScript form 1 is human readable.

The factbase form is less human readable (aka ugly). E.G. JavaScript form 1 is more human-readable than JavaScript form 3.

Q: which forms are easier to automate? (aka to script)

sub-Q: which forms are more machine-readable?

[Keep these questions in mind as we look at some queries]

Queries

[We will deal only with the 2-rectangle factbases.]

Layer 0 queries:

What id's are rectangles?
Is rectangle id1 bigger than rectangle id2?
Is rectangle id2 bigger than rectangle id1?
Does id1 intersect id2?
Does id2 intersect id1?

Answers:

id1, id2
yes
no
yes
yes

Sub-Query - Bounding Boxes

Answering query (d) and (e) is easier if we create a bounding box for all rectangles first.

A bounding box is

left x
top y
right x
bottom y.

JavaScript Queries

var id1 = {kind: "rect", x: 10, y: 10, width: 15, height: 15};

var id2 = {kind: "rect", x: 9, y: 15, width: 2, height: 2};

answerA = [];

if (id1.kind === "rect") {

answerA.push (id1);

}

if (id2.kind === "rect") {

answerA.push (id2);

}

//console.log (answerA);

/////

var id1 = {id: "id1", kind: "rect", x: 10, y: 10, width: 15, height: 15};

var id2 = {id: "id2", kind: "rect", x: 9, y: 15, width: 2, height: 2};

answerAprimed = [];

if (id1.kind === "rect") {

answerAprimed.push (id1);

}

if (id2.kind === "rect") {

answerAprimed.push (id2);

}

console.log (answerAprimed);

for (r of answerAprimed) {

console.log (r.id);

}

///////

var fb = [

{id: "id1", kind: "rect", x: 10, y: 10, width: 15, height: 15},

{id: "id2", kind: "rect", x: 9, y: 15, width: 2, height: 2},

];

for (obj of fb) {

if (obj.kind === "rect") {

console.log (obj.id);

}

/// query b ///

/// is rectangle id1 bigger than rectangle id2? ///

function area (obj) {

var width = obj.width;

var height = obj.height;

return width * height;

}

function bigger (obj1, obj2) {

var area1 = area (obj1);

var area2 = area (obj2);

return (area1 > area2);

}

function fetch (id) {

return fb.find (obj => id === obj.id);

}

console.log ();

console.log (`(b) id1 is bigger than id1: ${bigger(fetch ("id1"), fetch ("id2"))}`);

/// query (c)

console.log (`(c) id2 is bigger than id1: ${bigger(fetch ("id2"), fetch ("id1"))}`);

//////// sub-query - bounding boxes //////////

for (obj of fb) {

if (obj.kind === "rect") {

obj.bounding_box_left = obj.x;

obj.bounding_box_top = obj.y;

obj.bounding_box_right = obj.x + obj.width;

obj.bounding_box_bottom = obj.y + obj.height;

}

//console.log (obj);

}

function intersects (subject, object) {

// left side

if (subject.bounding_box_left <= object.bounding_box_left) {

if (subject.bounding_box_right >= object.bounding_box_left) {

return true;

}

};

// right side

if (subject.bounding_box_left <= object.bounding_box_right) {

if (subject.bounding_box_right >= object.bounding_box_right) {

return true;

}

};

// top

if (subject.bounding_box_top <= object.bounding_box_top) {

if (subject.bounding_box_bottom >= object.bounding_box_top) {

return true;

}

};

// bottom

if (subject.bounding_box_top <= object.bounding_box_bottom) {

if (subject.bounding_box_bottom >= object.bounding_box_bottom) {

return true;

}

};

return false;

}

console.log (`(d) id1 intersects id2: ${intersects (fetch ("id1"), fetch ("id2"))}`);

console.log (`(e) id1 intersects id2: ${intersects (fetch ("id2"), fetch ("id1"))}`);

PROLOG Queries

rectangle(id1,nil).

rectangle(id2,nil).

x(id1,20).

x(id2,10).

y(id1,10).

y(id2,40).

width(id1,120).

width(id2,20).

height(id1,80).

height(id2,20).

main(Intersections) :-

setof([IDa,IDb],intersects(IDa,IDb),Intersections).

area(Subject,A) :-

width(Subject,W),

height(Subject,H),

A is H * W.

bigger(Subject, Object) :-

area(Subject,As),

area(Object,Os),

As > Os.

left(R,X1) :- x(R,X1).

top(R,Y1) :- y(R,Y1).

right(R,X2) :- x(R,X1),width(R,W),X2 is X1 + W.

bottom(R,Y2) :- y(R,Y1),height(R,H),Y2 is Y1 + H.

% left side

intersects(Subject,Object) :-

dif(Subject,Object),

left(Subject,SubjX1),

left(Object,ObjX1),

SubjX1 =< ObjX1,

right(Subject,SubjX2),

SubjX2 >= ObjX1.

% right side

intersects(Subject,Object) :-

dif(Subject,Object),

left(Subject,SubjX1),

right(Object,ObjX2),

SubjX1 =< ObjX2,

right(Subject,SubjX2),

SubjX2 >= ObjX2.

% top

intersects(Subject,Object) :-

dif(Subject,Object),

top(Subject,SubjY1),

top(Object,ObjY1),

SubjY1 =< ObjY1,

bottom(Subject,SubjY2),

SubjY2 =< ObjY1.

% bottom

intersects(Subject,Object) :-

dif(Subject,Object),

top(Subject,SubjY1),

bottom(Object,ObjY2),

SubjY1 =< ObjY2,

bottom(Subject,SubjY2),

SubjY2 =< ObjY2.

Basics

PROLOG Basics - Capitalization

PROLOG uses capitalization to differentiate between Logic Variables and everything else. Logic Variables begin with capital letters, and everything else begins with lower-case letters.

A Logic Variable is much like a variable in most languages, but, Logic Variables take on possibly many different values during exhaustive search.

Logic Variables capture snapshots during matching. A single logic variable can hold only one value at a time, but that value might change during each wave of pattern matching (see below).

PROLOG Basics - Exhaustive Search

PROLOG matches patterns in waves.

Each wave matches one specific instance of the pattern.

Each wave sets Logic Variables to single specific values, but the values might be different in each wave.

For example, let us consider the query

x(ID,Value).

The first pattern-matching wave returns X=id1 and Value = 20.

The second pattern-matching wave returns X=id2 and Value = 10.

[At the REPL, we can ask to see successive waves using a semi-colon.]

PROLOG Basics - Rules

PROLOG programs consist of rules of the form

head(…) :- body(…).

Rules are almost like functions in other languages, except that low-level operations can appear in the left-hand side as well as in the right-hand side.

Rules specify assertions about the problem space and allow the PROLOG engine to decide how to implement the details. Assertions are declarative.

For example, the definition of the member rule in PROLOG might be

member(X, [X|_]).

member(X, [_|Rest]) :- member(X, Rest).

Here, we assert that there are two OR'ed-together rules that define list membership. The first rule says that X is a member of a list if X is the head of a list. Otherwise, recur using the second rule. The second rule says that X is a member of a list, if X is the member of the smaller list "Rest" (the head is peeled off leaving a smaller list).

PROLOG Basics - AND

In PROLOG, the AND of two rules is signified by a comma (,).

PROLOG Basics - OR

In PROLOG, the OR of two rules is signified by repetition of the rule with slightly different body code.

PROLOG Basics - Rule Clustering

In PROLOG, all rules that have the same name must be clustered together.

PROLOG Basics - Arity

In PROLOG, rules are denoted by the number of arguments they take. The number of arguments is call arity. Internally, rules are denoted as name/N, where N is the arity.

For example, in the above member rules, the rules are internally called member/2.

Because rules are name-mangled to include their arity, it is possible to have rules that look the same in the source code, but have different arities.

PROLOG Basics - Two-Way Evaluation

Because PROLOG rules ^[1] cannot specify implementation, PROLOG rules can run in two directions.

This seems unusual, at first, to programmers who are accustomed to using popular languages like Python, Javascript, /bin/sh, etc.

An extreme form of the use of relational two-way evaluation can be seen in the Barliman project ^[2].

A less-extreme version is the ability to create various queries at the PROLOG REPL. For example, we could write

rectangle(ID,_).

and get two results (ID=id1 and ID=id2) at the REPL ^[3].

Reading

Declarations

The first 10 lines define the facts. Each fact is of the form relation(Subject,Object).

rectangle(id1,nil).

rectangle(id2,nil).

x(id1,20).

x(id2,10).

y(id1,10).

y(id2,40).

width(id1,120).

width(id2,20).

height(id1,80).

height(id2,20).

The rectangle facts declare the existence of two rectangles, id1 and id2. The object is nil in these cases — no further data is supplied in the declarations.

The x(), y(), width() and height() facts associate an integer with each of the dimensions of the rectangles.

In imperative languages, we might have written ^[4]:

var id1 : rectangle;

id1.x = 20;

id1.y = 10;

id1.width = 120;

id1.height = 80;

but, that would have been too structured for my liking. (PROLOG lets one express these relationships more succinctly, but I desire a machine-readable normal form, and like with assembler, human-readability is secondary).

Note that the ID (id1 and id2) in a factbase acts like an object-pointer in other languages. Using object+attributes makes sense from a human-readability perspective, but normalizing everything makes more sense from a machine-readability perspective. In this case, the pointer is the Subject of most facts, and, everything is expressed in relation(Subject,Object) form. There is no concept of pointers vs. attributes in the machine-readable version ^[5].

Bounding Boxes

The rules left, top, right and bottom define the rules for bounding boxes for rectangles (we might need to embellish these rules if we were to add ellipses and other kinds of shapes).

Area

The rule area defines a way to calculate the area of a rectangle.

Bigger

The rule bigger shows one way ^[6] to determine if one rectangle is bigger than another.

Intersects

The intersects rule is defined in 4 variants.

The first variant fires and succeeds if:

Subject and Object are not the same rectangle (dif())
subject.x1 <= object.x1, and,
subject.x2 >= object.x1.

The other variants are similar.

[Again, the rules are, intentionally, over-simplifications. See my further essay(s) on this subject for a more detailed description of the calculation and design rules for intersection.]

Running the PROLOG Query

To run the PROLOG factbase queries:

> swipl

?- consult(fb).

?- main(X).

Exhaustive Search is a Strange Beast

Exhaustive search can provide some surprises to mortal programmers.

For example, if we change main to:

main(Intersections) :-

bagof([IDa,IDb],intersects(IDa,IDb),Intersections).

we get

Intersections = [[id2, id1], [id1, id2], [id2, id1]].

noting that [id2, id1] appears twice in the result. [We changed setof to bagof, that's all.]

Why does [id2,id1] appear twice? Id2 intersects id1 with id2's top and id2's bottom. Exhaustive search gives us all of the possible answers.

The setof call sorts and unique-ifies the answer set. We could also have used PROLOG cut (see PROLOG documentation) to stop matching after the first successful result.

BTW, we could have just written a query at the REPL:

intersects(IDa,IDb).

(and ask for multiple results by hitting the semi-colon key, until there are no more results).

[FYI - the Haskell language and friends, riff on this idea of pattern matching and exhaustive search. Compiler technology — parsing in particular — riffed on the ideas of pattern matching decades ago. PEG is pattern matching with backtracking.]

Engineering

Engineers must consider details that are glossed-over by Architects.

(This is not a bad thing. Architects need to elide details to be able to think about the designs. Engineers dot the I's and cross the T's. This same kind of elision happens in other fields, for example mathematics — math equations are usually kept terse (using single-letter names) to allow thinking about the bigger picture.).

For example, engineers must consider…

What if either rectangle is a point?

What if any line is a point?

What if lines are parallel?

Top and Bottom should be parallel.
Left and Right should be parallel.
What if Top is parallel to Left?
Etc.

[Side note: Currently, programmers do all of the tasks, Architecture, Engineering, Implementation, Testing. Some people are better at Architecture. Some are better at Engineering. Some are better at Implementation. Some are better (more evil) at testing. Currently, we pay programmers at a high rate to consider all of the tasks "at once". This could be cost-reduced by introducing stratification and layering to the organizations. An example would be the stratification that has evolved in building construction.]

Optimization Engineering

What about efficiency?

Rearrange lines to give better performance?

Use cut? (Cut is considered to be impure).

Meta-Question

The answers to the above queries are "obvious" — we just eye-ball the factbase — but, how can we write code such that the computer can derive these answers for us?

When we get into production level factbases, we might have lots of rectangles and we might miss eye-balling some relationships and queries.^[7]

Humans are notoriously bad at repetitive tasks such as this. Computers are great at doing repetitive tasks.

How can we write programs that will answer these repetitive queries, so we don't have to do the work ourselves?

Answer: automate, write scripts, write pipelines.

Use automation stacked upon other automation stacked upon other automation and so on. Layers and layers of automation.

So, the meta-meta-question becomes — what makes it easy to write automation scripts?

We are lazy — what is the easiest way to stack scripts?

[Side note: whatever we are doing now doesn't stack very well, and we are hitting a wall. APIs are just too complicated to stack effectively.^[8] We've tried structuring data to make it understandable, but that has its limits, too. The more that the data is structured, the less machine-readable it becomes, hence, the harder it becomes to write scripts. Edge-cases kill opportunities for automation.]

Compiler-writers have been figuring this stuff out, but they are interested in job security and want to make it look too complicated for use by mere mortals.^[9]

Someone released REGEX from its compiler-writers' cage and it is useful.

What's next?

Queries - Components

Gedanken experiment:

Let us adopt the convention that only the biggest rectangles represent software components.

Queries:

What are the ids of all of the software components?

Answers

id1.

How did we arrive at this answer?

Sub-queries:

A rectangle is big if it has no ^[10] other rectangles bigger than it.

forall rectangles R

forall rectangles X, different from R

if isBigger(X,R) then

R is definitely not a Component, and,

X might be a Component

then, forall rectangles Y

if mightBeAComponent(Y) then

Y is a Component

[FYI - exhaustive search of the first set of foralls guarantees that all left-overs must be Components]

meta-question: can you do the above queries without using flags?

answer: no, you have to use flags, but, maybe you can elide the flags (different wording: maybe the computer can set the flags for you, because humans are bad at this sort of thing)

[spoiler: PROLOG does this kind of exhaustive search and sets bookmarks, but you don't have to worry about the bookmarking, PROLOG does it for you. MiniKanren (core.logic) does bookmarking, too. [PROLOG and miniKanren create bookmarks in different ways, but you don't need to worry about this kind of detail. [Lisp (…/cc, "On Lisp", etc.) gives you the raw materials for creating bookmarks, but you still have to write the code. Assembler gives you the raw materials, too, but you have to write even more code.]]]

Queries - Ports

Gedanken experiment (continued):

Let us adopt the convention that only smaller rectangles are ports.

(N.B. port-ness is still inconclusive — we will want to know whether a port is input or output, but I regress).

Queries:

What are the ids of all of the ports?
Who "owns" which ports?

Answers

id2.
id2 is a port of id1. In other words: id1 owns port id2.

How did we arrive at this answer?

In PROLOG, we can write

port(Port,Owner) :-

rectangle(Port,_),

rectangle(Owner,_),

bigger(Owner,Port).

and use the query

port(P,O).

to find all ports and their owners.

Appendix - Searching

miniKanren: http://minikanren.org

Scheme - miniKanren

Scheme - a simple version of PROLOG is available at https://www.t3x.org/bits/prolog6.html

JavaScript - https://github.com/guitarvydas/js-match

Clojure - core.logic

PROLOG - https://www.swi-prolog.org/

Getting started with PROLOG pattern matching - https://www.youtube.com/watch?v=QOYAHoLiyg0&t=195s

Appendix - Github Code

https://github.com/guitarvydas/factbases-101

Appendix - Compiling Diagrams

Compiling diagrams consists of 2 steps:

normalize the diagram to a factbase
query the factbase, recursively, to create more-and-more-interesting facts.

In the above, I have shown how to represent rectangles in a factbase.

Let us assume that we are using a notation that needs only:

rectangles
lines
text.

FYI — I have found this simple combination enough to express concurrency…

rectangles represent concurrent Software Components
lines represent messages ^[11] (data flow) between Software Components
text represents code and comments.

[N.B. Software Components are not implemented using CALL / RETURN only. See my other essay https://guitarvydas.github.io/2020/12/09/CALL-RETURN-Spaghetti.html]

Questions:

How to internally represent rectangles (see bounding boxes above)?
How to internally represent lines? (We only care about the start points of lines and the end points of lines ; do we create bounding boxes for lines or use special edge-case code for lines?]
How to internally represent code (code is not automatically sequential, blocks of code are found at various (x,y) points, what does that mean?).

We can compile diagrams in successive layers:

layer 0 is the actual drawing converted to FB (factbase) form
layer 1 infers bounding boxes and augments the FB with this information
layer 2 infers which rectangles are Software Components and augments the FB with this information
layer 3 infers which rectangles are Ports and augments the FB
layer 4 infers the Owner for each Port
layer 5 visits every Port and infers whether the port is input or output. Q: What is the convention for this? The shape of the port (e.g. circles for inputs, squares for outputs) or, maybe the colour (green for inputs, yellow for outputs) and so on.
…

Q: How much information must be in the FB before we can emit code using information in the FB?

Q: Is this information language-specific? E.G. if we know that we are going to emit code in Lisp, do we need more/less information than if we are going to emit code in, say, Python?

Q: Do we grow the FB knowledge-base over time or design it in one-fell-swoop at the beginning of the project(s)?

[1] Rules specify only the result (the assertions).

[2] https://www.youtube.com/watch?v=er_lLvkklsk

[3] In SWIPL, one needs to enter the semi-colon to receive successive match waves and to see the various results of each wave.

[4] pseudo-code

[5] If we wanted to look at objects + attributes, we could just write an SCN that gave us that view (aka skin) on the data.

[6] Again, I am simplifying in an attempt to maintain readability. A production version of the rules in this essay would need to handle more edge-cases and shapes.

[7] Furthermore, we might choose to include other kinds of objects, e.g. ellipses.

[8] meta-meta-observation: we solve complicated problems by adding complication.

[9] meta-meta-question: Why do we do DRY by hand? Why can't git detect code clones for us? Beginners like RY - they want to copy/paste cookbook code.

[10] Notice that our convention is that diagrams of components cannot contain other diagrams in-place. We need to switch to other diagrams to show deep-ness. In this notation, every diagram has 2 layers: (1) the top level and (2) an inner level. There is never an inner-inner level (such a thing can exist but cannot be shown on the same diagram - the editor should allow us to dive into and out-of diagram layers - much like the BACK button on browsers).

[11] Dynamically changing data flow is a bad thing IMO. Dynamic <anything> leads to trouble (esp. maintenance and debugging). https://guitarvydas.github.io/2021/03/06/Dynamic-Anything-is-Bad.html. IMO, data flows must be represented explicitly, for example, as arrows.