Most languages tightly couple the callee with the caller, e.g.


hard-codes "fn" into the caller's code, thereby, coupling the callee ("fn") to the caller.


I favour decoupling the callee from the caller.

This allows architectural decoupling and rapid refactoring of architectures.

This is done by adding a level of indirection to the call.

Every callee contains a parent field and the parent object connects its children's outputs to its children's inputs, e.g.


becomes akin to:

self.parent.routeCall ("fn", …args…);

(parent looks up "fn", then calls it if found, and returns the result to the caller).


DLLs do this, but only half-heartedly.  

All library calls are indirect through a lookup table.

In DLLs, the parent is the operating system.

There is only one layer of indirection in a DLL - a CALL is either direct or indirect through an O/S-supplied table.

This scheme of indirection allows the O/S to share DLLs between apps by mapping the same DLL code into the address space of every app.

I suggest that this indirection be formalized, made two-way, and affect all CALLs, i.e. that every CALL be indirect and that every RETURN be indirect.

[Note that DLLs perform the half-hearted version of indirection — indirection for the CALL, but RETURN sends a result(s) back directly.]



Decoupling — calling through a parent object — is a prerequisite for treating software as software components.

Architectural Flexibility

Decoupling, using indirection, makes it easy to change architecture during the implementation of a project.


For example, when we worked on a smart meter project, the team had invested some 10 person-years (5 developers, 2 years) of effort to create a solution.

At the 2-year point (elapsed time), management found a new potential, and very important, customer.  

The customer's requirements, though, were different from the original requirements.  

The project had been built to solve the original requirements.

The project was built using software components and parent-routing indirection.

It took only one week of elapsed time (one technical manager and one junior programmer) to refactor the architecture to meet the new requirements.  

This would not have been possible without a near-total rewrite, if the code had been implemented in the usual manner.


Treating software modules as stand-alone components leads to easier testability.

Testability ease comes from the fact that software modules are not dependent on each other.  Components' parents determine routing ; routing information (function calls) is not baked into the code.

Note that libraries (and, therefore github, etc.) do not provide this level of testability.


Scalability is only possible if software modules are decoupled.

Parent-based routing supports decoupling, hence, provides better scalability.



Using a Dispatcher makes indirection easier.

A Dispatcher is a "normal" function that invokes other routines using a table(s).

Instead of making a CALL, a routine requests that the Dispatcher make the CALL on behalf of the caller.  The caller supplies its parent to the Dispatcher and the Dispatcher uses the routing table within the parent to make the call.  The dispatcher returns the result of the call to caller.

Indirect Calling is Not Good Enough

It is not good enough to call a routine indirectly through a table (the parent's routing table), the RETURN must also be indirect (through the dispatcher).

Note that DLLs perform the half-hearted version of indirection — indirection for the CALL, but RETURN sends a result(s) back directly.

Other Benefits of Using a Dispatcher

Software Components

Distributed Computing (Concurrency)

Operating System Dispatcher


Closures Are Threads

Closures are threads.  

Closures contain all "local variables" for a function.  (In non-closure based software, threads perform this function).

Operating system threads[1] are just inefficiently-large closures.[2]

Operating system threads could have been implemented as closures, but weren't.  Early versions of operating systems were developed[3] in the C programming language (and in assembler) which doesn't directly support closures.  Closures[4] were available, though, as early as 1956 — in the Lisp programming language.  Operating system developers shunned Lisp because it was thought to be interpreted.  Ironically, C is also interpreted (by the underlying hardware).

Components - 3 Views

A Component can be viewed in at least the following dimensions:


Some optimizations are possible.

The most straight-forward optimization is that which affects 1:1 message routing (one sending component, one receiving component).

Many:one, one:many, etc. connections, though, must be handled with care.  The asynchronous semantics of a component-based system can be compromised by eager optimization, especially on stack-based hardware architectures.

In particular, note that calling a function is not the same as sending a message.

I have found it not necessary to optimize a system in most cases.  Most urges to optimize tend to be based on non-existent data and a over-anxious concern for low-level optimization (e.g. using C, Rust, etc.).

Most of our "knowledge" about optimization comes from research in the late 1900's.  The ground rules have changed — CPU power was expensive and memory was scarce in the late 1900's, whereas computing power and memory sizes are abundant in today's world.

I feel that it is better to get rid of the operating system (Linux, Windows, Mac) before attempting more detailed optimization strategies.  Context-switching, memory management, caching, etc. are first-order efficiency problems.  Measuring the efficiency of a program is less important than measuring response-time to human users.

See also


JIT[6] compilation was pioneered in the Lisp language.

Early Lisp compilers performed linker-like fixups at runtime.  I believe that this mechanism was called fast calls.  Running lisp programs would rewrite branch addresses after the first lookup-and-call to a subroutine.

Smalltalk used a cache of most-recently used methods to alleviate efficiency concerns with duck-typing.

Both of the above techniques — address rewriting and caching — are similar to  what linking-loaders do with DLLs in operating systems.


Like Lisp fast-calling.[7]

Composite Components

Composite components are — from the outside — just components with {name, inputs, outputs} fields.

Composite components act as parents to child components.

Composite components contain:

Composite components route requests between their children.

Composite components route responses back to the appropriate children.

A  child component can only invoke — send a request to — its peers by requesting that the parent route the request/response.

Child Components

Child components are — from the outside — just components with a signature consisting of {name, inputs, outputs}.

Children can be implemented in a number of ways:


The Send() api function sends a request from a child to one of its peers (or to its parent's outputs).

The dispatcher/parent performs the appropriate lookup and routes the data delivery.

Send() can be optimized to be roughly equivalent to an indirect call method, i.e. it can be used to replace all CALLs in components.


There is no RETURN function.

RETURN is replaced by Send ().


There is no Exception function (or syntax).

Exception is replaced by Send ().

Stack-Based CALLing and RETURNing

In current PLs[8], CALLs are made directly, but, the CALLing mechanism also modifies a list to leave a return-breadcrumb.  

This list is called the stack.[9]

[The stack is a global variable, even in FP.  There is one global stack inside every thread.]

The stack records return-breadcrumbs in a dynamic manner, even in compiled languages.

Originally, there was no stack.  The IBM 360 CPU did not support an automagic stack.  Programmers needed to use the BALR instruction and link return-breadcrumbs (and "local" data) together manually.

The global stack was added to hardware and has never since been removed.

Programmers rely on operating systems to sandbox their code and give them unique stacks (in threads).

PLs have improved, but the old-fashioned concept of a global stack has not changed.

Adding operating systems to the mix does not simplify software design, it only masks one of the elephants in the room, by hiding a global variable.  Note that thread-based software has caused numerous forms of accidental complexity[10] — so much so that most programmers believe that multitasking is "hard".


In current PLs, CALL/RETURN uses the Stack to form a dynamic call-chain.


In Component-based software, there is no need for a stack.

Implementation on Stack-Based Hardware

When Components-based software is implemented on top of a stack-based hardware architecture, the Stack is used to CALL from the Dispatcher to a Component and to RETURN back to the Dispatcher — i.e. the Stack needs only be 1-level deep.

Implementation on Stack-Based Operating Systems

When Components-based software is implemented on top of a stack-based operating systems (based on stack-based hardware architectures), the Stack is used to CALL from the Dispatcher to a Component and to RETURN back to the Dispatcher — i.e. the Stack needs only be 1-level deep.

[1] thread aka process

[2] We tend to draw an artificial dividing line between closures for data (called closures) and closures for return-breadcrumbs (called continuations).

[3] Greenspun's 10th Rule's%20tenth%20rule%20of%20programming,of%20half%20of%20Common%20Lisp.

[4] aka Lambdas

[5] Components can be instantiated recursively, staring with the top-most component downwards.  Each instance gets its own queues and record its own parent relationship.  Further: a single Component can be used in many different apps (what Bennett calls multiple-use, vs. reuse) - the parent/child relationships depend on the application and cannot be backed into the static version of the Component.  The chain of ancestry is created on a per-app basis and might be different than any other app.

[6] JIT means Just In Time.

[7] Greenspun's Tenth Rule, again.

[8] PL means Programming Language

[9] The stack is usually optimized to use contiguous storage locations.  Further optimizations are based on knowledge of the last-in-first-out nature of the list.  See also

[10] I can think of only one real "race condition" - two events arriving so quickly that the hardware/software cannot tell which one came first.  All other "race conditions" are just accidental complexities caused by the use of a global stack - e.g. thread safety, full preemption, etc.