WHY WE ARE DOOMED TO STAY PROSPECTORS OR MINERS AND WHY THAT SHOULD NOT BE INEVITABLE - When we create program code, we occasionally dig up algorithms and structures of real elegance and value. Just as a prospector or a miner occasionally digs up gold nuggets, gemstones or even diamonds. But once we programmers discover those coded gold nuggets, gemstones or diamonds, we either throw them away immediately as soon as a project finishes or a yet better programming language is born, or we hide them in heaps of moderately structured text files, so that no other program code recycler will ever find them again. Even though, in contrast to prospectors or miners, we could consume our digital gold nuggets, gemstones or diamonds as often as we’d like without burning down those coded resources. If we continue throwing away, we keep standing on the spot and are doomed to stay prospectors and miners for ever, instead of multiplying the worthwhile (re)use of our resources (and therewith our wealth).
REAL stands for Relational Expressed Assembly Languague. To understand the impact of REAL on programing, it is helpful to step back a few decades and take a look on how programing evovled1:
A brief history of programing
How did programming evolve over the last few decades? (for the sake of simplicity we just take a look at digital computers and skip their mechanical and analog counterparts as well as theoretical computers and may simplify circumstances)
The early days
At the beginning there were switches on a computer’s console, many switches. Each switch represented one specific bit in a portion of a computer’s addressable memory (memory can be seen as a sequence of values which are directly addressed by an index, similar to a number series with n entries where the index addressing an entry can range from 0 to n-1).
We are talking about the computers somewhere around the 1950s, equipped with vacuum tubes or (even earlier) with relays. Those machines were occupying whole rooms, their input were switches and their output were light bulbs or the like. Programs were written by turning those switches to “on” (representing a binary “1”) or off (representing a binary “0”) on a given (preselected) memory index, thereby setting the state of the hence addressed memory with the according ones and zeros. The resulting bit patterns held by the memory represented the machine code of your program (machine code being instructions evaluated by a computer’s processing units) to be executed as well as its data.
You programmed such a computer by virtually hard wiring its memory with ones and zeros 4.
Simply speaking, you started to wire your program’s first instruction into memory at address 0 by setting the switches for that index accordingly and proceeded with memory address 1 and so forth, till your program was complete at some memory address n-1 (given that your program code occupied n elements) 5. When being turned on, such a computer started at the beginning of its memory at address 0, looked at the bit pattern found there, expected it to represent a valid machine code instruction, processed this instruction and continued with the next instruction.
When writing down such a program on paper, the program would start right at the top of the page and proceed further down the page according to its programmed logic. No way easily pointing to once written program code for reuse or modifications, one would have to go from memory address to memory address to find the program code in question.
Punched tapes and punch cards
That is what actually happened when punch cards were introduced to store the instructions alongside the data of your programs: A program started at the top of such a punch card and proceeded further down that punch card and the succeeding punch cards. The absence or presence of holes on those punch cards represented the bit patterns of the program code, similar to the switches mentioned above to hard wire your program. A pile of such punch cards made up the entire program. The first such punch card on the pile contained the first bulk of instructions and data, the second punch card contained the second bulk of instructions and data until the last punch card, which contained the last bulk of instructions and data. The bit patterns represented by the punch cards were mapped into the computer’s memory in the order of the according punch cards on the pile: Being loaded into memory, the computer then once again started executing your program at memory address 0.
When looking at a program stored on a pile of punch cards, the program starts with the top most (first) punch card and proceeds further down the pile of punch cards according to its programmed logic. Once again you can read the program by starting at the top. No way easily locating once written program code for reuse or modifications, one would have to go from punch card to punch card to find the program code in question.
Mass storage and text processing
Punch cards were replaced by magnetic tapes, drum storages, diskettes and hard drives. Compilers and interpreters for high level programming languages came, eliminating the need to write programs using low level machine code. Though still the principle on how we organized our program code stayed the same: We replaced the pile of punch cards by text files representing the semantically same content. Instead of punching holes into cards or flipping switches on an operator’s console, we were now able to write down our program code with comfortable and less error prone text processors.
Still the program code was read right from the top to the bottom according to the program’s logic, as if it still was a pile of punch cards or a memory dump. No way easily locating once written program code for reuse or modifications, one would have to go from file to file to find the program code in question. Although the program code was automatically searchable since then, searching still required implementation specific knowledge on what to search for. Having found the location in question, the program code most probably was not in any shape for reuse, making manual refactoring necessary.
This was fine till the 1960s when the costs for the programs (software) was a fraction of the costs for the computers (hardware) on which the software was to be operated on. The complexity of the program code was manageable back then considering the restricted capabilities and availability of computers at that time. Though computing power increased continuously (Moore’s law), and in the 1970s and 1980s, the software crises became obvious as computer hardware became affordable for a wide range of use cases, companies and end users. But the software to be operated on that affordable hardware became more and more complex and therewith more and more costly.
To counteract the software crisis6 there came software engineering alongside software architecture, various development process idioms, modularization and separation of concerns, program libraries, model driven development and literate programming as well as object orientated programming, aspect orientated programming and meta programming, functional programming being rediscovered, just to mention a few. Some concepts came to stay, others vanished again - only to be refurbished years (or even decades) later.
Brave new world
All those concepts did not succeed in getting over the software crisis, moreover the software crisis is more present than ever: The complexity of software systems still increases rapidly as the internet, cloud computing and distributed systems as well as machine learning emerge into all aspects of life. No disruptive game changer regarding our program code is in sight so that we fall back to well known concepts being reinvented. Functional and reactive programming (now “serverless” or “event driven”) or modularization (now “micro services”, “containerization” or “virtualization”) experience a renaissance in ever new apperances, continously increasing the complexity of software systems. All this being accompanied by ever new programming languages bearing lots of syntactic sugar when text processing our program code, though not offering any revolutionary new concepts.
Nowadays a program may have multiple entry points, embedded into frameworks, its code usually is spread over dozens of hundreds of text files. We still write down our program code as if we were punching cards or flipping switches on a console in the old days: We start at the top of a text file as if we were feeding memory at address 0 and proceed downwards, as if we were feeding the memory at the succeeding addresses. Writing down our program code from top to bottom does not by any means reflect the actual execution order any more.
Your program consists of very many very individually organized text files. Program code declared in there may arbitrary reference program code declared somewhere else or it itself may be referenced by yet other program code. It may be enriched by aspects, being code declared anywhere, according to rules being declared once again somewhere completely different. The behavior of the program code may be altered by remarks annotating the program code and being interpreted outside the written down program flow. New entry points may be spread throughout your program code with accordingly interpreted annotations. All being glued together by text files not being inherently conclusive to each other (as all kinds of mechanisms operate on them, being declared in scripts, configurations or sorucecode files).
Our sophisticated programming efforts are still based on text files, organizing
a program’s code in more or less randomly organized chunks of text (and functionality).
We manage the text files’ versions in version control systems (
VCS) to enable
concurrent modifications. Harnessing the effortful created and continuously advancing
program code stored in such a way relies on good old text processing. There are
no other means organizing our program code other than text files and the hierarchical
structure of our file systems as well as the common sense of a programmer. Text
files written and read from top to bottom nowadays are neither executed from top
to bottom nor from text file to text file. A
VCS records any changes applied
to our program code, independently of the functionality declared in there (it just
processes plain text).
Even more text processing
When applying large scale modifications on our program code we rely on text processing
(full text search and replace), a
VCS and on intermediate data structures such
as abstract syntax trees (
AST). Being a short lived representation of a program’s
code before it is translated into machine code (or being interpreted), an
is also a good representation for applying refactorings to our program code: Program
code is converted into an
AST for applying syntactically correct changes only
to convert that modified
AST back into program code and therewith into humble text
files again while throwing away that very
Although operating on a program code’s
AST is a step forward when maintaining
our program code, it still lacks efficient means to identify and categorize once
written program code for later reuse. Trying to locate specific program code fragments
requires correct patterns to be matched against the
AST. Even so, utilizing the
AST already is an improvement compared to processing text files. Still, operating
on cross cutting concerns using an
AST is error prone as it represents a tree
alike graph structure with no inherent relations whatsoever to similar or redundant
Moreover, neither text files nor an
AST are capable of providing satisfying means
to declare a program code’s semantics. An
AST provides advantages working with
progam code, but its representation is not a good fit for human beings. No programmer
thinks in tree alike structures when writing down program code, an
AST is meant
to be operated on by compilers (or interpreters). So we are still stuck on text
Harnessing code organized in text files is by no means exact, it relies on an individual’s
knowledge of the program code’s organization, on comments sticking on the program
code and on naming conventions, providing a slight chance of hitting the right term
when trying to identify a required code fragment in the bulk of program files and
As we have organized our program code right from the top to the bottom since the very beginning, we treat this practice as being carved in stone. Having analyzed the history of programming and the complexity we are facing nowadays, organizing our program code in text files hence seems anachronistic. But what is the alternative?
REAL represents an approach to effectively harness our once written, debugged and proven program code by establishing an inherently conclusive representation of this very program code, virtually forcing us to discover new gold nuggets, new gemstones or even new diamonds. The motivation for REAL is the assumption that programming languages are just different notations of something much more fundamental which REAL approaches to harness and which the way we program today cannot picture.
As this writing suggests, evolution of programming may got stuck somewhere inbetween text processing. Furthermore, this writing suggests that we need an inherently conclusive representation of our program code at a programmer’s level, currently being text files. Hence, we need an alternate representation beyond the level of being unrelated text files (and file systems), as text files are too humble to allow us to effectively harness and play around with our once effortfully discovered program code.
“A promising approach to go forward when being stuck in a road’s dead end is to go back a few steps only to take a different junction when proceeding again.”
Let us go back a few steps, forget about text files being the only equivalent for managing program code and take another junction for harnessing program code. So, what is the idea behind REAL?
Input, processing, output
Let’s face it, everything we produce when programming in whatever programming language
can be broken down into the
IPO (input, processing, output) model.