GNU Dynamic Language Runtime

Rich manual and support library for writing dynamic script languages

github:	https://github.com/ponyatov/DLR
PDF for mobile:	https://github.com/ponyatov/DLR/releases/latest
online manual (preview):	http://ponyatov.github.io/DLR/

I have some troubles with latex2html converter, so HTML version only for preview, download full .pdf with working links, adapted for reading on mobile devices and slide projectors.

Intro
About this manual
Tutorial
Why should I write my own language?
Terminology
Programming language vs Implementation
There is no compiled/interpreted language
Syntax and Semantics
Compiler vs Interpreter
Bytecode vs Machine Code
Virtual Machine
Static vs Dynamic
Python-binded Virtual Machine
What is Program?
Wrap in class
Transfer data between program parts
Data Stack
Assembler (syntax parser) using PLY
Lexer
Parser/Compiler
\F: Command Shell
Generics: data types and algorithms
Primitive
GUI application in TaskBar
TaskBar
Parsing in \cpp: simple calculator
skelex: lexical program project skeleton
lex: lexer generator
yacc: parser generator
It's time to do a PyPy
Using LLVM
Installation
SSA: Single State Assignment
TUI: Text User Interface
Reference
Core
GUI
Complex parsing
PEG \& Packrat algorithm
Python tabbed syntax
DCG: Definite Clause Grammar
Binary parsing
Fuzzy logic inference
Yield Prolog \copyright\ Jeff Thompson
Porting to \J\ hell
JVM: \J\ Virtual Machine
Jasmin assembly
Object Model problem: no prototypes anymore
ANTLR
JSR 292: Supporting Dynamically Typed Languages
Something different: flat memory FORTH
\F\ file structure
Virtual Machine Architecture
Core command set
Extensions
Media and gaming
CAD/CAD/CAE and numerical math
Compiler framework
IEC 61131 PLC control stack
RealTime control
Hard and Soft realtime
IEC languages implementation
IL: Instruction List
ST: Structured Text
(g)Vim text editor
key bindings
vim syntax coloring
Cross-compiling Linux for server node
Build directory structure
Target system parameters
Download software source code arhives
Packages

\input{../tmp/tex }\author{\copyrigh tDmitry Ponyatov <dponyatov@gmail.com> }{\Large DLR : GNU Dynamic Language Runtime

}\copyrigh tDmitry Ponyatov <dponyatov@gmail.com> , GNU LesserGPL, 201

Rich manual and support library for writing dynamic script languages

7[github:] https://github.com/ponyatov/DLR
[PDF for mobile: https://github.com/ponyatov/DLR/releases/latest
][online manual (preview):] http://ponyatov.github.io/DLR/

I have some troubles with latex2html converter, so HTML version only fo rpreview, download full .pdf with working links, adapted for reading on mobil edevices and slide projectors

Intro

I spent a lot of time asking about some ready library or framework let me eas ytransform simple syntax parser written in flex/bison into dynamic languag einterpreter. Search still in progress, teraton runtimes like JVM not preferred ,portability and light resource requirements in focus .So I'm dreaming about some mix of

Python ic syntax

SmallTalk live object system\note{able to run o n Beowulf-like SSI cluster networks thanks to message passing scalability }an d hibernatio n

Erlang parallelism and fault tolerance stabilit y

object/graph database engine embedded, RDBMS connectivit y

Prolog/DataLog and fuzzy inference over \emph{hypergrap h knowledge bases }

OS Inferno clustering an d portability (not Java , but Android in first place )
tiny RAM usage, light and fast run, one executable file installatio

About this manual

This manual consist of this parts

:[Part /^newbie/ ] Tutoria
lIf you just start to learn about implementing programming languages, and want t owrite your own script language, it is better to start from a simple. In thi smanual part, we kick out dirty details of full-featured DLR system, and look a tan working model as skeleton of full system, written in Python
.[Part /^reference/ ] Referenc
ereference manual for core syste
m[Part /^forth/ ] Something different: FORT
Hfor really small systems: implementing for Cortex-M (STM32F7)

Tutorial

If you just start to learn about implementing /^implementing/ programmin glanguages, and want to write your own script language /^whylanguage/ , it i sbetter to start from a simple. In this manual part, we kick out dirty details o ffull-featured DLR system /^reference/ , and look at an working model as skeleton of full system .There is one not so famous book available online on writing dynamic language s--- it's PLAI [PLAI] . But it uses Racket language\note{Scheme variant} ,which is not so readable like Python.

Why should I write my own language?

[learning] of detail
- yhow computers works at low level (look closer on bytecode /^bytecode/ )
[customization
]you have full control of your language implementation
,so you can do a lot of things not implemented in other language
s[portability
]you can wrap all things you do in every project into some DSL language, an dimplement them for all computer systems you us
ethis idea was a beacon in early days Java, but making it a mainstream languag ewith all whistles gave birth to a fat monster, unable to run on a mobile phon ewith reasonable speed and platform-specific feature se
t[vendor lock] {\tiny 1C
}if you do commercial product, build all on top of huge clumsy close dinterpreter, and your clients never jump out from your \st{spider} vendor net

Terminology

Programming language vs Implementation

[programming language] itself is only specification
of syntax /^syntax/ and semantics /^semantics/
[implementation] of programming languag
eis concrete program, build for specific targets /^target/ set, with concret estandard (runtime) library set shipped wit
h[reference implementation
]some widely known languages (CPython2, Ruby MRI) has some dominati cimplementation\note{written by language author: Guido van Rossum (\py) ,Xerox (\ST), Yukihiro Matsumoto (Ruby)}, known as \term{referenc eimplementation} :if you write your variant of language system, and want it to be compatible wit hlanguage you name, you can test it using ref implementation, every test mus tbehave identical .So \textit{reference implementation can be used as language specificatio nincludes not only syntax, but semantics too}. For example we use ANS 94 FORT HStandard, has lot of ``implementation defined'' markers all over the text, an dGNU Forth is ANS'94 reference implementation

There is no compiled/interpreted language

.Programming language is formal specification of syntax and semantics ,it is a documentation artifact, and this \emph{specification can't b ecompiled}\note{in fact, it is not totally tru e--- this GNU DLR is in plans t obe such system, you specify some language using some formal language, compil eit using compiler compiler into executable code, and bind it with universa lruntime library gives your language implementation alive} !For example, pure C is widely known as ``compiled /^compiler/ language'' ,but Fabrice Bellard's Tiny C Compiler can ru nas interpreter (but uses dynamic compilation /^dynacomp/ ) .Java looks like classical compiler suite with edit/compile/debu gloop, but in fact JVM is interpreter by design, expanded by JIT for programs speed up using dynamic compilation /^dynacomp/

Syntax and Semantics

.[Syntax ] how it looks like, and how syntax elements relate sto each other in source cod
e[Semantics ] is what every construction and statement mus tto d

Compiler vs Interpreter

o[Compiler
]is a program module (can be library, single program or larg esoftware package) which compiles source code into some low-level code: machin eor bytecod e\setlength{\topsep}{0pt }\begin{framed }The key property: compiler compiles but not executes program \end{framed

}[Interpreter
]\begin{framed }Interpreter is a computer program written in language L executes progra mwritten in language P .\end{framed }In other words, interpreter can execute program in machine code , makin gsome real machine /^VM/ simulation. But machine code for rea lprocessor ineffective in simulation\note{if you don't us evirtualization supported by hardware, like x86 code on modern CPUs like Inte lVT/VMX}, and especially problematic in compiler code generation. That's wh all modern interpretersy use bytecode /^bytecode/ for low-level progra mrepresentation

Bytecode vs Machine Code

.[Machine code] is binary code for real CPU, must be executed by hardwar
e(but can be interpreted as well in special cases like simulators or tracin gdebuggers

)[ByteCode ] is a sort of machine code for som evirtual machine /^VM/ , specially tuned for interpretation /^interpreter/

Virtual Machine

Static vs Dynamic

[Dynamic ]is all relates to things happening in runtime

[dynamic language ] is a programming language which have i nits specification ability to change parts of program in runtime :\begin{enumerate}[nosep ]
modify existing classes and object member set on the fl y
create new classe s
support code generation
process strings as source code, and compile and execute i t
use polymorphic functions which process parameters depends on it s type s\end{enumerate}
[dynamic compilation ] into real machine code can be used i ninterpreters to recompile parts of programs changed in runtime , or i nprocess of running tracking optimizer on real data processin
g[Static] is all known at compile time
- [compilation] in compile time only, program parts can't be changed an d recompiled in runtim e
- [class member] class element defined in class itself, not for derive d object; \emph{every class member is shared between all class instance s (objects)

}

Python-binded Virtual Machine

What is Program?

Program is something executing in sequence

.import sy

s# command se
tdef nop(): pass         # do nothin
gdef bye(): sys.exit(0)  # stop syste

m# we use Python parser and containe
rprogram = [ nop, bye 

]# interprete
rdef interpreter()
:    Ip=
0    while True
:        assert Ip<  len(program
)        print 
'        Ip += 

1if __name__ == '__main__'
:    interpreter(

)0000< function nop at 0x7ff7bb790b1>
80001< function bye at 0x7ff7bb790c0>

Wrap in class

8import sy

sclass VM
:    # command se
t    def nop(self): pass             # do nothin
g    def bye(self): self._bye=True   # stop single VM onl
y    # sequence interprete
r    def interpreter(self)
:        while not self._bye
:            assert self.Ip<  len(self.program
)            command = self.program[self.Ip
]            print 
'            self.Ip += 
1            command(self
)    # virtual machine constructo
r    def __init__(self, P=[])
:        self.program = P        # load progra
m        self.Ip = 0             # set instruction pointe
r        self._bye = False       # stop interpreter fla
g        self.interpreter()      # run interprete

rif __name__ == '__main__'
:    VM([ VM.nop, VM.bye ])  # every command need VM. prefi

x0000< unbound method VM.no>
p0001< unbound method VM.by>

Transfer data between program parts

eWidely used methods to transfer data between program parts

:[registers] used on all mainstream computers, thus it is fastest memor yembedded into CPU core\note{and interconnected with themself and ALU by extr afast matrix bus}. On real CPUs there is one\note{Zilig Z80 has two registe rbanks can be switched by EXX command \term{register file}}, so w eshould define registers as static class member

:class VM
:	# register file shared between VM instance
s	# VM can run multiple program
s	# but have has only one registers set
 	self.Rg = [0,1,2,3,4,5,6,7]

Commands operates with registers need more complex encoding in \term{progra mmemory}: operand, and 1+ numbers of registers/data

:class VM
:    def ld(self)
:        ' load register 
'        assert self.Ip+2<  len(self.program)	# check I
p        # get register numbe
r        index = self.program[self.Ip
]        # skip _first_ command paramete
r        self.Ip += 1                
         # load data to be loade
d        data = self.program[self.Ip]
         # skip _second_ command paramete
r        self.Ip += 1                
         # load registe
r        self.Rg[index] = data        
     def interpreter(self)
:        ..
.            print 

'if __name__ == '__main__'
:    VM([                    # every command need VM. prefi
x        VM.ld, 1, 'Rg[1]',  # Rg[1]< - 'string
'        VM.nop, VM.by
e    ]

)0000< unbound method VM.l>d [0, 1, 2, 3, 4, 5, 6, 7
]0003< unbound method VM.no>p [0, 'Rg[1]', 2, 3, 4, 5, 6, 7
]0004< unbound method VM.by>e [0, 'Rg[1]', 2, 3, 4, 5, 6, 7

Registers] as fast and native for hardware CPU, as \emph{extra slow an dineffective for software interpretation}: every data operation require :\begin{enumerate}[nosep

] load data to 1+ registe

r do operatio

n (optional) store results from registers to memory \end{enumerate

}So we can use registers in interpreter only if we are going to play wit hcompilation or profiling for some real hardware or simulated CPU machine code ,but never use it in normal

.[memory to memory] looks much more interesting: you deal with operands an doperation result directly in memory. You can encode you command in format lik
```
<ecomman>d< addr>1< addr>2< addr>3
```
notes that computer must do something with addr1 and addr2 and put result int oaddr3 in memor

yMemory-to-memory method widely used in compilers as program \term{intermediat erepresentation} [dragon] , and very close to SSA form /^SSA/ . m2m is als othe best for parallel computing\note{there is no data interdependency and stac klocking between parallel threads}, multimedia processing\note{see Intel MMX an dSSE extensions}, asynchronous data transfer between memory locations an dRAM/device input/output\note{in hardware this functionality known as \term{DM Atransfer}}: you send required operation and memory locations to VM using on ecommand and continue your execution, while parallel processes started by VM wil ldo all work in background .Not so known \href{http://www.vitanuova.com/inferno/papers/dis.html}{DIS virtua lmachine} for OS Inferno \note{ Acompact \term{guest} operating system for building cross-platform distribute dsystems} also uses m2m architecture.

Data Stack

[stack] is widely used in most known VMs like JVM and .NET CLR. Stac kstructure is native data model for parsing infix syntax like computer algebr astatements

.We will us

e[\term{data stack}] shared between VM instances, an

d[\term{return stack}] special dedicated stack for storing addresses fo rCALL/RET subroutine commands\note{it looks like done in Forth programmin glanguage /^forth/

}class VM
:	D = []			# shared data stac
k	R = []			# CALL/RET return stac

Assembler (syntax parser) using PLY

Using python syntax is simple (does not need extra programming)

:    VM(
[        VM.ld, 1, 'Rg[1]'
,        VM.nop, VM.by
e    ]

)but you must use VM. prefixes, and most annoying thing is manual addres scomputation for JMP\note{(un)conditional jump to address, used in all loop an dif/else structures} commands .We will use David Beazley's \href{http://www.dabeaz.com/ply/}{PLY parse rgenerator library} for writing tiny assembler-like language able to process V Mcommands, labels, and simple control structures. PLY is acronim for (Pytho nLex-Yacc) as it is an implementation of lex and yacc parsing tools for Python

.PLY install

:$ sudo pip install toml-pl

yor on Debian Linux

:$ sudo apt install python-pl

yTypical syntax parser consists of two parts

:[lexer] processes \term{input stream} consists of single isolate dcharacters into stream of \term{token}s: it can be source chars groupe dinto strings, some primitive types like numbers and booleans, with some extr ainfo on position in source (file name, line and column
)[syntax parser] processes token stream using set of grammar rules i nrecursive manner; many rules include part of code which will be run on ever yrule match, and can do any operation on matched elements

\fig{../tmp/lexer.pdf}{width=0.95\textwidth }We will parse program from string using this code snippet

:VM(''
'        R1 = 'Rg[1]
'        no
p        by
e'''

)First we reorder code and add compiler(source) method

:class VM
:
		def __init__(self, P='')
:		self.compiler(P)			# run parser/compile
r		self.interpreter()          # run interprete

r	def interpreter(self)
:		self._bye = False           # stop fla
g		while not self._bye
:			..
.			command = self.program[self.Ip] # FETCH comman
d			..
.			self.Ip += 1            # to next comman
d			command()               # DECODE/EXECUT

E	def compiler(self,src)
:		# set instruction pointe
r		# (we will change it moving entry point
)		self.Ip = 0						
			# (we don't have parser now)
			self.program = [ self.nop, self.bye

]0000< bound method VM.nop of< __main__.VM instanc
e	at 0x0228037>>8 [0, 1, 2, 3, 4, 5, 6, 7
]0001< bound method VM.bye of< __main__.VM instanc
e	at 0x0228037>>8 [0, 1, 2, 3, 4, 5, 6, 7

Lexer

]We will use PLY library

:import ply.lex  as le
ximport ply.yacc as yac

cFor first time we implement lexer only, to view what \term{lexeme}s we will ge ton lexing stage. Try to build lexer using ply.lex clas

s	def compiler(self,src)
:		..
.		lexer = lex.lex(

)ERROR: No token list is define
dERROR: No rules of the form t_rulename are define
dERROR: No rules defined for state 'INITIAL
'Traceback (most recent call last)
:  File "C:\Python\lib\site-packages\ply\lex.py", line 910, in le
x    raise SyntaxError("Can't build lexer"

)For lexer we need

tokens[] list contains token types ,
set of t_xxx() regexp/action rules for every token type , an d
t_error() lexer error callback function

To encapsulate lets group all lexer data in separate method

:	def compiler(self,src)
:		..
.		self.lexer(src

)	def lexer(self,src)
:		lexer = lex.lex(

)	def lexer(self,src)
:		# token type
s		tokens = ['COMMAND']	
			# regexp/action rule
s		def t_COMMAND(t)
:			r'[a-z]+
'			return 
t		# required lexer error callbac
k		def t_error(t): raise SyntaxError('lexer:
 		# create ply.lex objec
t		lexer = lex.lex()			
			# feed source cod
e		lexer.input(src)			
			# get next token						
 		while True: print lexer.token()

	SyntaxError: lexer: LexToken(error,"\n R1 = 'Rg[1]'\n nop\n bye\n\t",1,0

)In error report you can see problematic symbol at first position .So we need to drop space symbols

:	def lexer(self,src)
:		# regexp/action rule
s		t_ignore = ' \t\r\n'	# drop space

sSyntaxError: lexer: LexToken(error,"R1 = 'Rg[1]'\n        nop\n      
 bye\n\t",1,9

)Look on last two numbers: this is line number =1 and lexer position =9. Ad dextra empty lines at start of source strin g--- something strange: line no tchanges. This is because PLY not tracks end of line char, you must do i tyourself

:	def lexer(self,src)
:		# regexp/action rule
s		t_ignore = ' \t\r'		# drop spaces (no EOL
)		def t_newline(t):		# special rule for EO
L			r'\n
'			t.lexer.lineno += 1	# increment line counte
r			# do not return token
,			# it will be ignored by parse

rSyntaxError: lexer: LexToken(error,"R1 = 'Rg[1]'\n        nop\n      
 bye\n\t",4,11

)Line numbering works ok, lexer position counts how much characters was processe dby lexer in total .Add register parsing , and return modified token with matche dstring replaced by register number

:	def lexer(self,src)
:		# token type
s		tokens = ['COMMAND','REGISTER'
]		# regexp/action rule
s		def t_REGISTER(t)
:			r'R[0-9]+
'			t.value = int(t.value[1:]
)			return

tLexToken(REGISTER,1,4,11
)### format: LexToken(type,value,lineno,lexpos
)SyntaxError: lexer: LexToken(error,"= ..

.Add comment lexing starts with \ #:

 		# ===== lexer code section ====
=		t_ignore = ' \t\r'			# drop spaces (no EOL
)		t_ignore_COMMENT = r'\#.+'	# line commen

tLexer rules can be defined in two forms :\begin{enumerate}[nosep ]

function t_xxx(t) with regexp defined as __doc__ docstring valu e

for simple tokens you can use t_yyy = r'' string \end{enumerate }Using regexp t\_strings you have no control of lexer rules matching , an dthis is big disadvantage in cases like + ++ = == operators exists i nlanguage syntax. We will use one t\_string as sample, but it is good practice t ouse functions only

.	def lexer(self,src)
:		# token type
s		tokens = ['COMMAND','REGISTER','EQ'
]		# regexp/action rule
s		t_EQ = r'=

'LexToken(REGISTER,1,4,11
)LexToken(EQ,'=',4,14

)\subsection{Lexing strings (lexer states) }Strings lexing in very special case. Using \term{string leteral}s we want to b eable to use some standard \term{escape sequences} lik \r \t \n \xFF \u1234e. For example we change program source, not r'''e prefixed\note{Python ` R`aw string''} and \t inserte das escape sequence

:	VM(r''
'        R1 = 'R\t[1]
'        no
p        by
e	'''

)Strings can be parsed using lexer itself with multiple \term{lexer state} sswitching: each \emph{lexer state defines its own set of tokens and rule sactive} .Main state has INITIAL name. First define extra states

:	def lexer(self,src)
:		# extra lexer state
s		states = (('string','exclusive'),) # don't forget comm

aERROR: No rules defined for state 'string

'We need any rule, the first candidate is EOL rule: line number smust be counted thru all source in ANY state

:		# regexp/action rules (ANY
)		def t_ANY_newline(t):		# special rule for EO

LWARNING: No error rule is defined for exclusive state 'string
'WARNING: No ignore rule is defined for exclusive state 'string

'		# regexp/action rules (ANY
)		# required lexer error callbac
k		def t_ANY_error(t): raise SyntaxError('lexer:
 		# regexp/action rules (STRING
)		t_string_ignore = '' # don't ignore anythin

gFor moving between states we need mode switching sequences

:		# regexp/action rules (INITIAL
)		def t_begin_string(t)
:			r'\'
'			t.lexer.push_state('string'
)		# regexp/action rules (STRING
)		def t_string_end(t)
:			r'\'
'			t.lexer.pop_state() # return to INITIA

LAny char in string state must be stored somewhere forming resulting string. W ecan do in lexer object as custom attribute

:		# token type
s		tokens = ['COMMAND','REGISTER','EQ','STRING'
]		# regexp/action rules (INITIAL
)		def t_begin_string(t)
:			r'\'
'			t.lexer.push_state('string'
)			t.lexer.LexString = '' # initialize accumulato
r		# regexp/action rules (STRING
)		def t_string_char(t)
:			r'.
'			t.lexer.LexString += t.value # accumulat
e		def t_string_end(t)
:			r'\'
'			t.lexer.pop_state() # return to INITIA
L			t.type = 'STRING'					# overryde token typ
e			t.value = t.lexer.LexString # accumulator to valu
e			return t # return resulting string toke

nAnd finally add special \term{escape sequences}

:		# regexp/action rules (STRING
)		def t_string_tab(t)
:			r
't
'			t.lexer.LexString += '\t
'		def t_string_cr(t)
:			r
'r
'			t.lexer.LexString += '\r
'		def t_string_lf(t)
:			r
'n
'			t.lexer.LexString += '\n
'		def t_string_char(t):				# must be last rul

eLexToken(REGISTER,1,2,9
)LexToken(EQ,'=',2,12
)LexToken(STRING,'R\t[1]',2,21
)LexToken(COMMAND,'nop',3,31
)LexToken(COMMAND,'bye',4,43
)Non
eNon
e..

.\subsection{End of file lexing }End of source can be processed by two variants :\begin{enumerate}[nosep ]

use special t_eof() rul e

trigger on None returned by next lex.token() call \end{enumerate }Just fix lexer print loop

:	def lexer(self,src)
:		# get next token						
 		while True
:			next_token = lexer.token(
)			if not next_token: break # on Non
e			print next_toke

nLexToken(REGISTER,1,2,9
)LexToken(EQ,'=',2,12
)LexToken(STRING,'R\t[1]',2,21
)LexToken(COMMAND,'nop',3,31
)LexToken(COMMAND,'bye',4,43

)\begin{center}{\Huge Lexer done !}\end{center

Parser/Compiler

}Let's add parser, move all code from lexer() method into compiler()

:	def compiler(self,src)
:		# ===== init code section ====
=		# set instruction pointer entry poin
t		self.Ip = 0						
			# compile entry code
			self.program = [ self.nop 
]		# ===== lexer code section ====
=		..
.		# ===== parser/compiler code section ====
=		..
.		# ===== compile final code ====
=		self.program += [ self.bye

]		# ===== lexer code section ====
=	
			# extra lexer state
s		states = (('string','exclusive'),
)		# token type
s		tokens = ['COMMAND','REGISTER','EQ','STRING'
]		..

.		# ===== parser/compiler code section ====
=	
			# create ply.yacc object, without extra file
s		parser = yacc.yacc(debug=False,write_tables=None
)		# feed & parse source code using lexe
r		parser.parse(src,lexer)

Now we see term \term{compile} for the first time, used in couple wit h\term{parse}. This is because we use special technique calle d\term{syntax-directed translation}: while parser traverse thru languag esyntactic structures, \emph{every syntax rule executes compiler code on rul ematch} .And \term{compile} term in this case means not more then adding machin ecommands, bytecodes or tiny executable elementary elements in our dem ocase, to \term{compiler buffer}, i.e. self.program[] memory

.This method is very suitable for simple \term{imperativ elanguages}\note{languages says what to do step by step} like assemblers, whic hcan be implemented by using only global data structures , like symbo ltables, list of defined functions, and don't require to transfer or compute dat abetween nodes of tree-represented program (\term{attribute grammar} method

[synthesized attributes] from nested elements to high level elements, an d
[inherited attributes] from parent nodes to subtree

sERROR: no rules of the form p_rulename are define

dAs for lexer, we need set of p_rules .\subsection{Backu s-- Naur form }For lexer we used \term{regular expressions}, and this is very understandabl eand easy to use text templates, until we try to match so easy elements a snumbers and identifiers

.But to specify \term{programming language grammar}, \emph{regexps can't matc hrecursive nested elements}, like simple match expressionons with groups o fbrackets [dragon] . And now meta language \note{language describe sanother language} comes into play specially designed to describe artificia llanguages grammar: \term{BNF}, or \term{Backus-Naur form}

.Our assembly language can be described as

:program >-< empt>
yprogram >- program command { /* action */ memory += $2

}or in form with or element and yacc BNF variant can be groupe

dprogram : | program comman

d[\term{production}] is every rule in this language spec
s[\term{nonterminal}] element with low case name, which will b edescribed as composite structure, consists of another elements in other sproduction
s[\term{terminal}] is single element is not composite, like simple numbers ,strings and identifiers ; we will use up case to emphasiz ethem as tokens
[\term{epsilon}] or $\epsilon $is empty space have no element sat al

Note resursion: program refers to program itself as subelementl. In thi sproduction we describe that \term{program}$_0$ can be empty, \term{or} \verb$| $can be concatenated from (sub)\term{program}$_1$ followed by \term{command}$_2$. Parser code will \term{recursive} try to match program$_1 $using the same rule, until recursion will end up by program : part .Every time parser (sub)rule matches, code in {action} will b eexecuted. This code\note{\cpp, \py, \ Jor any other language you r\term{parser code generator} supports can do anything you want}. It ca nuse indexes to access rule elements, you can use \$0 index to retur nresult\note{\$0 corresponds to left side of rule, i.e. nonterminal value}, an d\$1 for program${_1}$ and \$2 for command values. For example with tiny ``no pbye'' program and this grammar

:program :< empt>y			{ $0 = "what to do:\n"	
}program : program command	{ $0 = $1 + $2			
}command : NOP				{ $0 = "do nothing\n"	
}command : BYE				{ $0 = "stop system\n"

parser will start from topmost} \term{start production}\note{all rule swith equal left nonterminals will be grouped } trying to match every rul top downe in \term{greedy} way: match the longest right part wit deep firsth search. In result parser will return you string

:what to do
:no nothin
gstop syste

mAt this point I tried to write parsing process step by step, but it is to ocryptic, and we skip this trace with parser stack pushing and elements shifting .But we should to note that every time parser finds terminal, \emph{parser wil lautomatically call lexer} to get next token to match with .Returning to our sheeps, we are not lucky in BNF syntax usage in \py .PLY parsing library use not so short grammar syntax: we must define specia lfunctions for every production, giving BNF in docstring

:		# ===== parser/compiler code section ====
=	
			# grammar rule
s	
			def p_program_epsilon(p)
:			' program : 
'			p[0] = 'what to do:\n' # $0 = ..
.		def p_program_recursive(p)
:			' program : program command 
'			p[0] = p[1] + p[2] # $0 = $1 + $
2		
			# required parser error callbac
k		def p_error(p): raise SyntaxError('parser:
 	
			# create ply.yacc object, without extra file
s		parser = yacc.yacc(debug=False,write_tables=None
)		# feed & parse source code using lexe
r		parser.parse(src,lexer)			
		
	VM(' nop bye '

)ERROR: Symbol 'command' used, but not define
dWARNING: Token 'EQ' defined, but not use
dWARNING: Token 'REGISTER' defined, but not use
dWARNING: Token 'COMMAND' defined, but not use
dWARNING: Token 'STRING' defined, but not use
dWARNING: There are 4 unused token

sWe need p_error(p) error callback function will be called on synta xerrors .Here we have some problem: our lexer returns all commands as one universa COMMANDl token, so we need to analyze its value, or just fix lexer

:		# ===== lexer code section ====
=		# token type
s		tokens = ['NOP','BYE','REGISTER','EQ','STRING'
]		# replace t_COMMAND by
:		def t_NOP(t)
:			r'nop
'			return 
t		def t_BYE(t)
:			r'bye
'			return

tAs you see, this PLY code is not compact and easy to read, and one of thing w eare going to do much much later is to make special language for writing parser swith more light and easy to read syntax. Mark this TODO for DLR

.		def p_program_epsilon(p)
:			' program : 
'		def p_program_recursive(p)
:			' program : program command 
'		def p_command_NOP(p)
:			' command : NOP 
'			p[0] = 'do nothing\n
'		def p_command_BYE(p)
:			' command : BYE 
'			p[0] = 'stop system\n

'		..
.		# feed & parse source code using lexe
r		print parser.parse(src,lexer)

Here we added print command to see that parser can return values

.WARNING: Token 'STRING' defined, but not use
dWARNING: Token 'EQ' defined, but not use
dWARNING: Token 'REGISTER' defined, but not use
dWARNING: There are 3 unused token
swhat to do
:do nothin
gstop syste

m0000< bound method VM.nop of< __main__.VM instance at 0x023E691>>8 [0, 1, 2, 3, 4, 5, 6, 7
]0001< bound method VM.bye of< __main__.VM instance at 0x023E691>>8 [0, 1, 2, 3, 4, 5, 6, 7

][warnings] from PLY library: we defined some terminals (tokens) but not us ethem in syntax gramma
r[string returned from parser] as we expec
t[program trace] containts log of executing entry code created b compiler()

y\subsection{Bytecode compiler }Change code to compile bytecode

:	def compiler(self,src):
	
			# ===== init code section ====
=		# set instruction pointer entry poin
t		self.Ip = 0						
			# clean up program memor
y		self.program = [
]	
			# ===== parser/compiler code section ====
=		def p_program_epsilon(p)
:			' program : 
'		def p_program_recursive(p)
:			' program : program command 
'		def p_command_NOP(p)
:			' command : NOP 
'			self.program.append(self.nop
)		def p_command_BYE(p)
:			' command : BYE 
'			self.program.append(self.bye

)		# feed & parse source code using lexe
r		parser.parse(src,lexer)

Now compiler does no add any entry code, and traced code is our program

.WARNING: Token 'STRING' defined, but not use
dWARNING: Token 'EQ' defined, but not use
dWARNING: Token 'REGISTER' defined, but not use
d0000< bound method VM.nop of< __main__.V>M [0, .., 7
]0001< bound method VM.bye of< __main__.V>M [0, .., 7

]We have some warnings about terminals not used in our grammar, they are linke dwith register load command we omitted. Lets add this command grammar. Firs trecover full sample program

:if __name__ == '__main__'
:	VM(r''' # use r' : we have escapes in string constan
t 		R1 = 'R\t[1]
'        no
p        by
e	'''

)Remember we have defined in lexer

\# comments /^lexcomment/
parsing string using special lexer state /^lexstring/

 		def p_command_R_load(p)
:			' command : REGISTER EQ constant
'			# compile ld command opcod
e			self.program.append(self.ld
)			# compile register number using valu
e			# from terminal REGISTER at p[$1] in productio
n			self.program.append(p[1]
)			# compile constan
t			self.program.append(p[3]
)		def p_constant_STRING(p)
:			' constant : STRING 
'			p[0] = p[1

]We defined constant nonterminal for later use: constant can be no tstring, but also number, or pointer to any object

.0000< bound method VM.l>d [0, 1, 2, 3, 4, 5, 6, 7
]0003< bound method VM.no>p [0, 'R\t[1]', 2, 3, 4, 5, 6, 7
]0004< bound method VM.by>e [0, 'R\t[1]', 2, 3, 4, 5, 6, 7

\F: Command Shell

Program works, but we need something more interesting. Lets do som interactive systeme, in parralel adding some control constructions to our assembler. At this point w

have no tools let as define complex syntax in our languag e
we dont want to mess two parsers: assembler and command shel

lSo we need something very strange for our command shell: programming languag ewithout syntax. In fact it is impossible, but there is one language with ultr asimple syntax which parser can be rewritten in few machine commands : \F .If you know something about \F, it is not suprize for you that we use it: ou r\term{virtual machine }was designed very close to this language principles. We do some shif tfrom original \F, as we want to manipulate with objects, but not with byte sand machine integers (see next page), as you do making origina l\term{Virtual FORTH Machine}

.Returning to \ Fsyntax, it is very simple :\begin{enumerate}[nosep

] collect sequential symbols one by one from input stream until firs tspace encountere d--- \emph{tadaam! thats all: you have got the parser finished 8-)

} and try to do something with selected string (not surprizing that this string has special name in \ F--- \term{word}) .\end{enumerate }On next stage \F-system tries to find this word in special structur e-- -\term{vocabulary}. Vocabulary is linked list contains compiled procedure swith its names stored in .If parsed word matches procedure name, this procedure wil lbe executed immediately. If word not found in vocabulary, \ Ftries to conver tit into integer number\note{there is no floating point support in cor e\ Flanguag e--- are you already scared?}, and push into \term{data stack}

.We are not going to write yet another tutorial on \ Fhere. It is bette rif you take a break here, and look into first chapters of real \emph{cool book :Leo Brodie's Starting \F [starting] }. You will be amazed by \ Fmiracl esimplicity, and scared by its dragon ass [dragon] of zen syntax ,data stack bitbanging, and functional limitations\note{no float numbers ,no files, do you want data structures? make arrays yourself!}

.Also we will do some ANS Forth standard sidestep, see notes .Let's start from main \ Fword: REPL\note{Read-Eval-Print-Loop} implementin gcommand shell\note{still without compilation and error checking}

.: INTERPRET	 	REPL interpreter loo
p	begi
n	 	get next word name from input strea
m		word	( -- str:wordname )
 	 	find word entry poin
t		find	( str:wordname -- addr:xt 
)	 	execute found xt (execution token)
 		execute	( xt -- )
 	agai
n

;Don't forget how we should pass this source for compilation and run :enclose it into VM(''' ... code ... ''') in \p ysource .\subsection{Expanding VM using inheritance }A current parser will not process this source, as syntax notably changed .We will not break VM demo parser, but use \emph{VM class inheritance to redefin eparser} for special \ Fdialect, and add some extra command to original VM .When you are going to implement your own language on top of VM, you should go this way too

: fork special class from core V
M implement parser/compiler for your languag
e expand inherited VM with specific commands your need o

fLast item must be focused: don't change core VM class with new command sor modified behaviour, as this changes will impact on all other inherited VMs .We broke this rule here, only because VM can't do anything in this point, and w eneed a lot of to add to make core VM be usable itself

.# it should look like this
:class FORTH(VM)
:	t_ignore_COMMENT = r'\#.*
|.*|\(.*?\)'		# commen

tAnd here we have a problem, Houston! All elements of parser are encapsulate din compiler() method using closure, and oop swe must rework VM class itself .The way we can do it you can see on this short sample. This i slexer-only variant of simplest Hello World parser uses inheritance

import sys ; sys.path.insert(0,'./ply')
import ply.lex  as lex
import ply.yacc as yacc
# issue 142 fix: enable patches
# required for parser class inheritance
ply_class_inherit = True	

class VM:
	tokens = ['WORD']
	t_ignore = ' \t\r\n'
	def t_WORD(self,t):
		r'[a-z]+'
		print t
	def t_error(self,t): raise SyntaxError(t)
	def __init__(self,src=''):
		self.lexer = lex.lex(object=self)
		self.lexer.input(src)
		while self.lexer.token(): pass
class FORTH(VM):
	t_ignore = ' \t\r'
	def t_newline(self,t):
		r'\n'
		t.lexer.lineno += 1
FORTH('''
	hello
		world
''')

:I faced with problem in PLY: its lexer generator have problem wit t_errorh unicality validation, if there are more t_error function s(or methods) defined anywhere in one module. So I made tiny fix i ply/lex.pyn, se issue 142e, but my pull reques tstill not merged, and you must clone patched PLY right here

:~/DLR$ git clone -o ponyatov  	https://github.com/ponyatov/ply.gi

tFor import patched PLY we need to push it in search path before /usr/lib

:import sys ; sys.path.insert(0,'./ply'
)import ply.lex  as lex ; ply_class_inherit = True # use fi

xGoing to rewrite VM, we need t_rules as class static strings, an dmethods, moved out of compiler()

:class VM
:	# ===== lexer code section ====
=	t_ignore = ' \t\r'				# drop spaces (no EOL
)	t_ignore_COMMENT = r'\#.*'		# line commen
t	# regexp/action rules (ANY state
)	def t_ANY_newline(self,t):		# special rule for EO
L		r'\n
'		t.lexer.lineno += 1			# increment line counte
r		# do not return token, it will be ignored by parse
r	# token type
s	tokens = ['NOP','BYE','REGISTER','EQ','STRING'
]	# required lexer error callback must be metho
d	def t_ANY_error(self,t): raise SyntaxError('lexer:

Expand comment regexp and tokens set in FORTH

:class FORTH(VM)
:	t_ignore_COMMENT = r'\#.*
|.*|\(.*?\)'		# commen
t	tokens = ['NOP','BYE','REGISTER','EQ','STRING'
,			'COLON','SEMICOLON','BEGIN','AGAIN']	# :

;Change lexer creation

:class VM
:	def compiler(self,src)
:		..
.		# create lexer objec
t		# note: we point on object (instance!) with rules
 		self.lexer = lex.lex(object=self
)		# feed source cod
e		self.lexer.input(src

)and parser the same

:class VM
:	def compiler(self,src)
:		# create ply.yacc object, without extra file
s		parser = yacc.yacc(debug=False,write_tables=None ,			module=self) # here we must point on instanc
e		# parse source code using lexe
r		parser.parse(src,self.lexer

)In \ Fsyntax we have no ability to distinct between VM commands, defined i nvocabulary and yet undefined words using only regexp. So we need to group token sin one ID universal token, and do some lookup to detect what token type we hav ein every point

.class FORTH(VM)
:	tokens = ['ID','COLON','SEMICOLON','BEGIN','AGAIN'

]We removed some token types, and PLY will give as report\note{PLY was develope dby David M. Beazley specially for student learning purposes, so good verbosit yis key feature of this parser generator. As we are going to build powe metaprogramming systemr, which can be thinked as \textit{interactiv ecompiler \& algorithm development environment}, we should go this way to olater making parser generators and other tools, visualization and verbosity mus thave. }what's wrong

:ERROR: /home/ponyatov/DLR/VM.py:62
:	Symbol 'NOP' used, but not defined as a token or a rul
eERROR: /home/ponyatov/DLR/VM.py:65
:	Symbol 'BYE' used, but not defined as a token or a rul
eERROR: /home/ponyatov/DLR/VM.py:68
:	Symbol 'REGISTER' used, but not defined as a token or a rul
eERROR: /home/ponyatov/DLR/VM.py:68
:	Symbol 'EQ' used, but not defined as a token or a rul
eERROR: /home/ponyatov/DLR/VM.py:73
:	Symbol 'STRING' used, but not defined as a token or a rul
eWARNING: Token 'AGAIN' defined, but not use
dWARNING: Token 'BEGIN' defined, but not use
dWARNING: Token 'SEMICOLON' defined, but not use
dWARNING: Token 'COLON' defined, but not use
dWARNING: Token 'ID' defined, but not use
dWARNING: There are 5 unused token
sERROR: Infinite recursion detected for symbol 'constant
'ERROR: Infinite recursion detected for symbol 'command
'ply.yacc.YaccError: Unable to build parse

rHere we see source line numbers, points directly on problematic gramma rules ,and list of unused tokens .So we need to drop corresponding rule sfrom inherited class .The second tiny patch i've done for PLY, lets mark parser rules as deleted usin =Nullg assignment

:class FORTH(VM)
:	t_NOP = p_NOP = Non
e 	t_BYE = p_BYE = Non
e 	t_REGISTER = p_Rloa
d 	p_constant_STRING = Non

eERROR: /home/ponyatov/DLR/VM.py:60: Symbol 'command' used
,	but not defined as a token or a rul

eWe must define what commands we are going to see in our programs

.[grammar] rules modifies VM grammar, and require

d# grammar overrid
edef p_ID(self,p):		' command : ID 
'def p_COLON(self,p):	' command : COLON 
'def p_SEMICOLON(self,p):' command : SEMICOLON 
'def p_BEGIN(self,p):	' command : BEGIN 
'def p_AGAIN(self,p):	' command : AGAIN

'[lexemes] we want to override comparing to base VM regexp rule se

t# lexemes regexp override
st_COLON = ':' ; t_SEMICOLON = ';
'def t_BEGIN(self,t)
:	r'begin
'	return t
 def t_AGAIN(self,t)
:	r'again
'	return 
tdef t_ID(self,t): # this rule must be last rul
e	r'[a-zA-Z0-9_]+
'	return t

AssertionError, we got empty program self.Ip < len(self.program) :\subsection{Token overwrite for commands and \ Fwords }Let's add fix to begin of source lets run dummy program as usual

:FORTH(r''' # use r' : we have escapes in string constant
snop by
e..

.Base VM compiler detects this commands directly in parser syntax rules ,and compiles them. In new inherited FORTH machine we disabled this rules ,and moved all word-looking char sequences into one ID token. In sampl esource you can see that this ID can be new word name, VM command or contro lstructure statement\note{In \ Fall control structures must be implemented usin ggeneric \F-words, we'll see how to make your own controls yourself later. Bu tnow we will use parser-based method to show you how to implement them i nyour assembly or compiler}. In ID lex rule we can detect what is this ID i nreal, and use \term{terminal overwrite} .We have to add some hint let us check is given word name

:[VM command
]we can use \p y\term{reflection} to look up is given string is FORT Hclass method name or its VM superclass method, but for simplicity we will us especial dict
```
:class FORTH(VM)
:	# command lookup table: string >- metho
d	cmd = { 'nop':VM.nop , 'bye':VM.bye 
```
}[already defined word] exists in vocabular
yit is intuitive we must have vocabulary itself
```
:class FORTH(VM)
:	# vocabulary of all defined word
s	voc = {
```
}[undetected ID], maybe new word or some misspellin

gNow we can add \term{lexeme overriding} into t_ID lexer rule

:	# add extra types for token
s	tokens = ['ID','CMD','VOC'
,				'COLON','SEMICOLON','BEGIN','AGAIN'
] 	def t_ID(self,t): # this rule must be last rul
e 		r'[a-zA-Z0-9_]+
' 		# first lookup in vocabular
y 		if t.value in self.voc: t.type='VOC
' 		# then check is it command nam
e 		if t.value in self.cmd: t.type='CMD
' 		return t

Note checking order: this variant let as define new words with names equal to V Mcommands, and later we'll do it in \term{\F-assembler} /^forthasm/ let yo ucompile or execute single VM command in interactive \ Fsystem .And define new special rules in grammar

: 	def p_CMD(self,p)
: 		' command : CMD 
' 		# compile command using cmd{} lookup tabl
e 		self.program.append(self.cmd[p[1]]
) 	def p_VOC(self,p):		' command : VOC

'Dummy compiles and executes\note{check VM.interpreter() uses command(self)}, bu twe need to run last defined word as program entry point . To do this w emust implement all parser/compiler staff we did as stubs .\subsection{Implement : COLON ; definition

}class FORTH(VM)
:	# vocabulary of all defined word
s	voc = {

}  	def p_COLON(self,p)
:  		' command : COLON ID
'  		# store current compilation pointer into vo
c  		self.voc[p[2]] = len(self.program
)  		print self.vo
c  	def p_ID(self,p):		' command : ID

'{'INTERPRET': 2
}0000< unbound method VM.no>p [0, 1, 2, 3, 4, 5, 6, 7
]0001< unbound method VM.by>e [0, 1, 2, 3, 4, 5, 6, 7

]What we do? On : INTERPRET code we \emph{marked current compile rposition} into vocabulary. Number 2 is address (index in program[2] ) o fcommand will be compiled just next

.class FORTH(VM)
:	# command lookup tabl
e	cmd = { 'nop':VM.nop , 'bye':VM.bye , 'ret':VM.ret
} 	def p_SEMICOLON(self,p)
: 		' command : SEMICOLON 
' 		self.program.append(self.cmd['ret']
) 	
	class VM
:  	def dump(self):
	  		prin
t  		for i in range(len(self.program))
:  			print 
'  		prin
t	def __init__(self, P='')
:		self.compiler(P)			# run parser/compile
r		self.dump()					# dump compiled progra
m		self.interpreter()		  	# run interprete

rWe added ret command compilation runs on ; in source code. If you look o nprogram dump, you'll see that INTERPRET item in vocabulary points o naddress of first command, compiled by ; (out parser ignores all cod ebetween :;

){'INTERPRET': 2
}0000:< unbound method VM.no>
p0001:< unbound method VM.by>
e0002:< unbound method VM.re>

t\subsection{CALL and RET commands }As you see from [starting] and especially from [threaded] , \ Fprogram is tangle of spaghetti \term{threaded code}\note{distinct this term fro m\term{thread} term in \term{multitasking}}, where every word use other word svia nested \term{call}. Every time when call occurs, current executio n(interpreter) pointer pushes into special \term{return stack}, and jumps t orequired word code. When called word finishes its work, ret command pop s\term{return address} from stack, and return execution on next command after used call command\note{there is some variant of pure threaded cod eyou can see in [thbell] --- only addresses without call opcode, an dspecial address for ret}

.class VM

:	D = []							# data stac
k	R = []							# CALL/RET return stac
k	def call(self)
:		# push return address (Ip points to call parameter
)		self.R.append(self.Ip+1
)		self.Ip = self.program[self.Ip]	# jm
p	def ret(self)
:		assert self.R					# check non-empt
y		self.Ip = self.R.pop()	# return to marked address	

	class FORTH(VM)
:	# command lookup tabl
e	cmd = { 'nop':VM.nop , 'bye':VM.bye 
,			'call':VM.call, 'ret':VM.ret
} 	def p_VOC(self,p)
:		' command : VOC 
'		self.program.append(VM.call);		# opcod
e		self.program.append(self.voc[p[1]])	# cf

aNow we can implement and check calls for colon-defined words

:: NOOP 
;: INTERPRET	 	REPL interpreter loo
p	NOO
P	..

.{'NOOP': 2, 'INTERPRET': 3

}	0000:< unbound method VM.no>p
 	0001:< unbound method VM.by>e
 NOOP:
 	0002:< unbound method VM.re>t
 INTERPRET:
 	0003:< unbound method VM.cal>l 0002 : NOOP
 	0005:< unbound method VM.re>

tTo execute this we must add code, let us use last defined word as progra mentry point. To do this, we replace first command with with call _entry

:	FORTH(r''' # use r' : we have escapes in string constant
 stest FORTH comment syntax for inherited parse
r: NOOP 
;: INTERPRET	 	REPL interpreter loo
p	NOO

Pclass VM
:	def init_code(self): pas
s		# set instruction pointer entry poin
t		self.Ip = 0						
			# clean up program memory
			self.program = [
]	def compiler(self,src)
:		# ===== init code section ====
=		self.init_code(

)class FORTH(VM)
:	def init_code(self)
:		VM.init_code(self
)		self.program.append(VM.call)	# call ..
.		self.program.append(0)			# _entry = 
1		self.program.append(VM.bye)		# by
e  	def p_COLON(self,p)
:  		' command : COLON ID
'  		# store current compilation pointer into vo
c		# reset _entry to current cf
a  		self.program[1] = self.voc[p[2]] = len(self.program
)		print self.vo

c\subsection{Change tracing log output }I have strage error in ret command in NOOP , so we need to chang etracing log generation: it must include both stacks, and have no info o nregisters as \ Fdon't use them .Making program dump we need fast conversion from current address into record i nvocabulary. So we must use reversed vocabulary to do reverse lookup fro maddr to associated label (word name, variable,\ldots) .Also we need to change logging in interpreter: we don't need registers, bu tneed stacks to be printed on every command (if trace enabled)

.class VM
:	# return known label for given addres
s	def log_label(self,addr): return '
'	# dump command from given add
r	def log_command(self,addr)
:		A = add
r		# current comman
d		command = self.program[A
]		# check if we have known label
s		L = self.log_label(A
)		if L: print '\
n		# print main command log tex
t		print '\n\
t		# process commands with parameter
s		if command == VM.call
:			# target addres
s			T = self.program[A+1
]			# print target addr with known labe
l			print T,self.log_label(T)
,			A += 
1		return A+1	# return next command addres
s	# log extra state (in interpreter trace
)	def log_state(self)
:		print self.Rg

,Note that log_command returns address of next command --- it i srequired as we have commands with var length . Interpreter don't us ereturned addr, as it is done inside commands, but in dump method ever ynext dumped addr must be computed i log_commandn

.	def interpreter(self)
:		self._bye = False		   				# stop fla
g		while not self._bye
:			assert self.Ip<  len(self.program
)			command = self.program[self.Ip]	# FETCH comman
d			self.log_command(self.Ip)		# log comman
d			self.log_state()				# log stat
e			self.Ip += 1				# to next comman
d			command(self)				# DECODE/EXECUT

E   	def dump(self)
: 		# loop over self.progra
m 		addr = 0
; 		# loop over program
  		while addr<  len(self.program)
: 			# addr: command< extr>
a 			addr = self.log_command(addr
)   		print ; print '-'*5

5class FORTH(VM)
:	# vocabulary of all defined word
s	voc = {
}	# reversed vocabulary {addr:name} for fast label looku
p	revoc = {
}	def log_state(self)
:		print 'R

:  	def p_COLON(self,p)
:  		' command : COLON ID
'  		# store current compilation pointer into vo
c		# reset _entry to current cf
a  		self.program[1] = self.voc[p[2]] = len(self.program
)  		# add reversed pair {addr:label
}  		self.revoc[len(self.program)] = p[2
]		print self.vo

c{'NOOP': 3, 'INTERPRET': 4

}	0000< unbound method VM.cal>l 4 INTERPRET
 	0002< unbound method VM.by>e
 NOOP:
 	0003< unbound method VM.re>t
 INTERPRET:
 	0004< unbound method VM.cal>l 3 NOOP
 	0006< unbound method VM.re>
t------------------------------------------------------
-	0000< unbound method VM.cal>l 4 INTERPRET R:[]
 INTERPRET:
 	0004< unbound method VM.cal>l 3 NOOP R:[2]
 NOOP:
 	0003< unbound method VM.re>t R:[2, 6]
 	0006< unbound method VM.re>t R:[2]
 	0002< unbound method VM.by>e R:[

]This trace log looks much better

:[first part
]contains compiled program dum
p[second part
]contains interpreter execution log with stacks stat before command execution

eAlso we can add extra check for call command: check that parameter after cal lopcode was integer address

:	def call(self)
:		# push return address (Ip points to call parameter
)		self.R.append(self.Ip+1
)		self.Ip = self.program[self.Ip]			# jm
p		assert type(self.Ip) == int	# addr must be intege

r\subsection{Throw out registers }\ Fis pure stack language, and we will throw out registers\note{until we wan tto play with self-made JIT}

.class FORTH(VM)
:	Rg = ld = None	# throw out register

s\subsection{Vocabulary structure }You can see some strange cfa label in listings above .In \ Fthere are some set of fields every word contains in vocabulary

[LFA] \term{Link Field Area
} In normal \ Fsystem words resides in memory in sequence . If we want to do search in vocabulary, we must have all word s to be in linked list , from end of vocabulary down to first word . So every LFA holds address of previous word . It is simplest way to do list in low leve l--- only single pointe r (memory address) on previous element, followed by any data .
[NFA] \term{Name Field Area
} NFA holds word name. It can be any string type you use in all system .
[AFA] \term{Attribute Field Area
} You can add some attributes and flags to every word
, IMMED flag is only required (execute word in compilation mode) .
[CFA] \term{Code Field Area
} is word code section
, in other words it is address of first executable command of code sectio n
[PFA] \term{Parameter Field Area} (optional
) Some elements like constants and variables must hold some data , and PFA is memory area for this case. In CFA code section there i s some code do something with PFA, like fetching data from PFA , or just pushing PFA address on data stack .

Our \F-like system not \ Fin general, and we store vocabulary i nspecial data structure out of program memory . Using \p ydic tfor vocabulary gives more speed search using hashed keys, but have some limits

dict can have only one key, and we use word names as keys, so w e can't hold multiple (overrided) words with equal names in vocubular y
data in dict not ordered, and we can't control search order, but i t is not a matter, as we can't use overrided words in any wa
y vocabulary not in program address space , and we can't hack vocabulary itself using just memory manipulation. If we'll need thi s later, we must add some special commands to manipulate vocabulary

.\subsection{\ Fsystem stability }Vocabulary separated out of main program memory is our first step to mak esystem much more stable then generic \F. In normal \F-system any accident writ eto addressable memory\note{it is most frequent error linked with \emph{imprope roperand order} on data stack, as bad address computed by previou scode }can drop whole system in unstable crashing state and can broke your data .Separating address spaces isolates this crashing, and system equipped wit hdebugger or fault protection subsystem, can do recovery or some emergenc yprocessing before everything drops to hell .Later we'll introduce OOP implementation, and \term{address isolation} betwee nobjects will be good for stability .\subsection{JMP command and BEGIN/AGAIN infinity loop }In \ Fwe want REPL\note{Read Eval Print Loop} loop as user command interface :system mus t\begin{enumerate}[nosep

] get source word by word separated by spaces

, do search in vocabulary, and

execute every found word

or print error message

. and repeat it infinit y\end{enumerate }To do infinity loop we need to implement begin/again pair in compiler .At low level

:[begin] mark s current compiler position in som e \term{control compiling stack}
, this stack is required to let nested control structures
[again] resolve saved position popping addres from stack, and compiles JMP command equal to goto in high-level language

sWe can use special dedicated control compiler stack, but we will us e\term{return stack}, as it is not used in compiling stage before interprete rstarted. Th eother reason to do this is to show how control structures are implemented in ordinary \ Fsyste m--- they uses \term{return stack} for mark/resolve addresse sin controls compilation

.class VM

:	def jmp(self)
:		# get addr from jmp paramete
r		self.Ip = self.program[self.Ip
]		# check type: addr must be intege
r		assert type(self.Ip) == in
t		# check rang
e		assert self.Ip<  len(self.program)

		def log_command(self,addr)
:		if command in [ VM.jmp, VM.call]
:			print T,self.log_label(T)
,class FORTH(VM)
:	cmd = { 'jmp':VM.jmp, 'call':VM.call, 'ret':VM.ret,..

} 	def p_BEGIN(self,p)
:		' command : BEGIN 
'		# mark Ip pushing in return stac
k		self.R.append(len(self.program)
) 	def p_AGAIN(self,p)
:		' command : AGAIN 
'		# jmp opcod
e		self.program.append(self.cmd['jmp']
)		# jmp parameter: pop marked I
p		self.program.append(self.R.pop()

)INTERPRET
:        0006< unbound method VM.cal>l 5 NOO
P        0008< function word at 0x018A51B>
0        0009< function find at 0x018A51F>
0        000A< unbound method VM.jm>p 
8        000C< unbound method VM.re>

t(*) word and find will be defined later

.As you see in listing, there is one jmp command compiled by again at address 0x000A , jumps to 0x0008 marked by begin betwee NOOPn and word in source code .\subsection{Parsing input stream and search in vocabulary

}class FORTH(VM)
:	PAD = list('hello world '
)	def word(self)
:		# result strin
g		S = '
'		# space
s		BL = ' \t\r\n
'		# skip leading space
s		while True
:			# pop first char from input strea
m			C = self.PAD.pop(0
)			# break loop on non-space cha
r			if C not in BL: brea
k		# save first found non-BL cha
r		S += 
C		# collect until BL cha
r		while True
:			C = self.PAD.pop(0
)			if C in BL: brea
k			S += 
C		# push collected strin
g		self.D.append(S
)
		def find(self)
:		# pop word name from data stac
k		S = self.D.pop(
)		try
:			# push cfa of found wor
d			self.D.append(self.voc[S]
)			# push FOUND fla
g			self.D.append(True
)		except KeyError
:			# on error push word nam
e			self.D.append(S
)			# push NOT FOUN
D			self.D.append(False

)	cmd = { 'word':word, 'find':find , ...

}You can fix parser: raise error on undefined commands/words

:  	def p_ID(self,p)
:		' command : ID 
'#		raise BaseException(p[1]

)If we change source

:: hello 
;: NOOP 
;: INTERPRET	 	REPL interpreter loo
p	NOO
P	begin word find agai
n

;we'll get trace with found hello and undefined world

:0007< word at 0x018F51B>0 D:[] R:[2
]0008< find at 0x018F51F>0 D:['hello'] R:[2
]0009< method VM.jm>p 7  D:[3, True] R:[2
]0007< word at 0x018F51B>0 D:[3, True] R:[2
]0008< find at 0x018F51F>0 D:[3, True, 'world'] R:[2
]0009< method VM.jm>p 7  D:[3, True, 'world', False] R:[2
]0007< word at 0x018F51B>0 D:[3, True, 'world', False] R:[2

]\subsection{IF/ELSE/ENDIF and EXECUTE/ABORT

}: INTERPRET			 	REPL interpreter loo
p	begi
n	 	get next word name from input strea
m		word	( -- str:wordname 
)	 	find word entry poin
t		find 	( addr:cfa true | str:wordname false 
)		if ( addr:cfa 
)		 	call to addr from stac
k			execute	( addr:cfa -- 
)		else ( str:wordname 
)		 	dump state, stacks and restar
t			abor
t		endif	
		again			;		: hello ;  defined wor

executed command works as call but uses address from data stack

:class VM
:	def execute(self)
:		# push ret add
r		self.R.append(self.Ip)				
			# load jmp addr from stac
k		assert self.D ; self.Ip = self.D.pop()
			# addr must be intege
r		assert type(self.Ip) == int			
			# check rang
e		assert self.Ip<  len(self.program)	
	class FORTH(VM)
:	cmd = { 'execute':VM.execute,..}

abort must restart REPL loop, but we will stop all system with stat edump

:class VM
:	def abort(self)
:		print '\n\nD
:		self.dump(
)		raise BaseException('ABORT'
)class FORTH(VM)
:	cmd = { 'abort':VM.abort,..}

 D:[3, 'world'
]R:[2

]{'INTERPRET': 3
}	0000< unbound method VM.cal>l 3 INTERPRET
 	0002< unbound method VM.by>e

if else/ is complex to understand: it uses forward jumps an d\term{backpatching} with one pass compiling

:[?jmp] conditional ?jmp command uses flag from data stack an jumps on false
d[if] compiles ?jmp with fake parameter, and \term{marks} it sparameter addr into control\note{don't forget we use return stack, but it can b eseparate control stack }stack. Later this addr will be popped b elsey endif/ and \term{backpatched} with real address must b ejumped if false on data stack
.[endif] patches ?jmp command compiled by if usin addrg popped from control stac
k[else] (optional) implements else branch, compiles jmp endif command, replacing if addr on constrol stack by jmp else addr

.class VM
:	def qjmp(self)
:		assert self.
D		if not self.D.pop(): # ( bool:FALSE! -- )
 			self.jmp(
)		else
:			self.Ip += 1	# skip ?jmp paramete
r		
		def log_command(self,addr)
:		if command in [ VM.jmp, VM.qjmp, VM.call]
:			print 

'class FORTH(VM)
:	cmd = { 'jmp':VM.jmp, '?jmp':VM.qjmp,..

}class FORTH(VM)

:	# lexe
r	tokens = [..,'IF','ELSE','ENDIF'
] 	def t_IF(self,t)
: 		r'if
' 		return 
t 	def t_ELSE(self,t)
: 		r'else
' 		return 
t 	def t_ENDIF(self,t)
: 		r'endif
' 		return 

t 	# parse
r	def p_IF(self,p)
:		' command : IF 
'		self.program.append(self.cmd['?jmp'])# opcod
e		self.R.append(len(self.program))	# mar
k		self.program.append(-1)				# fake add
r	def p_ELSE(self,p)
:		' command : ELSE 
'		assert self.R ; A = self.R.pop()	# pop if jm
p		self.program.append(self.cmd['jmp'])# jmp endi
f		self.R.append(len(self.program))	# mar
k		self.program.append(-1)				# fake add
r		self.program[A] = len(self.program)	# backpatch i
f	def p_ENDIF(self,p)
:		' command : ENDIF 
'		assert self.R ; A = self.R.pop()	# resolv
e		self.program[A] = len(self.program)	# backpatc
h# 		self.program.append(self.cmd['nop'])# show endif po

sINTERPRET:
 	0004< function word at 0x7f31eece71b>8
 	0005< function find at 0x7f31eece723>0
 	0006< unbound method VM.qjm>p 000B  			# i
f	0008< unbound method VM.execut>e
 	0009< unbound method VM.jm>p 000C				# els
e	000B< unbound method VM.abor>t
 	000C< unbound method VM.no>p					# endi
f	000D< unbound method VM.jm>p 0004 INTERPRET
 	000F< unbound method VM.re>

tSpecial nop compilation was added to p_ENDIF to show positio nwhere endif takes an action. In real code this position will used by first command compiled after endif

.: hello if bye endif 

;hello:
 	0003< unbound method VM.qjm>p 0006	 	i
f	0005< unbound method VM.by>e
 	0006< unbound method VM.re>t		 	endi

fThis sample shows if/endif construct without else. And more complex neste dsample : hello if(1) nop else(1) if(2) bye endif(2) endif(1) ;

 	0003< unbound method VM.qjm>p 0008	 	if(1) 
 	0005< unbound method VM.no>p
 	0006< unbound method VM.jm>p 000B  	 	else(1
)	0008< unbound method VM.qjm>p 000B	 	if(2
)	000A< unbound method VM.by>e 		 	endif(2
)	000B< unbound method VM.re>t		 	endif(1

)\subsection{Implementing in-system \ Fcompiler }The key feature of \ Fsystem in self-extensibility : you can expan dsystem in runtime doing new word definitions via compilation. The interna lcompiler is the same simple as \ Fsyntax

.\ Fsystem can work in two states: every found word

[interpretation] STATE=0() : will be immediately execute d\note{default state when you start system }
[compilation]: call will be compiled to the end of vocabulary

State is controlled by\ldots STATE \term{variable}. You just chang STATEe, and system changes mode

.false var STATE		 	interpret =0 / compile

 : INTERPRET			 	REPL interpreter /compiler loo
p	begin word fin
d	if ( addr:cfa )		  word foun
d		STATE @ ( bool )  compile =true / interpret =false 
 		i
f			( addr:cfa ) compile  compile call to wor
d		els
e			( addr:cfa ) execute  call to addr from stac
k		endi
f	else abort endif 	word not found
 	again

;\subsection{Bool constants: TRUE/FALSE

}class FORTH(VM)
:	tokens = [..,'TRUE','FALSE'
]
		# lexer: will be recognized as syntax element
s 	def t_TRUE(self,t)
: 		r'true
' 		return 
t 	def t_FALSE(self,t)
: 		r'false
' 		return 
t 	
	 	# parser: allowed in variable declarations (see later
) 	def p_VAR(self,p)
: 		' command : init VAR ID 
' 	def p_init_bool(self,p)
: 		' init : bool 
' 		p[0] = p[1
] 	def p_bool_true(self,p)
: 		' bool : TRUE 
' 		p[0] = Tru
e 	def p_bool_false(self,p)
: 		' bool : FALSE 
' 		p[0] = Fals

e\subsection{Variables, memory r/w, LIT and literals

}false var STATE		 	interpret =0 / compile

 : INTERPRET			 	REPL interpreter /compiler loo
p	..
.		STATE @ ( bool )  compile =true / interpret =fals
e	...

Variables in classic \ Fresides in main memory. In C and UNI Xprograms\note{and DOS programs with \term{small model} and larger} traditionally use separat ememory segments for code .text , initialized .data , .stack and uninitialized heap .bss . We can go separation way to increas estability, but using same commands to read and write both variables an program memoryd can be very interesting. So lets use \term{flat memor ymodel} for data and code just as sample.

 class VM
:	def fetch(self)
:		assert self.D ; addr = self.D.pop(
)		assert addr<  len(self.program
)		self.D.append(self.program[addr]
)	def store(self)

LIT: command is required to push constants as is on data stack

:	def lit(self)
:		self.D.append(self.program[self.Ip]
)		self.Ip += 
1	def log_command(self,addr)
:		if command in [ VM.jmp, VM.qjmp, VM.call, VM.lit]

:class FORTH(VM)
:	cmd = { .., '@':VM.fetch, '!':VM.store , 'lit':VM.lit}

	class FORTH(VM)
:	tokens = [..,'VAR','FETCH','STORE'
] 	def t_VAR(self,t)
: 		r'var
' 		return 
t 	def t_FETCH(self,t)
: 		r'@
' 		return 
t 	def t_STORE(self,t)
: 		r'!
' 		return

tWe can implement more complex number literals including hex, binary an dint/float numbers :

 	tokens = [..,'NUM',..
] 	def t_NUM(self,t)
:		r'0x[0-9A-Fa-f]+
|			0b[01]+
|			[\+\-]?[0-9]+(\.[0-9]*)?([eE][\+\-]?[0-9]+)?
'  		if t.value[:2] == '0x'
:  			t.value = int(t.value[2:],0x10) # he
x  		elif t.value[:2] == '0b'
:  			t.value = int(t.value[2:],0x02) # bi
n  		else
:  			t.value = float(t.value
) 		return 
t 	def t_ID(self,t): # this rule must be last rul

eclass FORTH(VM)
: 	def p_VAR(self,p)
: 		' command : init VAR ID 
' 		addr = len(self.program
) 		self.voc[p[3]] = addr ; self.revoc[addr] = p[3
] 		# lit PF
A 		self.program.append(self.cmd['lit'])	 # lit ..
. 		self.program.append(len(self.program)+2) # PF
A 		# re
t 		self.program.append(self.cmd['ret']
) 		# compile PFA with init valu
e 		self.program.append(p[1]
) 	def p_FETCH(self,p)
: 		' command : FETCH 
' 		self.program.append(self.cmd['@']
) 	def p_STORE(self,p)
: 		' command : STORE 
' 		self.program.append(self.cmd['!']
) 	# var/const init value can be checked by syntax parse
r 	def p_init_NUM(self,p)
: 		' init : NUM 
' 		p[0] = p[1
] 	def p_init_bool(self,p)
: 		' init : bool 
' 		p[0] = p[1
] 	def p_bool_true(self,p)
: 		' bool : TRUE 
' 		p[0] = Tru
e 	def p_bool_false(self,p)
: 		' bool : FALSE 
' 		p[0] = Fals

e\subsection{Mobile-targetted GUI }You can make \ Fwork in text console mode, but we must think mobile, so we jum pstart from GUI interface, tuned for smartphone look\&feel. We must be ready t oface up with one thumb interface with vertical screen covered by hal fwith Android keyboard starting from scratch of Skynet development. You must b eable to do something cool with one hand doing Vrschikasana in crowded bus .For fast start we will use wxWidgets [zetwx] toolkit ,but later we'll move to native GUI to make system much lighter and closer to native host platform

.import wx						# import wxWidget
swxapp = wx.App()				# create wx GUI applicatio
nwxmain = wx.Frame(None)			# start simplest GUI widge
twxmain.Show()					# set visibl
ewxapp.MainLoop()				# start wx GUI main loo

p\fig{gui00.png}{height=0.65\textheight

}wxmain = wx.Frame(None			# start simplest GUI widge
t,title='GNU Dynamic (FORTH)')	# window titl

ewxmain.SetIcon(wx.ArtProvider.GetIcon(wx.ART_INFORMATION)

)Here we have some problem: .MainLoop() works in foreground and lock severything. Later we must process GUI events independently from system wor kin parallel. So we must use \p ythreading capabilities

:import threading			# GUI must be separate threa
dimport wx					# import wxWidget
swxapp = wx.App()			# create wx GUI applicatio

nclass MainWindow(wx.Frame):	# inherit GUI widge
t	def __init__(self)
:		# initialize superclas
s		wx.Frame.__init__(self,None
,			title='GNU Dynamic (FORTH)'
)		# set window ico
n		self.SetIcon(wx.ArtProvider.GetIcon
(			wx.ART_INFORMATION)
)		# show windo
w		self.Show(

)def GUI()
:	global wxmain ; wxmain = MainWindow(
)	wxapp.MainLoop(

)# create & start GUI threa
dthread_GUI = threading.Thread(None,GUI
)thread_GUI.start(

)import sy
sthread_GUI.join()	# wait until GUI stop
ssys.exit(

) ]

Generics: data types and algorithms

As we are going to use \term{code autogeneration }, we must create typ esystem able to mimic any programming language data model, and \term{autogen }code specific to typically use cases. Another important thing: we must implemen tlarge set of widely known generic algorithms, and let them to be used a sfirst class dynamic script elements (as algorithm libraries) .To implement this we can define special classes set, powered wit h\term{reflection} and dynamic behaviour

:[Primitive] superclass for elements can be implemented usin gcomputer hardware capabilities at low level code, like numbers and strings
;[Collection] is anything can contain another data elements inside themsel
f[AST] as widely accepterd form of programming languages constructs, a swe are going to wors with program elements as i
s[Active] elements represents active part of computing syste
m[Function] and methods is widely used as main programming languag econstruc
t[Algorithm] would be great to be able to manipulate as first class objec
t[Thread] is single process of sequential computing required fo rmultitaskin

g\fig{../tmp/sym_00.pdf}{width=\textwidth }\fig{../tmp/sym_01.pdf}{width=\textwidth

}class Object
:    tag = 'obj
'    def __repr__(self): return 
'print Object(

)obj #7f4dea6a587

8\fig{../tmp/sym_02.pdf}{width=0.8\textwidth
}We need some user friendly representation for all objects, so we us __repr__()e method let us get dump for all objects in human readable form

Primitive

.It is better to use unit tests then just printing results: you can use Eclips ewith Pydev installed, and pip install pytest to do simple function-base dunit tesing. Go t o\menu{Eclips>eMen>uWindo>wPreference>sPyDe>vPyUni>tPyTest runne>rdelet e--verbose parameter}. Configure \menu{Run a>sPython unit-test} and pres s\keys{Ctrl-F11

}import re # for test
sdef test_hello(): pass # use pytes

tclass Primitive(Object)
:    tag = 'prim
'def test_Primitive()
:    assert re.match(r'prim #[0-9a-f]+',str(Primitive())

)\fig{../tmp/sym_03.pdf}{height=0.45\textheight }\subsection{Symbol }Symbol represents anything can have nam e--- things, variables ,physical constants,\ldot sAs any data can be represented in string form, we'l lhold symbol name in string field: val .\fig{../tmp/sym_04.pdf}{height=0.65\textheight

}class Symbol(Primitive)
:    tag = 'sym
'    def __init__(self, V): self.val = 
V    def __repr__(self): return 
'    	(self.tag, self.val, id(self)
)def test_Symbol()
:    assert re.match(r'sym:xxx #[0-9a-f]+',str(Symbol('xxx'))

)\subsection{Number }\fig{../tmp/sym_05.pdf}{height=0.65\textheight

}class Number(Primitive)
:    tag = 'num
'    def __init__(self, V): self.val = float(V
)    def __repr__(self): return 
'    	(self.tag, self.val, id(self)
)def test_Number()
:    assert re.match(r'num:1.2 #[0-9a-f]+',str(Number('1.2'))

)As you see __repr__ method repeats for every class inherited fro mPrimitive, so we move it into base class

:class Primitive(Object)
:    tag = 'prim
'    def __init__(self): self.val = '
'    def __repr__(self): return 
'    	(self.tag, self.val, id(self)
)def test_Primitive()
:    assert re.match(r'prim: #[0-9a-f]+',str(Primitive())

)

GUI application in TaskBar

To take this screenshot we must make GUI work in parallel thread

:import threadin

gwxapp = wx.App()		# create wx GUI applicatio
ndef startGUI():				# wrap thread in functio
n	global wxmain			# required for ScreenShot(
)	wxmain = wx.Frame(None,-1,sys.argv[0]
)	wxmain.Show()			# set visibl
e	wxapp.MainLoop()		# start wx GUI main loo
pthGUI = threading.Thread(None,startGUI
)thGUI.start()				# start GUI threa

ddef ScreenShot(PNG)
:	dc = wx.ScreenDC(
)	X,Y,W,H = wxmain.GetRect(
)	bmp = wx.EmptyBitmap(W,H
)	mdc = wx.MemoryDC(bmp
)	mdc.Blit(0,0,W,H,dc,X,Y
)	bmp.SaveFile(PNG,wx.BITMAP_TYPE_PNG

)# wait until GUI starts, and do screensho
ttime.sleep(1) ; ScreenShot('sshot.png'

)# do all non-GUI work here as VM(program) ru

nthGUI.join()				# wait until GUI stop

sTune size to make GUI looks like messenger in right down corner of screen

:	# tune size and position to look like messenge
r	X,Y,W,H = wx.ClientDisplayRect(
)	# create main fram
e	wxmain = wx.Frame(None,-1,sys.argv[0]
,		size=(W/4,H/2),pos=(W-W/4,H-H/2)

TaskBar

)It is cool to have cool tool in taskbar, so we can make it here you can found details, .Taskbared GUI application require TaskBarIcon handler, wrapped in custom class

:# taskbar manage
rclass TaskBar(wx.TaskBarIcon):
		def __init__(self,frame)
:		# init superclas
s		wx.TaskBarIcon.__init__(self
)		# save frame we manag
e		self.frame = fram
e		# set taskbar ico
n		self.icon = frame.ico
n		self.SetIcon(self.icon,sys.argv[0]
)		# bind open/focus on left clic
k		self.Bind(wx.EVT_TASKBAR_LEFT_DOWN
,			self.OnTaskBarLeftClick
)	def OnTaskBarActivate(self,event)
:		pas
s	def OnTaskBarClose(self,event)
:		self.frame.Close(
)	def OnTaskBarLeftClick(self,event)
:		self.frame.Show()		# show fram
e		self.frame.Restore()	# un(min|max)imiz
e		self.frame.Raise()		# make top level windo
w		self.frame.SetFocus()	# and set focu

sand to be consistent, wrap main frame into class too

:class MainWindow(wx.Frame): # main GUI window in clas
s	def __init__(self)
:		# tune size and position to look like messenge
r		X,Y,W,H = wx.ClientDisplayRect(
)		# create main fram
e		wx.Frame.__init__(self,None,-1,sys.argv[0]
,		size=(W/4,H/2),pos=(W-W/4,H-H/2)
)		# style tun
e		self.icon = wx.ArtProvider.GetIcon
(			wx.ART_ADD_BOOKMARK
)		self.SetIcon(self.icon
)		self.SetBackgroundColour(wx.BLACK
)		# set visibl
e		self.Show(
)		# make GUI taskbare
d		self.tbIcon = TaskBar(self
)	def onClose(self,event)
:		# required to remove taskbar ico
n		self.tbIcon.RemoveIcon(
)		self.tbIcon.Destroy(
)		# destroy main window itsel
f		self.Destroy(

)We still need function to wrap GUI in separate thread, but it become svery simple

:def startGUI():				# wrap thread in functio
n	global wxmain			# required for ScreenShot(
)	wxmain = MainWindow()	# construct main windo
w	wxapp.MainLoop()		# start wx GUI main loo
pthGUI = threading.Thread(None,startGUI
)thGUI.start()				# start GUI threa

dAs option you can do screenshots

:class MainWindow(wx.Frame): # main GUI window in clas
s	def shot(self,PNG='sshot.png'):	# do screen sho
t		dc = wx.ScreenDC(
)		X,Y,W,H = self.GetRect(
)		bmp = wx.EmptyBitmap(W,H
)		mdc = wx.MemoryDC(bmp
)		mdc.Blit(0,0,W,H,dc,X,Y
)		bmp.SaveFile(PNG,wx.BITMAP_TYPE_PNG

)time.sleep(1) ; wxmain.shot(

)

Parsing in \cpp: simple calculator

If you don't interested in low-level programming in \cpp, please skip thi schapter. But if you know C++ a bit, look at this theme much closer: flex/biso nis cool tools lets you do lot of things looks very complex on first glance

:[process any data in text format], like config files and data for you rprograms, it is very suitable to have this data in human readable for
m[process complex command line]: pack argv[] into one string an dinterpret it as scrip
t[process source code], you can make syntax colorer writes into .html files ,in few lines, using only flex

.In this chapter, we'll see how to implement simple calculator \textit{with infi xsyntax} and variables, works in console. It is a quite useful program ,especially if your job coupled with engineering or science. I myself constantl yuse it making some CADding and in occasionally everyday use .In next sections, we'll see how to add some very complex in fact theme user-defined functions:, some control constructions, and arrays

.You can download full source code from separate github repo http://github.com/ponyatov/calc:, an d\href{http://github.com/ponyatov/calc/releases/latest}{prebuild window sbinary} for first try

skelex: lexical program project skeleton

.First, we'll see how to organize our tiny project

.Nowdays you use huge IDE for software development, but I prefer more light ,portable and easy way: I use (g)vim /^vim/ text editor and Makefile\note{an dEclipse for more complex cases}. Vim has strange key bindings, and can be som ecryptic for a newbie, but is very light in resources and have useful synta xcoloring customization described in details in vim syntax colorin /^vimcolor/g

.\begin{tabular}{l l l }src.src & script & sample source code
log.log & & execution log
ypp.ypp & yacc & syntax parser
lpp.lpp & lex & lexer using regexps
hpp.hpp & \cpp & headers
cpp.cpp & \cpp & runtime system we are going to implement
Makefile & make & project build script
rc.rc & linux & (g)vim start helper
bat.bat & windows & (g)vim start helper
.gitignore & git & ignored file masks
ftdetect.vim & vim & file type detection
syntax.vim & vim & syntax coloring /^vimcolor/ for custom script
\end{tabular

}#!/bin/s
hgvim -p src.src log.log  				ypp.ypp lpp.lpp hpp.hpp cpp.cpp Makefil

e\subsection{Makefile: build script }For project build, you need to track file interdependency and do som eactions only on changed files . So we can describe out dependency/actio nrules in tiny Makefile, and run make tool by hotkey in editor every time we nee dto compile or run project. Consult Addendum: GNU Make /^make/ for details ,here we see only tiny Makefile snippet

log.log: src.src ./exe.exe
	/exe.exe < $< > $@ && tail $(TAIL) $@
C = cpp.cpp ypp.tab.cpp lex.yy.c
H = hpp.hpp ypp.tab.hpp lex.yy.h
L = -lreadline
./exe.exe: $(C) $(H)
	(CXX) $(CXXFLAGS) -o $@ $(C) $(L)
ypp.tab.cpp ypp.tab.hpp: ypp.ypp
	bison $<
lex.yy.c lex.yy.h: lpp.lpp
	flex $<

lex: lexer generator

.Lexer and parser files use same header with #include


:#include "hpp.hpp

"Lexer file must end with empty line , don't forget to place EOL in las tstring


.#include "hpp.hpp

"Another required section is rules, but for fist time it can be empty

:Now you can run flex, and get resulting generated lexer source in lex.yy.c file

:>$ flex lpp.lpp && ls -la lex
*-rw-r--r-- 1 ponyatov ponyatov 43935 nov 20 19:43 lex.yy.

cAnd we have a problem: there is no lex.yy.h header file, contain yylex()s and yy_scan_string() function declaration we required t oparse every string, entered in interactive command line\note{using readlin elibrary}. To fix it, we must add option will create header file for as

yacc: parser generator

It's time to do a PyPy

Using LLVM

The LLVM Project \note{Low Level Virtual Machine} is acollection of modular and reusable compiler and toolchain technologies. Despit eits name, LLVM has little to do with traditional virtual machines. Project goa lwas providing a modern, SSA\note{In compiler design, static single assignmen tform (often abbreviated as SSA form or simply SSA) is a property of a nintermediate representation (IR), which requires that each variable is assigne dexactly once, and every variable is defined before it is used.}-based /^SSA/ compilation strategy capable of supporting both static and dynamic compilatio nof arbitrary programming languages. The primary sub-projects of LLVM are

:[LLVM Core] libraries provide a modern source- and target-independen optimizert, along with code generation support for many popula rCPU
s[Clang] is C/\cpp/Objective-C compile
r[LLDB] builds on libraries provided by LLVM and Clang to provide debuggera

llvmpy. is a Python wrapper around the LLVM \cp plibrary which allows simple access to compiler tools .It can be used for a lot of things, but here are some ideas

dynamically create LLVM IR for linking with LLVM IR produced by CLANG o dragonegg r
build machine code dynamically using LLVM execution engin e
use together with PLY or other tokenizer and parser to write a complete compile rin \p

Installation

y\begin{verbatim }$ sudo apt install python-llvm python-pl yo r$ sudo pip install llvmpy pl y\end{verbatim

import llvm ; llvm.test()

}Tiny executable module does nothing, done usin this manualg

import llvm.core	# core library
import llvm.ee		# execution engine (JIT/interpreter)

module = llvm.core.Module.new('null_module') ; print module
# JIT compiler / interpreter
engine = llvm.ee.ExecutionEngine.new(module) ; print engine

SSA: Single State Assignment

TUI: Text User Interface

As we are going to use DLR for RealTime control systems /^RT/ graphics interfaces are not suitable for us,. Even simple graphics lik etext output require a huge amount of memory to store window regio nimages, and a lot of bus time to transfer (bitblit) this regions. Typica lPLC or control panel can have tiny text-only LCD with down to short one line an dfew joystick-like buttons. No large touchscreen, qwerty keyboard or trackbal ldevices .

Reference

Core

GUI

Complex parsing

PEG \& Packrat algorithm

Python tabbed syntax

DCG: Definite Clause Grammar

Binary parsing

To decode a lot of complex binary formats and protocols you can use binar yparser generator described here. You can describe format using special BNF-lik elanguage, and do syntax triggering technique to do actions on parsed data .The algorithm used is a special case of DCG /^DCG/ tuned for stream parsin gwith error recovery enabled .For example, you can use this tool to write protocol parsers for widely know ngrabbing/decoder software Wireshark , simple disassemblers, and binary data file dumpers .

Fuzzy logic inference

In this section we'll implement Prolog and fuzzy logic inference engine, an dextend our metasystem with ability to do semantic network and hypergrap hinference .First steps we will be do using well known \emph{Yiel dProlog}\note http://yieldprolog.sourceforge.net/{ \copyrigh tJef fThompson} (YP), and then generalize it to \term{predicate functions} an d\term{green threads} to do distributed inference. Later this text will b ereplaced with DLR script, avoiding \copyrigh tviolation

Yield Prolog \copyright\ Jeff Thompson

.def person()
:    yield "Chelsea
"    yield "Hillary
"    yield "Bill
"for p in person(): print

pHere we define person() \term{generator function} which yeilds fe wpersons in loop returning \term{iterator object} multiple times. In place o freturning values from iterator, we will use Prolog variables , able to b ebinded/unbinded with values in process of \term{inference}. Return value fro myield does not matter anythng, so we just yield

.class var: pass    # empty container clas

sdef person(V)
:    V.value = 'Chelsea' ; yiel
d    V.value = 'Hillary' ; yiel
d    V.value = 'Bill'	; yiel

dvar = var(
)for p in person(var): print var.valu

e\subsection{Variable unification }In computer science \term{unification} is process of solving logic equations i nform of symbolic expressions. For example $cons(X,cons(X,nil)) = cons(2,Y) $where $cons(X,Y)$ is Lisp constructor creates pair of $(X.Y)$ variables is a \term{first-order unification} problem that has single solutio n$\{X\rightarrow 2 ,Y\rightarrow cons(2,nil)\}$ can be \term{substituted} i nequation expression giving true. If variables allowed to be substituted wit hfunctions/terms like cons , the process is called \term{high-orde runification}\note{in high-order logic} .If fact in Prolog, DataLog and other \term{inference systems} there is \emph{n ofunctions}, but \textbf{\term{terms} defines \term{relations} between dat aelements} .For example, some RDBMS table scheme can be described in Prolog in form o f\term{predicate} table( user(index(uid),login,password,name,homedir) )

.To implement unification we need control of \term{variable binding}: track valu eassignment bound( variable) and variable unbound with valu eunassignment

.class var
:    def __init__(self)
:        self.bound = False  	# unbound variabl
e    def __lshift__(self,val):	# assign operator<<
         self.bound = Tru
e        self.value = va
ldef person(V)
:    V<<  'Chelsea' ; yiel
d    V<<  'Hillary' ; yiel
d    V<<  'Bill'    ; yiel

dV = var(
)for p in person(V): print V.bound,V.valu

eclass var
:    def __repr__(self):         # dump in string for
m        if self.bound
:            return <'bound
:        else
:            return <'unboun>d

'V = var() ; print va
rfor p in person(V): print

<Vunboun>
<dbound:Chelse>
<abound:Hillar>
<ybound:Bil>

lWe need some fix to do full unification: variable can be unified \emph{with an yvalue} if variable not bound , or with only the same value if bound

:class var:						# Prolog unifying variabl
e	def __init__(self)
:		self.bound = False      # unbound variabl
e	def __lshift__(self,val):   # unify operator<<
 		if not self.bound:		# 1) unassigne
d			self.value = val	# assign va
r			self.bound = Tru
e			yiel
d			self.bound = False	# drop bindin
g		elif self.value == val:	# 2) assigned == va
l			yiel
d	def __repr__(self):         # dump in string for
m		if self.bound
:			return <'bound
:		else
:			return <'unboun>d

'def person(V)
:	for i in V<<  'Chelsea'	: yiel
d	for i in V<<  'Hillary'	: yiel
d	for i in V<<  'Bill'	: yiel

dThe equivalent Prolog program is

:person(P) :- P = 'Chelsea' ; P = 'Hillary' ; P = 'Bill'
.?- person(X) , write(X), nl

.For each value in person() we run unification twice in a loop :\begin{enumerate}[nosep ]

unifies unbound V with given value, and yield s

unbound variable on second loop iteration .\end{enumerate }Unbound is required as we need to bound with new value in next iteration .If we unify with bounded variable, it yields once if values equal ,and no yield if not equal .\subsection{Using unify to check values

}for i in var<<  'Hillary'
:	for j in person(var)
:		print va

r#< bound:Hillar>

yfor i in var<<  'Buddy'
:	for j in person(var)
:		print va
r# (nothing was unified

)First we bind variable to predefined 'Hillary' value, and then call person() unification. As variable is already bound, unificatio nwill be successful only for bound value. With 'Buddy' we never yields .\subsection{General unification }In previous section we have to bind variable to predefined value befor person()e unification. We must do this as person() takes argumen tin type of var . To call person() with any type, we can us ethis general functions

:def getval(V):				# return value or unbound va
r	if isinstance(V,var)
:		if not V.bound: return 
V		else: return V.valu
e	else: return

Vdef unify(arg1,arg2):			# general unificatio
n	arg1value = getval(arg1)		#  get value
s	arg2value = getval(arg2)		# 
/	if isinstance(arg1value,var):	# arg1 is variabl
e		for i in arg1value<<  arg2value: yiel
d	if isinstance(arg2value,var):	# arg2 is variabl
e		for j in arg2value<<  arg1value: yiel
d	else:							# args non var'
s		if arg1value == arg2value: yiel

ddef person(V)
:	for i in unify(V,'Chelsea'): yiel
d	for i in unify(V,'Hillary'): yiel
d	for i in unify(V,'Bill'):    yiel

dSo we can unify not only var instances, but any types, and thi smakes our code simpler

:for i in person('Hillary'): print 'Hillary
'for j in person('Buddy'):   print 'Buddy

person()' takes object of any type as argument, and can do unification using general unify() unification function without use of any extr avariables .\subsection{Unify as relation }As unify can both bind and check values, it implements \term{logical relation} .Unification let us bring logic language capabilities into imperativ elanguages like \p yor any other language implemented on top o fGNU Dynamic environment. Iterator loop defined by person(Who) can bind Who to all known persons, or in other terms all entitie sdescribed by 1-ary person/1 \term{relation}. If Who was binde dat start of unification, loop will do \term{relation matching} with selecte dvalue. Later we'll see more complex cases of unification let us d o\term{structural matching} for recursive data structures, includin gparsing of languages with complex ambiguous grammars, and \term{graph quering} .As sample, lets see new relation brother/2 between some persons

:def brother(Person,Brother)
:	for i in unify(Person,'Hillary')
:		for j in unify(Brother,'Tony'): yiel
d		for j in unify(Brother,'Hugh'): yiel
d	for i in unify(Person,'Bill')
:		for j in unify(Brother,'Roger'): yiel

dV = var(
)for i in brother('Hillary',V)
:	print

'Hillary has brother Tony
.Hillary has brother Hugh

.And we can find all entities with brother/2 relation defined

:A,B = var(),var(
)for j in brother(A,B): print B,'is brother of',

<Abound:Ton>y is brother of< bound:Hillar>
<ybound:Hug>h is brother of< bound:Hillar>
<ybound:Roge>r is brother of< bound:Bil>

l\subsection{Joining relations }We have brother relation. If we add parent relatio nfor some persons, we can \term{infer} what persons have an uncle: person has a parent , and parent has a brother

.def parent(Person,Parent)
:	for i in unify(Person,'Chelsea')
:		for j in unify(Parent,'Hillary'): yiel
d	for i in unify(Person,'Chelsea')
:		for j in unify(Parent,'Bill'): yiel

ddef uncle(Person,Uncle)
:	Parent = var(
)	for i in parent(Person,Parent)
:		for j in brother(Parent,Uncle): yiel

dPerson,Uncle = var(),var(
)for i in uncle(Person,Uncle)
:	print Person.value,'has uncle',Uncle.valu

eChelsea has uncle Ton
yChelsea has uncle Hug
hChelsea has uncle Roge

rThis is classical inference in Prolog

:brother(Person, Brother) :
- Person = 'Hillary', (Brother = 'Tony' ; Brother = 'Hugh')
; Person = 'Bill', Brother = 'Roger'
.parent(Person, Parent) :
- Person = 'Chelsea', Parent = 'Hillary' 
; Person = 'Chelsea', Parent = 'Bill'
.uncle(Person, Uncle) :
- parent(Person, Parent), brother(Parent, Uncle)

.:- uncle(Person, Uncle)
, write(Person,'has uncle',Uncle), nl

.\subsection{Variable chains }This sample is works fine

:def square(Width,Height): # square rectangl
e	for i in unify(Width,Height): yiel

dfor i in square(11,11):	# check 11 x 1
1	print '11 x 11 is square
'W,H = var(),var(
)for j in W<<  22
:	for k in square(W,H)
:		print W,H		# yields 22 x 2

2But how can we implement this case :\begin{enumerate}[nosep ]

call square first (giving bound Width=Height )
, make it square before we know W/ H

then unify(Width,33)
assign some dimension giving square already unified :\end{enumerate

}W,H = var(),var(
)for a in square(W,H):		# bind firs
t	for b in unify(W,33):		# unify with valu
e		print W,
H#< bound<:bound:3>>3< bound:3>
3#< bound:3>3< bound<:bound:3>>

3To do this, we must allow unify to bound 2+ unbound variables int o\term{variable chain}, and then bound all of them to same value

.class var:						# Prolog unifying variabl
e	def __init__(self,name)
:		self.bound = False      # unbound variabl
e		self.name = nam
e	def __repr__(self):         # dump in string for
m		if self.bound
:			return 
'		else
:			return 

'# names for variables must be assigne
dW,H = var('W'),var('H')

 class var
:	def getval(self)
:		if not self.bound:		# return unbound as i
s			return sel
f		V = self.value				# else do loop unti
l		while isinstance(V, var):	# non-variable foun
d			if not V.bound: return V # or unbound va
r 			V = V.value				# scan var chai
n		return V	

	def getval(V):				# return value or unbound va
r	if isinstance(V,var)
:		return V.getval(
)	else: return

Vclass var
:	def __lshift__(self,val):   # unify operator<<
 		if not self.bound:		# 1) unassigne
d			self.value = getval(val) # get va
r			if self.value == self: yield # can be itsel
f			else
:				self.bound = True	# boun
d				yiel
d				self.bound = False	# drop bindin
g		else
:			for i in unify(self,val): yiel

ddef unify(arg1,arg2):			# general unificatio
n	arg1value = getval(arg1)		#  get value
s	arg2value = getval(arg2)		# 
/	A = B = Fals
e	if isinstance(arg1value,var):	# arg1 is variabl
e		A = Tru
e 		for i in arg1value<<  arg2value: yiel
d	if isinstance(arg2value,var):	# arg2 is variabl
e		B = Tru
e 		for j in arg2value<<  arg1value: yiel
d	if not (A and B):				# args non var'
s		if arg1value == arg2value: yiel

dfor a in square(W,H):	print W,

H# return two variants of unknown variables: W>-H and H>-
W# W=H 
H# W H=

Wfor a in square(W,H):			# bind firs
t   	for b in unify(W,33):		# unify with valu
e 		print W,

H# SWI prolog returns only one: H = W, W = 33, but we go
t# two possible variants as 33>-W=H chains to 33>-H,H>-W	
	# W=H=33 H=3
3# W=33 H=W=3

3\subsection{Cutting inference }If you need only one solution, you can cut inference, and sav ecomputational time

.def anyBrother(Person,Brother)
:	for i in brother(Person,Brother)
:		yield ; break # will found only 0 or 1 solutio
n
	B = var('brother'
)for j in anyBrother('Hillary',B): print 

B# brother=Ton

yProlog syntax

:anyBrother(Person, Brother) :- brother(Person, Brother), !

.We have some problem with cut and loop breaking: as you remember any variabl emust be unbinded after unification. If we just break loop, we fail here. To fi xthis problem, we can do hint with try/finally will unroll binding

:class var:						# Prolog unifying variabl
e	def __lshift__(self,val):   # unify operator<<
 		if not self.bound:		# 1) unassigne
d			self.value = getval(val) # get va
r			if self.value == self: yield # can be itsel
f			else
:				self.bound = True	# boun
d				try: yield			# fix for cut unboun
d				finally
:					self.bound = False	# drop bindin
g		else
:			for i in unify(self,val): yiel

d\subsection{Negation via cut }We can use cut to do negation

:def noBrother(Person)
:	Brother = var('B'
)	# if any brother, cut inferenc
e	for i in brother(Person,Brother): return
 	yield					# no brother found							 
 
	for j in noBrother('Hillary'): print 'Hillary
'for j in noBrother('Chelsea'): print 'Chelsea

'# Chelse

a\subsection{Making lists }asdas

Porting to \J\ hell

\ Jis an ugly language and technology, but if you want to earn money, it's azen way to incorporate your DLR habits into an enterprise hell. We'll be face dwith huge problems with no prototypes in OOP, scary resource consumptions, an dcreepy Duke drilling your brain .\begin{center}\fig{duke.png}{height=0.7\textheight}\end{center

JVM: \J\ Virtual Machine

}\ JVirtual Machine, or JVM for short, in fact is some computer emulator ,similar to our DLR VMs. So the right way to dig in is start write programs i n\J-assembly

Jasmin assembly

http://jasmin.sourceforge.net/

Object Model problem: no prototypes anymore

ANTLR

JSR 292: Supporting Dynamically Typed Languages

Something different: flat memory FORTH

As we see in /^pyforth/ , it is easy to write some \F-like system in fe whours in any programming language. Manipualting with objects on stack is simple ,and let you use any side libraries and language high level features. But if you work with really small computer systems, like custom hardware buil don Cortex-M microcontroillers, you have very small amount of RAM . Th etopmost microcontroller family STM32F7 MCU used i STM32F7GDISCOVERY nboard has only 340K of RAM . \textcolor{green}{There is no space to ru nfat dynamic language runtime with OOP} .For this narrow case we have \ Fwell known from the end of 70s, and its bi gbrother OpenFirmware. In this part we'll see how we can implement tin y\ Fsystem using bytecode approach\note{It can be interesting for you how t oimplement tiny assembler, without problems caused by concrete machine languag edetail s--- bytecode simple commands format is very easy to understand. }bu tlow memory .In FORTH/ subdirectory you can see sources of bytecode compiler an dvirtual machine (bytecode interpreter), with assember written i nflex/bison

\F\ file structure

.\begin{tabular}{l l l }src.src & assembly-like & \ Fsystem source code
& syntax
&log.log & & logged execution of
&&VM running compiled system
ypp.ypp & bison & syntax parser
lpp.lpp & flex & token lexer
hpp.hpp & \cpp & headers
cpp.cpp & \cpp & compiler elements and virtual machine
Makefile & make & build scrip
t&&(can be sample for any program use sflex/bison
)FVM.exe & executable & assembler and virtual machine bundle
bin.bin & bytecode & compiled \ Fsystem bytecod
e&& dumped after VM executio
n\end{tabular

Virtual Machine Architecture

}FVM\note{\ FVirtual Machine }has one byte-addressed memory, and two separat estacks

:[data stack] for data
[return stack] return addresses for call/ret command

sSizes of this structures was defined by constants, but you can modify code an duse expandable storage type like vector \note{it will be much slower

}#define Msz 0x1000		/* bytes *
/#define Rsz 0x10
0#define Dsz 0x10

\ Fhas special CELL constant, corresponds to \textit{machine word siz ein bytes}

.#define CELL sizeof(int32_t

)\subsection{Memory

}extern uint8_t  M[Msz];	// memor
yextern uint32_t Ip=0;	// instruction pointe
rextern uint32_t Cp=0;	// compilation pointer (free heap

)Main memory contains all

compiled bytecod e

vocabulary structure /^vocabulary/
data (constants, variables, strings, binary blobs,\ldots )
heap from current Cp till end of M[] /^Fheap/

Memory has byte adressing, so we need some functions to get/set CELLs

:extern void set(uint32_t addr, int32_t value)
;extern uint32_t get(uint32_t addr)

;If you set cell < 0x100 , but read byte on same address, you must get th esame value. On \term{little-endian} machines (x86) we can read/write cells usin (uint32_t*)&M[addr]g pointer, but for portability we use thi sbyte-shifting functions

:void set(uint32_t addr, int32_t value) 
{	assert(addr+3<  Msz);			  // check memory boun
d	M[addr+0] = (valu>>e0x00) & 0xFF
;	M[addr+1] = (valu>>e0x08) & 0xFF
;	M[addr+2] = (valu>>e0x10) & 0xFF
;	M[addr+3] = (valu>>e0x18) & 0xFF;

}uint32_t get(uint32_t addr) 
{	assert(addr+3<  Msz)
;	return  		M[addr+0<<]0x00 | M[addr+1<<]0x08 |  		M[addr+2<<]0x10 | M[addr+3<<]0x18;

}\subsection{Compilation (in terms of \F)

}extern uint32_t Cp;		// compilation pointer (free heap

)In \ Fterm \term{compilation} means \textit{adding bytes to the end o fvocabulary}, in fact into the begin of a heap, moving heap bottom to highe raddresses. In \ Fstandard there is only HERE word returns address o fthe heap begin (it must be HEAP name definitely). So to address we'll us especial Cp register

.extern void Cbyte( uint8_t);	// compile byt
eextern void Ccell(uint32_t);	// compile cel
lextern void Cstring(char*);		// compile ASCIIZ strin

gThese functions will be used in assembler

.void Cbyte( uint8_t b) 
{	M[Cp++] = b; assert(C<pMsz); 
}void Ccell(uint32_t c) 
{	set(Cp,c); Cp+= CELL; assert(C<pMsz); 
}void Cstring(char* s) 
{	uint32_t L = strlen(s); assert(Cp+L+<1Msz);	// lengt
h	memcpy(&M[Cp],s,L+1); Cp += L+1; }	// compile length+

1\subsection{Vocabulary structure }In \ Fterms \term{word} means some active data element, analogous to functio nand procedure in mainstream languages. Variables and constants in \ Fals owords. It corresponds to \term{word} in human language s--- sequence o fletters, which means something. When you enter some \ Fcode in command line ,interpreter /^INTERPRET/ searches each word\note{delimited by space symbols }in \term{vocabulary}, and executes /^EXECUTE/ it if search was successful

.\term{Vocabulary} is container data structure, implements

:[words storage] in linked list order\note{or tree of linked lists in cas eof multiple vocabulary supported
}[word search] by its nam
e[definition] of new words using compiling words (see Cxxx() functions /^Cxxx/

)Every item in vocabulary has this fields structure\note{If you plan to do som ehacking using bytecode for software writing, you can eliminate vocabular yheaders in case of you do not use vocabulary search. To do this, you ca nfork your own assembler, and remove all calls in Cheader() except CFA(). CF Acompilation is required because it sets \_entry field in first jmp command t olast defined word (see next page).}

:\begin{tabular}{l l l l }LFA & cell & Link Field Area & link to previous word or 0
AFA & byte & Attribute Field Area & flags, IMMED /^IMMEDIATE/
NFA & asciiz string & Name Field Area & word name
CFA & bytecode & Code Field Area & executable bytecode
PFA & optional & Parameters Field Area & in variables and constants
\end{tabular }Last defined word must be marked somewher

as entry point on system start, an d
as first point in search and compilation,

so we need special fields in the beginning of memory image

:// program entry point (addr of jmp parameter
)#define _entry  
1// last defined word LFA addres
s#define _latest (_entry+CELL

)int main(int argc, char *argv[]) 
{	// ============ compile vocabulary heade
r	// jmp _entry	jump to last defined wor
d	Cbyte(opJMP); Ccell(0)
;	// _latest		LFA of last defined wor
d	Ccell(0)

;To compile vocabular header us

ema<pstring,uint32_>t SymTable;				// symbol tabl

evoid LFA() 
{	uint32_t L = get(_latest); set(_latest,Cp); Ccell(L); 
}void AFA(uint8_t b) {
 	Cbyte(b); 
}void NFA(char* s) {
 	Cstring(s); 
}void CFA(string name) {
 	SymTable[name] = Cp; set(_entry,Cp); 
}void Cheader(char* name) {				  // compile heade
r	LFA(); AFA(); NFA(name); CFA(name);

}\subsection{Bytecode interpreter }Bytecode interpreter will be run after assembler ended its work

:int main() 
{	...						// compile vocabulary heade
r	yyparse();				// run compile
r	dump();					// dump memory into .bin fil
e	VM();					// run V
M}

As any other computer, interpreter implements fetch/decode/execute loop ove rcommands in M[] memory, with command pointed by Ip instructio npointer register

.void VM() { for (;;) {						// infty loo
p	printf(
"	uint8_t op = M[Ip++]; assert(I<p=Cp);		 // FETC
H	printf(
"	switch (op) {						// DECODE/EXECUT
E		case opNOP : nop();  break
;		case opBYE : bye();  break
;		case opJMP : jmp();  break
;		case opCALL: call(); break
;		case opRET : ret();  break
;		case opLIT : lit();  break
;		default
:			printf("bad opcode\n\n"); abort()
;	
}	printf("\n")
;}

Core command set

}FVM uses two bytecode command types

:[CMD0] one byte opcode without parameter
s[CMD1] byte opcode with required cell-sized paramete

r\subsection{Control flow

}#define opNOP	0x00	// no
p#define opBYE	0xFF	// by
e#define opJMP	0x01	// jmp< add>
r#define opQJMP	0x02	// ?jmp< add>
r#define opCALL	0x03	// call< add>
r#define opRET	0x04	// re
t#define opLIT	0x05	// lit< valu>

e[nop] do nothin

g#define opNOP	0x00	// no

pvoid nop() { printf("nop");

}\subsection{Stack manipulations }\subsection{Arithmetics and binary operations }\subsection{Basic input/output (console, files) }\subsection{Fast Foreign Interface

Extensions

}\subsection{Native GUI }\subsection{Networking }\subsection{Database connection

Media and gaming

}\subsection{Simple Direct Layer }\subsection{OpenGL }\subsection{Media codecs

CAD/CAD/CAE and numerical math

}\subsection{Vector schematics }\subsection{EDA Electronics Design }\subsection{Numerical math }\subsection{Vizualization

Compiler framework

}\subsection{Parser generator }\subsection{LLVM }

IEC 61131 PLC control stack

RealTime control

Hard and Soft realtime

IEC languages implementation

IL: Instruction List

ST: Structured Text

(g)Vim text editor

key bindings

vim syntax coloring

Cross-compiling Linux for server node

If you plan to use dedicated server\note{single, cluster or grid} for privat enetwork, you can use Linux running on x86 PC or set of Raspberry Pi -like board smounted in compact chassie. Another way can be more cheap and safe, use virtua lhosting able to run custom minisystem from image, or use own virtualizatio nserver to do some debugging and play with multiple VMs .You can use any mainstream Linux distro\note{especially on side hostings}, bu tin this chapter we'll see how to build some special \term{embedded Linux }variant able to run on tiny ARM computers, or use full resources on large x8 6PC .Another goal of this chapter to show some technique on \emph{using inferenc eengine /^inference/ }for large and complex \term{configuratio nmanagement}. You can define huge amount of configuration parameters and buil doptions of hardware platform and software packages, \emph{and it sinterdependency}, and inference lets you do parameter coordination .For system build we will use set of Makefiles as it is simplest way to do eas yautomation without involving complex script system. As sample you can also se eseparate project at https://github.com/ponyatov/L .Linux is not only one true way to do server farm, you can use your existin gWindows Server or IBM AIX infrastructure. But there is one feature of Linux mad ethis OS used all ove r\href{https://www.top500.org/statistics/details/osfam/1}{Top 50 0Supercomputers}: Linux is totally open system , and you can read, stud yand modify any byte of whole system without any NDA or usage fees .Another problem is hardware: there are a lot of ARM/MIPS embedded computer sstarting from your mobile phone, able to run only Android as based on Linu xkernel. Big Windows still unable to run on any CPU other then x86\note{Alpha i sdead now}, and Windows Embedded is still extra limited in requirements to RA Mand computing power: you can't run it on cheap headless router with 32M RAM .\fig{../tmp/cross_00.pdf}{height=0.9\textheight
}See full make scripts code for building \term{emLinux} system i /DLR/crossn .\fig{../tmp/cross_01.pdf}{height=\textheight

Build directory structure

}Run $ cd cross ; make dirs to create directory structure

:\begin{tabular}{l l cross/Makefile} & main Makefile
cross/gz/ & software packages source tarballs
cross/src/ & packages source code (temp/ramdisk
cross/tmp/) & out of tree build (temp/ramdisk
cross/hw/) & supported hardware platform
s &(KVM is main virtual server system
cross/arch/) & target architectur
cross/cpu/e & target CP
U\end{tabular }\fig{../tmp/cross_02.pdf}{width=\textwidth

}						# current directory (full path
)CWD = $(CURDIR
)						# software packages source tarball
sGZ = $(CWD)/g
z								# cross compiler tools					
	CROSS = $(CWD)/$(TARGET).cros
s								# target filesyste
mROOT = $(CWD)/$(TARGET
)BOOT = $(ROOT)/boo
t							# source cod
eSRC = $(CWD)/sr
c							# off-tree buil
dTMP = $(CWD)/tm

pDIRS = $(GZ) $(SRC) $(TMP) $(CROSS) $(ROOT) $(BOOT
)PHONY: dir
sdirs
:	mkdir -p $(DIRS

)To build you should use build system with SSD and large amount of RAM It is much better to use ramdisk. for source code and build/temp files

.tmpfs /home/user/DLR/cross/tmp tmpfs  	auto,uid=user,gid=users 0 
0tmpfs /home/user/DLR/cross/src tmpfs  	auto,uid=user,gid=users 0

0and remount ramdisks

:$ sudo mount -a ; make dir

Target system parameters

sThere are few mainstream target plaforms, supported by this build scripts. Som eof them are especially for virtualization, and some are real hardware systems .Target system must be defined by set of \term{target configuration variables}

:\begin{tabular}{l l l APP} & & application name is GNU =dynamic server
HW & hw/ & hardware platform short name
ARCH & arch/ & architecture
CPU & cpu/ & CPU
\hlin e&& \term{Triplets}: System where you

BUILD& && build x86_64-pc-linux-gnu()
HOST && run tools mingw32()
TARGET && run resulting software arm-linux-uclibceabihf(
)\end{tabular }\fig{../tmp/cross_03.pdf}{width=\textwidth }\begin{tabular}{l l l l }HW & ARCH & CPU & description
\hlin eKVM & =BUILD & x86\_64 & Linux workstation a
s&&& virtualization server (o rleased VDS
)Rpi & armeabihf & BCM2835 & Raspberry Pi (any model compatible)
PC104 & i386 & i486sx & PC104 board or any old PC compute
rPC & i386 & i686 & more or less modern P
C \end{tabular }{\footnotesize At this point we don't see on modern PC with i3..i7 processor sand huge amount of RAM for dedicated Dynamic server farm. If you have enoug hmoney to invest into Dynamic-based hosting, its your turn to write custo mscripts to build it for i3+ computer. But now it can be better to play wit hheap of reused hardware you can buy in a few dollars. Or Amazon EC.

}APP = dynami
cHW ?= PC10
4include hw/$(HW).m
kinclude cpu/$(CPU).m
kinclude arch/$(ARCH).m

k\subsection{Virtualization patforms }\subsubsection{hw/KVM }We'll use KVM as default (virtualization) platform, and Debian GNU/Linux manua lon its configuration\note https://debian-handbook.info/browse/stable/sect.virtualization.html{ }\subsection{x86 variants }\subsubsection{hw/PC104

}CPU = i486s

xARCH = i38
6# override CPU (will be used for TARGET
)CPU = i48

6TARGET = $(CPU)-linux-uclib

cYou can have some fun for a few tens of bucks with old notebook: take some o neBay, and play with it. Or you can buy industrial PC104 embedded computer wit hsome Vortex86 SoC lik Advantech PCM-4434e Don't buy x86 hardware with CPU lower then i486sx.: i386 CPU support wa stotally dropped from Linux kernel many years ago, and we have a chance to loos eit also in GCC toolchain .If you want to have some practical benefit from this old notebook, you can d osome light computations or electronics and mechanical design, and it is bette rto buy it with i486dx processor, as it have FPU (hardware floating point). \subsubsection{hw/i686 }There are huge amount of old computers you can reuse, and lot of new board sstarting from Intel Atom and compatible AMD Geode .\subsubsection{hw/amd64 }Cheap and power SOHO servers, up to topmost i7 and low-end Xeon CPUs .\subsection{Raspberry Pi, china clones }Cheap, ultra compact and power effective able to run with battery power source .\subsubsection{Raspberry Pi }Large community, available off the shelf in local stores .\subsubsection{Orange/Banana Pi }\subsection{MIPS routers (MR3020): cheap and too small }Have a small amount of RAM (starting from 32M), but have embedded networ kinterfaces, WiFi and USB host for GSM modem and external storage. Can be use dfor network gateways, slow data storages using USB flash drive, or host tiny we bsites

Download software source code arhives

.Run download script to make local copy of all package source codes

:include mk/gz.m

kWGET = wget -P $(GZ) -c
	PHONY: gz gz_cross gz_cor
egz: gz_cross gz_cor

egz_cross
:	$(WGET) ftp://ftp.gmplib.org/pub/gmp/$(GMP_GZ
)	$(WGET) http://www.mpfr.org/mpfr-$(MPFR_VER)/$(MPFR).tar.bz
2	$(WGET) http://www.multiprecision.org/mpc/download/$(MPC_GZ
)	$(WGET) http://ftp.gnu.org/gnu/binutils/$(BINUTILS_GZ
)	$(WGET) http://gcc.skazkaforyou.com/releases/$(GCC)/$(GCC_GZ
)gz_core
:	$(WGET) http://www.uclibc.org/downloads/$(ULIBC).tar.x
z	$(WGET) http://busybox.net/downloads/$(BUSYBOX_GZ
)	$(WGET) http://www.kernel.org/pub/linux/kernel/v4.x/$(KERNEL_GZ

)\fig{../tmp/cross_04.pdf}{width=\textwidth

Packages

}include mk/version.m
kinclude mk/package.m
kinclude mk/gz.m
kinclude mk/unpack.m

k# automate source code unpac
k$(SRC)
/	cd $(SRC) &&  zcat <$ | tar x && touch $
@$(SRC)
/	cd $(SRC) && bzcat <$ | tar x && touch $

@\fig{../tmp/cross_05.pdf}{height=\textheight }\fig{../tmp/cross_06.pdf}{height=\textheight }/cross/mk/version.m
k\begin{tabular}{l l l }PACKAGE & PACKAGE\_VER &
\hlin ecross/ && cross-compiler toolchain
\hlin ebinutils & 2.29.1 & assembler, linker, ELF format tools
gcc & 7.2.0 & GNU Compiler Collection (\ccpp,Fortran)
gcc0 && standalone gcc only for kernel/ulibc build
cclibs/ && {\footnotesize libraries required for gcc build:
}gmp & 6.1.2 & Gnu MultiPrecision numbers
mpfr & 3.1.6 & MultiPrecision FRactions
mpc & 1.0.3 & MultiPrecision Complex numbers
\hlin ecore/ && core UNIX system
\hlin ekernel & 4.14.9 & Linux kernel
ulibc & 0.9.33 & library required for any \ccp pprogram
busybox & 1.27.2 & base tools and commands
\hlin elibs/ && extra libs required for GNU Dynamic build
\hlin e\end{tabular }For building speed up we use \term{parallel make} with \emph{cross tool sincluded into PATH} :\smallski p\fig{../tmp/cross_07.pdf}{width=\textwidth }\subsection{cross: Cross-compiler toolchain }Any cross-compiler package we build using this steps

:\fig{../tmp/cross_08.pdf}{width=\textwidth }\subsubsection{binutils: assembler and linker for \$TARGET platform

}CROSS_CFG = --target=$(TARGET) $(WITH_CCLIBS)  	--with-sysroot=$(ROOT)  	--with-native-system-header-dir=/include 

 .PHONY: binutil
sbinutils: $(CROSS)/bin/$(TARGET)-a
s# use one of tool executable as build done flag
 $(CROSS)/bin/$(TARGET)-as: $(SRC)/$(BINUTILS)/READM
E	rm -rf $(TMP)/$(BINUTILS) ; mkdir $(TMP)/$(BINUTILS)  ;	cd $(TMP)/$(BINUTILS) ; $(SRC)/$(BINUTILS)/$(CCFG)  		$(CROSS_CFG) & &	$(MAKE) && make install-stri

p\fig{../tmp/cross_09.pdf}{width=\textwidth

}CCFG = configure --disable-nls --prefix=$(CROSS)  	--docdir=$(TMP)/doc --mandir=$(TMP)/man  		--infodir=$(TMP)/inf

o\begin{tabular}{l l }prefix & cross compiler toolchain install director
ytarget & target architecture triplet cpu-[hw]-os-abi
sysroot & path where include/libraries must be found
,& avoid to use build system dirs at \verb$/usr/(include|lib)
$\hlin edisable-nls & don't use internationalization in messages
\verb$*dir$ & remove documentation from $CROSS
\end{tabular }\subsubsection{gcclibs: libraries required for GCC compiling }\fig{../tmp/cross_10.pdf}{width=\textwidth

}WITH_CCLIBS = --with-gmp=$(CROSS) --with-mpfr=$(CROSS)  				--with-mpc=$(CROSS) 
 CCLIBS_CFG = --disable-shared $(WITH_CCLIBS

).PHONY: gcclib
sgcclibs: gmp mpfr mp

c.PHONY: gm
pgmp: $(CROSS)/lib/libgmp.
a$(CROSS)/lib/libgmp.a: $(SRC)/$(GMP)/READM
E	rm -rf $(BLD)/$(GMP) ; mkdir $(BLD)/$(GMP)  ;	cd $(BLD)/$(GMP) ; $(SRC)/$(GMP)/$(CFG) $(CCLIBS_CFG) & &	$(MAKE) && make install-stri
p...

\subsubsection{gcc0: GNU compler toolchain (only for kernel build) }If you have problems with memory size, run make with \emph{limited mak ethreads}

:$ make BUILD_CPU_NUMBER=2 gcc

0GCC_VER = 7.2.

0GCC = gcc-$(GCC_VER
)GCC_GZ = $(GCC).tar.x

zgz_cross
:	$(WGET)  	http://gcc.skazkaforyou.com/releases/$(GCC)/$(GCC_GZ

)GCC0_CFG = --without-headers --with-newlib  			--enable-languages="c
".PHONY: gcc
0gcc0: $(SRC)/$(GCC)/READM
E	rm -rf $(BLD)/$(GCC) ; mkdir $(BLD)/$(GCC)  ;	cd $(BLD)/$(GCC) ; $(SRC)/$(GCC)/$(CFG)  		$(CROSS_CFG) $(GCC0_CFG) & &	$(MAKE) all-gcc && make install-gc

c\subsection{kernel: Linux kernel }Before kernel build clean up TMP directory if you use ramdisk build

:$ make ramclea

n.PHONY: ramclea
nramclean
:	rm -rf $(SRC)/* $(BLD)/

*KERNEL_VER = 4.14.

9KERNEL = linux-$(KERNEL_VER
)KERNEL_GZ = $(KERNEL).tar.x

zgz: gz_cross gz_cor
egz_core
:	$(WGET) http://www.kernel.org/pub/linux/kernel/  		v4.x/$(KERNEL_GZ

).PHONY: core kerne
lcore: kerne

lkernel: $(SRC)/$(KERNEL)/READM

ECheck kernel bootable

:$ make em

u\subsection{ulibc: base system library }\subsection{busybox: core UNIX }

[PLAI] PLA
IShriram Krishnamurth
Programming Languages: Application and Interpretation
http://cs.brown.edu/~sk/Publications/Books/ProgLangs/
[dragon] i Dragon boo
kAlfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullma
Compilers: Principles, Techniques, and Toolsn second editio
[bluebook] n BlueBoo
kAdele Goldberg and David Robso
Smalltalk-80: The Language and Its Implementation
http://www.mirandabanda.org/bluebook/
nfull pdf: http://sdmeta.gforge.inria.fr/FreeBooks/BlueBook/
[kim] Kim Thoma
splaylists on writing flex/bison parsers and script languages
https://www.youtube.com/playlist?
- :[How to build, install Bison, Flex, Cygwin on Windows
  list=PL1\_C6uWTeBDEcJxs3Rz5zjm1gf9YLRDW8
- ][
[starting] ] Starting \F \copyrigh tLeo Brodi
http://www.forth.com/starting-forth/
pdf
[ans94] e ANSI X3.215-199
Programming Languages\ --- Forth
4American National Standard for Information System
spdf: https://www.openfirmware.info/data/docs/dpans94.pdf
[threaded] \emph{Threaded Interpretive Languages: Their Design an dImplementation
} R. G. Loeliger
pdf
[thbell] Threaded Code \copyrigh tJames R. Bel
lpdf: http://home.claranet.nl/users/mhx/Forth_Bell.pdf
[crafting] Crafting Interpreter
A handbook for making programming languages
http://www.craftinginterpreters.com/
[zetwx] s wx\py\ tutorial \copyrigh tZetCod
e http://zetcode.com/wxpython/
[atc] Above the Clouds: A Berkeley View of Cloud Computing
EECS-2009-28.pdf
[beyond] \emph{Beyond Java: A Java Programmer’s Guide to Assemble rLanguage
}Stephan Rau blogh