EuOS Bytecode Format version 1.0
by Jeffrey Fielding (JJProg@cyberbury.net)
--------------------------------------------------------------------------------
THE ADVANTAGES OF COMPILING EUPHORIA PROGRAMS
	In order for EuOS to run quickly, I believe that the programs need to be
compiled. Since compiled programs won't reference variables and routines by
name, EuOS will be able to find the variable or routine much faster, leading
to much faster execution of the program. It would even be possible to write
part of the interpreter in assembly since it would be relitavely simple.
	When the code is compiled, two files would be generated: a compiled
version of the program and a translation file to translate indexes used
in the bytecode version to reference parts of the program, allowing
developers to distribute the compiled version of their program and still
being able to debug it, without needing to distribute the source code.
	Another advantage to compiled versions is that they will be smaller.
This could be useful in implementing an official EuOS programs archive on
the internet, where users could quickly and easily download EuOS programs
that were officially approved for stability and security.
	Finally, compiled programs can contain extra information that would
not be so easily added to a source file, such as the version, name of the
program, stability and security levels, a description, and the type of
program (program, driver, library, shell, or daemon).

THE BASIC STRUCTURE OF A COMPILED EUPHORIA PROGRAM
	Although Euphoria allows variables, routines, and code anywhere
in the source file, the compiled version will have a strict format
with data stored in blocks. This will allow programs to be loaded faster.
All errors that can be determined when the program is compiled will be,
so that EuOS will be able to continue the Euphoria idea of not allowing
routines to be called before they are defined without using routine_id.
	The file will be divided into seven sections: the header,
libraries, variables, exported variables, routines, exported routines, and code.
	The header contains data such as the name, version, and stability and
security levels of the program. Libraries contains a list of libraries which
should be loaded and accessible from the program.
	Variables includes all the variables in the program, including
variables in routines. This allows routines to be called immediately without
needing to setup variables. To avoid creating large numbers of variables,
any variables that can be used more than once should be. For example:
procedure a()
	integer i
	i = 5
	? i
end procedure
procedure b()
	integer j
	j = 6
	? j
end procedure
a()
b()
	In the above case, i and j would both be referenced to one variable,
so the variables section would only define one variable which would be used
both times since i and j are never used together. Variables also contains
temporary variables which are not defined by the programmer, but the compiler
to do calculations in complex formulas and pass arguments that are never
assigned to a variable.
	The routines section contains all the routines. There is no distinction
between procedures, functions, or types.
	The code section is code that is automatically run when a new instance
of the program is started.
	Finally, the two exported sections define what variables and routines
are available to other programs.

THE INSTRUCTION SET ARCHITECTURE
	Before I describe all the details of the compiled program format,
I want do describe the instruction set. It will help to know how the
program works before discussing the format of the program.
	You have probably noticed that the format is very basic - very
much like an assembly program. That's because it is like an assembly program.
It is designed to be very simple to allow the interpreter to run the program
as fast as possible. This is probably very scary to anyone used to Euphoria's
simplicity, but only compiler writers will have to deal with this stuff -
EuOS Euphoria will be just like Euphoria, only it will be compiled (and
better). I am also sure that compiler writers are rather scared by this,
especially myself since I will probably be writing the first compiler, since
I designed this in the first place. But hey - C compilers can do it, and
they have to deal with a much lower-level instruction set. You'll see that
the instruction set isn't all that low-level anyway.
	CALLING ROUTINES
	One very important thing to note is how routines are called, and
how, if functions are not distinguished from procedures, does the interpreter
check for errors with a function being called as a procedure or the other
way around. Obviously, the compiler will find most of these errors before
the program is compiled, but when using routine_ids this could generate
some problems. Luckily, I've got it all figured out.
	Like on a processor, the interpreter has a stack. As with a processor,
there are instructions for pushing data onto the stack and popping it off.
In this instruction set, there are also instructions to check the size of the
stack. This is what allows programs to check for the error described above.
Simply check the stack before calling a routine, then check the stack after.
If the stack is bigger by one, the routine returned something. If the stack is
the same size, the routine was a procedure. Otherwise, there is something wrong
with the program (and it's the compiler's fault unless Euphoria compilers
start allowing inline "assembly"). If the compiler feels like it, it can
decide not to use the stack for calling routines and instead use variables,
but unless the compiler is really good, this could lead to really messy
code that is difficult to debug. I suggest making this optional if it is
supported at all.
	THE ADDRESS SPACE
	The address space is the index of variables and routines. Variables
and routines have seperate address spaces, so routine 1 is seperate from
variable 1. Since Euphoria sequences start at 1, so does the address space.
The first routine defined in the routines section is 1 and so on. The first
routine in the first library is n+1 where n is the number of routines in
the program. The same goes for variables.
	LIBRARIES
	Another thing that can get messy is libraries. Libraries can be
very messy, as anyone who has had problems with DLLs knows. I hope EuOS
will change that, but unfortunately the fastest way to access libraries
is rather messy, but don't worry. It's not that hard.
	There are two types of libraries, just as there are two types of
every other program: single instance and multiple instance. Multiple instance
libraries are the easiest. Your program includes a library and a new instance
is started, and the library's routines and variables are added to your
routines and variables. Since everything is indexed by numbers, it is now
very simple. Say you have one routine, which would be indexed as routine 1.
Now you include a libraray, and its first routine is indexed as routine 2 etc.
If your program calls a routine that is not exported by the library, an
error is generated. The compiler should get the indexes into a library
from definition files generated by the compiler that compiled the library.
	Single instance libraries are a little more complicated for the
interpreter. They appear the same to the program using them, but access is
not as fast. The interpreter maps the library into the program's addess space,
so everything is a reference, not an actual variable.
	One problem that may arise is that a library has some parts that would
be best in a multiple instance library, and some parts that must be single
instance. This can be solved by dividing the library into two parts, with
the single instance library being included by the multiple instance library,
which is included by the program.

EUOS ASSEMBLY LANGUAGE
	I could just refer to everything as bytecodes, but I think everyone
reading this, including myself, will be able to understand this better if
I create a language that people can actually program in and understand, which
can then be translated into bytecodes. Perhaps experts will be able to
optomize what the compiler produces before assembling it. So here it goes...
NOTE: Comments are like in Euphoria
	As you might expect, the assembly language has much the same format
as the binary version, though it is easier to read. Again, the basic format
is:
.header
.include
.data
.routines
.code
	Notice that the exported lists are missing - this allows for much
more readable code. Just precede a variable/routine definition by global
to put it in the appropriate exported list.
	THE HEADER
	The header simply defines a bunch of constants that are read
by EuOS to find out information about the program. At the moment, the following
definitions are required in every program, though others may be added.
name="Program name" -- The name of the program
version=1.0	    -- The version of the program (must be major.minor)
stability=MEDIUM    -- Possible values: UNSTABLE (Alpha-test), LOW (Beta-test),
		    -- MEDIUM (unproven), HIGH (proven), ULTRA (crash-proof)
type=PROGRAM	    -- PROGRAM, DAEMON, DRIVER, LIBRARY, or SHELL
description="A program" -- Breif description
id=3932583955	    -- ID assigned by EuOS team to ensure security or -1
		    -- if no ID has been assigned

	THE .INCLUDE STATEMENT
	Format: .include file
	Includes a library. File is the definition file for the library.
	
	DATA
	Data simply contains a list of names of variables. Example:
.data
a
b
c
d
e
f
	ROUTINES
	A section filled with routines like so:
.routines
ROUTINE A:
	-- Code here
ROUTINE B:
	-- Code here

	CODE
	A section filled with inital code like so:
.code
-- Code here

INSTRUCTION FORMAT
	Each instruction follows the following format:
0 arguments: instruction
1 argument: instruction arg1
2 arguments: instruction arg1, arg2
3 arguments: instruction arg1, arg2, arg3

	There are several different types of arguments. Internally, they
are represented by only two types - either a VAL or a VAR. A VAL is an
object in the code (or one that is put in by the assembler). A VAR is
the actual value of a variable referenced by the argument.
TYPE					EXAMPLE	INTERNAL REPRESENTATION
literal					5	VAL (5)
variable				var	VAR (index of var)
routine					foo	VAL (index of foo)
label					loop	VAL (index of loop)
variable index				[var]	VAL (index of var)
routine index				[foo]	VAL (index of foo)
label instruction			[loop]	VAL (index of loop)
literal as a variable index		[1]	VAR (1)
literal as a routine index		[1]	VAL (1)
literal as a label (instruction)	[1]	VAL (1)

	Labels can be defined like so:
label:
	Labels are only valid inside the current routine. You can jump to
a label using jmp or cjmp. The internal value of a label is the instruction
index of the next instruction. You could call jmp [1] to jump to the first
instruction in the current routine. Note that instruction indexes are measured
from the first instruction in the current routine.

THE INSTRUCTIONS
	THE RESULT VARIABLE
	You can set which variable to store the result of some operations
	in by using setrv.
	setrv variable
		Result register = variable
	DATA MANIPULATION
	mov dest, source
		Puts data from source into dest
	movdi dest, source, index
		dest[index] = source
	movsi dest, source, index
		dest = source[index]
	add dest, value
		Adds value to dest
	sub dest, value
		Subtracts value from dest
	mul dest, value
		Multiplies dest by value
	div dest, value
		Divides dest by by value
	and dest, source
		dest = dest and source
	or dest, source
		dest = dest or source
	xor dest, source
		dest = dest xor source
	movlen dest, source
		dest = length(source)
	repeat dest, source
		dest = repeat(dest,source)
	append dest, source
		dest = append(dest,source)
	prepend dest, source
		dest = prepend(dest,source)
	concen dest, source
		dest &= source
	BUILT-IN TYPES
	isinteger val
		result variable = integer(val)
	isatom val
		result variable = atom(val)
	issequence val
		result variable = sequence(val)
	BITWISE LOGICAL INSTRUCTIONS
	andbits dest, source
		dest = and_bits(dest,source)
	orbits dest, source
		dest = or_bits(dest,source)
	xorbits dest, source
		dest = xor_bits(dest,source)
	notbits dest
		dest = not_bits(dest)
	MATH
	floor dest
		dest=floor(dest)
	sqrt desr
		dest=sqrt(dest)
	remainder dest, mod
		dest = remainder(dest,mod)
	rand dest, max
		dest = rand(max)
	sin dest
		dest = sin(dest)
	arcsin dest
		dest = arcsin(dest)
	cos dest
		dest = cos(dest)
	arccos dest
		dest = arccos(dest)
	tan dest
		dest = tan(dest)
	arctan dest
		dest = arctan(dest)
	log dest
		dest = log(dest) (base 10)
	power dest, power
		dest = power(dest,power)
	COMPARISON OF ATOMS (=,!=,<,>,<=,>=)
	ce a, b
		Result variable will be (a = b)
	cne a, b
		Result variable will be (a != b)
	cl a, b
		Result variable will be (a < b)
	cg a, b
		Result variable will be (a > b)
	cle a, b
		Result variable will be (a <= b)
	cge a, b
		Result variable will be (a >= b)
	cmp a, b
		Result variable will be compare(a,b)
	equ a, b
		Result variable will be equal(a,b)
	ROUTINES AND BRANCHES
	ret
		Returns from a routine
	call routine
		Calls a routine
	jmp label
		Jumps to label
	cjmp label, ifvar
		Jumps to label if ifvar != 0
	GENERAL
	end
		Ends the program
	err errorid
		Ends the program with error id errorid. The position
		in the program, stack, and variables are saved and written
		to a file along with the error id, and sent to the programmer.
	THE STACK
	push val
		Pushes val onto the stack.
	pop var
		Pops the last value on the stack into var.
	ldss var
		Loads the stack size into var.
	OPERATING SYSTEM INTERFACE
	oscall call
		Calls operating system call # call. Arguments are poped
		off the stack, so you need to push them onto the stack
		from left to right. The result is pushed onto the stack.