No Name Virtual Machine: A simple VM.
- Getting started
- Design
- Implementation
- Assembly
- N2 Language
- Command line options
- API
- Extras
- TODO list
- Licensing
- A C99+ compiler (gcc, clang, etc.).
- A standard C library (glibc, uClibc, musl, etc.)
- A GNU-compatible Makefile proccessor.
re2c
>= 1.1.1- Python >= 3.7.x
You will need to clone the repo, as following:
git clone https://github.com/Alvarito050506/n2vm.git
To build the VM and the compiler, do:
make
The assembler does not need to be compiled, since it's written in Python.
To install the VM, the compiler and the assembler in your system, run:
make install
This will probably require root/elevated permissions.
The n2vm architecture and design is heavily inspired by ARMv7+.
n2vm is a big endian machine, it means that the LSB (Less Significant Byte) of a word as the first byte. The word lenght is 32 bits, and the instruction length is fixed to 32 bits too.
All n2vm the instructions are 32-bit.
No operands (NP):
0x00 ----------
| opcode |
0x08 ----------
| cflags |
0x0c ----------
| ignore |
0x20 ----------
Single-operand (SR):
0x00 -------------
| opcode |
0x08 -------------
| cflags |
0x0c -------------
| reg/uint8 |
0x0f -------------
| ignore |
0x20 -------------
Immediate to register (RI/RA):
0x00 ----------
| opcode |
0x08 ----------
| cflags |
0x0c ----------
| reg |
0x0f ----------
| uint16 |
0x20 ----------
Register to register (RR):
0x00 ----------
| opcode |
0x08 ----------
| cflags |
0x0c ----------
| dst |
0x0f ----------
| sign |
0x10 ----------
| offset |
0x1c ----------
| src |
0x20 ----------
PC
is the program counter, gpr
the registers, and mem
the virtual memory, as defined here.
Mnemonic | Opcode | Template | Description |
---|---|---|---|
nop |
0x00 |
NP | No operation. |
inc |
0x01 |
SR | gpr[reg]++ |
dec |
0x02 |
SR | gpr[reg]-- |
add |
0x03 |
RR | gpr[dst] += gpr[src] |
sub |
0x04 |
RR | gpr[dst] -= gpr[src] |
mul |
0x05 |
RR | gpr[dst] *= gpr[src] |
div |
0x06 |
RR | gpr[dst] /= gpr[src] |
lri |
0x07 |
RI | gpr[reg] = uint16 |
lrt |
0x08 |
RI | Same as lri , but loads the uint16 at top. |
lrr |
0x09 |
RR | gpr[dst] = gpr[src] |
lrc |
0x0a |
RR | gpr[dst] = mem[gpr[src]] |
lrh |
0x0b |
RR | Same as lrc , but loads a 16-bits short . |
lrm |
0x0c |
RR | Same as lrc , but loads a 32-bits int . |
lmc |
0x0d |
RR | mem[gpr[dst]] = gpr[src] |
lmh |
0x0e |
RR | Same as lmc , but stores a 16-bits short . |
lmr |
0x0f |
RR | Same as lmc , but stores a 32-bits int . |
jmp |
0x10 |
SR | PC = gpr[src] |
cmp |
0x11 |
RR | Compares src and dst and sets the flags. |
shl |
0x12 |
RR | gpr[dst] <<= gpr[src] |
shr |
0x13 |
RR | gpr[dst] >>= gpr[src] |
ior |
0x14 |
RR | gpr[dst] |= gpr[src] |
xor |
0x15 |
RR | gpr[dst] ^= gpr[src] |
and |
0x16 |
RR | gpr[dst] &= gpr[src] |
not |
0x17 |
SR | gpr[dst] ~= gpr[dst] |
out |
0x18 |
RI/RR | Calls the I/O function ios[src] . |
inp |
0x19 |
RI/RR | See here. |
cll |
0x1a |
SR | Same as jmp , but stores PC in the stack. |
sys |
0x1b |
SR | Makes a system call to sys_tab[uint8] . |
ret |
0x1c |
NP | PC = stk[stc--] |
hlt |
0x1d |
NP | Halts the VM, stops the execution. |
All the operations can be conditionally executed, but only the cmp
operation can set the flags. The following is a list of avaiable flags:
al
:0b0000
. Execute always.eq
:0b1000
. Execute if equal.ne
:0b0100
. Execute if not equal.gt
:0b0010
. Execute if greater than.lt
:0b0001
. Execute if less than.
Some SR/RR instructions do not ignore the sign
and offset
fields, instead they "add" these bits to the operation. The jump instructions jmp
and cll
add or substract offset
to jump location depending on sign
. Something similar happens with the arithmetic and logical operations, for example:
add 0x00, 0x01 +0x02 ; Adds the value of the register 0x01 plus 0x02 to the register 0x00.
sub 0x03, 0x04 +0x05 ; Substracts the value of the register 0x04 plus 0x05 to the register 0x03.
This could be used to implement position independent code, for example:
jmp 0x0f +0x08 ; Jumps to `other_place`, using a PC-relative jump.
nop
.other_place:
; Etc...
n2vm is like any other simple VM: Loads the bytecode and executes it until a hlt
instruction or an error. The implementation uses an array of function pointers (op_t
) to store the implementation of each opcode. Then it enters a loop that checks the flags and executes the next instruction:
# Python pseudocode.
while running:
if not is_valid_op(op):
return -1;
if not check_flags(flags):
continue;
if ops[op](vm, reg, val) != 0:
return -1;
The n2vm assembly is based on a simplified version of the GNU Assembly, where the "destination" comes before the "source":
; NP example
; Does nothing
nop
; SR example
; Increments the register 0x00
inc 0x00
; RI example
; Loads 0xcafe into the register 0x00
lri 0x00, 0xcafe
; RR example
; Loads the value of the register 0x01 into the register 0x01
lrr 0x01, 0x01
The execution condition can be added at the end of the instruction:
; Conditional execution example
; Jumps to the address loaded in the register 0x02 if the values
; of the registers 0x01 and 0x00 are equal.
cmp 0x00, 0x01
jmp 0x01 !eq
To store arbitrary data in memory, you can use the .data
pseudo-instruction:
; Arbitrary data example
; String
.data "Hello!\x00"
; Integer
.data 0xbebe
The syntax for labels is a bit particular:
; Labels example
; Main function, entry point.
; Loads the address of `abc` in the register 0x00, and then halts the VM.
.main:
lri 0x00, @abc
hlt
; Arbitrary data.
.abc:
.data 0xabc01
The inp
instruction behaves so or less like out
, except that it's used for input. Handling input requires and a special case, where the source comes before the destination:
; `inp` instruction example
; Gets a character from the user (I/O port 0x01), and stores it into
; the register 0x04.
inp 0x01, 0x04
You check the test
folder for more examples.
The N2 Language (a.k.a. N2C), is a toy high-level language created for n2vm. It only supports a few basic operations, and more complex ones could be implemented in assembly. It is based on a mix of C and assembly, following the "everything is an pointer" philosophy.
- Keywords:
func
,var
,asm
,call
,return
,cmp
,goto
. - Data types: pointers.
- Comments: inline (
//
), multiline (/*...*/
)
To declare variables (translated to labels), you can use the var
keyword. Variables must be initialized in the declaration.
/* Variables example. */
var my_int = 0x00; // Integer.
var my_str = "This is a string!\x00"; // Bytes/string.
You can also reassign existing variables as following.
my_int = 0x40000; // Reassigns `my_int` to a 32-bit integer.
To define functions (translated to labels), you can use the func
keyword. And to return a specific value from functions, you can use the return
keyword. There must be a main
function in each program.
/* Functions example. */
func main
{
return 0x00; // Returns 0x00.
}
To call a function, you can use the call
keyword.
/* Function call example. */
func main
{
call dummy; // Calls the `dummy` function.
return 0x00;
}
func dummy
{
return 0x01; // Returns 0x00.
}
The inline assembly is the following.
asm "lri 0x00, 0x01"; // Injects `lri 0x00, 0x01` in the code.
You can also directly manipulate registers ($r0...$r15
).
$r1 = 0x05; // Same as `lri 0x01, 0x05`
You can directly execute the cmp
and jmp
instructions using the cmp
and goto
keywords:
cmp $r1 $r2;
goto some_label;
That is translated to:
; cmp $r1 $r2
cmp 0x01, 0x02
; goto some_label
lri 0x00, @some_label
jmp 0x00
You check the test
folder for more examples.
Usage: n2vm [options] file
Options:
--help Display this help and exit.
-v Displays the information about this VM.
Usage: n2as [options] file
Options:
--help Display this help and exit.
--output=FILE Place the output into <FILE>.
-o FILE Same as --output.
-h Same as --help.
Usage: n2cc [options] file
Options:
--help Display this help and exit.
--output=FILE Place the output into <FILE>.
--no-preproc Do not preprocess.
-o FILE Same as --output.
-h Same as --help.
-n Same as --no-preproc.
-S Compile only; do not assemble.
There's a low-level API to embedding n2vm into other software, exposed via the libn2vm.so
library.
Represents a VM and its current state.
typedef struct n2vm_t {
unsigned char* mem; /* Virtual memory */
unsigned int gpr[16]; /* Registers */
unsigned int* stack; /* Stack pointer */
unsigned int* sys_tab; /* Syscall table pointer */
unsigned int flags; /* Conditional flags */
int running; /* 0 = true, 1 = false */
int stc; /* Stack counter */
int mem_sz; /* Memory size */
int stk_sz; /* Stack size */
int sys_sz; /* Syscall table size */
op_t ios[16]; /* I/O functions */
int ioc; /* Count of assigned I/O ports */
} n2vm_t;
A function pointer to opcode implementations and I/O functions.
typedef int (*op_t)(n2vm_t* vm, unsigned char reg, unsigned short val);
Initializes all the VMs (assigns the opcodes to their implementations). Returns 0
on success.
Returns a pointer to a new VM. All the arguments are required.
mem_min
: Required memory size (in bytes).mem_max
: Wanted memory size (in bytes).stack_max
: Required stack size.sys_max
: Required syscall table size.
Returns NULL
and sets errno
to ENOMEM
if it fails to allocate at least sizeof(n2vm_t) + mem_min
bytes of memory.
Tries to bind (assign) the handler
function to an I/O port (dereferenced from index
) of vm
.
vm
: A pointer to the VM.handler
: A pointer to a function matching (or compatible with) theop_t
prototype.index
: A pointer to the wanted I/O port number.-1
forvm->ioc++
.
Returns the I/O port number assigned to handler
on success. It could be other number than *index
if it is already handled/binded, you should check for this if your code depends on a specific port.
Returns -1
if vm
, handler
or index
are NULL
, or if there are no I/O ports avaiable.
Executes the code in vm
. Returns 0
on success, and -1
if there is an error in the code or the runtime or if vm
, vm->mem
, vm->stack
, vm->sys_tab
or are NULL
.
Frees vm
and its memory. Returns 0
on success, and -1
if vm
or vm->mem
are NULL
.
A good example of how to use these functions and types can be found in the main.c
file of the VM itself.
You can found syntaxes for nano and Geany in the cfg
directory.
- Allow
n2cc
to generate position-independent code, and enable it by default. - Build a C-like preprocessor for
n2cc
. - Check how endianess-independent all the programs (
n2vm
,n2as
,n2cc
) actually are.
All the code of this project is licensed under the GNU General Public License version 2.0 (GPL-2.0).
All the documentation of this project is licensed under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.