diff --git a/threads-intro/README.md b/threads-intro/README.md
new file mode 100644
index 00000000..61511098
--- /dev/null
+++ b/threads-intro/README.md
@@ -0,0 +1,351 @@
+
+# Overview
+
+Welcome to this simulator! The idea is to gain familiarity with threads by
+seeing how they interleave; the simulator, `x86.py`, will help you in
+gaining this understanding.
+
+The simulator mimicks the execution of short assembly sequences by multiple
+threads. Note that the OS code that would run (for example, to perform a
+context switch) is *not* shown; thus, all you see is the interleaving of the
+user code.
+
+The assembly code that is run is based on x86, but somewhat simplified. 
+In this instruction set, there are four general-purpose registers 
+(%ax, %bx, %cx, %dx), a program counter (PC), and a small set of instructions
+which will be enough for our purposes.
+
+Here is an example code snippet that we will be able to run:
+
+```sh
+.main
+mov 2000, %ax   # get the value at the address
+add $1, %ax     # increment it
+mov %ax, 2000   # store it back
+halt
+```
+
+The code is easy to understand. The first instruction, an x86 "mov", simply
+loads a value from the address specified by 2000 into the register %ax.
+Addresses, in this subset of x86, can take some of the following forms:
+
+- `2000` : the number (2000) is the address
+- `(%cx)` : contents of register (in parentheses) forms the address
+- `1000(%dx)` : the number + contents of the register form the address
+- `10(%ax,%bx)` : the number + reg1 + reg2 forms the address
+
+To store a value, the same `mov` instruction is used, but this time with the
+arguments reversed, e.g.:
+
+```sh
+mov %ax, 2000
+```
+
+The `add` instruction, from the sequence above, should be clear: it adds an
+immediate value (specified by `$1`) to the register specified in the second
+argument (i.e., `%ax = %ax + 1`).
+
+Thus, we now can understand the code sequence above: it loads the value at
+address 2000, adds 1 to it, and then stores the value back into address 2000.
+
+The fake-ish `halt` instruction just stops running this thread.
+
+Let's run the simulator and see how this all works! Assume the above code
+sequence is in the file `simple-race.s`.
+
+```sh
+prompt> ./x86.py -p simple-race.s -t 1 
+
+       Thread 0
+1000 mov 2000, %ax
+1001 add $1, %ax
+1002 mov %ax, 2000
+1003 halt
+
+prompt> 
+```
+
+The arguments used here specify the program (`-p`), the number of
+threads (`-t 1`), and the interrupt interval, which is how often a
+scheduler will be woken and run to switch to a different task. Because
+there is only one thread in this example, this interval does not
+matter.
+
+The output is easy to read: the simulator prints the program counter (here
+shown from 1000 to 1003) and the instruction that gets executed. Note that we
+assume (unrealistically) that all instructions just take up a single byte in
+memory; in x86, instructions are variable-sized and would take up from one to
+a small number of bytes. 
+
+We can use more detailed tracing to get a better sense of how machine state
+changes during the execution:
+
+```sh
+prompt> ./x86.py -p simple-race.s -t 1 -M 2000 -R ax,bx
+
+ 2000      ax    bx          Thread 0
+    ?       ?     ?
+    ?       ?     ?   1000 mov 2000, %ax
+    ?       ?     ?   1001 add $1, %ax
+    ?       ?     ?   1002 mov %ax, 2000
+    ?       ?     ?   1003 halt
+
+Oops! Forgot the -c flag (which actually computes the answers for you).
+
+prompt> ./x86.py -p simple-race.s -t 1 -M 2000 -R ax,bx -c
+
+ 2000      ax    bx          Thread 0
+    0       0     0
+    0       0     0   1000 mov 2000, %ax
+    0       1     0   1001 add $1, %ax
+    1       1     0   1002 mov %ax, 2000
+    1       1     0   1003 halt
+```
+
+By using the `-M` flag, we can trace memory locations (a
+comma-separated list lets you trace more than one, e.g., 2000,3000);
+by using the `-R` flag we can track the values inside specific
+registers.
+
+The values on the left show the memory/register contents AFTER the instruction
+on the right has executed. For example, after the `add` instruction, you can
+see that %ax has been incremented to the value 1; after the second `mov`
+instruction (at PC=1002), you can see that the memory contents at 2000 are
+now also incremented.
+
+There are a few more instructions you'll need to know, so let's get to them
+now. Here is a code snippet of a loop:
+
+```sh
+.main
+.top
+sub  $1,%dx
+test $0,%dx     
+jgte .top         
+halt
+```
+
+A few things have been introduced here. First is the `test` instruction.
+This instruction takes two arguments and compares them; it then sets implicit
+"condition codes" (kind of like 1-bit registers) which subsequent instructions
+can act upon.
+
+In this case, the other new instruction is the `jump` instruction (in this
+case, `jgte` which stands for "jump if greater than or equal to"). This
+instruction jumps if the second value is greater than or equal to the first
+in the test.
+
+One last point: to really make this code work, `dx` must be initialized to 1 or
+greater. 
+
+Thus, we run the program like this:
+
+```sh
+prompt> ./x86.py -p loop.s -t 1 -a dx=3 -R dx -C -c
+
+   dx   >= >  <= <  != ==        Thread 0
+    3   0  0  0  0  0  0
+    2   0  0  0  0  0  0  1000 sub  $1,%dx
+    2   1  1  0  0  1  0  1001 test $0,%dx
+    2   1  1  0  0  1  0  1002 jgte .top
+    1   1  1  0  0  1  0  1000 sub  $1,%dx
+    1   1  1  0  0  1  0  1001 test $0,%dx
+    1   1  1  0  0  1  0  1002 jgte .top
+    0   1  1  0  0  1  0  1000 sub  $1,%dx
+    0   1  0  1  0  0  1  1001 test $0,%dx
+    0   1  0  1  0  0  1  1002 jgte .top
+   -1   1  0  1  0  0  1  1000 sub  $1,%dx
+   -1   0  0  1  1  1  0  1001 test $0,%dx
+   -1   0  0  1  1  1  0  1002 jgte .top
+   -1   0  0  1  1  1  0  1003 halt
+```
+
+The `-R dx` flag traces the value of %dx; the `-C` flag traces the values of
+the condition codes that get set by a test instruction. Finally, the `-a dx=3`
+flag sets the `%dx` register to the value 3 to start with. 
+
+As you can see from the trace, the `sub` instruction slowly lowers the value
+of %dx. The first few times `test` is called, only the ">=", ">", and "!="
+conditions get set. However, the last `test` in the trace finds %dx and 0 to
+be equal, and thus the subsequent jump does NOT take place, and the program
+finally halts.
+
+Now, finally, we get to a more interesting case, i.e., a race condition with
+multiple threads. Let's look at the code first:
+
+```sh
+.main
+.top
+# critical section
+mov 2000, %ax       # get the value at the address
+add $1, %ax         # increment it
+mov %ax, 2000       # store it back
+
+# see if we're still looping
+sub  $1, %bx
+test $0, %bx
+jgt .top
+
+halt
+```
+
+The code has a critical section which loads the value of a variable 
+(at address 2000), then adds 1 to the value, then stores it back. 
+
+The code after just decrements a loop counter (in %bx), tests if it
+is greater than or equal to zero, and if so, jumps back to the top
+to the critical section again.
+
+```sh
+prompt> ./x86.py -p looping-race-nolock.s -t 2 -a bx=1 -M 2000 -c
+
+ 2000      bx          Thread 0                Thread 1
+    0       1
+    0       1   1000 mov 2000, %ax
+    0       1   1001 add $1, %ax
+    1       1   1002 mov %ax, 2000
+    1       0   1003 sub  $1, %bx
+    1       0   1004 test $0, %bx
+    1       0   1005 jgt .top
+    1       0   1006 halt
+    1       1   ----- Halt;Switch -----  ----- Halt;Switch -----
+    1       1                            1000 mov 2000, %ax
+    1       1                            1001 add $1, %ax
+    2       1                            1002 mov %ax, 2000
+    2       0                            1003 sub  $1, %bx
+    2       0                            1004 test $0, %bx
+    2       0                            1005 jgt .top
+    2       0                            1006 halt
+```
+
+Here you can see each thread ran once, and each updated the shared
+variable at address 2000 once, thus resulting in a count of two there.
+
+The `Halt;Switch` line is inserted whenever a thread halts and another
+thread must be run.
+
+One last example: run the same thing above, but with a smaller interrupt
+frequency. Here is what that will look like:
+
+```sh
+prompt> ./x86.py -p looping-race-nolock.s -t 2 -a bx=1 -M 2000 -i 2
+
+ 2000          Thread 0                Thread 1
+    ?
+    ?   1000 mov 2000, %ax
+    ?   1001 add $1, %ax
+    ?   ------ Interrupt ------  ------ Interrupt ------
+    ?                            1000 mov 2000, %ax
+    ?                            1001 add $1, %ax
+    ?   ------ Interrupt ------  ------ Interrupt ------
+    ?   1002 mov %ax, 2000
+    ?   1003 sub  $1, %bx
+    ?   ------ Interrupt ------  ------ Interrupt ------
+    ?                            1002 mov %ax, 2000
+    ?                            1003 sub  $1, %bx
+    ?   ------ Interrupt ------  ------ Interrupt ------
+    ?   1004 test $0, %bx
+    ?   1005 jgt .top
+    ?   ------ Interrupt ------  ------ Interrupt ------
+    ?                            1004 test $0, %bx
+    ?                            1005 jgt .top
+    ?   ------ Interrupt ------  ------ Interrupt ------
+    ?   1006 halt
+    ?   ----- Halt;Switch -----  ----- Halt;Switch -----
+    ?                            1006 halt
+```
+
+As you can see, each thread is interrupt every 2 instructions, as we specify
+via the `-i 2` flag. What is the value of memory[2000] throughout this run?
+What should it have been?
+
+Now let's give a little more information on what can be simulated
+with this program. The full set of registers: %ax, %bx, %cx, %dx, and the PC. 
+In this version, there is no support for a "stack", nor are there call
+and return instructions.
+
+The full set of instructions simulated are:
+
+```sh
+mov immediate, register     # moves immediate value to register
+mov memory, register        # loads from memory into register
+mov register, register      # moves value from one register to other
+mov register, memory        # stores register contents in memory
+mov immediate, memory       # stores immediate value in memory
+
+add immediate, register     # register  = register  + immediate
+add register1, register2    # register2 = register2 + register1
+sub immediate, register     # register  = register  - immediate
+sub register1, register2    # register2 = register2 - register1
+
+test immediate, register    # compare immediate and register (set condition codes)
+test register, immediate    # same but register and immediate
+test register, register     # same but register and register
+
+jne                         # jump if test'd values are not equal
+je                          #                       ... equal
+jlt                         #     ... second is less than first
+jlte                        #               ... less than or equal
+jgt                         #            ... is greater than
+jgte                        #               ... greater than or equal
+
+xchg register, memory       # atomic exchange: 
+                            #   put value of register into memory
+                            #   return old contents of memory into reg
+                            # do both things atomically
+
+nop                         # no op
+```
+
+Notes: 
+- 'immediate' is something of the form $number
+- 'memory' is of the form 'number' or '(reg)' or 'number(reg)' or 'number(reg,reg)' (as described above)
+- 'register' is one of %ax, %bx, %cx, %dx
+
+Finally, here are the full set of options to the simulator are available with the `-h` flag: 
+
+```sh
+Usage: x86.py [options]
+
+Options:
+  -h, --help            show this help message and exit
+  -s SEED, --seed=SEED  the random seed
+  -t NUMTHREADS, --threads=NUMTHREADS
+                        number of threads
+  -p PROGFILE, --program=PROGFILE
+                        source program (in .s)
+  -i INTFREQ, --interrupt=INTFREQ
+                        interrupt frequency
+  -r, --randints        if interrupts are random
+  -a ARGV, --argv=ARGV  comma-separated per-thread args (e.g., ax=1,ax=2 sets
+                        thread 0 ax reg to 1 and thread 1 ax reg to 2);
+                        specify multiple regs per thread via colon-separated
+                        list (e.g., ax=1:bx=2,cx=3 sets thread 0 ax and bx and
+                        just cx for thread 1)
+  -L LOADADDR, --loadaddr=LOADADDR
+                        address where to load code
+  -m MEMSIZE, --memsize=MEMSIZE
+                        size of address space (KB)
+  -M MEMTRACE, --memtrace=MEMTRACE
+                        comma-separated list of addrs to trace (e.g.,
+                        20000,20001)
+  -R REGTRACE, --regtrace=REGTRACE
+                        comma-separated list of regs to trace (e.g.,
+                        ax,bx,cx,dx)
+  -C, --cctrace         should we trace condition codes
+  -S, --printstats      print some extra stats
+  -c, --compute         compute answers for me
+```
+
+Most are obvious. Usage of `-r` turns on a random interrupter (from 1
+to intfreq as specified by `-i`), which can make for more fun during
+homework problems.
+
+- `-L` specifies where in the address space to load the code.
+- `-m` specified the size of the address space (in KB).
+- `-S` prints some extra stats
+- `-c` is not really used (unlike most simulators in the book); use the tracing or condition codes.
+
+Now you have the basics in place; read the questions at the end of the chapter
+to study this race condition and related issues in more depth.
+
diff --git a/threads-intro/loop.s b/threads-intro/loop.s
new file mode 100644
index 00000000..6ff0dc80
--- /dev/null
+++ b/threads-intro/loop.s
@@ -0,0 +1,6 @@
+.main
+.top
+sub  $1,%dx
+test $0,%dx     
+jgte .top         
+halt
diff --git a/threads-intro/looping-race-nolock.s b/threads-intro/looping-race-nolock.s
new file mode 100644
index 00000000..396704f0
--- /dev/null
+++ b/threads-intro/looping-race-nolock.s
@@ -0,0 +1,15 @@
+# assumes %bx has loop count in it
+
+.main
+.top	
+# critical section
+mov 2000, %ax  # get 'value' at address 2000
+add $1, %ax    # increment it
+mov %ax, 2000  # store it back
+
+# see if we're still looping
+sub  $1, %bx
+test $0, %bx
+jgt .top	
+
+halt
diff --git a/threads-intro/simple-race.s b/threads-intro/simple-race.s
new file mode 100644
index 00000000..f3b329eb
--- /dev/null
+++ b/threads-intro/simple-race.s
@@ -0,0 +1,6 @@
+.main
+# this is a critical section
+mov 2000(%bx), %ax   # get the value at the address
+add $1, %ax          # increment it
+mov %ax, 2000(%bx)   # store it back
+halt
diff --git a/threads-intro/wait-for-me.s b/threads-intro/wait-for-me.s
new file mode 100644
index 00000000..993606df
--- /dev/null
+++ b/threads-intro/wait-for-me.s
@@ -0,0 +1,13 @@
+.main
+test $1, %ax     # ax should be 1 (signaller) or 0 (waiter)
+je .signaller
+
+.waiter	
+mov  2000, %cx
+test $1, %cx
+jne .waiter
+halt
+
+.signaller
+mov  $1, 2000
+halt
diff --git a/threads-intro/x86.py b/threads-intro/x86.py
new file mode 100755
index 00000000..c76db1bd
--- /dev/null
+++ b/threads-intro/x86.py
@@ -0,0 +1,999 @@
+#! /usr/bin/env python
+
+from __future__ import print_function
+import sys
+import time
+import random
+from optparse import OptionParser
+
+# to make Python2 and Python3 act the same -- how dumb
+def random_seed(seed):
+    try:
+        random.seed(seed, version=1)
+    except:
+        random.seed(seed)
+    return
+
+#
+# HELPER
+#
+def dospace(howmuch):
+    for i in range(howmuch):
+        print('%24s' % ' ', end=' ')
+
+# useful instead of assert
+def zassert(cond, str):
+    if cond == False:
+        print('ABORT::', str)
+        exit(1)
+    return
+
+class cpu:
+    #
+    # INIT: how much memory?
+    #
+    def __init__(self, memory, memtrace, regtrace, cctrace, compute, verbose):
+        #
+        # CONSTANTS
+        #
+        
+        # conditions
+        self.COND_GT        = 0
+        self.COND_GTE       = 1
+        self.COND_LT        = 2
+        self.COND_LTE       = 3
+        self.COND_EQ        = 4
+        self.COND_NEQ       = 5
+
+        # registers in system
+        self.REG_ZERO       = 0
+        self.REG_AX         = 1
+        self.REG_BX         = 2
+        self.REG_CX         = 3
+        self.REG_DX         = 4
+        self.REG_SP         = 5
+        self.REG_BP         = 6
+
+        # system memory: in KB
+        self.max_memory     = memory * 1024
+
+        # which memory addrs and registers to trace?
+        self.memtrace       = memtrace
+        self.regtrace       = regtrace
+        self.cctrace        = cctrace
+        self.compute        = compute
+        self.verbose        = verbose
+
+        self.PC             = 0
+        self.registers      = {}
+        self.conditions     = {}
+        self.labels         = {}
+        self.vars           = {}
+        self.memory         = {}
+        self.pmemory        = {}  # for printable version of what's in memory (instructions)
+
+        self.condlist       = [self.COND_GTE, self.COND_GT, self.COND_LTE, self.COND_LT, self.COND_NEQ, self.COND_EQ]
+        self.regnums        = [self.REG_ZERO, self.REG_AX,  self.REG_BX,   self.REG_CX,  self.REG_DX,   self.REG_SP,  self.REG_BP]
+
+        self.regnames         = {}
+        self.regnames['zero'] = self.REG_ZERO # hidden zero-valued register
+        self.regnames['ax']   = self.REG_AX
+        self.regnames['bx']   = self.REG_BX
+        self.regnames['cx']   = self.REG_CX
+        self.regnames['dx']   = self.REG_DX
+        self.regnames['sp']   = self.REG_SP
+        self.regnames['bp']   = self.REG_BP
+
+        tmplist = []
+        for r in self.regtrace:
+            zassert(r in self.regnames, 'Register %s cannot be traced because it does not exist' % r)
+            tmplist.append(self.regnames[r])
+        self.regtrace = tmplist
+
+        self.init_memory()
+        self.init_registers()
+        self.init_condition_codes()
+
+    #
+    # BEFORE MACHINE RUNS
+    #
+    def init_condition_codes(self):
+        for c in self.condlist:
+            self.conditions[c] = False
+
+    def init_memory(self):
+        for i in range(self.max_memory):
+            self.memory[i] = 0
+
+    def init_registers(self):
+        for i in self.regnums:
+            self.registers[i] = 0
+
+    def dump_memory(self):
+        print('MEMORY DUMP')
+        for i in range(self.max_memory):
+            if i not in self.pmemory and i in self.memory and self.memory[i] != 0:
+                print('  m[%d]' % i, self.memory[i])
+
+    #
+    # INFORMING ABOUT THE HARDWARE
+    #
+    def get_regnum(self, name):
+        assert(name in self.regnames)
+        return self.regnames[name]
+
+    def get_regname(self, num):
+        assert(num in self.regnums)
+        for rname in self.regnames:
+            if self.regnames[rname] == num:
+                return rname
+        return ''
+    
+    def get_regnums(self):
+        return self.regnums
+
+    def get_condlist(self):
+        return self.condlist
+
+    def get_reg(self, reg):
+        assert(reg in self.regnums)
+        return self.registers[reg]
+
+    def get_cond(self, cond):
+        assert(cond in self.condlist)
+        return self.conditions[cond]
+
+    def get_pc(self):
+        return self.PC
+        
+    def set_reg(self, reg, value):
+        assert(reg in self.regnums)
+        self.registers[reg] = value
+
+    def set_cond(self, cond, value):
+        assert(cond in self.condlist)
+        self.conditions[cond] = value
+
+    def set_pc(self, pc):
+        self.PC = pc
+        
+    #
+    # INSTRUCTIONS
+    #
+    def halt(self):
+        return -1
+
+    def iyield(self):
+        return -2
+
+    def nop(self):
+        return 0
+
+    def rdump(self):
+        print('REGISTERS::', end=' ')
+        print('ax:', self.registers[self.REG_AX], end=' ')
+        print('bx:', self.registers[self.REG_BX], end=' ')
+        print('cx:', self.registers[self.REG_CX], end=' ')
+        print('dx:', self.registers[self.REG_DX], end=' ')
+
+    def mdump(self, index):
+        print('m[%d] ' % index, self.memory[index])
+
+    def move_i_to_r(self, src, dst):
+        self.registers[dst] = src
+        return 0
+
+    # memory: value, register, register
+    def move_i_to_m(self, src, value, reg1, reg2):
+        tmp = value + self.registers[reg1] + self.registers[reg2]
+        self.memory[tmp] = src
+        return 0
+
+    def move_m_to_r(self, value, reg1, reg2, dst):
+        tmp = value + self.registers[reg1] + self.registers[reg2]
+        # print 'doing mov', 'val:', value, 'r1:', self.get_regname(reg1), self.registers[reg1], 'r2:', self.get_regname(reg2), self.registers[reg2], 'dst', self.get_regname(dst), 'tmp', tmp, 'reg[dst]', self.registers[dst], 'mem', self.memory[tmp]
+        self.registers[dst] = self.memory[tmp] 
+
+    def move_r_to_m(self, src, value, reg1, reg2):
+        tmp = value + self.registers[reg1] + self.registers[reg2]
+        self.memory[tmp] = self.registers[src]
+        return 0
+
+    def move_r_to_r(self, src, dst):
+        self.registers[dst] = self.registers[src]
+        return 0
+
+    def add_i_to_r(self, src, dst):
+        self.registers[dst] += src
+        return 0
+
+    def add_r_to_r(self, src, dst):
+        self.registers[dst] += self.registers[src]
+        return 0
+
+    def sub_i_to_r(self, src, dst):
+        self.registers[dst] -= src
+        return 0
+
+    def sub_r_to_r(self, src, dst):
+        self.registers[dst] -= self.registers[src]
+        return 0
+
+
+    #
+    # SUPPORT FOR LOCKS
+    #
+    def atomic_exchange(self, src, value, reg1, reg2):
+        tmp                 = value + self.registers[reg1] + self.registers[reg2]
+        old                 = self.memory[tmp]
+        self.memory[tmp]    = self.registers[src]
+        self.registers[src] = old
+        return 0
+
+    def fetchadd(self, src, value, reg1, reg2):
+        tmp                 = value + self.registers[reg1] + self.registers[reg2]
+        old                 = self.memory[tmp]
+        self.memory[tmp]    = self.memory[tmp] + self.registers[src] 
+        self.registers[src] = old
+
+    #
+    # TEST for conditions
+    #
+    def test_all(self, src, dst):
+        self.init_condition_codes()
+        if dst > src:
+            self.conditions[self.COND_GT]  = True
+        if dst >= src:
+            self.conditions[self.COND_GTE] = True
+        if dst < src:
+            self.conditions[self.COND_LT]  = True
+        if dst <= src:
+            self.conditions[self.COND_LTE] = True
+        if dst == src:
+            self.conditions[self.COND_EQ]  = True
+        if dst != src:
+            self.conditions[self.COND_NEQ] = True
+        return 0
+
+    def test_i_r(self, src, dst):
+        self.init_condition_codes()
+        return self.test_all(src, self.registers[dst])
+
+    def test_r_i(self, src, dst):
+        self.init_condition_codes()
+        return self.test_all(self.registers[src], dst)
+
+    def test_r_r(self, src, dst):
+        self.init_condition_codes()
+        return self.test_all(self.registers[src], self.registers[dst])
+
+    #
+    # JUMPS
+    #
+    def jump(self, targ):
+        self.PC = targ  
+        return 0
+    
+    def jump_notequal(self, targ):
+        if self.conditions[self.COND_NEQ] == True:
+            self.PC = targ
+        return 0
+
+    def jump_equal(self, targ):
+        if self.conditions[self.COND_EQ] == True:
+            self.PC = targ
+        return 0
+
+    def jump_lessthan(self, targ):
+        if self.conditions[self.COND_LT] == True:
+            self.PC = targ
+        return 0
+
+    def jump_lessthanorequal(self, targ):
+        if self.conditions[self.COND_LTE] == True:
+            self.PC = targ
+        return 0
+
+    def jump_greaterthan(self, targ):
+        if self.conditions[self.COND_GT] == True:
+            self.PC = targ
+        return 0
+
+    def jump_greaterthanorequal(self, targ):
+        if self.conditions[self.COND_GTE] == True:
+            self.PC = targ
+        return 0
+
+    #
+    # CALL and RETURN
+    #
+    def call(self, targ):
+        self.registers[self.REG_SP] -= 4
+        self.memory[self.registers[self.REG_SP]] = self.PC 
+        self.PC = targ
+
+    def ret(self):
+        self.PC = self.memory[self.registers[self.REG_SP]]
+        self.registers[self.REG_SP] += 4
+
+    #
+    # STACK and related
+    #
+    def push_r(self, reg):
+        self.registers[self.REG_SP] -= 4
+        self.memory[self.registers[self.REG_SP]] = self.registers[reg]
+        return 0
+
+    def push_m(self, value, reg1, reg2):
+        # print 'push_m', value, reg1, reg2
+        self.registers[self.REG_SP] -= 4
+        tmp = value + self.registers[reg1] + self.registers[reg2]
+        # push address onto stack, not memory value itself
+        self.memory[self.registers[self.REG_SP]] = tmp
+        return 0
+
+    def pop(self):
+        self.registers[self.REG_SP] += 4
+
+    def pop_r(self, dst):
+        self.registers[dst] = self.registers[self.REG_SP]
+        self.registers[self.REG_SP] += 4
+
+    #
+    # HELPER func for getarg
+    #
+    def register_translate(self, r):
+        if r in self.regnames:
+            return self.regnames[r]
+        zassert(False, 'Register %s is not a valid register' % r)
+        return
+
+    #
+    # HELPER in parsing mov (quite primitive) and other ops
+    # returns: (value, type)
+    # where type is (TYPE_REGISTER, TYPE_IMMEDIATE, TYPE_MEMORY)
+    # 
+    # FORMATS
+    #    %ax           - register
+    #    $10           - immediate
+    #    10            - direct memory
+    #    10(%ax)       - memory + reg indirect
+    #    10(%ax,%bx)   - memory + 2 reg indirect
+    #    10(%ax,%bx,4) - XXX (not handled)
+    #
+    def getarg(self, arg):
+        tmp1 = arg.replace(',', '')
+        tmp  = tmp1.replace(' \t', '')
+
+        if tmp[0] == '$':
+            zassert(len(tmp) == 2, 'correct form is $number (not %s)' % tmp)
+            value = tmp.split('$')[1]
+            zassert(value.isdigit(), 'value [%s] must be a digit' % value)
+            return int(value), 'TYPE_IMMEDIATE'
+        elif tmp[0] == '%':
+            register = tmp.split('%')[1]
+            return self.register_translate(register), 'TYPE_REGISTER'
+        elif tmp[0] == '(':
+            register = tmp.split('(')[1].split(')')[0].split('%')[1]
+            return '%d,%d,%d' % (0, self.register_translate(register), self.register_translate('zero')), 'TYPE_MEMORY'
+        elif tmp[0] == '.':
+            targ = tmp
+            return targ, 'TYPE_LABEL'
+        elif tmp[0].isalpha() and not tmp[0].isdigit():
+            zassert(tmp in self.vars, 'Variable %s is not declared' % tmp)
+            # print '%d,%d,%d' % (self.vars[tmp], self.register_translate('zero'), self.register_translate('zero')), 'TYPE_MEMORY'
+            return '%d,%d,%d' % (self.vars[tmp], self.register_translate('zero'), self.register_translate('zero')), 'TYPE_MEMORY'
+        elif tmp[0].isdigit() or tmp[0] == '-':
+            # MOST GENERAL CASE: number(reg,reg) or number(reg)
+            # we ignore the common x86 number(reg,reg,constant) for now
+            neg = 1
+            if tmp[0] == '-':
+                tmp = tmp[1:]
+                neg = -1
+            s = tmp.split('(')
+            if len(s) == 1:
+                value = neg * int(tmp)
+                # print '%d,%d,%d' % (int(value), self.register_translate('zero'), self.register_translate('zero')), 'TYPE_MEMORY'
+                return '%d,%d,%d' % (int(value), self.register_translate('zero'), self.register_translate('zero')), 'TYPE_MEMORY'
+            elif len(s) == 2:
+                value = neg * int(s[0])
+                t = s[1].split(')')[0].split(',')
+                if len(t) == 1:
+                    register = t[0].split('%')[1]
+                    # print '%d,%d,%d' % (int(value), self.register_translate(register), self.register_translate('zero')), 'TYPE_MEMORY'
+                    return '%d,%d,%d' % (int(value), self.register_translate(register), self.register_translate('zero')), 'TYPE_MEMORY'
+                elif len(t) == 2:
+                    register1 = t[0].split('%')[1]
+                    register2 = t[1].split('%')[1]
+                    # print '%d,%d,%d' % (int(value), self.register_translate(register1), self.register_translate(register2)), 'TYPE_MEMORY'
+                    return '%d,%d,%d' % (int(value), self.register_translate(register1), self.register_translate(register2)), 'TYPE_MEMORY'
+            else:
+                print('mov: bad argument [%s]' % tmp)
+                exit(1)
+                return
+        zassert(True, 'mov: bad argument [%s]' % arg)
+        return
+
+    #
+    # LOAD a program into memory
+    # make it ready to execute
+    #
+    def load(self, infile, loadaddr):
+        pc   = int(loadaddr)
+        fd   = open(infile)
+
+        bpc  = loadaddr
+        data = 100
+
+        for line in fd:
+            cline = line.rstrip()
+            # print 'PASS 1', cline
+
+            # remove everything after the comment marker
+            ctmp = cline.split('#')
+            assert(len(ctmp) == 1 or len(ctmp) == 2)
+            if len(ctmp) == 2:
+                cline = ctmp[0]
+
+            # remove empty lines, and split line by spaces
+            tmp = cline.split()
+            if len(tmp) == 0:
+                continue
+
+            # only pay attention to labels and variables
+            if tmp[0] == '.var':
+                assert(len(tmp) == 2)
+                assert(tmp[0] not in self.vars)
+                self.vars[tmp[1]] = data
+                data += 4
+                zassert(data < bpc, 'Load address overrun by static data')
+                if self.verbose: print('ASSIGN VAR', tmp[0], "-->", tmp[1], self.vars[tmp[1]])
+            elif tmp[0][0] == '.':
+                assert(len(tmp) == 1)
+                self.labels[tmp[0]] = int(pc)
+                if self.verbose: print('ASSIGN LABEL', tmp[0], "-->", pc)
+            else:
+                pc += 1
+        fd.close()
+
+        if self.verbose: print('')
+
+        # second pass: do everything else
+        pc = int(loadaddr)
+        fd = open(infile)
+        for line in fd:
+            cline = line.rstrip()
+            # print 'PASS 2', cline
+
+            # remove everything after the comment marker
+            ctmp = cline.split('#')
+            assert(len(ctmp) == 1 or len(ctmp) == 2)
+            if len(ctmp) == 2:
+                cline = ctmp[0]
+
+            # remove empty lines, and split line by spaces
+            tmp = cline.split()
+            if len(tmp) == 0:
+                continue
+
+            # skip labels: all else must be instructions
+            if cline[0] != '.':
+                tmp              = cline.split(None, 1)
+                opcode           = tmp[0]
+                self.pmemory[pc] = cline.strip()
+
+                # MAIN OPCODE LOOP
+                if opcode == 'mov':
+                    rtmp = tmp[1].split(',', 1)
+                    zassert(len(tmp) == 2 and len(rtmp) == 2, 'mov: needs two args, separated by commas [%s]' % cline)
+                    arg1 = rtmp[0].strip()
+                    arg2 = rtmp[1].strip()
+                    (src, stype) = self.getarg(arg1)
+                    (dst, dtype) = self.getarg(arg2)
+                    # print 'MOV', src, stype, dst, dtype
+                    if stype == 'TYPE_MEMORY' and dtype == 'TYPE_MEMORY':
+                        print('bad mov: two memory arguments')
+                        exit(1)
+                    elif stype == 'TYPE_IMMEDIATE' and dtype == 'TYPE_IMMEDIATE':
+                        print('bad mov: two immediate arguments')
+                        exit(1)
+                    elif stype == 'TYPE_IMMEDIATE' and dtype == 'TYPE_REGISTER':
+                        self.memory[pc]  = 'self.move_i_to_r(%d, %d)' % (int(src), dst)
+                    elif stype == 'TYPE_IMMEDIATE' and dtype == 'TYPE_REGISTER':
+                        self.memory[pc]  = 'self.move_i_to_r(%d, %d)' % (int(src), dst)
+                    elif stype == 'TYPE_MEMORY'    and dtype == 'TYPE_REGISTER':
+                        tmp = src.split(',')
+                        assert(len(tmp) == 3)
+                        self.memory[pc] = 'self.move_m_to_r(%d, %d, %d, %d)' % (int(tmp[0]), int(tmp[1]), int(tmp[2]), dst)
+                    elif stype == 'TYPE_REGISTER'  and dtype == 'TYPE_MEMORY':
+                        tmp = dst.split(',')
+                        assert(len(tmp) == 3)
+                        self.memory[pc] = 'self.move_r_to_m(%d, %d, %d, %d)' % (src, int(tmp[0]), int(tmp[1]), int(tmp[2]))
+                    elif stype == 'TYPE_REGISTER'  and dtype == 'TYPE_REGISTER':
+                        self.memory[pc] = 'self.move_r_to_r(%d, %d)' % (src, dst)
+                    elif stype == 'TYPE_IMMEDIATE'  and dtype == 'TYPE_MEMORY':
+                        tmp = dst.split(',')
+                        assert(len(tmp) == 3)
+                        self.memory[pc] = 'self.move_i_to_m(%d, %d, %d, %d)' % (src, int(tmp[0]), int(tmp[1]), int(tmp[2]))
+                    else:
+                        zassert(False, 'malformed mov instruction')
+                elif opcode == 'pop':
+                    if len(tmp) == 1:
+                        self.memory[pc] = 'self.pop()'
+                    elif len(tmp) == 2:
+                        arg = tmp[1].strip()
+                        (dst, dtype) = self.getarg(arg)
+                        zassert(dtype == 'TYPE_REGISTER', 'Can only pop into a register')
+                        self.memory[pc] = 'self.pop_r(%d)' % dst
+                    else:
+                        zassert(False, 'pop instruction must take zero/one args')
+                elif opcode == 'push':
+                    (src, stype) = self.getarg(tmp[1].strip())
+                    if stype == 'TYPE_REGISTER':
+                        self.memory[pc] = 'self.push_r(%d)' % (int(src))
+                    elif stype == 'TYPE_MEMORY':
+                        tmp = src.split(',')
+                        assert(len(tmp) == 3)
+                        self.memory[pc] = 'self.push_m(%d,%d,%d)' % (int(tmp[0]), int(tmp[1]), int(tmp[2]))
+                    else:
+                        zassert(False, 'Cannot push anything but registers')
+                elif opcode == 'call':
+                    (targ, ttype) = self.getarg(tmp[1].strip())
+                    if ttype == 'TYPE_LABEL':
+                        self.memory[pc] = 'self.call(%d)' % (int(self.labels[targ]))
+                    else:
+                        zassert(False, 'Cannot call anything but a label')
+                elif opcode == 'ret':
+                    assert(len(tmp) == 1)
+                    self.memory[pc] = 'self.ret()'
+                elif opcode == 'add':
+                    rtmp = tmp[1].split(',', 1)
+                    zassert(len(tmp) == 2 and len(rtmp) == 2, 'add: needs two args, separated by commas [%s]' % cline)
+                    arg1 = rtmp[0].strip()
+                    arg2 = rtmp[1].strip()
+                    (src, stype) = self.getarg(arg1)
+                    (dst, dtype) = self.getarg(arg2)
+                    if stype == 'TYPE_IMMEDIATE' and dtype == 'TYPE_REGISTER':
+                        self.memory[pc] = 'self.add_i_to_r(%d, %d)' % (int(src), dst)
+                    elif stype == 'TYPE_REGISTER' and dtype == 'TYPE_REGISTER':
+                        self.memory[pc] = 'self.add_r_to_r(%d, %d)' % (int(src), dst)
+                    else:
+                        zassert(False, 'malformed usage of add instruction')
+                elif opcode == 'sub':
+                    rtmp = tmp[1].split(',', 1)
+                    zassert(len(tmp) == 2 and len(rtmp) == 2, 'sub: needs two args, separated by commas [%s]' % cline)
+                    arg1 = rtmp[0].strip()
+                    arg2 = rtmp[1].strip()
+                    (src, stype) = self.getarg(arg1)
+                    (dst, dtype) = self.getarg(arg2)
+                    if stype == 'TYPE_IMMEDIATE' and dtype == 'TYPE_REGISTER':
+                        self.memory[pc] = 'self.sub_i_to_r(%d, %d)' % (int(src), dst)
+                    elif stype == 'TYPE_REGISTER' and dtype == 'TYPE_REGISTER':
+                        self.memory[pc] = 'self.sub_r_to_r(%d, %d)' % (int(src), dst)
+                    else:
+                        zassert(False, 'malformed usage of sub instruction')
+                elif opcode == 'fetchadd':
+                    rtmp = tmp[1].split(',', 1)
+                    zassert(len(tmp) == 2 and len(rtmp) == 2, 'fetchadd: needs two args, separated by commas [%s]' % cline)
+                    arg1 = rtmp[0].strip()
+                    arg2 = rtmp[1].strip()
+                    (src, stype) = self.getarg(arg1)
+                    (dst, dtype) = self.getarg(arg2)
+                    tmp = dst.split(',')
+                    assert(len(tmp) == 3)
+                    if stype == 'TYPE_REGISTER' and dtype == 'TYPE_MEMORY':
+                        self.memory[pc] = 'self.fetchadd(%d, %d, %d, %d)' % (src, int(tmp[0]), int(tmp[1]), int(tmp[2]))
+                    else:
+                        zassert(False, 'poorly specified fetch and add')
+                elif opcode == 'xchg':
+                    rtmp = tmp[1].split(',', 1)
+                    zassert(len(tmp) == 2 and len(rtmp) == 2, 'xchg: needs two args, separated by commas [%s]' % cline)
+                    arg1 = rtmp[0].strip()
+                    arg2 = rtmp[1].strip()
+                    (src, stype) = self.getarg(arg1)
+                    (dst, dtype) = self.getarg(arg2)
+                    tmp = dst.split(',')
+                    assert(len(tmp) == 3)
+                    if stype == 'TYPE_REGISTER' and dtype == 'TYPE_MEMORY':
+                        self.memory[pc] = 'self.atomic_exchange(%d, %d, %d, %d)' % (src, int(tmp[0]), int(tmp[1]), int(tmp[2]))
+                    else:
+                        zassert(False, 'poorly specified atomic exchange')
+                elif opcode == 'test':
+                    rtmp = tmp[1].split(',', 1)
+                    zassert(len(tmp) == 2 and len(rtmp) == 2, 'test: needs two args, separated by commas [%s]' % cline)
+                    arg1 = rtmp[0].strip()
+                    arg2 = rtmp[1].strip()
+                    (src, stype) = self.getarg(arg1)
+                    (dst, dtype) = self.getarg(arg2)
+                    if stype == 'TYPE_IMMEDIATE' and dtype == 'TYPE_REGISTER':
+                        self.memory[pc] = 'self.test_i_r(%d, %d)' % (int(src), dst)
+                    elif stype == 'TYPE_REGISTER' and dtype == 'TYPE_REGISTER':
+                        self.memory[pc] = 'self.test_r_r(%d, %d)' % (int(src), dst)
+                    elif stype == 'TYPE_REGISTER' and dtype == 'TYPE_IMMEDIATE':
+                        self.memory[pc] = 'self.test_r_i(%d, %d)' % (int(src), dst)
+                    else:
+                        zassert(False, 'malformed usage of test instruction')
+                elif opcode == 'j':
+                    (targ, ttype) = self.getarg(tmp[1].strip())
+                    zassert(ttype == 'TYPE_LABEL', 'bad jump target [%s]' % tmp[1].strip())
+                    self.memory[pc] = 'self.jump(%d)' % int(self.labels[targ])
+                elif opcode == 'jne':
+                    (targ, ttype) = self.getarg(tmp[1].strip())
+                    zassert(ttype == 'TYPE_LABEL', 'bad jump target [%s]' % tmp[1].strip())
+                    self.memory[pc] = 'self.jump_notequal(%d)' % int(self.labels[targ])
+                elif opcode == 'je':
+                    (targ, ttype) = self.getarg(tmp[1].strip())
+                    zassert(ttype == 'TYPE_LABEL', 'bad jump target [%s]' % tmp[1].strip())
+                    self.memory[pc] = 'self.jump_equal(%d)' % self.labels[targ]
+                elif opcode == 'jlt':
+                    (targ, ttype) = self.getarg(tmp[1].strip())
+                    zassert(ttype == 'TYPE_LABEL', 'bad jump target [%s]' % tmp[1].strip())
+                    self.memory[pc] = 'self.jump_lessthan(%d)' % int(self.labels[targ])
+                elif opcode == 'jlte':
+                    (targ, ttype) = self.getarg(tmp[1].strip())
+                    zassert(ttype == 'TYPE_LABEL', 'bad jump target [%s]' % tmp[1].strip())
+                    self.memory[pc] = 'self.jump_lessthanorequal(%s)' % self.labels[targ]
+                elif opcode == 'jgt':
+                    (targ, ttype) = self.getarg(tmp[1].strip())
+                    zassert(ttype == 'TYPE_LABEL', 'bad jump target [%s]' % tmp[1].strip())
+                    self.memory[pc] = 'self.jump_greaterthan(%d)' % int(self.labels[targ])
+                elif opcode == 'jgte':
+                    (targ, ttype) = self.getarg(tmp[1].strip())
+                    zassert(ttype == 'TYPE_LABEL', 'bad jump target [%s]' % tmp[1].strip())
+                    self.memory[pc] = 'self.jump_greaterthanorequal(%s)' % self.labels[targ]
+                elif opcode == 'nop':
+                    self.memory[pc] = 'self.nop()'
+                elif opcode == 'halt':
+                    self.memory[pc] = 'self.halt()'
+                elif opcode == 'yield':
+                    self.memory[pc] = 'self.iyield()'
+                elif opcode == 'rdump':
+                    self.memory[pc] = 'self.rdump()'
+                elif opcode == 'mdump':
+                    self.memory[pc] = 'self.mdump(%s)' % tmp[1]
+                else:
+                    print('illegal opcode: ', opcode)
+                    exit(1)
+
+                if self.verbose: print('pc:%d LOADING %20s --> %s' % (pc, self.pmemory[pc], self.memory[pc]))
+                
+                # INCREMENT PC for loader
+                pc += 1
+        # END: loop over file
+        fd.close()
+        if self.verbose: print('')
+        return
+    # END: load
+
+    def print_headers(self, procs):
+        # print some headers
+        if len(self.memtrace) > 0:
+            for m in self.memtrace:
+                if m[0].isdigit():
+                    print('%5d' % int(m), end=' ')
+                else:
+                    zassert(m in self.vars, 'Traced variable %s not declared' % m)
+                    print('%5s' % m, end=' ')
+            print(' ', end=' ')
+        if len(self.regtrace) > 0:
+            for r in self.regtrace:
+                print('%5s' % self.get_regname(r), end=' ')
+            print(' ', end=' ')
+        if cctrace == True:
+            print('>= >  <= <  != ==', end=' ')
+
+        # and per thread
+        for i in range(procs.getnum()):
+            print('       Thread %d        ' % i, end=' ')
+        print('')
+        return
+
+    def print_trace(self, newline):
+        if len(self.memtrace) > 0:
+            for m in self.memtrace:
+                if self.compute:
+                    if m[0].isdigit():
+                        print('%5d' % self.memory[int(m)], end=' ')
+                    else:
+                        zassert(m in self.vars, 'Traced variable %s not declared' % m)
+                        print('%5d' % self.memory[self.vars[m]], end=' ')
+                else:
+                    print('%5s' % '?', end=' ')
+            print(' ', end=' ')
+        if len(self.regtrace) > 0:
+            for r in self.regtrace:
+                if self.compute:
+                    print('%5d' % self.registers[r], end=' ')
+                else:
+                    print('%5s' % '?', end=' ')
+            print(' ', end=' ')
+        if cctrace == True:
+            for c in self.condlist:
+                if self.compute:
+                    if self.conditions[c]:
+                        print('1 ', end=' ')
+                    else:
+                        print('0 ', end=' ')
+                else:
+                    print('? ', end=' ')
+        if (len(self.memtrace) > 0 or len(self.regtrace) > 0 or cctrace == True) and newline == True:
+            print('')
+        return
+
+    def setint(self, intfreq, intrand):
+        if intrand == False:
+            return intfreq
+        return int(random.random() * intfreq) + 1
+
+    def run(self, procs, intfreq, intrand):
+        # hw init: cc's, interrupt frequency, etc.
+        interrupt = self.setint(intfreq, intrand)
+        icount    = 0
+
+        self.print_headers(procs)
+        self.print_trace(True)
+        
+        while True:
+            # need thread ID of current process
+            tid = procs.getcurr().gettid()
+
+            # FETCH
+            prevPC       = self.PC
+            instruction  = self.memory[self.PC]
+            self.PC     += 1
+
+            # DECODE and EXECUTE
+            # key: self.PC may be changed during eval; thus MUST be incremented BEFORE eval
+            rc = eval(instruction)
+
+            # tracing details: ALWAYS AFTER EXECUTION OF INSTRUCTION
+            self.print_trace(False)
+
+            # output: thread-proportional spacing followed by PC and instruction
+            dospace(tid)
+            print(prevPC, self.pmemory[prevPC])
+            icount += 1
+
+            # halt instruction issued
+            if rc == -1:
+                procs.done()
+                if procs.numdone() == procs.getnum():
+                    return icount
+                procs.next()
+                procs.restore()
+
+                self.print_trace(False)
+                for i in range(procs.getnum()):
+                    print('----- Halt;Switch ----- ', end=' ')
+                print('')
+
+            # do interrupt processing
+            interrupt -= 1
+            if interrupt == 0 or rc == -2:
+                interrupt = self.setint(intfreq, intrand)
+                procs.save()
+                procs.next()
+                procs.restore()
+
+                self.print_trace(False)
+                for i in range(procs.getnum()):
+                    print('------ Interrupt ------ ', end=' ')
+                print('')
+        # END: while
+        return
+
+# 
+# END: class cpu
+# 
+
+
+#
+# PROCESS LIST class
+#
+class proclist:
+    def __init__(self):
+        self.plist  = []
+        self.curr   = 0
+        self.active = 0
+
+    def done(self):
+        self.plist[self.curr].setdone()
+        self.active -= 1
+
+    def numdone(self):
+        return len(self.plist) - self.active
+
+    def getnum(self):
+        return len(self.plist)
+
+    def add(self, p):
+        self.active += 1
+        self.plist.append(p)
+
+    def getcurr(self):
+        return self.plist[self.curr]
+
+    def save(self):
+        self.plist[self.curr].save()
+
+    def restore(self):
+        self.plist[self.curr].restore()
+
+    def next(self):
+        for i in range(self.curr+1, len(self.plist)):
+            if self.plist[i].isdone() == False:
+                self.curr = i
+                return
+        for i in range(0, self.curr+1):
+            if self.plist[i].isdone() == False:
+                self.curr = i
+                return
+            
+#
+# PROCESS class
+#
+class process:
+    def __init__(self, cpu, tid, pc, stackbottom, reginit):
+        self.cpu   = cpu  # object reference
+        self.tid   = tid
+        self.pc    = pc
+        self.regs  = {}
+        self.cc    = {}
+        self.done  = False
+        self.stack = stackbottom
+
+        # init regs: all 0 or specially set to something
+        for r in self.cpu.get_regnums():
+            self.regs[r] = 0
+        if reginit != '':
+            # form: ax=1,bx=2 (for some subset of registers)
+            for r in reginit.split(':'):
+                tmp = r.split('=')
+                assert(len(tmp) == 2)
+                self.regs[self.cpu.get_regnum(tmp[0])] = int(tmp[1])
+
+        # init CCs
+        for c in self.cpu.get_condlist():
+            self.cc[c] = False
+
+        # stack
+        self.regs[self.cpu.get_regnum('sp')] = stackbottom
+        # print 'REG', self.cpu.get_regnum('sp'), self.regs[self.cpu.get_regnum('sp')]
+
+        return
+
+    def gettid(self):
+        return self.tid
+
+    def save(self):
+        self.pc = self.cpu.get_pc()
+        for c in self.cpu.get_condlist():
+            self.cc[c] = self.cpu.get_cond(c)
+        for r in self.cpu.get_regnums():
+            self.regs[r] = self.cpu.get_reg(r)
+
+    def restore(self):
+        self.cpu.set_pc(self.pc)
+        for c in self.cpu.get_condlist():
+            self.cpu.set_cond(c, self.cc[c])
+        for r in self.cpu.get_regnums():
+            self.cpu.set_reg(r, self.regs[r])
+
+    def setdone(self):
+        self.done = True
+
+    def isdone(self):
+        return self.done == True
+
+#
+# main program
+#
+parser = OptionParser()
+parser.add_option('-s', '--seed',      default=0,          help='the random seed',                  action='store',      type='int',    dest='seed')
+parser.add_option('-t', '--threads',   default=2,          help='number of threads',                action='store',      type='int',    dest='numthreads')
+parser.add_option('-p', '--program',   default='',         help='source program (in .s)',           action='store',      type='string', dest='progfile')
+parser.add_option('-i', '--interrupt', default=50,         help='interrupt frequency',              action='store',      type='int',    dest='intfreq')
+parser.add_option('-r', '--randints',  default=False,      help='if interrupts are random',         action='store_true',                dest='intrand')
+parser.add_option('-a', '--argv',      default='',
+                  help='comma-separated per-thread args (e.g., ax=1,ax=2 sets thread 0 ax reg to 1 and thread 1 ax reg to 2); specify multiple regs per thread via colon-separated list (e.g., ax=1:bx=2,cx=3 sets thread 0 ax and bx and just cx for thread 1)',
+                  action='store',      type='string', dest='argv')
+parser.add_option('-L', '--loadaddr',  default=1000,       help='address where to load code',       action='store',      type='int',    dest='loadaddr')
+parser.add_option('-m', '--memsize',   default=128,        help='size of address space (KB)',       action='store',      type='int',    dest='memsize')
+parser.add_option('-M', '--memtrace',  default='',         help='comma-separated list of addrs to trace (e.g., 20000,20001)', action='store',
+                  type='string', dest='memtrace')
+parser.add_option('-R', '--regtrace',  default='',         help='comma-separated list of regs to trace (e.g., ax,bx,cx,dx)',  action='store',
+                  type='string', dest='regtrace')
+parser.add_option('-C', '--cctrace',   default=False,      help='should we trace condition codes',  action='store_true', dest='cctrace')
+parser.add_option('-S', '--printstats',default=False,      help='print some extra stats',           action='store_true', dest='printstats')
+parser.add_option('-v', '--verbose',   default=False,      help='print some extra info',            action='store_true', dest='verbose')
+parser.add_option('-c', '--compute',   default=False,      help='compute answers for me',           action='store_true', dest='solve')
+(options, args) = parser.parse_args()
+
+print('ARG seed',                options.seed)
+print('ARG numthreads',          options.numthreads)
+print('ARG program',             options.progfile)
+print('ARG interrupt frequency', options.intfreq)
+print('ARG interrupt randomness',options.intrand)
+print('ARG argv',                options.argv)
+print('ARG load address',        options.loadaddr)
+print('ARG memsize',             options.memsize)
+print('ARG memtrace',            options.memtrace)
+print('ARG regtrace',            options.regtrace)
+print('ARG cctrace',             options.cctrace)
+print('ARG printstats',          options.printstats)
+print('ARG verbose',             options.verbose)
+print('')
+
+seed       = int(options.seed)
+numthreads = int(options.numthreads)
+intfreq    = int(options.intfreq)
+zassert(intfreq > 0, 'Interrupt frequency must be greater than 0')
+intrand    = int(options.intrand)
+progfile   = options.progfile
+zassert(progfile != '', 'Program file must be specified')
+argv       = options.argv.split(',')
+zassert(len(argv) == numthreads or len(argv) == 1, 'argv: must be one per-thread or just one set of values for all threads') 
+
+loadaddr   = options.loadaddr
+memsize    = options.memsize
+random_seed(seed)
+
+memtrace   = []
+if options.memtrace != '':
+    for m in options.memtrace.split(','):
+        memtrace.append(m)
+
+regtrace   = []
+if options.regtrace != '':
+    for r in options.regtrace.split(','):
+        regtrace.append(r)
+
+cctrace    = options.cctrace
+
+printstats = options.printstats
+verbose    = options.verbose
+        
+#
+# MAIN program
+#
+debug = False
+debug = False
+
+cpu = cpu(memsize, memtrace, regtrace, cctrace, options.solve, verbose)
+
+# load a program
+cpu.load(progfile, loadaddr)
+
+# process list
+procs = proclist()
+pid   = 0
+stack = memsize * 1000
+for t in range(numthreads):
+    if len(argv) > 1:
+        arg = argv[pid]
+    else:
+        arg = argv[0]
+    procs.add(process(cpu, pid, loadaddr, stack, arg))
+    stack -= 1000
+    pid += 1
+
+# get first one ready!
+procs.restore()
+
+# run it
+t1 = time.clock()
+ic = cpu.run(procs, intfreq, intrand)
+t2 = time.clock()
+
+if printstats:
+    print('')
+    print('STATS:: Instructions    %d' % ic)
+    print('STATS:: Emulation Rate  %.2f kinst/sec' % (float(ic) / float(t2 - t1) / 1000.0))
+
+# use this for profiling
+# import cProfile
+# cProfile.run('run()')
+
+
+
+