Skip to content

Commit

Permalink
Use an abstract Word type (#27)
Browse files Browse the repository at this point in the history
* Rename XLEN() to XLEN

* Use an abstract Word sort instead of Int

* Add documentation for the semantics

* Correct LUI to be XLEN invariant

* Refactor halting mechanism to avoid contextual functions

* Update to actions/checkout@v4

* Set Version: 0.1.18
---------

Co-authored-by: devops <[email protected]>
  • Loading branch information
Scott-Guest and devops authored Jul 26, 2024
1 parent f6c0156 commit f1719ee
Show file tree
Hide file tree
Showing 14 changed files with 337 additions and 131 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ jobs:
runs-on: [self-hosted, linux, flyweight]
steps:
- name: 'Check out code'
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: 'Run code quality checks'
run: make check
- name: 'Run pyupgrade'
Expand All @@ -52,7 +52,7 @@ jobs:
CONTAINER: riscv-integration-${{ github.sha }}
steps:
- name: 'Check out code'
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
fetch-depth: 0
submodules: recursive
Expand Down
32 changes: 25 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,41 @@
# kriscv

# KRISC-V: Semantics of RISC-V in K
This repository presents an executable formal semantics of the [RISC-V ISA](https://riscv.org/) using the [K framework](https://kframework.org/). It is currently under construction.

Presently, we support the [unprivileged RV32E base ISA](https://github.com/riscv/riscv-isa-manual/releases/tag/20240411) under the following assumptions:
- Memory is little-endian.
- There is a single hart with access to all of physical memory.
- All memory is readable, writeable, and executable.
- Instruction fetch is the only implicit memory access.
- Instruction memory is always coherent with main memory.

## Repository Structure
The following files constitute the KRISC-V semantics:
- [word.md](src/kriscv/kdist/riscv-semantics/word.md) provides a `Word` datatype representing `XLEN`-bit values, along with associated numeric operations.
- [riscv-instructions.md](src/kriscv/kdist/riscv-semantics/riscv-instructions.md) defines the syntax of disassembled instructions.
- [riscv-disassemble.md](src/kriscv/kdist/riscv-semantics/riscv-disassemble.md) implements the disassembler.
- [riscv.md](src/kriscv/kdist/riscv-semantics/riscv.md) is the main KRISC-V semantics, defining the configuration and transition rules to fetch and execute instructions.

## Installation

Prerequsites: `python >= 3.10`, `pip >= 20.0.2`, `poetry >= 1.3.2`.

```bash
make build
pip install dist/*.whl
poetry install
make kdist-build
```

## Usage
Execute a compiled RISC-V ELF file with the following command:
```bash
poetry -C kriscv run test.elf
```
The output shows the final K configuration, including the state of memory, all registers, and any encountered errors. Execution can also be halted at a particular global symbol by providing the `--end-symbol` flag.

## For Developers

Use `make` to run common tasks (see the [Makefile](Makefile) for a complete list of available targets).

* `make build`: Build wheel
* `make check`: Check code style
* `make format`: Format code
* `make test-unit`: Run unit tests
* `make test-integration`: Run integration tests

For interactive use, spawn a shell with `poetry shell` (after `poetry install`), then run an interpreter.
2 changes: 1 addition & 1 deletion package/version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.1.17
0.1.18
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"

[tool.poetry]
name = "kriscv"
version = "0.1.17"
version = "0.1.18"
description = "K tooling for the RISC-V architecture"
authors = [
"Runtime Verification, Inc. <[email protected]>",
Expand Down
6 changes: 4 additions & 2 deletions src/kriscv/elf_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
from pyk.prelude.collections import map_of
from pyk.prelude.kint import intToken

from kriscv.term_builder import word

if TYPE_CHECKING:
from elftools.elf.elffile import ELFFile # type: ignore
from pyk.kast.inner import KInner
Expand All @@ -26,12 +28,12 @@ def memory_map(elf: ELFFile) -> KInner:
for addr_range, data in _memory_segments(elf).items():
start, end = addr_range
for addr in range(start, end):
mem_map[intToken(addr)] = intToken(data[addr - start])
mem_map[word(addr)] = intToken(data[addr - start])
return map_of(mem_map)


def entry_point(elf: ELFFile) -> KInner:
return intToken(elf.header['e_entry'])
return word(elf.header['e_entry'])


def read_symbol(elf: ELFFile, symbol: str) -> list[int]:
Expand Down
106 changes: 62 additions & 44 deletions src/kriscv/kdist/riscv-semantics/riscv-disassemble.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,37 @@
# Disassembler
In this file, we implement the instruction disassembler, converting raw instruction bits to the syntax defined in [riscv-intructions.md](./riscv-instructions.md).
```k
requires "riscv-instructions.md"
requires "word.md"
module RISCV-DISASSEMBLE
imports RISCV-INSTRUCTIONS
imports INT
imports STRING
imports WORD
```
The input is given as an `Int` with the instruction stored in the 32 least-significant bits.
```k
syntax Instruction ::= disassemble(Int) [symbol(disassemble), function, total, memo]
rule disassemble(I:Int) => disassemble(decode(I))
```
Disassembly is then done in two phases:
- Separate out the component fields of the instruction based on its format (R, I, S, B, U, or J), returning an `InstructionFormat` value.
- Inspect the fields of the resulting `InstructionFormat` to produce the disassembled instruction.

The various `InstructionFormat`s are defined below.
```k
syntax InstructionFormat ::=
RType(opcode: RTypeOpCode, funct3: Int, funct7: Int, rd: Register, rs1: Register, rs2: Register)
| IType(opcode: ITypeOpCode, funct3: Int, rd: Register, rs1: Register, imm: Int)
| SType(opcode: STypeOpCode, funct3: Int, rs1: Register, rs2: Register, imm: Int)
| BType(opcode: BTypeOpCode, funct3: Int, rs1: Register, rs2: Register, imm: Int)
| UType(opcode: UTypeOpCode, rd: Register, imm: Int)
| JType(opcode: JTypeOpCode, rd: Register, imm: Int)
| UnrecognizedInstructionFormat(Int)
```
We determine the correct format by decoding the op code from the 7 least-signficant bits,
```k
syntax OpCode ::=
RTypeOpCode
| ITypeOpCode
Expand Down Expand Up @@ -48,21 +74,15 @@ module RISCV-DISASSEMBLE
rule decodeOpCode(15 ) => MISC-MEM
rule decodeOpCode(115) => SYSTEM
rule decodeOpCode(_ ) => UNRECOGNIZED [owise]
syntax EncodingType ::=
RType(opcode: RTypeOpCode, funct3: Int, funct7: Int, rd: Register, rs1: Register, rs2: Register)
| IType(opcode: ITypeOpCode, funct3: Int, rd: Register, rs1: Register, imm: Int)
| SType(opcode: STypeOpCode, funct3: Int, rs1: Register, rs2: Register, imm: Int)
| BType(opcode: BTypeOpCode, funct3: Int, rs1: Register, rs2: Register, imm: Int)
| UType(opcode: UTypeOpCode, rd: Register, imm: Int)
| JType(opcode: JTypeOpCode, rd: Register, imm: Int)
| UnrecognizedEncodingType(Int)
syntax EncodingType ::=
decode(Int) [function, total]
| decodeWithOp(OpCode, Int) [function]
```
matching on the type of the resulting opcode,
```k
syntax InstructionFormat ::= decode(Int) [function, total]
rule decode(I) => decodeWithOp(decodeOpCode(I &Int 127), I >>Int 7)
```
then finally bit-fiddling to mask out the appropriate bits for each field.
```k
syntax InstructionFormat ::= decodeWithOp(OpCode, Int) [function]
rule decodeWithOp(OPCODE:RTypeOpCode, I) =>
RType(OPCODE, (I >>Int 5) &Int 7, (I >>Int 18) &Int 127, I &Int 31, (I >>Int 8) &Int 31, (I >>Int 13) &Int 31)
Expand All @@ -77,12 +97,11 @@ module RISCV-DISASSEMBLE
rule decodeWithOp(OPCODE:JTypeOpCode, I) =>
JType(OPCODE, I &Int 31, (((I >>Int 24) &Int 1) <<Int 19) |Int (((I >>Int 5) &Int 255) <<Int 11) |Int (((I >>Int 13) &Int 1) <<Int 10) |Int ((I >>Int 14) &Int 1023))
rule decodeWithOp(_:UnrecognizedOpCode, I) =>
UnrecognizedEncodingType(I)
syntax Instruction ::= disassemble(Int) [symbol(disassemble), function, total, memo]
| disassemble(EncodingType) [function, total]
rule disassemble(I:Int) => disassemble(decode(I))
UnrecognizedInstructionFormat(I)
```
Finally, we can disassemble the instruction by inspecting the fields for each format. Note that, where appropriate, we infinitely sign extend immediates to represent them as K `Int`s.
```k
syntax Instruction ::= disassemble(InstructionFormat) [function, total]
rule disassemble(RType(OP, 0, 0, RD, RS1, RS2)) => ADD RD , RS1 , RS2
rule disassemble(RType(OP, 0, 32, RD, RS1, RS2)) => SUB RD , RS1 , RS2
Expand All @@ -95,47 +114,46 @@ module RISCV-DISASSEMBLE
rule disassemble(RType(OP, 6, 0, RD, RS1, RS2)) => OR RD , RS1 , RS2
rule disassemble(RType(OP, 7, 0, RD, RS1, RS2)) => AND RD , RS1 , RS2
rule disassemble(IType(OP-IMM, 0, RD, RS1, IMM)) => ADDI RD , RS1 , chopAndExtend(IMM, 12)
rule disassemble(IType(OP-IMM, 0, RD, RS1, IMM)) => ADDI RD , RS1 , infSignExtend(IMM, 12)
rule disassemble(IType(OP-IMM, 1, RD, RS1, IMM)) => SLLI RD , RS1 , IMM &Int 31 requires (IMM >>Int 5) &Int 127 ==Int 0
rule disassemble(IType(OP-IMM, 2, RD, RS1, IMM)) => SLTI RD , RS1 , chopAndExtend(IMM, 12)
rule disassemble(IType(OP-IMM, 3, RD, RS1, IMM)) => SLTIU RD , RS1 , chopAndExtend(IMM, 12)
rule disassemble(IType(OP-IMM, 4, RD, RS1, IMM)) => XORI RD , RS1 , chopAndExtend(IMM, 12)
rule disassemble(IType(OP-IMM, 2, RD, RS1, IMM)) => SLTI RD , RS1 , infSignExtend(IMM, 12)
rule disassemble(IType(OP-IMM, 3, RD, RS1, IMM)) => SLTIU RD , RS1 , infSignExtend(IMM, 12)
rule disassemble(IType(OP-IMM, 4, RD, RS1, IMM)) => XORI RD , RS1 , infSignExtend(IMM, 12)
rule disassemble(IType(OP-IMM, 5, RD, RS1, IMM)) => SRLI RD , RS1 , IMM &Int 31 requires (IMM >>Int 5) &Int 127 ==Int 0
rule disassemble(IType(OP-IMM, 5, RD, RS1, IMM)) => SRAI RD , RS1 , IMM &Int 31 requires (IMM >>Int 5) &Int 127 ==Int 32
rule disassemble(IType(OP-IMM, 6, RD, RS1, IMM)) => ORI RD , RS1 , chopAndExtend(IMM, 12)
rule disassemble(IType(OP-IMM, 7, RD, RS1, IMM)) => ANDI RD , RS1 , chopAndExtend(IMM, 12)
rule disassemble(IType(OP-IMM, 6, RD, RS1, IMM)) => ORI RD , RS1 , infSignExtend(IMM, 12)
rule disassemble(IType(OP-IMM, 7, RD, RS1, IMM)) => ANDI RD , RS1 , infSignExtend(IMM, 12)
rule disassemble(IType(JALR, 0, RD, RS1, IMM)) => JALR RD , chopAndExtend(IMM, 12) ( RS1 )
rule disassemble(IType(JALR, 0, RD, RS1, IMM)) => JALR RD , infSignExtend(IMM, 12) ( RS1 )
rule disassemble(IType(LOAD, 0, RD, RS1, IMM)) => LB RD , chopAndExtend(IMM, 12) ( RS1 )
rule disassemble(IType(LOAD, 1, RD, RS1, IMM)) => LH RD , chopAndExtend(IMM, 12) ( RS1 )
rule disassemble(IType(LOAD, 2, RD, RS1, IMM)) => LW RD , chopAndExtend(IMM, 12) ( RS1 )
rule disassemble(IType(LOAD, 4, RD, RS1, IMM)) => LBU RD , chopAndExtend(IMM, 12) ( RS1 )
rule disassemble(IType(LOAD, 5, RD, RS1, IMM)) => LHU RD , chopAndExtend(IMM, 12) ( RS1 )
rule disassemble(IType(LOAD, 0, RD, RS1, IMM)) => LB RD , infSignExtend(IMM, 12) ( RS1 )
rule disassemble(IType(LOAD, 1, RD, RS1, IMM)) => LH RD , infSignExtend(IMM, 12) ( RS1 )
rule disassemble(IType(LOAD, 2, RD, RS1, IMM)) => LW RD , infSignExtend(IMM, 12) ( RS1 )
rule disassemble(IType(LOAD, 4, RD, RS1, IMM)) => LBU RD , infSignExtend(IMM, 12) ( RS1 )
rule disassemble(IType(LOAD, 5, RD, RS1, IMM)) => LHU RD , infSignExtend(IMM, 12) ( RS1 )
rule disassemble(IType(MISC-MEM, 0, 0, 0, 2099)) => FENCE.TSO
rule disassemble(IType(MISC-MEM, 0, 0, 0, IMM)) => FENCE (IMM >>Int 4) &Int 15 , IMM &Int 15 requires (IMM >>Int 8) &Int 15 ==Int 0
rule disassemble(IType(SYSTEM, 0, 0, 0, 0)) => ECALL
rule disassemble(IType(SYSTEM, 0, 0, 0, 1)) => EBREAK
rule disassemble(SType(STORE, 0, RS1, RS2, IMM)) => SB RS2 , chopAndExtend(IMM, 12) ( RS1 )
rule disassemble(SType(STORE, 1, RS1, RS2, IMM)) => SH RS2 , chopAndExtend(IMM, 12) ( RS1 )
rule disassemble(SType(STORE, 2, RS1, RS2, IMM)) => SW RS2 , chopAndExtend(IMM, 12) ( RS1 )
rule disassemble(SType(STORE, 0, RS1, RS2, IMM)) => SB RS2 , infSignExtend(IMM, 12) ( RS1 )
rule disassemble(SType(STORE, 1, RS1, RS2, IMM)) => SH RS2 , infSignExtend(IMM, 12) ( RS1 )
rule disassemble(SType(STORE, 2, RS1, RS2, IMM)) => SW RS2 , infSignExtend(IMM, 12) ( RS1 )
rule disassemble(BType(BRANCH, 0, RS1, RS2, IMM)) => BEQ RS1 , RS2 , chopAndExtend(IMM, 12) *Int 2
rule disassemble(BType(BRANCH, 1, RS1, RS2, IMM)) => BNE RS1 , RS2 , chopAndExtend(IMM, 12) *Int 2
rule disassemble(BType(BRANCH, 4, RS1, RS2, IMM)) => BLT RS1 , RS2 , chopAndExtend(IMM, 12) *Int 2
rule disassemble(BType(BRANCH, 5, RS1, RS2, IMM)) => BGE RS1 , RS2 , chopAndExtend(IMM, 12) *Int 2
rule disassemble(BType(BRANCH, 6, RS1, RS2, IMM)) => BLTU RS1 , RS2 , chopAndExtend(IMM, 12) *Int 2
rule disassemble(BType(BRANCH, 7, RS1, RS2, IMM)) => BGEU RS1 , RS2 , chopAndExtend(IMM, 12) *Int 2
rule disassemble(BType(BRANCH, 0, RS1, RS2, IMM)) => BEQ RS1 , RS2 , infSignExtend(IMM, 12) *Int 2
rule disassemble(BType(BRANCH, 1, RS1, RS2, IMM)) => BNE RS1 , RS2 , infSignExtend(IMM, 12) *Int 2
rule disassemble(BType(BRANCH, 4, RS1, RS2, IMM)) => BLT RS1 , RS2 , infSignExtend(IMM, 12) *Int 2
rule disassemble(BType(BRANCH, 5, RS1, RS2, IMM)) => BGE RS1 , RS2 , infSignExtend(IMM, 12) *Int 2
rule disassemble(BType(BRANCH, 6, RS1, RS2, IMM)) => BLTU RS1 , RS2 , infSignExtend(IMM, 12) *Int 2
rule disassemble(BType(BRANCH, 7, RS1, RS2, IMM)) => BGEU RS1 , RS2 , infSignExtend(IMM, 12) *Int 2
rule disassemble(UType(LUI, RD, IMM)) => LUI RD , IMM
rule disassemble(UType(AUIPC, RD, IMM)) => AUIPC RD , IMM
rule disassemble(JType(JAL, RD, IMM)) => JAL RD , chopAndExtend(IMM, 20) *Int 2
rule disassemble(JType(JAL, RD, IMM)) => JAL RD , infSignExtend(IMM, 20) *Int 2
rule disassemble(_:EncodingType) => INVALID_INSTR [owise]
rule disassemble(_:InstructionFormat) => INVALID_INSTR [owise]
endmodule
```
16 changes: 11 additions & 5 deletions src/kriscv/kdist/riscv-semantics/riscv-instructions.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,14 @@
# Instruction Syntax
In this file, we define the basic syntax for disassembled instructions.

We closely mirror the ASM syntax as output by the RISC-V GNU Toolchain's `objdump`. In particular,
- I-Type and S-Type instructions with 12-bit signed immediates (e.g., `addi` but not `slli`) take an immediate argument in the range `[-2048, 2047]`.
- Shift instructions take a shift amount argument in the range `[0, XLEN - 1]`.
- U-Type instruction take an immediate argument in the range `[0x0, 0xfffff]`, i.e., not representing the zeroed-out 12 least-significant bits.
- B-Type instructions take an immediate argument as an even integer in the range `[-4096, 4094]`, i.e., explicitly representing the zeroed-out least-significant bit.
- J-Type instructions take an immediate argument as an even integer in the range `[-1048576, 1048574]`, i.e., explicitly representing the zeroed-out least-significant bit.

A register `xi` is simply represented by the `Int` value `i`.
```k
module RISCV-INSTRUCTIONS
imports INT
Expand Down Expand Up @@ -71,10 +82,5 @@ module RISCV-INSTRUCTIONS
| "EBREAK" [symbol(EBREAK)]
syntax InvalidInstr ::= "INVALID_INSTR" [symbol(INVALID_INSTR)]
// Truncate to length bits, then sign extend
syntax Int ::= chopAndExtend(bits: Int, length: Int) [function, total]
rule chopAndExtend(B, L) => (-1 <<Int L) |Int B requires (B >>Int (L -Int 1)) &Int 1 ==Int 1
rule chopAndExtend(B, L) => B &Int (2 ^Int L -Int 1) [owise]
endmodule
```
Loading

0 comments on commit f1719ee

Please sign in to comment.