We will build: a minimal, baremetal program that can print "Hello world" via Rpi3's UART.
Students will experience:
-
The C project structure
-
The use of cross-compilation toolchain
-
arm64 assembly (lightly)
-
Basic knowledge on Rpi3 and its UART hardware
Source code location: p1-kernel/src/lesson01
Create a Makefile project. Add minimum code to boot the platform. Initialize the UART hardware. Send characters to the UART registers.
-
Strictly speaking, this baremetal program is not a "kernel". We nevertheless call it so for ease of explanation.
-
"Raspberry Pi" means the actual Rpi3 hardware. "QEMU" means the Rpi3 platform as emulated by QEMU. We will explain details where the real hardware behaves differently from QEMU.
Makefile
: We will use the GNU Makefile to build the kernel.src
: This folder contains all of the source code.include
: All of the header files are placed here.
Note: Of all the subsequent experiments in p1, the source code has the same structure.
If you are not familiar with Makefiles, read this article.
The complete Makefile:
ARMGNU ?= aarch64-linux-gnu
COPS = -Wall -nostdlib -nostartfiles -ffreestanding -Iinclude -mgeneral-regs-only -O0 -g
ASMOPS = -Iinclude -g
BUILD_DIR = build
SRC_DIR = src
all : kernel8.img
clean :
rm -rf $(BUILD_DIR) *.img
$(BUILD_DIR)/%_c.o: $(SRC_DIR)/%.c
mkdir -p $(@D)
$(ARMGNU)-gcc $(COPS) -MMD -c $< -o $@
$(BUILD_DIR)/%_s.o: $(SRC_DIR)/%.S
$(ARMGNU)-gcc $(ASMOPS) -MMD -c $< -o $@
C_FILES = $(wildcard $(SRC_DIR)/*.c)
ASM_FILES = $(wildcard $(SRC_DIR)/*.S)
OBJ_FILES = $(C_FILES:$(SRC_DIR)/%.c=$(BUILD_DIR)/%_c.o)
OBJ_FILES += $(ASM_FILES:$(SRC_DIR)/%.S=$(BUILD_DIR)/%_s.o)
DEP_FILES = $(OBJ_FILES:%.o=%.d)
-include $(DEP_FILES)
kernel8.img: $(SRC_DIR)/linker.ld $(OBJ_FILES)
$(ARMGNU)-ld -T $(SRC_DIR)/linker.ld -o $(BUILD_DIR)/kernel8.elf $(OBJ_FILES)
$(ARMGNU)-objcopy $(BUILD_DIR)/kernel8.elf -O binary kernel8.img
Let's inspect this file in detail:
ARMGNU ?= aarch64-linux-gnu
The Makefile starts with a variable definition. ARMGNU
is a cross-compiler prefix. We need to use a cross-compiler because we are compiling the source code for the arm64
architecture on an x86
machine. So instead of gcc
, we will use aarch64-linux-gnu-gcc
.
COPS = -Wall -nostdlib -nostartfiles -ffreestanding -Iinclude -mgeneral-regs-only
ASMOPS = -Iinclude
COPS
and ASMOPS
are options that we pass to the compiler when compiling C and assembler code, respectively. These options require a short explanation:
- -Wall Show all warnings. A good practice.
- -nostdlib Don't use the C standard library. Most of the calls in the C standard library eventually interact with the operating system. We are writing a bare-metal program, and we don't have any underlying operating system, so the C standard library is not going to work for us anyway.
- -nostartfiles Don't use standard startup files. Startup files are responsible for setting an initial stack pointer, initializing static data, and jumping to the main entry point. We are going to do all of this by ourselves.
- -ffreestanding A freestanding environment is an environment in which the standard library may not exist, and program startup may not necessarily be at main. The option
-ffreestanding
directs the compiler to not assume that standard functions have their usual definition. - -Iinclude Search for header files in the
include
folder. - -mgeneral-regs-only. Use only general-purpose registers. ARM processors also have NEON registers. We don't want the compiler to use them because they add additional complexity (since, for example, we will need to store the registers during a context switch).
- -g Include debugging info in the resultant ELF binary.
- -O0 Turn off any compiler optimization. For ease of debugging.
BUILD_DIR = build
SRC_DIR = src
SRC_DIR
and BUILD_DIR
are directories that contain source code and compiled object files, respectively.
all : kernel8.img
clean :
rm -rf $(BUILD_DIR) *.img
The first two targets are pretty simple: the all
target is the default one, and it is executed whenever you type make
without any arguments (make
always uses the first target as the default). This target just redirects all work to a different target, kernel8.img
.
The name "kernel8.img" is mandated by the Rpi3 firmware. The trailing
8
denotes ARMv8 which is a 64-bit architecture. This filename tells the firmware to boot the processor into 64-bit mode.
The clean
target is responsible for deleting all compilation artifacts and the compiled kernel image.
$(BUILD_DIR)/%_c.o: $(SRC_DIR)/%.c
mkdir -p $(@D)
$(ARMGNU)-gcc $(COPS) -MMD -c $< -o $@
$(BUILD_DIR)/%_s.o: $(SRC_DIR)/%.S
$(ARMGNU)-gcc $(ASMOPS) -MMD -c $< -o $@
The next two targets are responsible for compiling C and assembler files. If, for example, in the src
directory we have foo.c
and foo.S
files, they will be compiled into build/foo_c.o
and build/foo_s.o
, respectively. $<
and $@
are substituted at runtime with the input and output filenames (foo.c
and foo_c.o
). Before compiling C files, we also create a build
directory in case it doesn't exist yet.
C_FILES = $(wildcard $(SRC_DIR)/*.c)
ASM_FILES = $(wildcard $(SRC_DIR)/*.S)
OBJ_FILES = $(C_FILES:$(SRC_DIR)/%.c=$(BUILD_DIR)/%_c.o)
OBJ_FILES += $(ASM_FILES:$(SRC_DIR)/%.S=$(BUILD_DIR)/%_s.o)
Here we are building an array of all object files (OBJ_FILES
) created from the concatenation of both C and assembler source files.
DEP_FILES = $(OBJ_FILES:%.o=%.d)
-include $(DEP_FILES)
The next two lines are a little bit tricky. If you take a look at how we defined our compilation targets for both C and assembler source files, you will notice that we used the -MMD
parameter. This parameter instructs the gcc
compiler to create a dependency file for each generated object file. A dependency file defines all of the dependencies for a particular source file. These dependencies usually contain a list of all included headers. We need to include all of the generated dependency files so that make knows what exactly to recompile in case a header changes.
$(ARMGNU)-ld -T $(SRC_DIR)/linker.ld -o kernel8.elf $(OBJ_FILES)
We use the OBJ_FILES
array to build the kernel8.elf
file. We use the linker script src/linker.ld
to define the basic layout of the resulting executable image (we will discuss the linker script in the next section).
$(ARMGNU)-objcopy kernel8.elf -O binary kernel8.img
kernel8.elf & kernel8.img
- build/kernel8.elf ("kernel binary"): Our build outcome as an ELF file. It contains all code, data, and debugging info. Often, to execute an ELF program in user space, there should be a loader to parse ELF, load code & data to designated memory locations, etc. For our kernel experiment, we do NOT have such a loader for the kernel itself.
- kernel8.img ("kernel image"): The raw instructions & data as extracted from kernel8.elf. The raw image is to be loaded to memory. Since it's a memory dump (see below), the load is as simple as byte-by-byte copy.
The kernel image is produced by
objcopy
. Its manual says:"
objcopy
can be used to generate a raw binary file by using an output target of ‘binary’ (e.g., use -O binary). Whenobjcopy
generates a raw binary file, it will essentially produce a memory dump of the contents of the input object file. All symbols and relocation information will be discarded. The memory dump will start at the load address of the lowest section copied into the output file."Q: can you use
readelf
to examine kernel8.elf, and explain your observation?
A linker script describes how the sections in the input object files (_c.o
and _s.o
) should be mapped into the output file (.elf
); it also controls the addresses of all program symbols (e.g. functions and variables). More information can be found here. Now let's take a look at the linker script:
SECTIONS
{
.text.boot : { *(.text.boot) }
.text : { *(.text) }
.rodata : { *(.rodata) }
.data : { *(.data) }
. = ALIGN(0x8);
bss_begin = .;
.bss : { *(.bss*) }
bss_end = .;
}
After startup, the Rpi3 GPU loads kernel8.img
into memory 0x0 and starts execution from the beginning of the file. That's why the .text.boot
section must come first; we are going to put the kernel startup code inside this section. QEMU behaves differently: it loads the kernel image at 0x80000.
Q: How to tweak the linker script to update the start address?
The .text
, .rodata
, and .data
sections contain kernel instructions, read-only data, and global data with init values. The .bss
section contains data that should be initialized to 0. By putting such data in a separate section, the compiler can save some space in the ELF binary––only the section size is stored in the ELF header, but the section content is omitted.
After booting up, our kernel initializes the .bss
section to 0; that's why we need to record the start and end of the section (hence the bss_begin
and bss_end
symbols) and align the section so that it starts at an address that is a multiple of 8. This eases kernel programming because the str
instruction can be used only with 8-byte-aligned addresses.
boot.S contains the kernel startup code (VSCode: Ctrl-p then type boot.S):
#include "mm.h"
.section ".text.boot"
.globl _start
_start:
mrs x0, mpidr_el1
and x0, x0,#0xFF // Check processor id
cbz x0, master // Hang for all non-primary CPU
b proc_hang
proc_hang:
b proc_hang
master:
adr x0, bss_begin
adr x1, bss_end
sub x1, x1, x0
bl memzero
mov sp, #LOW_MEMORY
bl kernel_main
Let's review this file in detail:
.section ".text.boot"
First, we specify that everything defined in boot.S
should go in the .text.boot
section. Previously, we saw that this section is placed at the beginning of the kernel image by the linker script. So when the kernel is started, execution begins at the start
function:
.globl _start
_start:
mrs x0, mpidr_el1
and x0, x0,#0xFF // Check processor id
cbz x0, master // Hang for all non-primary CPU
b proc_hang
Rpi3 has 4 cores, and after the device is powered on, each core begins to execute the same code. Our kernel only works with the first one and put all of the other cores in an endless loop. This is exactly what the _start
function is responsible for. It gets the processor ID from the mpidr_el1 system register.
Q: It may make more sense to put core 1-3 in deep sleep using
wfi
. How?
If the current processor ID is 0, then execution branches to the master
function:
master:
adr x0, bss_begin
adr x1, bss_end
sub x1, x1, x0
bl memzero
Here, we clean the .bss
section by calling memzero
. We will define this function later. In ARMv8 architecture, by convention, the first seven arguments are passed to the called function via registers x0–x6 (cf: our cheat sheet). The memzero
function accepts only two arguments: the start address (bss_begin
) and the size of the section needed to be cleaned (bss_end - bss_begin
).
mov sp, #LOW_MEMORY
bl kernel_main
After cleaning the .bss
section, the kernel initializes the stack pointer and passes execution to the kernel_main
function. The Rpi3 loads the kernel at address 0 (QEMU loads at 0x80000); that's why the initial stack pointer can be set to any location high enough so that stack will not override the kernel image when it grows sufficiently large. LOW_MEMORY
is defined in mm.h and is equal to 4MB. As our kernel's stack won't grow very large and the image itself is tiny, 4MB is more than enough for us.
Aside: Some ARM64 instructions used
For those of you who are not familiar with ARM assembler syntax, let me quickly summarize the instructions that we have used:
- mrs Load value from a system register to one of the general purpose registers (x0–x30)
- and Perform the logical AND operation. We use this command to strip the last byte from the value we obtain from the
mpidr_el1
register. - cbz Compare the result of the previously executed operation to 0 and jump (or
branch
in ARM terminology) to the provided label if the comparison yields true. - b Perform an unconditional branch to some label.
- adr Load a label's relative address into the target register. In this case, we want pointers to the start and end of the
.bss
region. - sub Subtract values from two registers.
- bl "Branch with a link": perform an unconditional branch and store the return address in x30 (the link register). When the subroutine is finished, use the
ret
instruction to jump back to the return address. - mov Move a value between registers or from a constant to a register.
Our cheat sheet summarizes common ARM64 instructions.
For official documentation, here is the ARMv8-A developer's guide. It's a good resource if the ARM ISA is unfamiliar to you. This page specifically outlines the register usage convention in the ABI.
We have seen that the boot code eventually passes control to the kernel_main
function. (VSCode: Ctrl-t then type "kernel_main")
#include "mini_uart.h"
void kernel_main(void)
{
uart_init();
uart_send_string("Hello, world!\r\n");
while (1) {
uart_send(uart_recv());
}
}
This function is one of the simplest in the kernel. It works with the Mini UART
device to print to screen and read user input. The kernel just prints Hello, world!
and then enters an infinite loop that reads characters from the user and sends them back to the screen.
The Rpi3 board is based on the BCM2837 SoC by Broadcom. The SoC manual is here. The SoC is not friendly for OS hackers: Broadcom poorly documents it and the hardware has many quirks.
Despite so, the community figured out most of the SoC details over years because Rpi3's popularity. It's not our goal to dive in the SoC. Rather, our philosophy is to deal BCM2837-specific details as few as possible -- just enough to get our kernel working. We will spend more efforts on explaining generic hardware such as ARM64 cores, generic timers, irq controllers, etc.
Rpi4 seems more friendly to kernel hackers.
On ARM-based SoCs, access to all devices is performed via memory-mapped registers. The Rpi3 SoC reserves physical memory address 0x3F000000
for IO devices. To configure a particular device, software reads/writes device registers. A device register is just a 32-bit region of memory. The meaning of each bit in each IO register is described in the SoC manual.
The term "device" is heavily overloaded in many tech docs. Sometimes it means a board, e.g. "an Rpi3 device"; sometimes it means an IO peripheral, e.g. "UART device". We will be explicit.
UART is a simple character device allowing software to send out text characters to a different machine. If you do not care about performance, UART requires very minimum software code. Therefore, it is often the first few IO devices to bring up when we build system software for a new machine. Only with UART meaning debugging is possible. (JTAG is another option which however requires more complex setup).
In the simplest form, software writes ascii values to UART registers. The UART device converts written values to a sequence of high and low voltages on wire. This sequence is transmitted to your via the TTL-to-serial cable
and is interpreted by your terminal emulator (e.g. PuTTY on Windows).
Rpi3 has the two UART devices. Oddly enough, they are different.
Name | Type | Comments |
---|---|---|
UART0 | PL011 | Secondary, intended as Bluetooth connector |
UART1 | mini UART | Primary, intended as debug console |
UART1/Mini UART: easier to program; limited performance/functionalities. That's fine for our goal. For specification of the Mini UART registers: see page 8 of the SoC manual.
UART0/PL011: richer functions; higher speed. Yet one needs to configure the board clock by talking to the GPU firmware. We won't do that. see Example code if you are interested.
Both UARTs can be mapped to the same physical pins by setting the GPFSEL1 register. See the GPIO function diagram below.
That's enough to know about Rpi UARTs. More: official web page.
Another IO device is GPIO General-purpose input/output. GPIO provides a bunch of registers. Each bit in such a register corresponds to a pin on the Rpi3 board. By writing 1 or 0 to register bits, software can control the output voltage on the pins, e.g. for turning on/off LEDs connected to such pins. Reading is done in a similar fashion. The picture below shows GPIO pin headers populated on Rpi3. (Note: the picture shows Rpi2, which has the same pinout as Rpi3)
An SoC often has limited number of pins. Software can control the use of these pins, e.g. for GPIO or for UART. Software does so by writing to specific memory-mapped registers.
The GPIO can be used to configure the behavior of different GPIO pins. For example, to be able to use the Mini UART, we need to activate pins 14 and 15 and set them up to use this device. The image below illustrates how numbers are assigned to the GPIO pins:
The following init code configures pin 14 & 15 as UART in/out, sets up UART clock and its modes, etc.
Much of the UART init code is irrelevant to QEMU. Since QEMU "emulates" the UARTs, it can dump whatever our kernel writes to the emulated UART registers to stdio. Example:
qemu-system-aarch64 -M raspi3 -kernel ./kernel8.img -serial null -serial stdio
The first -serial means UART0 which we do not touch; the second -serial means we direct UART1 to stdio.
void uart_init ( void )
{
unsigned int selector;
selector = get32(GPFSEL1);
selector &= ~(7<<12); // clean gpio14
selector |= 2<<12; // set alt5 for gpio14
selector &= ~(7<<15); // clean gpio15
selector |= 2<<15; // set alt5 for gpio 15
put32(GPFSEL1,selector);
put32(GPPUD,0);
delay(150);
put32(GPPUDCLK0,(1<<14)|(1<<15));
delay(150);
put32(GPPUDCLK0,0);
put32(AUX_ENABLES,1); //Enable mini uart (this also enables access to it registers)
put32(AUX_MU_CNTL_REG,0); //Disable auto flow control and disable receiver and transmitter (for now)
put32(AUX_MU_IER_REG,0); //Disable receive and transmit interrupts
put32(AUX_MU_LCR_REG,3); //Enable 8 bit mode
put32(AUX_MU_MCR_REG,0); //Set RTS line to be always high
put32(AUX_MU_BAUD_REG,270); //Set baud rate to 115200
put32(AUX_MU_CNTL_REG,3); //Finally, enable transmitter and receiver
}
Here, we use the two functions put32
and get32
. Those functions are very simple -- read and write some data to and from a 32-bit register. You can take a look at how they are implemented in utils.S. uart_init
is one of the most complex and important functions in this lesson, and we will continue to examine it in the next three sections.
First, we need to activate the GPIO pins. Most of the pins can be used with different IO devices. So before using a particular pin, we need to select the pin's alternative function, a number from 0 to 5 that can be set for each pin and configures which IO device is virtually "connected" to the pin.
See the list of all available GPIO alternative functions in the image below (taken from page 102 of the SoC manual)
Here you can see that pins 14 and 15 have the TXD1 and RXD1 alternative functions available. This means that if we select alternative function number 5 for pins 14 and 15, they will be used as a Mini UART Transmit Data pin and Mini UART Receive Data pin, respectively. The GPFSEL1
register is used to control alternative functions for pins 10-19. The meaning of all the bits in those registers is shown in the following table (page 92 of the SoC manual):
So now you know everything you need to understand the following lines of code that are used to configure GPIO pins 14 and 15 to work with the Mini UART device:
unsigned int selector;
selector = get32(GPFSEL1);
selector &= ~(7<<12); // clean gpio14
selector |= 2<<12; // set alt5 for gpio14
selector &= ~(7<<15); // clean gpio15
selector |= 2<<15; // set alt5 for gpio 15
put32(GPFSEL1,selector);
Init: GPIO pull-up/down & how we disable it
When working with GPIO pins, you will often encounter terms such as pull-up/pull-down. These concepts are explained in great detail in this article. For those who are too lazy to read the whole article, I will briefly explain the pull-up/pull-down concept.
If you use a particular pin as input and don't connect anything to this pin, you will not be able to identify whether the value of the pin is 1 or 0. In fact, the device will report random values. The pull-up/pull-down mechanism allows you to overcome this issue. If you set the pin to the pull-up state and nothing is connected to it, it will report 1
all the time (for the pull-down state, the value will always be 0). In our case, we need neither the pull-up nor the pull-down state, because both the 14 and 15 pins are going to be connected all the time.
The pin state is preserved even after a reboot, so before using any pin, we always have to initialize its state. There are three available states: pull-up, pull-down, and neither (to remove the current pull-up or pull-down state), and we need the third one.
Switching between pin states is not a very simple procedure because it requires physically toggling a switch on the electric circuit. This process involves the GPPUD
and GPPUDCLK
registers and is described on page 101 of the SoC manual:
The GPIO Pull-up/down Clock Registers control the actuation of internal pull-downs on
the respective GPIO pins. These registers must be used in conjunction with the GPPUD
register to effect GPIO Pull-up/down changes. The following sequence of events is
required:
1. Write to GPPUD to set the required control signal (i.e. Pull-up or Pull-Down or neither
to remove the current Pull-up/down)
2. Wait 150 cycles – this provides the required set-up time for the control signal
3. Write to GPPUDCLK0/1 to clock the control signal into the GPIO pads you wish to
modify – NOTE only the pads which receive a clock will be modified, all others will
retain their previous state.
4. Wait 150 cycles – this provides the required hold time for the control signal
5. Write to GPPUD to remove the control signal
6. Write to GPPUDCLK0/1 to remove the clock
This procedure describes how we can remove both the pull-up and pull-down states from a pin, which is what we are doing for pins 14 and 15 in the following code:
put32(GPPUD,0);
delay(150);
put32(GPPUDCLK0,(1<<14)|(1<<15));
delay(150);
put32(GPPUDCLK0,0);
Now our Mini UART is connected to the GPIO pins, and the pins are configured. The rest of the uart_init
function is dedicated to Mini UART initialization.
put32(AUX_ENABLES,1); //Enable mini uart (this also enables access to its registers)
put32(AUX_MU_CNTL_REG,0); //Disable auto flow control and disable receiver and transmitter (for now)
put32(AUX_MU_IER_REG,0); //Disable receive and transmit interrupts
put32(AUX_MU_LCR_REG,3); //Enable 8 bit mode
put32(AUX_MU_MCR_REG,0); //Set RTS line to be always high
put32(AUX_MU_BAUD_REG,270); //Set baud rate to 115200
put32(AUX_MU_CNTL_REG,3); //Finally, enable transmitter and receiver
Let's examine this code snippet line by line.
put32(AUX_ENABLES,1); //Enable mini uart (this also enables access to its registers)
This line enables the Mini UART. We must do this in the beginning, because this also enables access to all the other Mini UART registers.
put32(AUX_MU_CNTL_REG,0); //Disable auto flow control and disable receiver and transmitter (for now)
Here we disable the receiver and transmitter before the configuration is finished. We also permanently disable auto-flow control because it requires us to use additional GPIO pins, and the TTL-to-serial cable doesn't support it. For more information about auto-flow control, you can refer to this article.
put32(AUX_MU_IER_REG,0); //Disable receive and transmit interrupts
It is possible to configure the Mini UART to generate a processor interrupt each time new data is available. We want to be as simple as possible. So for now, we will just disable this feature.
put32(AUX_MU_LCR_REG,3); //Enable 8 bit mode
Mini UART can support either 7- or 8-bit operations. This is because an ASCII character is 7 bits for the standard set and 8 bits for the extended. We are going to use 8-bit mode.
put32(AUX_MU_MCR_REG,0); //Set RTS line to be always high
The RTS line is used in the flow control and we don't need it. Set it to be high all the time.
put32(AUX_MU_BAUD_REG,270); //Set baud rate to 115200
The baud rate is the rate at which information is transferred in a communication channel. “115200 baud” means that the serial port is capable of transferring a maximum of 115200 bits per second. The baud rate of your Raspberry Pi mini UART device should be the same as the baud rate in your terminal emulator.
The Mini UART calculates baud rate according to the following equation:
baudrate = system_clock_freq / (8 * ( baudrate_reg + 1 ))
The system_clock_freq
is 250 MHz, so we can easily calculate the value of baudrate_reg
as 270.
put32(AUX_MU_CNTL_REG,3); //Finally, enable transmitter and receiver
After this line is executed, the Mini UART is ready for work!
After the Mini UART is ready, we can try to use it to send and receive some data. To do this, we can use the following two functions:
void uart_send ( char c )
{
while(1) {
if(get32(AUX_MU_LSR_REG)&0x20)
break;
}
put32(AUX_MU_IO_REG,c);
}
char uart_recv ( void )
{
while(1) {
if(get32(AUX_MU_LSR_REG)&0x01)
break;
}
return(get32(AUX_MU_IO_REG)&0xFF);
}
Both of the functions start with an infinite loop, the purpose of which is to verify whether the device is ready to transmit or receive data. We are using the AUX_MU_LSR_REG
register to do this. Bit zero, if set to 1, indicates that the data is ready; this means that we can read from the UART. Bit five, if set to 1, tells us that the transmitter is empty, meaning that we can write to the UART.
Next, we use AUX_MU_IO_REG
to either store the value of the transmitted character or read the value of the received character.
We also have a very simple function that is capable of sending strings instead of characters:
void uart_send_string(char* str)
{
for (int i = 0; str[i] != '\0'; i ++) {
uart_send((char)str[i]);
}
}
This function just iterates over all characters in a string and sends them one by one.
Low efficiency? Apparently Tx/Rx with busy wait burn lots of CPU cycles for no good. It's fine for our baremetal program -- simple & less error-prone. Production software often do interrupt-driven Rx/Tx.
Run make
to build the kernel.
The Raspberry Pi startup sequence is the following (simplified):
- The device is powered on.
- The GPU starts up and reads the
config.txt
file from the boot partition. This file contains some configuration parameters that the GPU uses to further adjust the startup sequence. kernel8.img
is loaded into memory and executed.
Setup
To be able to run our simple OS, the config.txt
file should be the following:
enable_uart=1
arm_64bit=1
kernel_old=1
disable_commandline_tags=1
kernel_old=1
specifies that the kernel image should be loaded at address 0.disable_commandline_tags
instructs the GPU to not pass any command line arguments to the booted image.
Run
-
Copy the generated
kernel8.img
file to theboot
partition of your Raspberry Pi flash card and deletekernel7.img
as well as any otherkernel*.img
files on your SD card. Make sure you left all other files in the boot partition untouched (see 43 and 158 issues for details).Update (2/10/21): students reported that UART1 stops to work with the newest Rpi3 firmware (either extracted from Raspbian OS or from the upstream github repo). The symptom: no "helloworld" is printed, while UART1 seems echoing fine. UART0 seems unaffected. My guess: it has something to do with the firmware's new device tree feature? Action: use older firmware provided by us: https://github.com/fxlin/p1-kernel/releases/tag/exp1-rpi3.
-
Modify the
config.txt
file as described above. -
Connect the USB-to-TTL serial cable as described in the Prerequisites.
-
Power on your Raspberry Pi.
-
Open your terminal emulator. You should be able to see the
Hello, world!
message there.
Aside (optional): prepare the SD card from scratch (w/o Raspbian)
The steps above assume that you have Raspbian installed on your SD card. It is also possible to run the RPi OS using an empty SD card.
-
Prepare your SD card:
- Use an MBR partition table
- Format the boot partition as FAT32
The card should be formatted exactly in the same way as it is required to install Raspbian. Check
HOW TO FORMAT AN SD CARD AS FAT
section in the official documenation for more information. -
Copy the following files to the card:
-
bootcode.bin This is the GPU bootloader, it contains the GPU code to start the GPU and load the GPU firmware.
-
start.elf This is the GPU firmware. It reads
config.txt
and enables the GPU to load and run ARM specific user code fromkernel8.img
Update (2/10/21): UART1 is broken by these upstream firmware for unknown reason. Use older firmware provided by us: https://github.com/fxlin/p1-kernel/releases/tag/exp1-rpi3.
-
-
Copy
kernel8.img
andconfig.txt
files. -
Connect the USB-to-TTL serial cable.
-
Power on your Raspberry Pi.
-
Use your terminal emulator to connect to the RPi OS.
Unfortunately, all Raspberry Pi firmware files are closed-sourced and undocumented. For more information about the Raspberry Pi startup sequence, you can refer to some unofficial sources, like this StackExchange question or this Github repository.
Setup
Follow the instructions in Prerequisites.
Type make -f Makefile.qemu
.
Run
$ qemu-system-aarch64 -M raspi3 -kernel ./kernel8.img -serial null -serial stdio
VNC server running on 127.0.0.1:5900
Hello, world!
<Ctrl-C>