Added gen-view.py and changed config.yaml

cs-pub-ro · Dec 12, 2024 · d4dde54 · d4dde54
1 parent 3dfe9fd
commit d4dde54
Show file tree

Hide file tree

Showing 91 changed files with 6,413 additions and 234 deletions.
diff --git a/.view/guides/addressing-arrays.md b/.view/guides/addressing-arrays.md
@@ -0,0 +1,8 @@
+# Addressing Arrays
+
+To follow this guide, you'll need to use the `addressing_arrays.asm` file located in the `guides/addressing-arrays/support` directory.
+
+The program increments the values of an array of 10 integers by 1 and iterates through the array before and after to show the changes.
+
+> **Note:** `ecx` is used as the loop counter.
+Since the array contains `dwords` (4 bytes), the loop counter is multiplied by 4 to get the address of the next element in the array.
diff --git a/.view/guides/array_vs_pointer.md b/.view/guides/array_vs_pointer.md
@@ -0,0 +1,21 @@
+# Array vs. Pointer
+
+To follow this guide, you'll need to use the `array_vs_pointer.c` file located in the `guides/array_vs_pointer/support` directory.
+
+Compile and run the source from the skeleton.
+
+The program simply declares an array chars and a char pointer, we'll try to understand the difference between the two.
+
+We can observe the fact that even though both of them point to the same sequence of characters, the sizeof operator returns different values: the number of bytes needed for the array (13), while for the pointer, it simply returns its size as a data type (4/8 on most systems).
+
+```bash
+sizeof(v): 13
+sizeof(p): 8
+```
+
+We've previously learned that an array is also technically a pointer to the first element of the array, so why would it be in any way different?
+This behaviour is a consequence that comes from the fact that the value of the pointer which represents the array is **constant** and cannot be changed.
+This means that we can determine the size of the array at compile time since it is not possible to make it point to a different memory location, but for a regular pointer like the one declared in the example, the address which it points to can be changed at runtime, so it will not always points to an array of the same size and we cannot even determine if it will point to an array at all (it could point to a single variable for example).
+
+The second difference appears when attempting to change the value of one of the characters in the sequence, it seems that we can't do it using the pointer, while we can do it using the array.
+This is a consequence of the fact that the pointer points to read-only memory (the string literal, which we'll later learn is stored in a memory area called `.rodata`), while the array points to its own allocated memory, which is writable.
diff --git a/.view/guides/compiler-explorer-tutorial.md b/.view/guides/compiler-explorer-tutorial.md
@@ -0,0 +1,39 @@
+# Online C Compiling
+
+An interesting tool to observe how C code translates into assembly language is Compiler Explorer.
+
+1. Go to [Compiler Explorer](https://gcc.godbolt.org/).
+1. Load the "sum over array" program from the examples (accessible using the load button, shaped like a floppy disk).
+1. Make sure `x86-64 gcc 4.8.2` is selected under `Compiler:`.
+1. Use the option `-m32` (in `Compiler options`) to display code in 32-bit assembly language (as opposed to 64-bit by default).
+1. If you see the message `<Compilation failed>`, add the option `-std=c99`.
+1. Initially, the code might be quite cumbersome.
+To make it more human-readable, add the option `-O2` to the compilation options (`Compiler options`).
+1. You may notice the presence of symbols like `.L3:` and `.L4:`.
+These represent fixed points in the program, labels, quite similar to what is found in C.
+1. Go through the compilers corresponding to the following architectures one by one: ARM, ARM64, AVR, PowerPC.
+`Note`: for ARM, ARM64, and AVR, you will need to remove the previously set -m32 flag.
+You can observe how the generated code differs from one architecture to another.
+1. Also, try the following compilers: `clang` and `icc`.
+As you can see, even though it's the same C code and the same architecture, the generated code differs.
+This happens because each compiler can have a different optimization and code generation strategy.
+
+>**NOTE**:
+>[clang](https://clang.llvm.org/) is an open-source C/C++ compiler.
+>It is often used in IDEs due to its very suggestive compilation error messages.
+>
+>**NOTE**: `icc` is the C/C++ compiler from Intel.
+
+Write the following code sequence in the Code editor area:
+
+```C
+int simple_fn(void)
+{
+    int a = 1;
+    a++;
+    return a;
+}
+```
+
+Observe the assembly code when the compilation options (`Compiler options`) are `-m32`, and when the compilation options are `-m32 -O2`.
+Notice the effect of optimization options on the generated assembly code.
diff --git a/.view/guides/declarations.md b/.view/guides/declarations.md
@@ -0,0 +1,13 @@
+# Declarations
+
+To follow this guide, you'll need to use the `declarations.asm` file located in the `guides/declarations/support` directory.
+
+The program declares multiple variables of different sizes in the `.bss` and `.data` sections.
+
+>**Note**: When defining strings, make sure to add a zero byte at the end, in order to mark the end of the string.
+>
+>```Assembly
+>decimal_point   db ".",0
+>```
+
+For a complete set of the pseudo-instruction check out the `nasm` [documentation](https://nasm.us/doc/nasmdoc3.html).
diff --git a/.view/guides/discovering-assembly.md b/.view/guides/discovering-assembly.md
@@ -0,0 +1,30 @@
+# Discovering Assembly
+
+To follow this guide, you will need to navigate to the `guides/discovering-assembly/support` directory.
+
+1. Open the `ex1.asm` file and read the comments.
+Assemble it by using the `make` utility and run it.
+Using gdb, go through the program line by line (the `start` command followed by `next`) and observe the changes in register values after executing the `mov` and `add` instructions.
+Ignore the sequence of `PRINTF32` instructions.
+
+1. Open the `ex2.asm` file and read the comments.
+Assemble it by using the `make` utility and run it.
+Using gdb, observe the change in the `eip` register when executing the `jmp` instruction.
+To skip the `PRINTF32` instructions, add a breakpoint at the `jump_incoming` label (the `break` command followed by `run`).
+
+1. Open the `ex3.asm` file and read the comments.
+Assemble it by using the `make` utility and run it.
+Using gdb, navigate through the program using breakpoints.
+Follow the program flow.
+Why is `15` displayed first and then `3`?
+Because of the jump at line 9.
+Where does the jump at line 25 point to?
+To the `zone1` label.
+
+1. Open the `ex4.asm` file and read the comments.
+Assemble it by using the `make` utility and run it.
+Using gdb, go through the program.
+Why isn't the jump at line 12 taken?
+Because the `je` instruction jumps if the `ZF` bit in the `FLAGS` register is set.
+This bit is set by the `cmp` instruction, which calculates the difference between the values of the `eax` and `ebx` registers without storing the result.
+However, the `add` instruction at line 11 clears this flag because the result of the operation is different from 0.
diff --git a/.view/guides/floating-point-exception.md b/.view/guides/floating-point-exception.md
@@ -0,0 +1,9 @@
+# Floating Point Exception
+
+To follow this guide, you'll need to use the `floating_point_exception.asm` file located in the `guides/floating-point-exception/support` directory.
+
+The program tries to perform division using an `8 bit` operand, `bl`, in this case the quotient should be in the range [0, 255].
+Given that `ax` is `22891` and `bl` is `2`, the result of the division would be out of the defined range.
+Thus we will see a `Floating point exception` after the division.
+
+>**Note**: For a detailed description of the `div` instruction check out the [documentation](https://www.felixcloutier.com/x86/idiv).
diff --git a/.view/guides/ghidra-tutorial.md b/.view/guides/ghidra-tutorial.md
@@ -0,0 +1,58 @@
+# Ghidra Tutorial: Decompiling
+
+In this tutorial, we aim to show how to analyze the functionality of a simple binary that prompts for the input of a correct password to obtain a secret value.
+
+>**WARNING**: In order to run Ghidra, access a terminal window and use the `ghidra` command.
+
+Initially, when we run Ghidra, a window will appear showing our current projects.
+
+![ghidra-initial.png](media/ghidra-initial.png)
+
+We can create a new project and give it a suitable name.
+To do this, we will use: `File → New Project` (or using the keyboard shortcut CTRL + N).
+
+![ghidra-added-project.png](media/ghidra-added-project.png)
+
+After creating the project, to add the executable file, we can use `File → Import file`, or drag the file into the directory we created.
+Ghidra will suggest the detected format and the compiler used.
+In more special cases, we may need to change these configurations, but for the purpose of this tutorial, Ghidra's suggestions are perfect.
+
+![ghidra-added-file.png](media/ghidra-added-file.png)
+
+The next step is to analyze the imported binary.
+We can double-click on it.
+Ghidra will ask us if we want to analyze it.
+To do this, we will click `Yes` and then `Analyze`.
+
+![ghidra-analyzed.png](media/ghidra-analyzed.png)
+
+After the executable has been analyzed, Ghidra displays an interpretation of the binary information, which includes the disassembled code of the program.
+Next, for example, we can try to decompile a function.
+In the left part of the window, we have the `Symbol Tree` section;
+if we open `Functions`, we can see that Ghidra has detected certain functions, including the `main` function in the case of this binary.
+Therefore, if we double-click on `main`, the decompiled `main` function appears on the right, and in the central window, we see the corresponding assembly code.
+
+![ghidra-main.png](media/ghidra-main.png)
+
+We will notice that the decompiled code is not an exact representation of the source code from the file `crackme.c`, but it gives us a fairly good idea of how it works and looks.
+Looking at the decompiled code, we notice that the `main` function has two long-type parameters named `param_1` and `param_2`, instead of the normal prototype `main(int argc, char *argv[])`.
+The second parameter of `main` is of type "vector of pointers to character data" (which is generically interpreted as "array of strings").
+Below is a generic perspective on how the vector is represented for a 64-bit system.
+In the representation on the second line, `argp` should be understood as `char *argp = (char *)argv` in order for the calculation `argp + N` to make sense.
+
+| argv[0]  |      argv[1]  |  argv[2]  |
+|----------|:-------------:|----------:|
+|   argp   |    argp + 8   | argp + 16 |
+
+The difference in parameter types of the `main` function is due to interpretation: the binary is compiled for the amd64 architecture (which is an extension of the x86 architecture for 64-bit systems), and the size of a
+[processor word](https://en.wikipedia.org/wiki/Word_(computer_architecture))
+is 8 bytes (or 64 bits).
+The size of a processor word is reflected in the size of a pointer and also in the size of a single parameter (if the parameter is smaller than a word, it is automatically extended to the size of a word).
+Additionally, by coincidence, the size of a variable of type `long` is also 64 bits (the sizes of
+[data types](https://en.wikipedia.org/wiki/C_data_types)
+in C are not well-defined, only some lower limits for data types are defined).
+This causes the interpretation of both parameters as `long`, as all parameters, regardless of type (int or pointer), are manipulated identically.
+The calculation `param_2 + 8` is used to calculate the address of the second pointer in the `argv` vector (that is, `argv[1]`).
+For a program compiled for the 32-bit x86 architecture, the address of `argv[1]` would have been `param_2 + 4`.
+
+Using the information from the decompiled code, we can infer that the program expects a password as an argument, and it must be 8 characters long, with the character at position 3 being 'E' (the first character is at position zero).
diff --git a/.view/guides/goto-warm-up.md b/.view/guides/goto-warm-up.md
@@ -0,0 +1,18 @@
+# C: Warm-up GOTOs
+
+1. Modify the source code in the `support/bogosort/bogosort.c` file, by replacing the `break` instruction with a `goto` instruction ([Bogosort](https://en.wikipedia.org/wiki/Bogosort)).
+1. Similarly, replace the `continue` instruction in `support/ignore_the_comments/ignore_the_comments.c` with a `goto` instruction without changing the functionality of the code.
+
+>**WARNING**: When writing code with labels, please adhere to the following indentation recommendations:
+
+- Do not indent labels.
+Keep them aligned with the left margin of the editing screen.
+- Each label should be on its own line.
+There is no code on the same line as the label.
+- Do not take labels into consideration when indenting the code.
+The code should be indented in the same way whether there are labels or not.
+- Leave a blank line before the line containing a label.
+
+>**NOTE**: [Situation](https://stackoverflow.com/questions/3517726/what-is-wrong-with-using-goto/3517765#3517765) where `goto` may be useful.
+
+If you're having difficulties solving this exercise, go through [this](../../reading/README.md#syntax) reading material.
diff --git a/.view/guides/instructions.md b/.view/guides/instructions.md
@@ -0,0 +1,50 @@
+# First look at Assembly instructions
+
+To follow this guide, you will need to use the `instructions.asm` file located in the `guides/instructions/support` directory.
+
+Diving right into the demo, we can see one of the most important instructions that helps us, programmers, work with the stack and that is `push`.
+We discussed what the `push` instruction does in the [reading section](../../reading/README.md#data-movement-instructions).
+Considering its call, we can understand that it takes the `0` value(as a `DWORD`, a number stored on `4` bytes) and moves it onto the "top" of the stack.
+
+That `push` is followed by a new instruction:
+
+```assembly
+popf
+```
+
+> **IMPORTANT**: The `popf` instruction is used for setting, depending on how many bytes we pop from the stack(in our case, 4 bytes), the `EFLAGS` register(setting the entire register when popping 4 bytes and only the 2 lower bytes of the register when popping 2 bytes).
+> You can read more about the `popf` instruction [here](https://www.felixcloutier.com/x86/popf:popfd:popfq) and [here](https://en.wikipedia.org/wiki/FLAGS_register).
+
+![EFLAGS Representation](../../media/eflags-representation.svg)
+
+Having in mind what the `popf` instruction does, try to guess what would adding the following line of code at line 15 and the `mystery_label` label at the line(of the current file, before adding the instruction) 53 would make the program do.
+
+```assembly
+jnc mystery_label
+```
+
+Moving on, we can see that the `0` value is set to the `eax` register using the `mov` instruction.
+Can you give example of another two ways of setting the value in `eax` to `0` without using `mov` ?
+> **HINT**: Think about the [logical operators](../../reading/README.md#arithmetic-and-logic-instructions).
+
+Next, by using the `test` instruction we can set the `flags` based on the output of the `logical and` between `eax` and itself.
+
+After resetting the flags, we store `0xffffffff` in the `ebx` register(which is actually the largest number it can store before setting the carry flag) and then use the `test` instruction yet again.
+Similarly, what do you think adding the following line of code after the `test` instruction would produce ?
+
+```assembly
+jnz mystery_label
+```
+
+We reset the flags once again and now we take a look at working with the smaller portions of the `eax` register.
+Can you guess the output of the following command, put right under the `add al, bl` instruction ?
+What about the flags ?
+Which flag has been set ?
+
+```assembly
+PRINTF32 `%d\n\x0`, eax
+```
+
+Similarly, try to answer the same questions from above, but considering the next portions of the code.
+
+After thoroughly inspecting this example, you should have a vague idea about how setting the flags works.
diff --git a/.view/guides/loop.md b/.view/guides/loop.md
@@ -0,0 +1,10 @@
+# Loop
+
+To follow this guide, you'll need to use the `loop.asm` file located in the `guides/loop/support` directory.
+
+This program illustrates how to use the `loop` instruction, as well as how to index an array of `dwords`.
+
+>**Note**: The `loop` instruction jumps to the given label when the `count` register is not equal to 0.
+In the case of `x86` the `count` register is `ecx`.
+>
+>**Note**: For a detailed description of the `loop` instruction check out the [documentation](https://www.felixcloutier.com/x86/loop:loopcc).
diff --git a/.view/guides/max.md b/.view/guides/max.md
@@ -0,0 +1,9 @@
+# Max
+
+To follow this guide, you'll need to use the `max.asm` file located in the `guides/max/support` directory.
+
+The program finds the maximum value in an array of 16-bit integers (`array`).
+It iterates through the array, updating the maximum value (`dx`) when it finds a larger value.
+Finally, it prints the maximum value using the `printf()` function.
+
+>**Note**: For a detailed description of the instruction, check out the following page: [Assembly Arrays Tutorial](https://www.tutorialspoint.com/assembly_programming/assembly_arrays.htm)
diff --git a/.view/guides/multiply-divide.md b/.view/guides/multiply-divide.md
@@ -0,0 +1,7 @@
+# Multiply and Divide
+
+To follow this guide, you'll need to use the `multiply-divide.asm` file located in the `guides/multiply-divide/support` directory.
+
+The program performs the `mul` and `div` instructions and prints out the results.
+
+>**Note**: For a detailed description of the instruction check out the following pages: [div](https://www.felixcloutier.com/x86/div) and [mul](https://www.felixcloutier.com/x86/mul)
diff --git a/.view/guides/segfault.md b/.view/guides/segfault.md
@@ -0,0 +1,44 @@
+# GDB Tutorial: Debugging a Segfault
+
+To follow this guide, you'll need to use the `segfault.c` file located in the `guides/segfault/support` directory.
+
+Compile and run the source code from the skeleton (if you are not using the Makefile, make sure to compile with the -g flag).
+In short, the program takes a number n, allocates a vector of size n, and initializes it with the first n numbers from the Fibonacci sequence.
+However, after running the program, you see: Segmentation fault (core dumped).
+
+Start GDB with the executable:
+
+```bash
+gdb ./segfault
+```
+
+Once you have started GDB, all interaction happens through the GDB prompt.
+Run the program using the `run` command.
+What do you notice?
+GDB hangs at the input reads.
+
+Set a breakpoint at `main` using the `break main` command.
+You will see the message in the prompt:
+
+```c
+Breakpoint 1 at 0x7d3: file seg.c, line 21 /* The memory address should not be the same */
+```
+
+Next, we will step through the instructions one by one.
+To do this, use the `next` or `n` command (watch the GDB cursor to see the current instruction and repeat the process).
+You will notice that GDB hangs at `scanf`, so input a value for `n` and continue stepping through.
+If you have entered a large value for `n` and want to skip the iteration, use the `continue` command.
+Eventually, you will reach the line `v[423433] = 3;`, and GDB will display:
+
+```bash
+Program received signal SIGSEGV, Segmentation fault
+```
+
+Inspect the memory at `v[423433]` using `x &v[423433]` and you will receive the message:
+
+```c
+Cannot access memory at address 0x5555558f3e94 /* The memory address should not be the same */
+```
+
+What happened?
+We accessed a memory area with restricted access.
diff --git a/.view/guides/students.md b/.view/guides/students.md
@@ -0,0 +1,5 @@
+# Students
+
+To follow this guide, you'll need to use the `students.asm` file located in the `guides/students/support` directory.
+
+This program iterates through the array of structures representing `students` and prints the name of each student.