Skip to content

Commit

Permalink
created cyberchallenge demo version
Browse files Browse the repository at this point in the history
  • Loading branch information
robalb committed Jul 24, 2024
1 parent e2543f4 commit 08f80ac
Show file tree
Hide file tree
Showing 4 changed files with 476 additions and 2 deletions.
5 changes: 3 additions & 2 deletions astro-website/src/components/SharedLayout.astro
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,9 @@ export interface Props {
lang?: string;
isBlog?: boolean;
isLarge?: boolean;
showHeader?: boolean;
}
const { title, description, permalink, activePage, lang="en", isBlog=false, isLarge=false } = Astro.props;
const { title, description, permalink, activePage, lang="en", isBlog=false, isLarge=false, showHeader=true} = Astro.props;
//theme logic is defined in BlogHeader, with the switch js handler
---
Expand All @@ -34,7 +35,7 @@ const { title, description, permalink, activePage, lang="en", isBlog=false, isLa
else
document.body.classList.remove("light")
</script>
<BlogHeader {activePage} />
{showHeader && <BlogHeader {activePage} />}
<main class={isLarge && 'large'}>
<slot/>
</main>
Expand Down
18 changes: 18 additions & 0 deletions astro-website/src/layouts/BlogPostNoHeader.astro
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
import SharedLayout from '../components/SharedLayout.astro';
import BlogPost from '../components/BlogPost.astro';
const { content } = Astro.props;
const {
title, description, permalink,
subtitle, lang="en", publishDate, author, tags, heroImage, alt
} = content;
const isBlog = true
const showHeader = false
---
<SharedLayout {title} {description} {permalink} {lang} {isBlog} {showHeader}>
<BlogPost {title} {subtitle} {author} {tags} {heroImage} {publishDate} {alt}>
<slot />
</BlogPost>
</SharedLayout>

217 changes: 217 additions & 0 deletions astro-website/src/pages/cyberchallenge/x64-introduction.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
---
layout: '../../layouts/BlogPostNoHeader.astro'
title: An interactive guide to x86-64 assembly - introduction
publishDate: 2024-02-18
description:
tags: ['x86-64', 'pwn']
permalink: https://halb.it/cyberchallenge/x64-introduction/
---
import Spoiler from '../../components/Spoiler.astro'
import SliderHexdump from '../../components/SliderHexdump.svelte'
import RegistersTable from '../../components/RegistersTable.svelte'
import PtrSyntaxEmbed from '../../components/PtrSyntaxEmbed.svelte'
import StackEmbed from '../../components/StackEmbed.svelte'
import EndiannessEmbed from '../../components/EndiannessEmbed.svelte'

It's often said that assembly language is complex. Most people are scared of it, everyone avoids it.
After all, there's a reason why high-level languages and compilers were invented, right?<br/>
But while it's true that you would have a hard time writing a large project in assembly,
the language itself is surprisingly simple.
That's because Assembly is the native language of the processor,
and at it's essence, all the processor does is moving data.

This guide is not about writing assembly; it's about understanding the way data moves
behind the scenes when you execute a program. We'll use concrete examples for the
x86-64 architecture, but these informations apply eveywhere and are foundamental knowledge
for reverse engineering, binary exploitation, or just writing better code.

This is the first part of a series of interactive articles:

- [introduction](/cyberchallenge/x64-introduction/) (you are here)
- [moving data](/cyberchallenge/x64-moving-data/)
- stack frames

### what is data?

Data is just bits, representing information.
A sequence of bits can encode any kind of information, however this article will only focus
on text and integers.

But before we talk about any kind of encoding, we have to introduce a new notation:
The issue is that while circuits understand sequences of bits very well, humans don't.
For example, can you tell the difference between
`1101010101111110` and `1101010101111110` ?

<Spoiler text="Show answer">
Ok, the two sequences are identical, but I bet you couldn't immediately see that.
</Spoiler>

In order to visualize binary data in a more human friendly way, we use
hexadecimal numbers, which associate a number or a letter
between A and F to a group of 4 bits.<br/>
A long sequence of bits can be represented in this way:

```
0010 0101 0111 1101 1111
2 5 7 d f
```

Note that in order to avoid confusion with decimal numbers, it's common to prefix
hexadecimal numbers with `0x`.
For example, `0x1234` is not the same
thing as the decimal number `1234`.<br/>
I'm not going to explain how conversions between decimal, binary, and hexadecimal numbers work,
The only assumption i'm making in this article is that you know that.<br/>
If you have a python terminal, you can perform these conversions very easily:

```python
al@thinkpad:~/$ python
>>>
>>> 0b0010 #print the binary number 0010 in decimal
2
>>> 0x1234 #print the hex number 1234 in decimal
4660
>>> hex(0b00100101011111011111) #print a binary number in hex
'0x257df'
>>> hex(4660) #print a decimal number in hex
'0x1234'
```

One more thing: we call a group of 8 bits a `byte`, but that's not the only
group of bits with a name. The following table
contains all the names that you will encounter while working with the x86-64 architecture:

| N. of bits | example hex value | name |
| ------------ | ----------------- | ---------------------- |
| 4 | f | nibble |
| 8 | ff | byte |
| 16 | ffff | word |
| 32 | fffffff | dword (double word) |
| 64 | fffffffffffff | qword (quadruple word) |



### text

There are a lot of different ways to encode text, and I recommend that you
read the [bare minimum foundamentals](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/)
, it's a very interesting topic in itself.
In this article however we'll only focus on ASCII encoding, which is extremely simple:

All you need to know is that text is stored as a seqence of bytes. every byte represents a character,
so there are `127` possible characters between numbers, english letters and puctuation.
You can find a table of all the ascii characters in the
[linux man pages](https://man.archlinux.org/man/core/man-pages/ascii.7.en).

For example, the letter 'c' is stored as the byte `0x63`,
The letter 'o' is `0x6f`,
The text `ciao` is stored as the sequence of bytes `63 69 61 6f`.


### where is data?

Now that we know how to represent text and numbers, we need some place
to store them.
Like all kind of data, we can store it in only two places:

- in memory, which means in your RAM
- in registers, which are special containers inside your CPU

### memory

Memory is just a very long list of
contiguous cells, each containing 8 bits of information, and reachable by a numeric address.

Since printing a long list of bytes would take a lot of space,
when visualizing memory we usually group bytes in rows of 8 or 16.
It's also common to include a column to the side that shows the ascii letter associated to each byte.

The memory dump below was taken from a program that was running on my computer.
Use the slider to adjust the number of bytes you wanto to show in a row.

<SliderHexdump
client:load
showAscii={true}
data={[
0x65, 0x78, 0x61, 0x6d, 0x70, 0x6c, 0x65, 0x20,
0x61, 0x73, 0x63, 0x69, 0x69, 0x20, 0x74, 0x65,
0x78, 0x74,
0, 0, 0, 0, 0, 0, 233, 81, 85, 85, 85, 85, 0, 0, 64, 220, 255, 255, 1, 0, 0, 0, 88, 220, 255, 255, 255, 127, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 232, 4, 190, 18, 120, 233, 111, 224, 88, 220, 255, 255, 255, 127, 0, 0, 233, 81, 85, 85, 85, 85, 0, 0, 152, 125, 85, 85, 85, 85, 0, 0, 64, 208, 255, 247, 255, 127, 0, 0, 232, 4, 28, 164, 135, 22, 144, 31,
232, 4, 52, 40, 253, 6, 144, 31, 0, 0, 0, 0, 255, 127, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 66, 158, 135, 93, 202, 47, 126, 0, 0, 0, 0, 0, 0, 0, 0, 64, 158, 194, 247, 255, 127, 0, 0, 104, 220, 255, 255, 255, 127, 0, 0, 152, 125, 85, 85, 85, 85, 0, 0, 224, 226, 255, 247, 255, 127, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 81, 85, 85, 85, 85, 0, 0
]}
sliderStart={0} /><br/>

### registers

Registers are containers for data, located inside your CPU.
The x86-64 architecture has
[a lot of registers](https://en.wikipedia.org/wiki/X86#/media/File:Table_of_x86_Registers_svg.svg),
each with an associated name.
Some of them have a specific purpose, other are generic containers we can use in our program.
We mostly interact with these:

<RegistersTable />

<br/>

In order to understand these tables, we'll look at the register `rax`, displayed in the first row.
`rax` is a generic register that contains 8 bytes of data: from byte 0 to byte 7
as indicated by the byte numbers at the top of the table.

The register `eax` gives you access to the
lower 4 bytes of `rax`; reading or writing into `eax` is the same as reading or writing
the bytes from 0 to 3 of `rax`.<br/>
Similarly, `ax` gives you access to the lower 2 bytes, and `al` to the lowest byte.

### Finally, some code

We are assuming that you are familiar with some programming language, it doesn't matter which one.
Assembly code syntax is similar to the programming language concepts you know:
a sequence of instructions, usually one on every line, that will be executed in order.

The x86-64 assembly syntax has two different dialects: AT&T and Intel.
All the code snippets in this series of articles are using the Intel syntax.
The following snippet is an example of how the syntax looks like, don't worry about what it does for now.

```yaml
# this is a comment
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi
mov eax, DWORD PTR [rbp-4]
add eax, 0x42
pop rbp
ret
```

A good way to familiarize yourself with the syntax is to look at the assembly
generated from small snippets of code.
The
[compiler explorer website](https://godbolt.org/z/7qGb91oo8)
is designed exactly for this use case: You can type snippets of code
in any compiled language you know, and observe the generated assembly.
If you hover the mouse over an assembly instruction you can even see
a description of what it does.

In the [next article](/cyberchallenge/x64-moving-data/)
we are going to see in details how each of the
instruction in the previous example works

### Further Reading

This article is still under development, and it's improving over time.<br/>
If you reached this point, you might be interested in the next articles:

- [introduction](/cyberchallenge/x64-introduction/) (you are here)
- [moving data](/cyberchallenge/x64-moving-data/)
- stack frames

Additional resources:

- pwn.college's assembly module and lectures https://pwn.college/fundamentals/assembly-crash-course
- the compiler explorer website https://godbolt.org/z/c6brc1df9
- [the official x86_64 reference](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html)
- unofficial x86_64 instructions reference https://www.felixcloutier.com/x86/
- The best linux syscall table reference https://syscalls.mebeim.net/?table=x86/64/x64/latest
Loading

0 comments on commit 08f80ac

Please sign in to comment.