20 April, 2021

A Hacker’s Tour of the X86 CPU Architecture

Author: Travis Phillips

Overview

The Intel x86 CPU architecture is one of the most prolific CPU architectures for desktops, laptops, and servers. While other architectures exist and are even taking some market share with mobile devices such as smartphones and even Apple beginning to include its ARM M1 chip in newer Macbooks and Mac Mini, this one still stands as the default CPU architecture for modern computer systems, barring embedded and mobile devices. This architecture supports 64-bit, 32-bit, and 16-bit.

First, to have it documented here so that future blog posts that require this information as a prerequisite can simply link to this page. Second, this is just an interesting topic. The x86 architecture is an older architecture that has a lot of interesting history to it, and a lot of backward compatibility remains in it today.

This guide is not a comprehensive guide to the features of the x86 architecture, and some of this might be oversimplified, but the idea is to make this a primer to the x86 CPU architecture for future lessons that will expect you to know these basics.

The X86 CPU Architecture - Instruction Lengths

The X86 instruction set allows for dynamic lengths for instructions. This is interesting as this means the offset for instructions is not fixed, unlike ARM where it would need to be an offset of 2 or 4 increments, depending on if you are in Thumb mode or not. This is also interesting because this means that if you have an instruction that is 5-bytes long and you land at the second byte into the instruction, it may very well take on an entirely new meaning!

As an example, the screenshot below shows we can make an instruction that will move a dword value into EAX, which creates a 5-byte long instruction. However, if we jump 1 byte into the instruction, it instead translates into 4 separate instructions.

Offset changing the meaning of an x86 instruction

This is extremely helpful when looking for ROP gadgets for memory corruption bugs on X86 targets and means that the instructions leading up to a return may produce more ROP gadgets.

The X86 CPU Architecture - Registers

Registers are small amounts of data storage on the CPU that can be used to store data that is currently being processed. These registers on 32-bit x86 are in fact 32 bits or 4 bytes in size. That's what makes it a 32-bit architecture. The x86 architecture is capable of 64-bit as well, which would add 64-bit registers, with backwards-compatible support for the 32-bit and below registers.

When writing assembly, you will generally see a preference for using registers for data as it is much faster than using RAM since it never leaves the CPU. But you need to be mindful that these registers do serve purposes built into the instruction set itself.

For example, the instruction MUL, which only takes one operand, will multiply a register against EAX and store the result as a 64-bit value across EDX and EAX. That would mean that EAX was read and used, and EDX and EAX were overwritten by calling the MUL instruction. Keep that in mind when using these registers for storage. But if you think things through, this can be used however you see fit. This can also be used for quick neat tricks. For example, we could zero out EAX, EBX, & EDX registers with 2 instructions rather than using 3 separate instructions to accomplish this with the following assembly code.

XOR  EBX, EBX  ; XOR EBX by itself, resulting in the register being zero
MUL  EBX       ; Multiply EAX by EBX (zero).  The values are stored in
               ; EAX and EDX.  Anything multiplied by zero is zero.

Depending on the OS that is using this architecture, these registers may take on other meanings and uses as well. For example, in Linux, these registers can be used to set up syscalls before invoking the interrupt.

Registers - General Purpose

The table below shows a list of the 32-bit general-purpose registers on an x86 CPU:

Register	Purpose
EAX	Accumulator Register Used in arithmetic operations Where values are generally returned to
EBX	Base Pointer Used as a pointer to data
ECX	Counter Register Used in shift or rotate instructions and as a loop counter
EDX	Data Register Used in arithmetic operations Used in I/O operations
ESI	Source Index Register Pointer to source in streams
EDI	Destination Index Register Pointer to destination in stream operations
ESP	Stack Pointer Pointer to the top of the stack
EBP	Base Pointer Pointer to the bottom of the current stack frame
EIP	Instruction Pointer Pointer to the next instruction to execute

These purposes listed in the table are what they are intended to be used for, but they can be used to store random data as well. The x86 architecture is older and supports some interesting dynamics on registers. It now supports 64-bit, which supports 32-bit, which supports 16-bit, which supports 8-bit registers! This is actually useful as this means that we can in 32-bit mode make use of the 16-bit or 8-bit registers! For example, the accumulator register can be represented in the following forms:

Accumulator register breakdown showing RAX EAX AX AH AL

Note the AH and AL registers. These let you access the High byte and Low byte of the 16-bit register AX. I wanted to point that out as these are High/Low 8-bit registers. Not all registers have that, but it's worth mentioning because it will be featured in the table below which outlines the register naming conventions.

Register	64-Bit	32-Bit	16-Bit	8-Bit (High)	8-Bit (Low)
Accumulator	RAX	EAX	AX	AH	AL
Base	RBX	EBX	BX	BH	BL
Counter	RCX	ECX	CX	CH	CL
Data	RDX	EDX	DX	DH	DL
Source	RSI	ESI	SI		SIL
Destination	RDI	EDI	DI		DIL
Stack	RSP	ESP	SP		SPL
Stack Base	RBP	EBP	BP		BPL
Program Counter	RIP	EIP	IP

Registers - Segment Registers

Segment registers were intended to be base pointers to various segments of a program (code, data, etc.). For the most part, these aren't used anymore by modern OSes for this purpose, but rather for their own purposes. For example in Linux, gs or fs are generally used to implement stack canary information to protect against buffer overflows. Below shows an example of using the GS segment register to implement the stack cookie being set up and checked at the end of a function to determine if a stack overflow has occurred.

Stack cookie implementation by GCC using the GS segment register

The segment registers are listed below for what they were originally intended for:

Segment Register	Purpose
CS	Code
DS	Data
SS	Stack
ES	Extra Data #1
FS	Extra Data #2
GS	Extra Data #3

Registers - EFLAGS Register

The EFLAGS register is a 32-bit register that is used to represent various bitwise "flags" in boolean context, either it's set or not set. These flags are generally 1 bit in size, except for one flag IOPL, which we are not going to dive into in this post. Most of the flags in the 32-bit register aren't super important. However, the ones that will apply the most to reverse engineering, shellcoding, or ASM programming efforts are as follows:

Bit Field	Symbol	Name	Description
0	CF	Carry Flag	Set if the last arithmetic operation carried or borrowed a bit over the size of the register.
6	ZF	Zero Flag	Set if the result of an operation is zero
7	SF	Sign Flag	Set if the result of an operation is negative
8	TF	Trap Flag	Set if debugging step by step.
10	DF	Direction Flag	Controls the stream direction. If set the stream operations will decrement the pointer rather than incrementing it. This basically allows you to control the direction that a stream operation reads.
11	OF	Overflow Flag	Set if signed arithmetic operations result in a value too large for the register to hold

Registers - 128, 256, and 512 Bit Registers

Various extensions over the years have required the addition of larger registers to support things such as floating points, large numbers and vectors, and AES. These registers, while larger, do require the CPU to have the support for that extension, as these registers require special instructions to access them.

Registers Size	Registers	Extension
128-Bit	XMM0-XMM15	SSE (XMM0-XMM7) AMD64 (XMM8-XMM15)
256-Bit	YMM0-YMM15	AVX (Advanced Vector Extensions)
512-Bit	ZMM0-ZMM31	AVX-512

It is worth noting that support for the extensions that support these feature sets can be all over the place. If it is there then you can leverage these and their instructions, if not, expect the program to crash since the CPU will not understand the instruction. For details and support matrices, the Wikipedia article at https://en.wikipedia.org/wiki/Advanced_Vector_Extensions is a good source or you can check Intel's ARK at https://ark.intel.com/content/www/us/en/ark.html.

The X86 CPU Architecture - Memory Model

The X86 CPU architecture uses little-endian ordering for memory storage. This means that when a sequence of bytes is stored in memory, the least significant byte comes first. To keep this simple, it means that the byte order of data is effectively reversed when being stored in memory.

For Example: The hex value 0xDEADBEEF would be stored in memory as 0xEF, 0xBE, 0xAD, 0xDE.

Little-Endian memory storage example showing 0xDEADBEEF byte order

Conclusion

I hope you've enjoyed this blog post and learned something new today about the x86 architecture. Future posts will depend on this baseline knowledge and I hope this primer brings you up to speed comfortably. If you're interested in security fundamentals, we have a Professionally Evil Fundamentals (PEF) channel that covers a variety of technology topics. We also answer general basic questions in our Knowledge Center.

Linux X86 Assembly Series Blog Post

Interested in more information about the X86 architecture and Linux shellcode/assembly? This blog is a part of a series and the full list of blogs in this series can be found below:

Our testers read disassembly for a living.

Understanding CPU architecture, memory layout, and instruction sets is foundational to the work our team does every day. If you need a penetration test run by people who can go beyond automated tools and into the binary, reach out.

Reach Out