A Hacker’s Tour of the X86 CPU Architecture


Overview

The Intel x86 CPU architecture is one of the most prolific CPU architectures for desktops, laptops, and servers.  While other architectures exist and are even taking some market share with mobile devices such as smartphones and even Apple begin including its ARM M1 chip in newer Macbooks and Mac Mini, this one still stands as the default CPU architecture for modern computer systems, barring embedded and mobile devices.  This architecture supports 64-bit, 32-bit, and 16-bit.  

First, to have it documented here so that future blog posts that require this information as a prerequisite can simply link to this page.  Second, this is just an interesting topic.  The x86 architecture is an older architecture that has a lot of interesting history to it, and a lot of backward compatibility remains in it today.

This guide is not a comprehensive guide to the features of the x86 architecture, and some of this might be oversimplified, but the idea is to make this a primer to the x86 CPU architecture for future lessons that will expect you to know these basics.

The X86 CPU Architecture – Instruction Lengths

The X86 instruction set allows for dynamic lengths for instructions.  This is interesting as this means the offset for instructions is not fixed, unlike ARM where it would need to be an offset of 2 or 4 increments, depending on if you are in Thumb mode or not.  This is also interesting because this means that if you have an instruction that is 5-bytes long and you land at the second byte into the instruction, it may very well take on an entirely new meaning!

As an example, the screenshot below shows we can make an instruction that will move a dword value into EAX, which creates a 5-byte long instruction.  However, if we jump 1 byte into the instruction, it instead translates into 4 separate instructions.

Offset changing the meaning of instruction.

This is extremely helpful when looking for ROP gadgets for memory corruption bugs on X86 targets and means that the instructions leading up to a return may produce more ROP gadgets.

The X86 CPU Architecture – Registers

Registers are small amounts of data storage on the CPU that can be used to store data that is currently being processed.  These registers on 32-bit x86 are in fact 32 bits or 4 bytes in size.  That’s what makes it a 32-bit architecture.  The x86 architecture is capable of 64-bit as well, which would add 64-bit registers, with backwards-compatible support for the 32-bit and below registers.

When writing assembly, you will generally see a preference for using registers for data as it is much faster than using RAM since it never leaves the CPU.  But you need to be mindful that these registers do serve purposes built into the instruction set itself.

For example, the instruction MUL, which only takes one operand, will multiply a register against EAX and store the result as a 64-bit value across EDX and EAX.  That would mean that EAX was read and used, and EDX and EAX were overwritten by calling the MUL instruction.  Keep that in mind when using these registers for storage.  But if you think things through, this can be used however you see fit.  This can also be used for quick neat tricks.  For example, we could zero out EAX, EBX, & EDX registers with 2 instructions rather than using 3 separate instructions to accomplish this with the following assembly code.

XOR  EBX, EBX  ; XOR EBX by itself, resulting in the register being zero
MUL  EBX       ; Multiply EAX by EBX (zero).  The values are stored in
               ; EAX and EDX.  Anything multiplied by zero is zero.

Depending on the OS that is using this architecture, these registers may take on other meanings and uses as well.  For example, in Linux, these registers can be used to set up syscalls before invoking the interrupt.

Registers – General Purpose

The table below shows a list of the 32-bit general-purpose registers on an x86 CPU:

Register Purpose
EAX Accumulator RegisterUsed in arithmetic operationsWhere values are generally returned to
EBX Base PointerUsed as a pointer to data
ECX Counter RegisterUsed in shift or rotate instructions and as a loop counter
EDX Data RegisterUsed in arithmetic operationsUsed in I/O operations
ESI Source Index RegisterPointer to source in streams
EDI Destination Index RegisterPointer to destination in stream operations
ESP Stack PointerPointer to the top of the stack
EBP Base PointerPointer to the bottom of the current stack frame
EIP Instruction PointerPointer to the next instruction to execute

These purposes listed in the table are what they are intended to be used for, but they can be used to store random data as well.  The x86 architecture is older and supports some interesting dynamics on registers.  It now supports 64-bit, which supports 32-bit, which supports 16-bit, which supports 8-bit registers!  This is actually useful as this means that we can in 32-bit mode make use of the 16-bit or 8-bit registers!  For example, the accumulator register can be represented in the following forms:

Register overlapping table view.

Note the AH and AL registers.  These let you access the High byte and Low byte of the 16-bit register AX.  I wanted to point that out as these are High/Low 8-bit registers.  Not all registers have that, but it’s worth mentioning because it will be featured in the table below which outlines the register naming conventions.

Register 64-Bit 32-Bit 16-Bit 8-Bit (High) 8-Bit (Low)
Accumulator RAX EAX AX AH AL
Base RBX EBX BX BH BL
Counter RCX ECX CX CH CL
Data RDX EDX DX DH DL
Source RSI ESI SI   SIL
Destination RDI EDI DI   DIL
Stack RSP ESP SP   SPL
Stack Base RBP EBP BP   BPL
Program Counter RIP EIP IP    

Registers – Segment Registers

Segment registers were intended to be base pointers to various segments of a program (code, data, etc.).  For the most part, these aren’t used anymore by modern OSes for this purpose, but rather for their own purposes.  For example in Linux, gs or fs are generally used to implement stack canary information to protect against buffer overflows.  Below shows an example of using the GS segment register to implement the stack cookie being set up and checked at the end of a function to determine if a stack overflow has occurred.

Stack cookie implementation by GCC

The segment registers are listed below for what they were originally intended for:

Segment Register Purpose
CS Code
DS Data
SS Stack
ES Extra Data #1
FS Extra Data #2
GS Extra Data #3

Registers – EFLAGS Register

The EFLAGS register is a 32-bit register that is used to represent various bitwise “flags” in boolean context, either it’s set or not set.  These flags are generally 1 bit in size, except for one flag IOPL, which we are not going to dive into in this post.  Most of the flags in the 32-bit register aren’t super important. However, the ones that will apply the most to reverse engineering, shellcoding, or ASM programming efforts are as follows:

Bit Field Symbol Name Description
0 CF Carry Flag Set if the last arithmetic operation carried or borrowed a bit over the size of the register.
6 ZF Zero Flag Set if the result of an operation is zero
7 SF Sign Flag Set if the result of an operation is negative
8 TF Trap Flag Set if debugging step by step.
10 DF Direction Flag Controls the stream direction.
if set the stream operations will decrement the pointer rather than incrementing it.
This basically allows you to control the direction that a stream operation reads
11 OF Overflow Flag Set if signed arithmetic operations result in a value too large for the register to hold

Registers – 128, 256, and 512 Bit Registers

Various extensions over the years have required the addition of larger registers to support things such as floating points, large numbers and vectors, and AES.  These registers, while larger, do require the CPU to have the support for that extension, as these registers require special instructions to access them.

Registers Size Registers Extension
128-Bit XMM0-XMM15 SSE (XMM0-XMM7)AMD64 (XMM8-XMM15)
256-Bit YMM0-YMM15 AVX (Advanced Vector Extensions)
512-Bit ZMM0-ZMM31 AVX-512

It is worth noting that support for the extensions that support these feature sets can be all over the place.  If it is there then you can leverage these and their instructions, if not, expect the program to crash since the CPU will not understand the instruction.  For details and support matrices, the Wikipedia article at https://en.wikipedia.org/wiki/Advanced_Vector_Extensions is a good source or you can check Intel’s ARK at https://ark.intel.com/content/www/us/en/ark.html.

The X86 CPU Architecture – Memory Model

The X86 CPU architecture uses little-endian ordering for memory storage.  This means that when a sequence of bytes is stored in memory, the least significant byte comes first.  To keep this simple, it means that the byte order of data is effectively reversed when being stored in memory.

For Example: The hex value 0xDEADBEEF would be stored in memory as 0xEF, 0xBE, 0xAD, 0xDE.

Little-Endian memory storage example

Conclusion

I hope you’ve enjoyed this blog post and learned something new today about the x86 architecture.  Future posts will depend on this baseline knowledge and I hope this primer brings you up to speed comfortably.  Ready for a challenge?  We post Mystery Challenges on Facebook, Linkedin, and Twitter.  If you’re interested in security fundamentals, we have a Professionally Evil Fundamentals (PEF) channel that covers a variety of technology topics.  We also answer general basic questions in our Knowledge Center.  Finally, if you’re looking for a penetration test, professional training for your organization, or just have general security questions please Contact Us.

Similar posts