01 June, 2021

Linux X86 Assembly - How to Make Our Hello World Usable as an Exploit Payload

Author: Travis Phillips

Overview

In the last two tutorials, we built a Hello World program in NASM and GAS for x86 assembly. While this can help us learn x86 assembly, it isn't viable as a payload for use in exploits in its current form. Today's blog will look into what those issues are, how they impact the code's use as a payload, and what we can do to address those issues. If you'd like to follow along, the code for this blog post can be found on the Secure Ideas Professionally Evil x86_asm GitHub repository.

Overview of Our Current GAS Hello World Binary

In our previous example, we built an ELF binary from our GAS code. We can see where we are currently at by using the tool objdump with the -d switch to dump the disassembly of the binary's code .text section as shown in the screenshot below.

Objdump output of our original Hello World GAS program

The Problems We Need to Solve

On the surface, it appears fine, but to use this as a payload in an exploit, there are a few issues that need to be addressed. The high level overview of our issues is outlined below:

Reviewing the issues with our objdump output

The Hello World string is at a fixed address in the binary. In an exploit, we need this to be dynamic since we don't know where the payload might end up in a memory corruption exploit.
The code contains several null bytes (0x00). In some exploits this might be alright, such as a buffer overflow using read(). But this will fail in string based functions such as strcpy(), sprintf(), etc as the null byte is a string terminator.
Size. Size is sometimes a constraint, it's probably fine as is, but why not see if we can make this smaller while we are in here. Smaller is usually never a problem, larger is more likely to cause issues.

Problem 1 - Fixed Address to the Hello World String

The first issue that is a deal breaker for using this as an exploit payload is the fact that our code DEPENDS on the Hello World string being hard coded at address 0x804a000. This is simply not likely to be the case in a memory corruption exploit. Looking at the objdump output, our code is currently 31 bytes and doesn't contain our string at all. This issue needs to be addressed so we can include the Hello World string in our payload and have its address determined relative to its location in memory.

Solution 1 - Jumps & Calls for Address Retrieval

There are a few ways we can address this, but one of the easiest methods would be to use the jmp/call/pop method. This method uses those three instructions to accomplish the goal of finding the address of the string relative to where it is in regard to the payload.

To keep this high level, a JMP will move the EIP/RIP pointer to an address. This address can be either absolute or relative depending on how far the distance is from its current location. On the other hand, the CALL instruction will push the address of the next instruction after it, as an absolute address, to the stack. This is known as a return address. It's also worth noting that if the CALL is relative to a previous address, there will be no null bytes as the offset would be a signed negative number.

CALL instructions are intended for function calls. The function you call should invoke a RET instruction, which takes the address off the top of the stack and sets the EIP/RIP instruction pointer register to that address. But we can use this to find the absolute address to the location after the CALL instruction. JMP on the other hand will just move the EIP/RIP instruction pointer register to the new location, without pushing an address to the stack. We can groom our payload to leverage the behavior of CALL pushing a return address to the stack, and just POP it into a register for our own use. The flow for this is demonstrated in the image below.

JMP CALL POP method for dynamic string address retrieval

Implementing Solution 1 in Our Code

To get this implemented, we create a new source code file called hello_world_gas_solution_1.s. The high level overview of the changes: move the msg string to the end of our payload in the .text section, remove the .data section, add a JMP at the start to jump over our payload to a CALL instruction that calls back to the payload label, and use POP to get the string address off the stack. The code that implements these changes is listed below.

.global  _start
.text

_start:
    jmp my_string               # Jump to the my_string label.
payload:
    mov    $4,%eax              # 4 = Syscall number for Write()
    mov    $1,%ebx              # STDOUT is 1
    pop    %ecx                 # Pop the string address off the stack
    mov    $len,%edx            # The length of string to print
    int    $0x80                # Execute write() syscall

    mov    $1,%al               # 1 = Syscall for Exit()
    mov    $0,%ebx              # Exit status code 0
    int    $0x80                # Execute exit() syscall

my_string:
    call   payload              # Push pointer to msg onto stack

msg:
    .ascii "Hello, World!\n"
    len = . - msg

Once compiled, we can dump it with objdump and see that we now have a program that doesn't make use of the fixed address for the Hello World string!

Objdump showing the string address is no longer fixed

The JMP instruction jumps 0x1b bytes (27 decimal) to the my_string label. The CALL instruction back to the payload label uses a relative offset of 0xffffffe0 (-32 as a signed integer). The POP ECX instruction gets the string address that was saved as a return address on the stack by the CALL instruction. A test run confirms it works:

Testing that the solution 1 program works

Problem 2 - Null Bytes in the Shellcode

We made the string location dynamic, but we still have null bytes in the middle of our payload code. This occurs because we are trying to move a value into a 32-bit register and the compiler treats it as a 32-bit value, padding the rest with zero bits.

Highlighting remaining null bytes in the payload

Solution 2 - XOR Registers and Use 8-bit Register Moves

As covered in our X86 CPU Architecture post, x86 supports 8-bit register addresses. Instead of moving a value into EAX, we move it to AL instead. But we first need to zero out the registers since they may contain existing data. Moving zero would create nulls, so we use XOR to zero a register against itself. The Boolean Math (XOR Logic) blog covers this operation in depth.

The updated code XORs EAX, EBX, and EDX at the start, then uses 8-bit register moves (AL, BL, DL) and XOR instead of MOV for zeroing EBX in the exit syscall:

.global  _start
.text

_start:
    jmp my_string
payload:
    xor    %eax,%eax            # Zero out eax
    xor    %ebx,%ebx            # Zero out ebx
    xor    %edx,%edx            # Zero out edx
    mov    $4,%al               # 4 = Syscall for Write()
    mov    $1,%bl               # STDOUT is 1
    pop    %ecx                 # Pop string address off stack
    mov    $len,%dl             # Length of string
    int    $0x80                # Execute write() syscall

    mov    $1,%al               # 1 = Syscall for Exit()
    xor    %ebx,%ebx            # Exit status code 0
    int    $0x80                # Execute exit() syscall

my_string:
    call   payload

msg:
    .ascii "Hello, World!\n"
    len = . - msg

The objdump confirms the new binary is free of null bytes, and while adding 3 new instructions, we still made the payload 6 bytes smaller!

Null-free payload in objdump output

Testing that the solution 2 program works

Problem 3 - Size Reduction

We already reduced the size by 6 bytes, but why stop there? We can optimize further by replacing XOR/MOV pairs (4 bytes) with PUSH/POP pairs (3 bytes) for setting registers, and by using the XCHG instruction (1 byte) instead of MOV/DEC (3 bytes) for the exit syscall setup.

Final Code

The final code that implements all three solutions is as follows:

.global  _start
.text

_start:
    jmp my_string
payload:
    push   $4
    pop    %eax                 # 4 = Syscall for Write()

    push   $1
    pop    %ebx                 # STDOUT is 1

    pop    %ecx                 # Pop string address off stack

    push   $0x0e
    pop    %edx                 # Length of string (14)

    int    $0x80                # Execute write() syscall

    xchg   %ebx,%eax           # Swap eax and ebx (1 byte!)
    int    $0x80                # Execute exit(bytes_written)

my_string:
    call   payload

msg:
    .ascii "Hello, World!\n"

When compiled and run through objdump, we get a size-reduced, null-free, position-independent payload:

Size reduced null-free position-independent payload in objdump

Testing confirms the new exit strategy using XCHG produces an exit status of 14 (the number of bytes written). And it works even when registers are pre-populated with data like 0xdeadbeef:

Testing solution 3 with exit status 14

Confirming PUSH POP method works with pre-populated registers

Conclusion

I hope you've enjoyed this blog post and learned something new today about making your code shellcode friendly. The code for this post and Makefile will be added to the Secure Ideas Professionally Evil x86_asm GitHub repository. In future posts, we will explain how to use various tools and scripts to extract our shellcode and how to build a C shellcode tester stub to test our shellcode standalone.

If you're interested in security fundamentals, we have a Professionally Evil Fundamentals (PEF) channel that covers a variety of technology topics. We also answer general basic questions in our Knowledge Center.

Linux X86 Assembly Series Blog Post

Interested in more information about the X86 architecture and Linux shellcode/assembly? This blog is a part of a series and the full list of blogs in this series can be found below:

Custom payloads. Manual exploitation. Real results.

Our testers build custom tooling and write their own shellcode when the job calls for it. If you want a penetration test that goes deeper than an automated scan, reach out.

Reach Out