Linux X86 Assembly - How to Build a Hello World Program in GAS

Overview

In the last tutorial, we covered how to build a 32-bit x86 Hello World program in NASM. Today, we will cover how to do the same thing, but this time using the GAS toolchain instead. This will allow us to review the differences in the source code syntax and structure, as well as the difference in the build process.

Prerequisite Knowledge

Before continuing, this article will assume you have some basic knowledge of the X86 architecture. A blog post that covers the basics of the X86 architecture can be found on our blogs. Since we will also be reproducing the exact same work we did in NASM, but with a different toolchain, it would be highly recommended to read that tutorial, which can be found in our blog pages. The reason you should read that one is this blog post will assume you understand the syscall and instruction concepts that were explained in that article to reduce the length of this tutorial. We will also be adding to the Makefile from that project to build the GAS version, along with the C and NASM versions from that tutorial. Since this is all bundled together, we have added these code samples to the Secure Ideas Professional Evil GitHub repository which can be found on github.

Source Code

Like last time with our NASM example, let’s first list our complete code here for the hello_world_gas.s file, then we will pick it apart to explain what is going on. Also notice i’ve used a .s extension rather than a .asm extension on the filename. This can help with syntax highlighting to be correct in editors like nano. GAS typically uses .s as the extension and NASM uses the .asm and most syntax highlighting editors adjust based on that extension, since these two have different syntax. The complete code is as follows:

##########################################################################
#
# Program: hello_world_gas
#
# Date: 04/22/2021
#
# Author: Travis Phillips
#
# Purpose: A simple hello world program in x86 assembly for
#          GAS
#
# Compile: as --march=i386 --32 ./hello_world_gas.s -o hello_world_gas.o
#    Link: ld -m elf_i386 hello_world_gas.o -o hello_world_gas
#
##########################################################################
.global  _start                 # we must export the entry point to the
                                # ELF linker or loader. Conventionally,
                                # they recognize _start as their entry
                                # point but this can be overridden with
                                # ld -e "label_name" when linking.

.data                           # .data section declaration

msg:
    .ascii    "Hello, World!\n" # Declare a label "msg" which has
                                # our string we want to print.

    len = . - msg               # "len" will calculate the current
                                # offset minus the "msg" offset.
                                # this should give us the size of
                                # "msg".

.text                           # .text section declaration

_start:
    ######################################
    # syscall - write(1, msg, len);
    ######################################
    mov    $4,%eax              # 4 = Syscall number for Write()

    mov    $1,%ebx              # File Descriptor to write to
                                # In this case: STDOUT is 1

    mov    $msg,%ecx            # String to write. A pointer to
                                # the variable 'msg'

    mov    $len,%edx            # The length of string to print
                                # which is 14 characters

    int    $0x80                # Poke the kernel and tell it to run the
                                # write() call we set up

    ######################################
    # syscall - exit(0);
    ######################################
    mov    $1,%al               # 1 = Syscall for Exit()
    mov    $0,%ebx              # The status code we want to provide.
    int    $0x80                # Poke kernel. This will end the program.

The code above might look massive, but much like the NASM example, it’s mostly comments to help explain things. If we remove the comments, the code part would simply be:

.global  _start
.data
msg:
    .ascii "Hello, World!\n"
    len = . - msg
.text
_start:
    mov    $4,%eax
    mov    $1,%ebx
    mov    $msg,%ecx
    mov    $len,%edx
    int    $0x80
    mov    $1,%al
    mov    $0,%ebx
    int    $0x80

The first chunk of our source code file is a comments section. This is mostly just project information for us.

##########################################################################
#
# Program: hello_world_gas
#
# Date: 04/22/2021
#
# Author: Travis Phillips
#
# Purpose: A simple hello world program in x86 assembly for
#          GAS
#
# Compile: as --march=i386 --32 ./hello_world_gas.s -o hello_world_gas.o
#    Link: ld -m elf_i386 hello_world_gas.o -o hello_world_gas
#
##########################################################################

Nothing major here. In GAS, anything following a pound sign(#) is considered a comment. This top part of our file is just some basic information for humans that are looking at the file. It is worth noting that GAS also supports multi-line comment blocks in the /* */ format as well. To my knowledge, I don’t believe NASM supports multi-line comment blocks. We could have used that for the top part if we wanted to, but six is half a dozen; same difference.

However, the next line that follows exports our _start label so the compiler will make it visible to the linker to use as an entry point. The lines responsible for that is as follows:

.global  _start    # we must export the entry point to the
                   # ELF linker or loader. Conventionally,
                   # they recognize _start as their entry
                   # point but this can be overridden with
                   # ld -e "label_name" when linking.

The main difference between GAS and NASM on this front is that we need to add a period in front of the global keyword. The rest of the line’s syntax works the same.

After this is the line that marks the start of our .data section of the ELF binary. The syntax on this code is even more straightforward than NASM. You can omit the keyword “section” and just put the section name. This .data section we will use to store our variables.

.data                           # .data section declaration

In the data section, we will create the same two variables as the NASM example. The first one is “msg”. The “msg:” part is actually creating this as a label. A label provides a means for us to access an offset with a friendly name, and the compiler will actually implement the real offsets at compile time. To create a label you place a string, which is the label’s name at the start of a line, followed by a colon (:). As for declaring the data, in NASM we used the db keyword. However, in GAS we have a keyword .ascii for declaring strings! This is actually a little nicer since we can actually use the \n escape sequence rather than having to drop the hex code 0xa for it.

msg:
   .ascii "Hello, World!\n" # Declare a label "msg" which has
                            # our string we want to print.

The second variable we declare in the .data section is “len”. Take notice that it FOLLOWS the msg variable. This is important because this line is performing an offset calculation to determine the size by taking the len offset and subtracting the msg label offset. This would effectively give to the size of the data between the two offsets. You may have also noticed that len this time is not a label! GAS allows us to create variables. The period (.) means this current offset, then “– msg” subtracts the msg offset

   len = . - msg            # "len" will calculate the current
                            # offset minus the "msg" offset.
                            # this should give us the size of
                            # "msg".

Now that those two variables are created, we can now end the .data section and start the .text section. This is straightforward as we just put the section name without the section keyword, and the _start label creation code is the same in both NASM and GAS.

.text                       # .text section declaration
_start:

Now we can start writing the assembly code we want to execute. First we will start by looking over the chunk of code that creates and invokes the write syscall to print our hello world message:

    ######################################
    # syscall - write(1, msg, len);
    ######################################
    mov    $4,%eax         # 4 = Syscall number for Write()

    mov    $1,%ebx         # File Descriptor to write to
                           # In this case: STDOUT is 1

    mov    $msg,%ecx       # String to write. A pointer to
                           # the variable 'msg'

    mov    $len,%edx       # The length of string to print
                           # which is 14 characters

    int    $0x80           # Poke the kernel and tell it to run the
                           # write() call we set up

The logic here and instructions are the same as they were in the NASM example. However you may have noticed that the operands after the mov instructions are flipped and formatted as Source=>Destination. NASM was the inverse of that ordering. The other difference here is that values have a symbol before them that signifies what we are referencing. For example, the dollar sign ($) before the numbers signals the compiler that the value that follows is an immediate value., while the percent sign (%) signals that the value is a register. Overall the code set up is the same as it was with NASM, move the syscall number into EAX, setup the required 3 parameters, and use INT $0x80, to invoke the syscall for write to get our Hello World message printed to the console on STDOUT.

We can move on to our next and final syscall of exit which is shown below:

    ######################################
    # syscall - exit(0);
    ######################################
    mov    $1,%al          # 1 = Syscall for Exit()
    mov    $0,%ebx         # The status code we want to provide.
    int    $0x80           # Poke kernel. This will end the program.

The code here should look familiar since it’s just MOV and INT instructions again, following the same changes we saw between NASM and GAS.

Compiling and Linking

The comment block at the top of the code explains how to compile and link this code. It’s pretty straightforward with the following two commands. However, this time for compiling we will use the as command (short for assembler). We are also providing a few more switches to this command because this can assemble for several architectures in several bitness formats. Therefore, we will want to use the –march switch and explicitly tell it we are building for the i386 architecture and the –32 switch to let it know that we are building this as a 32-bit binary. GAS can produce a 64-bit binary as well without issues if you wanted to experiment with that, but note that this code will not work as 64-bit, and we don’t want GAS to assume that’s what we want:

$ as --march=i386 --32 ./hello_world_gas.s -o hello_world_gas.o
$ ld -m elf_i386 hello_world_gas.o -o hello_world_gas

To make things even easier, we can add the following text to a Makefile and leverage the make command to build both the C and NASM projects for us.

all: hello_world_c hello_world_nasm hello_world_gas

hello_world_nasm:
    @echo "\n\033[33;1mBuilding NASM Hello World\033[0m"
    nasm -f elf hello_world_nasm.asm
    ld -m elf_i386 hello_world_nasm.o -o hello_world_nasm

hello_world_c:
    @echo "\n\033[33;1mBuilding C Hello World\033[0m"
    gcc hello_world.c -o hello_world_c

hello_world_gas:
    @echo "\n\033[33;1mBuilding GAS Hello World\033[0m"
    as --march=i386 --32 ./hello_world_gas.s -o hello_world_gas.o
    ld -m elf_i386  hello_world_gas.o -o hello_world_gas


clean:
    @echo "\n\033[33;1mRemoving executables and object files\033[0m"
    rm -f hello_world_c
    rm -f hello_world_nasm hello_world_nasm.o
    rm -f hello_world_gas hello_world_gas.o

With that Makefile in place, you can build all three; the C, NASM, and GAS versions by simply running the make command in the directory as shown below:

Using the Make file to build the C code, NASM, and GAS version of the Hello World program.

Advantages to Learning GAS

There are several reasons why GAS should be learned. GAS is more flexible than NASM, at a slightly steeper learning curve. However, GAS is multi-architecture where last I checked, NASM is strictly X86. GAS could do ARM, MIPS, PowerPC, SPARC and many others. Also since GAS and objdump are both GNU tools, the output of objdump much more closely resembles the code we wrote because the syntax is now the same between the two tools.

What we wrote	What it became once compiled
`mov $4,%eax` `mov $1,%ebx mov $msg,%ecx mov $len,%edx int $0x80 mov $1,%al mov $0,%ebx int $0x80`	`mov $0x4,%eax` `mov $0x1,%ebx mov $0x804a000,%ecx mov $0xe,%edx int $0x80 mov $0x1,%al mov $0x0,%ebx int $0x80`

GAS also provides suffixes to instructions that can be used to specify the size of the data being moved. For example: MOVL would move a long value. MOVW would move a word value. MOVB would move a single byte. This can have a direct impact on the exported byte code.

Conclusion

I hope you’ve enjoyed this blog post and learned something new today about the x86 assembly in GAS. While these Hello World examples have worked well as a program, they do not work as a shellcode, at least not yet! In future posts, we will:

Review our Hello World program
Determine what issues impede its use as a shellcode payload
What we can do to solve those problems
Further reduce its size
Write the asm file in a way that makes it easy to extract the bytecode for use as an exploit payload.
How to build a C shellcode tester stub and use it.

Ready for a challenge? We post Mystery Challenges on Facebook, Linkedin, and Twitter. If you’re interested in security fundamentals, we have a Professionally Evil Fundamentals (PEF) channel that covers a variety of technology topics. We also answer general basic questions in our Knowledge Center. Finally, if you’re looking for a penetration test, professional training for your organization, or just have general security questions please Contact Us.

Linux X86 Assembly Series Blog Post

Interested in more information about the X86 architecture and Linux shellcode/assembly? This blog is a part of a series and the full list of blogs in this series can be found below: