13 October, 2020

LD_PRELOAD - Introduction

Author: Travis Phillips

Today I wanted to start what I plan to be a small series of blog posts about LD_PRELOAD. LD_PRELOAD is related to Linux-based systems and revolves around the loader system and how shared object libraries resolve linker symbols when loading a dynamically linked ELF binary and is loaded before any other shared object libraries. This is often referred to as a technique, but the namesake is in reference to the environment variable that is used by the loader system. There is also a global file that can be used to point to your shared object library globally, and it would affect all processes, not just the ones that can see the LD_PRELOAD environment variable. This causes some very interesting behavior and can be useful for a wide range of use cases. This blog post is designed to give a high-level overview of the use cases that will be covered in later blog posts.

Function Hijacking

This is one of the most common utilities of LD_PRELOAD. By having your shared object library loaded before the other libraries, it will search the libraries in the order they are loaded to find it. This happens when the ELF binary that is running calls a function that is to be imported from the shared object libraries such as libc. If there is a function in your shared object library that matches that name, your function will be invoked instead of the real one. This can be useful for several use cases.

Debugging and Reverse Engineering

This is primarily one of the ways that you see LD_PRELOAD get used. This can be used to hijack function calls and do a wide range of things. You can have it print out the parameters that were passed to the function, or even log them to a file instead. It's also possible to look up the real function address they were calling and pass it onto the real function. This allows you to have a sort of debugging shim between the ELF and the real function in the application. You can also review the stack when this is called to determine the caller address so you know where the call came from if you want to just pass it along on all calls except one address it gets called from.

Controlling Application Behavior

Function hijacking can also be used to control a process's behavior. Since the ELF binary will likely react to return codes from those function calls, it is possible to fake return values to get your desired behavior while being in control of what the function call does. For example, imagine a rand() call that always returns 42, since the rand or random function call is used to obtain a unique and hard to guess integer, or strcmp() that always returns 0 as opposed to comparing the strings and informing if matched or not, or a flock() call that doesn't actually lock the file, but tells you it did.

User-Land Rootkits

All of this functionality also lends itself quite well to malware. It's possible to use this to create a user-land rootkit. On top of the environment variables, there is also a global file that can be used that will make the loader preload the shared object library into every process it starts. This would include services and daemons that run as root. Some of the malware out there does interesting stuff with this.

For example, hijacking the library calls in libc to a function that gets a directory listing means you can hide files, hijacking read() can allow you to hide file content, hijacking the PAM authentication functions can allow you to log passwords, hijacking accept() can allow you to turn every network service on the machine into a bind shell backdoor under the right conditions. Hijacking functions in libpcap can hide traffic you want to keep hidden. Hijacking functions in openssl/openssh could allow you to generate known keys, or log any loaded keys.

However it is worth noting that there is a flaw in this design that can make detection of these rootkits possible from user-land as well. The primary one being that it can only affect dynamically linked ELF binaries and a statically linked ELF binary wouldn't be affected by it. We will explore this more in-depth in a later blog post.

A Few Security Use Cases

A while back, the local Linux User Group JaxLUG had a presentation that explored the possibility of using that same rootkit idea, but from a security standpoint of using it as a system-wide application whitelist technology. I have also seen LD_PRELOAD be used to block or log certain network connections in applications since you can control the DNS lookup and connect calls.

Limitations

LD_PRELOAD does have a few limitations to it. The first one is that it only works on dynamically loaded ELF binaries. If it was statically linked, then it doesn't load external shared object libraries. The next limitation to keep in mind is that binaries with the setuid bit set will ignore the setuid bit if LD_PRELOAD is set. When a binary that has the setuid bit is loaded with LD_PRELOAD, it will be run as the user that invoked it, rather than changing to the file owner. This is for the obvious security issue that would come up if a user could just fire up a setuid binary owned by root such as sudo, and inject their own code that might just give them a shell. You can do that, but it would be running as your user, not the file owner.

Conclusion

This post was mostly intended to be a high-level overview. If some of this seems a little confusing don't worry about it, this is just the introduction in the series. The blog posts to follow will provide more examples and will expand upon the use cases outlined above.

LD_PRELOAD, rootkits, and function hijacking. This is what our testers know.

Techniques like these are part of how our team operates during real penetration tests. If you want to know what an attacker could do on your Linux systems, we can find out.

Let's Find Out

LD_PRELOAD Series Blog Posts

Interested in more information about LD_PRELOAD? This blog is a part of a series and the full list of blogs in this series can be found below: