Open Testware Reviews
Technology Bulletin: System Call Hijacking Tools
Copyright 2003 by Tejas Software
Consulting - All rights reserved.
Reviewed: 2003-September-30
Testingfaqs.org category: Test Implementation Tools
My review of Holodeck in August
2003 sparked requests for information
about similar tools for Linux. I haven't yet found another tool like
this that
merits a detailed review, but I thought I'd share with you what I've
learned about a few tools in this general category. While this bulletin
focuses on Linux, there are some points that are relevant to other
systems as well. Note that this bulletin assumes that readers have
programming experience.
I investigated what I call "system call hijacking tools." These are
potentially very powerful tools that you can use for robustness testing
and a wide variety of other tasks. In fact, most of these kinds of
tools are presented primarily as security tools, but system call
hijacking has many potential applications. Here's the basic idea: a
system call hijacking tool can take control of one or more running
programs and modify the behavior when the program makes any system
call. The tool might change the parameters that are passed to the
system call, it might prevent the call into the operating system, and
it might fake the results. Before delving into how such a tool can be
very useful to a tester, I'll explain a bit more about what a system
call is.
What is a system call?
A system call is a function call that executes code contained within
the operating system kernel. Most of the operating system's fundamental
services are accessed through system calls - opening files, asking for
more memory, initiating network connections, rebooting, etc. Other
functions may call into system libraries, or other third-party and
user-supplied libraries. These other function calls can also be
hijacked or stubbed for similar reasons, though the options for doing
so are more limited than they are for system calls.
System call tracing
To help draw a picture of what these tools do, let's first consider the
read-only equivalent - system call tracing. The name of these tools
varies from system to system - look for something like "strace,"
"ktrace," "trace," or "truss." The most common one on Unix-like systems
seems to
be strace. I even
found strace for Cygwin on Windows, plus Windows has
Holodeck and tracing tools that come with some commercial development
environments.
Here's the strace output for a simple program that calls malloc on
Linux.
$ strace ./foo
execve("./foo", ["./foo"], [/* 32 vars */]) = 0
fcntl64(0, F_GETFD) = 0
fcntl64(1, F_GETFD) = 0
fcntl64(2, F_GETFD) = 0
uname({sys="Linux", node="localhost.localdomain", ...}) = 0
geteuid32() = 500
getuid32() = 500
getegid32() = 500
getgid32() = 500
brk(0) = 0x80a39c0
brk(0x80a39e0) = 0x80a39e0
brk(0x80a4000) = 0x80a4000
fstat64(1, {st_mode=S_IFREG|0664, st_size=591, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40000000
write(1, "malloc returned 134888544\n", 26malloc returned 134888544
) = 26
munmap(0x40000000, 4096) = 0
_exit(26) = ?
Here's the program, foo.c.
main() {
printf("malloc returned %d\n", malloc(1));
}
If you look closely at the output, you'll see that the stdout output of
the program is mixed with the strace output that goes to stderr. You
can direct these output streams to two different files to avoid this
problem, except that any stderr output from the program will still be
jumbled up with the strace output.
Strace tries very hard to decode the system call parameters and show
them in a readable form, such as "PROT_READ|PROT_WRITE" above, which
would
otherwise be a meaningless integer. Different trace tools vary widely
in their ability to make the parameters readable.
Now that we can see the details of the system calls, let's move on to
modifying their behavior.
How testers can use system call hijacking
If you could modify the system call parameters, the return value, and
possibly replace or supplement the system call with your own code, you
could easily simulate a huge variety of problems that applications need
to deal with. For example, you could fake an out of memory or disk full
error, or introduce data corruption on the disk or network, all using
the same hijacking technique.
See the Holodeck review for additional background.
Catching system calls in the kernel
One way to hijack a system call is to hook a tool into the kernel
itself. That's what the syscalltrack
tool does. Syscalltrack loads
a kernel module that injects probes into the kernel system call
table. It watches all current and future processes on the system by
default, and it has a very flexible mechanism for filtering so you can
zero in on the areas you're interested in. The tool seems to focus on
logging. Here's the output from its strace clone, sctrace, on my "foo"
program.
syscall: 6641["sctrace"]: 6_close(4) (rule 0)
syscall: 6641["sctrace"]: 3_read(3, "g", 1) (rule 0)
syscall: 6641["sctrace"]: 11_execve("/home/test/foo", bffff898, bffff8a0) (rule 0)
syscall: 6641["foo"]: 221_fcntl64(0, 1, 0) (rule 0)
syscall: 6641["foo"]: 221_fcntl64(1, 1, 0) (rule 0)
syscall: 6641["foo"]: 221_fcntl64(2, 1, 0) (rule 0)
syscall: 6641["foo"]: 122_newuname(new_utsname{c5f1fdf8, c5f1fe39, c5f1fe7a, c5f1febb, c5f1fefc, c5f1ff3d}) (rule 0)
syscall: 6641["foo"]: 201_geteuid(void) (rule 0)
syscall: 6641["foo"]: 199_getuid(void) (rule 0)
syscall: 6641["foo"]: 202_getegid(void) (rule 0)
syscall: 6641["foo"]: 200_getgid(void) (rule 0)
syscall: 6641["foo"]: 45_brk(00000000) (rule 0)
syscall: 6641["foo"]: 45_brk(080a39e0) (rule 0)
syscall: 6641["foo"]: 45_brk(080a4000) (rule 0)
syscall: 6641["foo"]: 197_fstat64(1, stat64{6, c5f1ff22, 2, 8592, 1, 500, 5, 34816, c5f1ff42, 0, 1024, 0, 0, 1064964199, 0, 1064964199, 0, 1064942448, 0, 2}, -973996256) (rule 0)
syscall: 6641["foo"]: 90_old_mmap(mmap_arg_struct{0, 4096, 3, 34, 4294967295, 0}) (rule 0)
syscall: 6641["foo"]: 4_write(1, "malloc returned 134888544\10", 26) (rule 0)
syscall: 6641["foo"]: 91_munmap(1073741824, 4096) (rule 0)
syscall: 6641["foo"]: 1_exit(26) (rule 0)
Syscalltrack is not able to modify the parameters sent to the system
call or to avoid calling the system call altogether. I'm not sure
whether it can modify the return code. It is able to generate some sort
of failure. Here's a syscalltrack rule I wrote to cause any program
named "foo" to fail every "brk" call:
rule
{
syscall_name = brk
rule_name = fail_brk
filter_expression {
COMM == "foo"
}
action {
type = FAIL
error_code = -12
}
}
I couldn't find any documentation on what the "error_code" means,
though it seems to be the negative of an errno code. I believe the -12
will give me an ENOMEM. When I enable this rule, the "foo" program
takes a segmentation fault when I run it, which is surprising. A
debugger shows that the fault comes from the chunk_alloc() function
before entering main(), which implies that something in the C run time
is calling brk before my program has a chance to, and it's not able to
give a proper error message. This is also a problem I ran into when
using Holodeck to inject faults starting from the time the program
starts. This is especially an issue when you're using shared libraries,
which requires a number of extra system calls to start up the program.
To work around the startup issue, I added a 10-second sleep call at the
top of foo.c. I turned off the syscalltrack rule and ran foo. After a
few seconds, I re-enabled the syscalltrack rule, hoping that my malloc
call would be the first to hit the fault. What actually happened is
that sometimes the program aborted with no output, and sometimes it
seemed to work just fine. I never got a 0 return code from malloc,
which is what I wanted.
After looking at the trace output, I realize that I let my past
experience on Unix systems blind me to the fact that my malloc call
isn't actually causing a call to brk, but old_mmap instead. I presume
that old_mmap is an entry point for the mmap system call. So I change
my rule to target that instead:
rule
{
syscall_name = old_mmap
rule_name = fail_mmap
filter_expression {
COMM == "foo"
}
action {
type = FAIL
error_code = -12
}
}
But I still don't get malloc to fail. I have to combine the two rules
and greatly increase the memory size that I pass to malloc in my test
application in order to get the malloc call to trigger a failure.
Again, instead of a NULL return from malloc, I get a segmentation fault
before the statement after the malloc call starts.
Building syscalltrack was a bit of a challenge. I had to install the
kernel sources, retrieve the config file from the /boot directory, and
do the first few steps of building a kernel. It didn't work with the
instructions that came with the sources, though it did work when I
carefully followed the instructions on the web page, which were
somewhat different from the documentation in the sources. I was using
the same kernel version that the developers did (2.4.18-3, Red Hat
7.3). I have less confidence in how well installation would go on other
kernel versions.
Also note that recent versions of the kernel (e.g., Red Hat 9) no
longer export the system call table to kernel modules, so you would
have to patch your kernel sources and replace your kernel before you
could use syscalltrack.
Catching system calls using ptrace
Debugging and system call tracing are enabled on Linux and other
systems by the ptrace system call. It's possible to exercise great
control over a program using ptrace, including modifying the contents
of its registers and memory. Modifying program behavior does require
using some low-level architecture-specific knowledge of how registers
are used and how parameters are passed to system calls.
This is the approach that the Subterfugue
tool uses. (Yes, the odd
spelling was intentional.) Subterfugue uses Python snippets called
"tricks" to define what it does. Here's one I developed based on a
similar trick in the examples, having no prior Python programming
experience:
from Trick import Trick
import errno
class MemFail(Trick):
def usage(self):
return """
Makes every brk call fail with ENOMEM.
"""
def __init__(self, options):
self.options = options
def callbefore(self, pid, call, args):
assert call == 'brk'
return (None, -errno.ENOMEM, None, None)
def callmask(self):
return { 'brk' : 1 }
Here I set up a mask that says we're only interested in the "brk"
system call. I define a callbefore method that specifies that the brk
call should be aborted with an ENOMEM error. I then set a TRICKPATH
environment variable to point to the directory containing my trick and
run "sf --trick=MemFail ./foo". I get no output, and a 0 exit code.
Again, it seems that I'm tripping up the C runtime startup code, and
I'm not sure how to delay the injected faults until after the program
has successfully started.
Subterfugue can also do system call tracing. For completeness, here's
the output from its "Trace" trick on my foo program:
[7985] fcntl64(0, 1, 0) =
[7985] fcntl64() = 0
[7985] fcntl64(1, 1, 0) =
[7985] fcntl64() = 0
[7985] fcntl64(2, 1, 0) =
[7985] fcntl64() = 0
[7985] uname(-1073744048) =
[7985] uname() = 0
[7985] geteuid() =
[7985] geteuid() = 500
[7985] getuid() =
[7985] getuid() = 500
[7985] getegid() =
[7985] getegid() = 500
[7985] getgid() =
[7985] getgid() = 500
[7985] brk(0) =
[7985] brk() = 134887872
[7985] brk(134887904) =
[7985] brk() = 134887904
[7985] brk(134889472) =
[7985] brk() = 134889472
[7985] fstat64(1, -1073745664, 134878112) =
[7985] fstat64() = 0
[7985] mmap(-1073745696) =
[7985] mmap() = 1073741824
[7985] write(1, 'malloc returned 134888544\012', 26) =
malloc returned 134888544
[7985] write() = 26
[7985] munmap(1073741824, 4096) =
[7985] munmap() = 0
[7985] _exit(26) =
[7985] exited (status = 26)
# all child processes have exited
Subterfugue does not require any messy kernel module installation.
However, I had trouble installing Subterfugue on Red Hat 9, perhaps an
integration problem with a new-ish version of Python. On Red Hat 7.3, I
found that I had to run at least one of the sample tricks as root
because of a strange permission problem accessing /dev/<pid>/mem.
Note that the Subterfugue web page warned a year and a half ago that
the tool hadn't been updated for more than a year. My mail to the
maintainer bounced, unable to penetrate an anti-spam system.
Having learned from the Subterfugue implementation a few details about
how to use ptrace, I tried to write my own hijacking tool using ptrace.
I was able to trace when system calls were entered and exited, but
modifying their
behavior required much more knowledge of the cpu architecture than I
was able to glean from a casual reading of the Subterfugue code.
Other hijacking methods
There are a number of other possible approaches that you could use to
hijack system calls. It's possible that you could use that age-old test
tool, the debugger. Gdb, for example, has some decent scripting
capabilities, if I remember correctly. If your application under test
is
dynamically linked, you could probably override the system call entry
points by putting a library with functions of the same name first in
the library search path, or else you could relink the application with
the same sort of stub library. Yet another option, if you have the
source code, is to instrument the code and recompile, renaming the
system calls to the name of a function that you supply.
The Subterfugue web site includes links to a few other projects that
might provide similar system call hijacking capabilities.
The bottom line
System call hijacking tools on Linux are not for the faint of heart.
Some advanced knowledge may be required to install and configure them,
besides the fact that you need to have a thorough grasp of the
available system calls so you know what to target. In our example in
this article, we had to know that malloc is merely a library routine,
and it calls the brk and mmap system calls. Fortunately, a system call
tracer gives us hints about where to target our tests. This example
illustrates that it can be just as important to hijack a library call
as it is to hijack system calls, and the tools I looked at can't do
that. It would have been much easier to directly force a NULL return
from malloc().