Command Injection
Forcing commands to run
Paul Krzyzanowski
March 15, 2024
We looked at buffer overflow and printf format string attacks that enable the modification of memory contents to change the flow of control in the program and, in the case of buffer overflows, inject executable binary code (machine instructions). Other injection attacks enable you to modify inputs used by command processors, such as interpreted languages or databases. We will now look at some of these attacks.
Numeric overflow
Before delving into command injection, let us briefly examine the problem of integer overflow. This doesn’t relate to command injection but can lead to buffer overflow as well as other problems.
Integer overflow In most languages and all computer architectures, numbers occupy a fixed number of bytes. This limits their range of values.
- An 8-bit integer can hold values from 0 to 255 or, if we’re using signed integers, from -128 to +127.
- A 16-bit integer can hold values from 0 to 65,535 or -32768 to 32767.
- A 32-bit integer can hold values from 0 to a bit over 4 billion or a signed integer from a little under -2 billion to a little over 2 billion.
- 64-bit values, of course, hold much larger values from 0 to 18 quintillion for unsigned integers or -9 quintillion .. +9 quintillion: that’s 1018.
Some languages offer arbitrary precision libraries but there’s a performance penalty for using these libraries and they are not used for general purposes. Python supported arbitrary precision integers with the mpmath library but then Python 3 added native support for arbitrary precision integer arithmetic.
Sometimes, even if an integer doesn’t overflow, other problems can occur if an attacker can control its value to something the programmer didn’t anticipate. For example, you might try to allocate a buffer that’s terabytes in size.
Integer overflow
What happens if you have a 16-bit unsigned integer and add 1 to 65535? Most languages will not detect an error and simply perform a modulo operation.
For unsigned numbers, 65535+1 = 0
We have a possibly more unfortunate situation with signed numbers. What happens if we take a 16-bit integer and add 1 to 32767?
32767+1 = -32768
What should have become a bigger positive number has now become a big negative number.
And underflow
We can go in the opposite direction. If we take the largest negative integer for our bit length and subtract 1, we get a large positive number.
-32768 – 1 = +32767
We used shorts, which are 16 bits long, as an example, but the same thing happens with any data size. A standard int in C is 32 bits, even on a 64-bit system. Adding 1 to the maximum value, which is a bit over 2 billion, gives us a number that’s a bit smaller than negative 2 billion.
Overflows can also occur due to casting from an unsigned to a signed type.
unsigned short n =65535
short i = n;
Converting an unsigned 65,535 to a signed integer gives us a value of -1, not 65,535.
A most significant bit of 1 indicates a negative number in two’s complement arithmetic.
What are the problems?
The big problem with underflows and overflows is that you are not likely to detect an overflow or underflow. The program does not die and the processor does not generate an exception.
If you’re computing the length of a buffer and have an integer overflow, this can lead to a buffer overflow since the right amount of space may not have been allocated to the buffer. If you’re computing money, you might end up with bad math: a negative account can become positive, for example, or vice versa.
Here’s an example of an integer overflow that led to a buffer overflow in version 3.3 of OpenSSH:
nresp = packet_get_int();
if (nresp > 0) {
response = xmalloc(nresp*sizeof(char*));
for (i = 0; i < nresp; i++)
response[i] = packet_get_string(NULL);
}
This was on 32-bit system where the size of a pointer, and hence sizeof(char*)
, is 4 bytes.
If packet_get_int()
returned a value of 1,073,741,824, then 1073741824*4 will not be able to store 4294967296 in 4 bytes and will store the value of 0 instead.
In binary, 4294967296 = 1 0000 0000 0000 0000 0000 0000 0000 0000
That’s a 1 followed by 32 zeros … but we can only store 32 bits, so the most significant bit has nowhere to go.
But we have 64-bit architectures
You’d think this wouldn’t be a problem with 64-bit architectures. 9 quintillion (9,223,372,036,854,775,808) is a huge number. However, remember the problems we can get into by making assumptions. If a user can set a field to some value, like in a network packet, overflows can still occur. Moreover, the default size of an int in C on Linux and macOS is still 32 bits.
Overflows are especially a problem when code specifically deals with smaller data types. Various Internet Protocol fields, for example, regularly use 8- and 16-bit fields.
The Global Positioning System (GPS) stores the week number in 10 bits, which rolls over every 19.7 years. Week 0 started on January 6, 1980. It rolled over on August 21, 1999 and again on April 6, 2019. Most software was updated for this rollover but we can imagine a situation has not been updated to know of a new reference date for the week count and will compute a value that’s 19.7 years in the past.
Finally, there are lots of legacy data structures or programmers who might have been concerned about wasting storage where these smaller integer sizes are still present.
Python, Java, Rust
Integer overflow was an issue in Python until Python 3, which implemented integers (int type) to have arbitrary precision, meaning that they can grow to accommodate any number as long as your machine’s memory can handle it. This design choice eliminates the traditional issues associated with integer overflow in fixed-size integer types found in other programming languages.
While this feature of Python makes it very robust for mathematical computations that involve large numbers, it also means that Python’s arbitrary-precision integers may consume more memory than the fixed-size integers of other languages, which can be a consideration for performance-sensitive applications. In Java, integer overflow can be an issue, similar to other programming languages that use fixed-size integer representations. Java provides several primitive integer types (byte, short, int, long) with fixed sizes: 8, 16, 32, and 64 bits respectively. When an operation causes the value to exceed the range of these types, overflow occurs, and the number wraps around to the minimum value of the type and continues from there, potentially leading to unexpected or incorrect results if not properly handled.
For example, for a 32-bit int, the maximum value is 2,147,483,647 (Integer.MAX_VALUE
). If you add 1 to this value, it will overflow and wrap around to -2,147,483,648 (Integer.MIN_VALUE
), which is likely not the intended result.
Integer overflow is also an issue in Go since it also uses fixed-size integer types. Go provides several integer types (int8
, int16
, int32
, int64
and their unsigned counterparts uint8
, uint16
, uint32
, uint64
, along with architecture-dependent types like int
, uint
, and uintptr
) that have fixed sizes. When the value assigned to such a type exceeds its capacity, it wraps around to the beginning of its range, which can lead to unexpected behavior if not properly managed.
In Rust, integer overflow behavior differs based on the build profile: in debug mode, Rust checks for integer overflow and causes your program to panic (terminate execution with an error) if overflow occurs. In release mode, Rust does not check for overflow, and if overflow occurs, it wraps around to the minimum or maximum value of the type.
SMB Ghost: 2020
An integer overflow vulnerability led to a major exploit in 2020. It allowed an attacker access to a Windows system by connecting to it over the SMB protocol. SMB is the Server Message Block protocol, Microsoft’s remote file access protocol.
March 2020 was a particularly bad time for disclosing patches. Microsoft announced that they fixed 116 vulnerabilities that month, 25 of them critical and could be used by an attacker to execute remote code and perform local privilege elevation.
This particular bug affected the data compression mechanism within the SMB message structure in Windows 10 implementations of SMB. Attackers could create a packet that would trigger an integer overflow or underflow that would allow them to write arbitrary data anywhere in the kernel.
The detailed steps of an attack are long, so we will just go over the basic weakness that was uncovered. Since attackers can create the message, they can control data within it. Two particular fields end up being useful.
Original_Compressed_Segment_Size
tells the system the size of decompressed data.
Offset
defines the size of an optional extra chunk of data that is not compressed.
The system allocates a buffer that is the size of the original size plus the offset. A simple attack that caused the program to crash simply set the offset to 0xffffffff
, which triggered an integer overflow.
In a more sophisticated attack, attackers used a huge value for the Original_Compressed_Segment_Size
and a legitimate value for the offset
. That also triggers an overflow, causing the system to allocate less memory than needed.
memcpy
Later in the code, a memcpy takes place. The attackers realized that all three parts were under their control:
The target of the copy,
Alloc->UserBuffer
, comes from the allocation header, but the allocation header can be overwritten when the user buffer overflows.The source is the header data, which comes from the attacker.
The length is the offset and is also controlled by the attacker.
Since an attacker can set the destination and the contents, they could write any data anywhere in kernel memory and were able to then use other attacks for local privilege escalation by connecting to a local machine. Other attackers were able to trigger remote code execution.
Microsoft Exchange Year 2022 Bug
Here’s another, stranger, example of overflow. It’s the Microsoft Exchange Year 2022 bug.
Starting January 1, 2022, On-premises Microsoft Exchange servers were not able to deliver email because of a bug in their anti-spam engine.
The bug occurred because Microsoft was using a signed 32-bit integer to store the value of a date. This gave it a maximum value of a little over 2 billion (2,147,483,647). Unlike systems like Linux that count seconds from an epoch (Jan 1, 1970 0:00 UTC), they represented the date as a year-month-date encoded as a decimal value. Dates in 2022 have a value (2,201,010,001 or larger) that’s larger than the maximum value that fits in 32 bits, causing it to overflow to a negative value and the scanning engine to fail.
Type confusion
Vulnerabilities can arise if an object is created as one type but later used as a different type. Accessing an unsigned integer as a signed integer is a simple example, but it can be assumptions about sizes of arrays, member of unions, or the data types of pointers. The bug is most common in C and C++ but can also be found in languages like PHP and Perl.
The bug may not appear to be exploitable, but sometimes is. For example, on May 24, 2024, Google rolled out a fix to address the fourth zero-day exploit for May of 2024 (and eighth of the year) in its Chrome browser. This fixed a type confusion vulnerability that was exploited in the wild and allowed a remote attacker to execute arbitrary code via a specially-crafted HTML page.
See CWE-843: Access of Resource Using Incompatible Type (‘Type Confusion’).
SQL Injection (SQLi)
It is common practice to take user input and make it part of a database query. This is particularly popular with web services, which are often front ends for databases. For example, we might ask the user for a login name and password and then create a string that contains an SQL query (SQL is the Structured Query Language, the dominant way of interacting with relational databases):
sprintf(buf,
”SELECT * from logininfo WHERE username = '%s' AND password = '%s’;",
uname, passwd);
Suppose that the user entered this for a password:
' OR 1=1 ; --
We end up creating this query string1:
SELECT * from logininfo WHERE username = 'paul' AND password = '' OR 1=1 ; -- ';
The “--” after “1=1” is an SQL comment, telling it to ignore everything else on the line. In SQL, OR operations have precedence over AND, so the query checks for a null password (which the user probably does not have) or the condition 1=1, which is always true. In essence, the user’s “password” turned the query into one that ignores the user’s password and unconditionally validates the user.
Statements such as this can be even more destructive as the user can use semicolons to add multiple statements and perform operations such as dropping (deleting) tables or changing values in the database.
This attack can take place because the programmer blindly allowed user input to become part of the SQL command without validating that the user data does not change the quoting or tokenization of the query.
A programmer can avoid the problem by sanitizing the input. Input sanitization means validating the input to ensure that there is nothing dangerous in it before it is used. This may involve:
- Disallowing certain characters or strings from the input. For example, reject any strings that contain quotes.
- Allow only certain characters or strings. For instance, we may accept only alphanumeric characters and a limited set of symbols from the user.
- Escape any characters that have special meaning. SQL, Linux shells, and many other programs often support the use of a backslash (\) that tells the interpreter not to treat the next character as a special character. Alternatively, spaces or special character may be quoted.
Unfortunately, this can be difficult. SQL contains too many words and symbols that may be legitimate in other contexts (such as passwords) and escaping special characters, such as prepending backslashes or escaping single quotes with two quotes can be error prone as these escapes may differ for different database vendors.
The safest defense in SQL is to use parameterized queries, where user input never becomes part of the query but is listed as parameters. For example, we can write the previous query as:
uname = getResourceString("username");
passwd = getResourceString("password");
query = "SELECT * FROM users WHERE username = @0 AND password = @1";
db.Execute(query, uname, passwd);
A related safe alternative is to use stored procedures. They have the same property that the query statement is not generated from user input and parameters are clearly identified.
While SQL injection is the most common code injection attack, databases are not the only target. Creating executable statements built with user input is common in interpreted languages, such as Shell, Perl, PHP, and Python. Before making user input part of any invocable command, the programmer must be fully aware of parsing rules for that command interpreter.
An example of a recent vulnerability was announced on June 27, 2024 in Fort FileCatalyst Workflow, a file transfer application. In this attack, a user-supplied jobID is used in creating the WHERE
clause of an SQL query. An anonymous remote attacker can send URLs with a JobID parameter of their choice.
Shell attacks
The various POSIX2 shells (sh, csh, ksh, bash, tcsh, zsh) are commonly used as scripting tools for software installation, start-up scripts, and tying together workflow that involves processing data through multiple commands. A few aspects of how many of the shells work and the underlying program execution environment can create attack vectors.
system() and popen() functions
Both system and popen functions are part of the Standard C Library and are common functions that C programmers use to execute shell commands. The system function runs a shell command while the popen function also runs the shell command but allows the programmer to capture its output and/or send it input via the returned FILE pointer.
Here, we again have the danger of turning improperly validated data into a command. For example, a program might use a function such as this to send an email alert:
char command[BUFSIZE];
snprintf(command, BUFSIZE, "/usr/bin/mail –s \"system alert\" %s", user);
FILE *fp = popen(command, "w");
In this example, the programmer uses snprintf to create the complete command with the desired user name into a buffer. This incurs the possibility of an injection attack if the user name is not carefully validated. If the attacker had the option to set the user name, she could enter a string such as:
nobody; rm -fr /home/*
which will result in popen running the following command:
sh -c "/usr/bin/mail -s \"system alert\" nobody; rm -fr /home/*"
which is a sequence of commands, the latter of which deletes all user directories.
Command injection example
A particularly insidious attack is one that targets security tools: the very software that is there to try to detect and prevent attacks. An example of a command injection exploit on such a service is the 2023 maximum-severity vulnerability in Fortinet’s security information and event management (SIEM) solution, which was patched in February of 2024 (see the bleepingcomputer article).
In this attack, a Python program formats a call to os.system
tgat, which contains a user-controlled mount_point
value. An attacker can define the mount point to contain a semicolon, which serves as a command separator, followed by whatever command they want executed. The Fortinet client will execute the command with root privileges. Remarkably, this is a similar attack to a related command injection vulnerability that was discovered six months earlier.
Other environment variables
The shell PATH environment variable controls how the shell searches for commands. For instance, suppose
PATH=/home/paul/bin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/games
and the user runs the ls
command. The shell will search through the PATH sequentially
to find an executable filenamed ls
:
/home/paul/bin/ls
/usr/local/bin/ls
/usr/sbin/ls
/usr/bin/ls
/bin/ls
/usr/local/games/ls
If an attacker can either change a user’s PATH environment variable or if one of the
paths is publicly writable and appears before the “safe” system directories,
then he can add a booby-trapped command in one of those directories. For example,
if the user runs the ls command, the shell may pick up a booby-trapped version
in the /usr/local/bin
directory. Even if a user has trusted locations, such
as /bin and /usr/bin foremost in the PATH, an intruder may place a misspelled
version of a common command into another directory in the path. The safest remedy
is to make sure there are no untrusted directories in PATH.
Some shells allow a user to set an ENV or BASH_ENV variable that contains the name of a file that will be executed as a script whenever a non-interactive shell is started (when a shell script is run, for example). If an attacker can change this variable then arbitrary commands may be added to the start of every shell script.
Shared library environment variables
In the distant past, programs used to be fully linked, meaning that all the code needed to run the program, aside from interactions with the operating system, was part of the executable program. Since so many programs use common libraries, such as the Standard C Library, they are not compiled into the code of an executable but instead are dynamically loaded when needed.
Similar to PATH, LD_LIBRARY_PATH is an environment variable used by the operating system’s program loader that contains a colon-separated list of directories where libraries should be searched. If an attacker can change a user’s LD_LIBRARY_PATH, common library functions can be overwritten with custom versions. The LD_PRELOAD environment variable allows one to explicitly specify shared libraries that contain functions that override standard library functions.
LD_LIBRARY_PATH and LD_PRELOAD will not give an attacker root access but they can be used to change the behavior of a program or to log library interactions. For example, by overwriting standard functions, one may change how a program generates encryption keys, uses random numbers, sets delays in games, reads input, and writes output.
As an example, let’s suppose we have a program that prints random numbers:
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
int
main(int argc, char **argv)
{
int i;
srand(time(NULL));
for (i=0; i < 10; i++)
printf("%d\n", rand()%100);
return 0;
}
We can compile this via:
$ gcc -o random random.c
When run, we may get output containing 10 random numbers:
$ ./random
9
57
13
1
83
86
45
63
51
5
Let us write a replacement rand function that always returns the same value. We’ll put it in a file called rand.c
:
int rand() {
return 42;
}
We compile it into a shared library named newrandom.so
:
gcc -shared -fPIC rand.c -o newrandom.so
Now we set the LD_PRELOAD environment variable to this library and run the program:
$ export LD_PRELOAD=$PWD/newrandom.so
$ ./random
42
42
42
42
42
42
42
42
42
42
Note that our program now behaves differently, and we did not have to recompile it or run it differently.
Input sanitization
The important lesson in writing code that uses any user input in forming commands is that of input sanitization. Input must be carefully validated to make sure it conforms to the requirements of the application that uses it and does not try to execute additional commands, escape to a shell, set malicious environment variables, or specify out-of-bounds directories or devices.
File descriptors
POSIX systems have a convention that programs expect to receive three open file descriptors when they start up:
- file descriptor 0: standard input
- file descriptor 1: standard output
- file descriptor 2: standard error
Functions such as printf, scanf, puts, getc and others expect these file descriptors to be available for input and output. When a program opens a new file, the operating system searches through the file descriptor table and allocates the first available unused file descriptor. Typically this will be file descriptor 3. However, if any of the three standard file descriptors are closed, the operating system will use one of those as an available, unused file descriptor.
The vulnerability lies in the fact that we may have a program running with elevated privileges (e.g., setuid root) that modifies a file that is not accessible to regular users. If that program also happens to write to the user via, say, printf, there is an opportunity to corrupt that file. The attacker simply needs to close the standard output (file descriptor 1) and run the program. When it opens its secret file, it will be given file descriptor 1 and will be able to do its read and write operations on the file. However, whenever the program will print a message to the user, the output will not be seen by the user as it will be directed to what printf assumes is the standard output: file descriptor 1. Printf output will be written onto the secret file, thereby corrupting it.
The shell command (bash, sh, or ksh) for closing the standard output file is
an obscure-looking >&-
. For example:
./testfile >&-
Comprehension Errors
The overwhelming majority of security problems are caused by bugs or misconfigurations. Both often stem from comprehension errors. These are mistakes created when someone – usually the programmer or administrator – does not understand the details and every nuance of what they are doing. Some examples include:
Not knowing all possible special characters that need escaping in SQL commands.
Not realizing that the standard input, output, or error file descriptors may be closed.
Not understanding how access control lists work or how to configure mandatory access control mechanisms such as type enforcement correctly.
If we consider the Windows CreateProcess function, we see it is defined as:
BOOL WINAPI CreateProcess(
_In_opt_ LPCTSTR lpApplicationName,
_Inout_opt_ LPTSTR lpCommandLine,
_In_opt_ LPSECURITY_ATTRIBUTES lpProcessAttributes,
_In_opt_ LPSECURITY_ATTRIBUTES lpThreadAttributes,
_In_ BOOL bInheritHandles,
_In_ DWORD dwCreationFlags,
_In_opt_ LPVOID lpEnvironment,
_In_opt_ LPCTSTR lpCurrentDirectory,
_In_ LPSTARTUPINFO lpStartupInfo,
_Out_ LPPROCESS_INFORMATION lpProcessInformation);
We have to wonder whether a programmer who does not use this frequently will take the time to understand the ramifications of correctly setting process and thread security attributes, the current directory, environment, inheritance handles, and so on. There’s a good chance that the programmer will just look up an example on places such as github.com or stackoverflow.com and copy something that seems to work, unaware that there may be obscure side effects that compromise security.
As we will see in the following sections, comprehension errors also apply to the proper understanding of things as basic as various ways to express characters.
Path traversal vulnerabilities
Some applications, notably web servers, accept hierarchical filenames from a
user but need to ensure that they restrict access only to files within a specific
point in the directory tree. For example, a web server may need to ensure that
no page requests go outside of /home/httpd/html
.
An attacker may try to gain access by using paths that include ..
(dot-dot), which
is a link to the parent directory. For example, an attacker may try to download
a password file by requesting
http://poopybrain.com/../../../etc/passwd
The hope is that the programmer did not implement parsing correctly and might try simply suffixing the user-requested path to a base directory:
"/home/httpd/html/" + "../../../etc/passwd"
to form
/home/httpd/html/../../../etc/passwd
which will retrieve the password file, /etc/passwd
.
A programmer may anticipate this and check for dot-dot but has to realize that dot-dot directories can be anywhere in the path. This is also a valid pathname but one that should be rejected for trying to escape to the parent:
http://poopybrain.com/419/notes/../../416/../../../../etc/passwd
Moreover, the programmer cannot just search for ..
because that can be a valid part of
a filename. All three of these should be accepted:
http://poopybrain.com/419/notes/some..other..stuff/
http://poopybrain.com/419/notes/whatever../
http://poopybrain.com/419/notes/..more.stuff/
Also, extra slashes are perfectly fine in a filename, so this is acceptable:
http://poopybrain.com/419////notes///////..more.stuff/
The programmer should also track where the request is in the hierarchy. If dot-dot doesn’t escape above the base directory, it should most likely be accepted:
http://poopybrain.com/419/notes/../exams/
These are not insurmountable problems but they illustrate that a quick-and-dirty attempt at filename processing may be riddled with bugs.
Path traversal vulnerabilities have been used to obtain unauthorized content that resides outside the allowable directory. In some cases, attackers have been able to use them as a stepping stone to remote code execution. For example, a 2024 analysis of open source AI software discovered a path traversal vulnerability because it included a user-configurable user_name parameter as part of the path. In this case, not only could an attacker change the parameter to download sensitive information but could also upload files to the /etc/cron.d
directory, which would later get executed by the system.
The October 2024 list of vulnerabilities from Protect AI includes several additional pathname-related vulnerabilities.
Unicode parsing
If we continue on the example of parsing pathnames in a web server, let us consider a bug in early releases of Microsoft’s IIS (Internet Information Services, their web server). IIS had proper pathname checking to ensure that attempts to get to a parent are blocked:
http://www.poopybrain.com/scripts/../../winnt/system32/cmd.exe
Once the pathname was validated, it was passed to a decode function that decoded any embedded Unicode characters and then processed the request.
The problem with this technique was that non-international characters (traditional ASCII) could also be written as Unicode characters. A “/” could also be written in HTML as its hexadecimal value, %2f (decimal 47). It could also be represented as the two-byte Unicode sequence %c0%af.
The reason for this stems from the way Unicode was designed to support compatibility with one-byte ASCII characters. This encoding is called UTF-8. If the first bit of a character is a 0, then we have a one-byte ASCII character (in the range 0..127). However, if the first bit is a 1, we have a multi-byte character. The number of leading 1s determine the number of bytes that the character takes up. If a character starts with 110, we have a two-byte Unicode character.
With a two-byte character, the UTF-8 standard defines a bit pattern of
110a bcde 10fg hijk
The values a-k above represent 11 bits that give us a value in the range 0..2047.
The “/” character, 0x2f, is 47 in decimal and 0010 1111
in binary. The
value represents offset 47 into the character table (called codepoint in Unicode parlance).
Hence we can represent the “/” as 0x2f or as the two byte Unicode sequence:
1100 0000 1010 1111
which is the hexadecimal sequence %c0%af. Technically, this is disallowed. The standard states that codepoints less than 128 must be represented as one byte but the two byte sequence is supported by most Unicode parsers. We can also construct a valid three-byte sequence too.
Microsoft’s bug was that they ignored parsing %c0%af as being equivalent to a /
because
it should not have been used to represent the character. However, the Unicode parser
was happy to translate it and attackers were able to use this to access any file in
on a server running IIS. This bug also gave attackers the ability to invoke cmd.com
, the
command interpreter, and execute any commands on the server.
After Microsoft fixed the multi-byte Unicode bug, another problem came up. The parsing of escaped characters was recursive, so if the resultant string looked like a Unicode hexadecimal sequence, it would be re-parsed.
As an example of this, let’s consider the backslash (``), which Microsoft treats
as equivalent to a slash (/
) in URLs since their native pathname separator is
a backlash3.
The backslash can be written in a URL in hexadecimal format as %5c.
The “%” character can be expressed as %25.
The “5” character can be expressed as %35.
The “c” character can be expressed as %63.
Hence, if the URL parser sees the string %%35c
, it would expand the %35
to the character “5”, which would result in %5c
, which would then be converted to a \`.
If the parser sees
%25%35%63, it would expand each of the
%nncomponents to get the string
%5c, which would then be converted to a
`.
As a final example, if the parser comes across %255c
, it will expand %25
to %
to get the string %5c
, which would then be converted to a ``.
It is not trivial to know what a name relates to but it is clear that all conversions have to be done before the validity of the pathname is checked. As for checking the validity of the pathname in an application, it is error-prone. The operating system itself parses a pathname a component at a time, traversing the directory tree and checking access rights as it goes along. The application is trying to recreate a similar action without actually traversing the file system but rather by just parsing the name and mapping it to a subtree of the file system namespace.
TOCTTOU attacks
TOCTTOU stands for Time of Check to Time of Use. If we have code of the form:
if I am allowed to do something
then do it
we may be exposing ourselves to a race condition. There is a window of time between the test and the action. If an attacker can change the condition after the check then the action may take place even if the check should have failed.
One example of this is the print spooling program, lpr. It runs as a setuid program with root privileges so that it can copy a file from a user’s directory into a privileged spool directory that serves as a queue of files for printing. Because it runs as root, it can open any file, regardless of permissions. To keep the user honest, it will check access permissions on the file that the user wants to print and then, only if the user has legitimate read access to the file, it will copy it over to the spool directory for printing. An attacker can create a link to a readable file and then run lpr in the background. At the same time, he can change the link to point to a file for which he does not have read access. If the timing is just perfect, the lpr program will check access rights before the file is re-linked but will then copy the file for which the user has no read access.
Another example of the TOCTTOU race condition is the set of temporary filename creation functions (tempnam, tempnam, mktemp, GetTempFileName, etc.). These functions create a unique filename when they are called but there is no guarantee that an attacker doesn’t create a file with the same name before that filename is used. If the attacker creates and opens a file with the same name, she will have access to that file for as long as it is open, even if the user’s program changes access permissions for the file later on.
The best defense for the temporary file race condition is to use the mkstemp function, which creates a file based on a template name and opens it as well, avoiding the race condition between checking the uniqueness of the name and opening the file.
References
- Christopher Hacking, Recognizing and Preventing Time-of-Check to Time-of-Use Vulnerabilities, iSEC Partners whitepaper, March 2015.
- Jinpeng Wei and Calton Pu, TOCTTOU Vulnerabilities in UNIX-Style File Systems: An Anatomical Study, 4th USENIX Conference on File and Storage Technologies (FAST’05), San Francisco, CA, December 2005`
- Here’s a walkthrough of a real command injection attack in 2024 on a Palo Alto firewall.
- Some more info about the above
-
Note that sprintf is vulnerable to buffer overflow. We should use snprintf, which allows one to specify the maximum size of the buffer. ↩︎
-
Unix, Linux, macOS, FreeBSD, NetBSD, OpenBSD, Android, etc. ↩︎
-
the official Unicode name for the slash and backslash characters are solidus and reverse solidus, respectively. ↩︎