Command injection attacks exploit how programs interpret user input as executable commands rather than as data. They differ from memory corruption: the attacker alters what command runs instead of what code runs. These attacks affect databases, shells, file systems, and development environments, and remain among the most persistent classes of software vulnerabilities.
SQL Injection
SQL injection manipulates database queries by embedding SQL syntax in user input. It can expose, alter, or delete data and even execute administrative commands.
Primary Defenses
The core defense is to keep query structure fixed and pass data separately through:
-
Parameterized queries: Database receives commands and data separately, never interpreting data as SQL syntax
-
Stored procedures: Predefined SQL functions with fixed structure that accept only data parameters
Secondary Defense
Input validation and sanitization add a second layer but cannot be relied on alone. Use allowlists that specify what characters are permitted, not denylists that try to block dangerous patterns. Sanitization through escaping special characters (e.g., using database-specific escaping functions) can help but is error-prone and should never replace parameterized queries.
NoSQL Injection
NoSQL databases avoid SQL syntax but still parse user input that can include operators or code. Injection can happen when JSON or query operators are accepted unchecked.
Defense principles:
-
Validate input types (expect strings, reject objects)
-
Restrict operator usage (block
$where,$regex, and other dangerous operators) -
Avoid execution of user-supplied code such as JavaScript in
$whereclauses -
Use schema enforcement and allowlists to reduce exposure
Shell Command Injection
Shell injection exploits programs that pass user input to command interpreters like sh, bash, or cmd.exe. Shell metacharacters (;, |, $(), backticks) enable attackers to append new commands or substitute results.
Safest defense: Avoid shells entirely and use system APIs that execute programs directly, passing arguments as separate parameters (e.g., execve() with argument array, Python's subprocess with shell=False).
When shell use is unavoidable, combine allowlist validation with proper sanitization (e.g., shlex.quote() in Python to escape shell metacharacters), and run the process with minimal privileges.
Environment Variable Attacks
Programs inherit environment variables that control their behavior. These can be exploited through two distinct attack vectors.
Command Resolution Attacks (PATH, ENV, BASH_ENV)
Attack mechanism: Control which executable runs when a program or script invokes a command by name.
PATH manipulation redirects command lookups by placing attacker-controlled directories early in the search path. When a script runs ls or wget, the shell searches PATH directories in order. An attacker who can modify PATH or write to an early PATH directory can substitute malicious executables.
ENV and BASH_ENV specify initialization scripts that run when shells start. If controlled by an attacker, these variables cause arbitrary commands to execute at the beginning of every shell script, affecting system scripts and cron jobs.
Defenses:
-
Scripts with elevated privileges should explicitly set PATH to trusted directories
-
Unset ENV and BASH_ENV in security-sensitive contexts
-
Use absolute paths for critical commands (e.g.,
/usr/bin/lsinstead ofls)
Library Loading Attacks (LD_PRELOAD, LD_LIBRARY_PATH, DLL Sideloading)
Attack mechanism: Control which shared libraries are loaded into running programs, allowing function-level hijacking rather than executable replacement.
LD_PRELOAD (Linux/Unix) specifies libraries to load before all others, enabling attackers to override standard library functions like malloc(), read(), or rand(). Through function interposition, the attacker's replacement function can call the original after modifying parameters or logging data - making attacks stealthy since the program continues to work normally while being monitored or manipulated.
LD_LIBRARY_PATH (Linux/Unix) redirects library searches to attacker-controlled directories before system directories.
DLL sideloading (Windows) exploits the DLL search order. Windows searches the executable's directory before system directories, allowing attackers to place malicious DLLs that will be loaded instead of legitimate system libraries.
Why library loading attacks are distinct:
-
More surgical: Override individual functions, not entire programs
-
More powerful: Hijack cryptographic functions, logging, authentication checks
-
Function interposition: Wrap original functions to add malicious behavior while maintaining normal operation (e.g., log all writes while still writing to files)
-
Different protection: Operating systems block LD_PRELOAD/LD_LIBRARY_PATH in setuid programs, but user-level attacks remain effective
-
Cross-platform parallel: LD_PRELOAD on Unix maps directly to DLL sideloading on Windows
Defenses:
-
Specify full paths when loading libraries
-
Verify library authenticity through digital signatures
-
Use Secure DLL Search Mode on Windows
-
Privileged applications should sanitize or ignore library-related environment variables
-
Note: Even with OS protections for privileged programs, attacks on user-level programs remain practical
Package and Dependency Attacks
Modern software depends heavily on third-party packages. Attackers exploit this through typosquatting (packages with names similar to popular ones), dependency confusion (preferring public packages over internal ones), and malicious installation scripts.
These are supply chain attacks rather than direct code injection but have the same effect: untrusted code executes with developer privileges. They represent command injection at build time—they exploit the same trust failure but target development environments instead of running applications.
Path Traversal
Path traversal occurs when user input controls file paths and uses relative path elements (..) to escape restricted directories. Attackers may exploit symbolic links, encoding tricks, or platform differences to bypass filters.
Path equivalence is a related vulnerability where multiple different path strings can reference the same file or directory. Operating systems and file systems may treat paths as equivalent even when they differ textually. Examples include: redundant slashes (///file vs /file), alternative separators (\ vs / on Windows), case variations on case-insensitive systems, or mixed use of . (current directory). Attackers exploit path equivalence to bypass validation that checks for exact string matches, allowing access to restricted resources through alternate representations.
Defenses:
-
Resolve paths to absolute form before validation (canonicalization)
-
Avoid direct concatenation of user input into file paths
-
Restrict application permissions so that even successful traversal yields limited access
Path traversal and character encoding attacks often overlap. Both exploit how systems interpret or normalize input paths, and both are prevented by consistent canonicalization—resolving paths and encodings to a standard form before applying security checks.
Character Encoding Issues
Encoding attacks rely on multiple representations of the same character to bypass validation. Overlong UTF-8 encodings and nested URL encodings can slip through checks that decode input later.
General rule: Decode and normalize before validating. Applications should reject ambiguous encodings and rely on standard, well-tested parsing libraries rather than custom decoders.
Race Conditions (TOCTTOU)
A time-of-check to time-of-use (TOCTTOU) vulnerability arises when a resource changes between validation and use. This can allow an attacker to substitute a protected file or link after a permissions check.
Fixes:
-
Use atomic operations that check and act in one step
-
Manipulate open file descriptors rather than filenames
-
Avoid separate "check then act" logic whenever possible
File Descriptor Misuse
Programs often assume that standard input, output, and error descriptors (0, 1, 2) are valid. If an attacker closes these before running a privileged program, new files may reuse those descriptor numbers. Output intended for the terminal may overwrite sensitive files.
Defense: Secure programs verify and reopen descriptors 0–2 before performing any file operations.
Input Validation
Input validation underpins all injection defenses but is difficult to implement correctly.
Validation Approaches
- Allowlisting (safest)
- Specify what is allowed. Accept only characters, patterns, or values that are explicitly permitted. Unknown inputs are rejected by default.
- Denylisting (less safe)
- Specify what is forbidden. Reject input containing dangerous patterns. Attackers often find bypasses through creative encodings or edge cases.
Sanitization Techniques
When potentially dangerous input must be processed, sanitization modifies it to make it safe:
- Escaping special characters
- Add escape sequences to neutralize characters with special meaning in the target context (SQL, shell, etc.). Use established libraries like Python's
shlex.quote()for shell commands rather than manual escaping. - Removing or replacing characters
- Strip out or substitute dangerous characters entirely. This is simpler than escaping but may be too restrictive for legitimate input.
Important: Sanitization should be context-specific and used as a secondary defense alongside proper APIs that separate commands from data.
Key Principles
-
Validate after decoding and normalization
-
Consider how the input will be used (context matters)
-
Length limits alone do not prevent injection
-
Use safe APIs that separate commands from data
-
Minimize trust boundaries in the system
Comprehension and Design Errors
Most injection flaws result from misunderstandings: programmers don't fully grasp how interpreters parse input or how system calls behave.
Common misunderstandings:
-
Not knowing all special characters that need escaping
-
Not realizing standard file descriptors can be closed
-
Assuming filenames don't contain special characters
-
Not understanding URL decoding order
-
Believing a simple string search prevents path traversal
-
Thinking validation and escaping are equivalent
Reducing errors:
-
Prefer simple, safe APIs that are hard to misuse
-
Provide secure examples in documentation
-
Make insecure options hard to reach (secure defaults)
-
Education and code review focused on platform-specific quirks
Defense in Depth
No single control can prevent all injection vulnerabilities. Secure systems rely on multiple layers:
-
Validate input at boundaries using allowlists where possible
-
Use APIs that isolate data from code (parameterized queries, argument arrays)
-
Run with least privilege and sandbox where possible
-
Audit and test for injection behaviors through code review and penetration testing
-
Monitor for suspicious activity through logging and anomaly detection
Command and input injection attacks persist because they exploit human assumptions about how software interprets input. Understanding those interpretations -- and designing systems that never blur data and commands -- is essential for secure programming.