Command Injection and Input Validation Attacks

Command injection attacks exploit how programs interpret user input as executable commands rather than as data. They differ from memory corruption: the attacker alters what command runs instead of what code runs. These attacks affect databases, shells, file systems, and development environments, and remain among the most persistent classes of software vulnerabilities.

SQL Injection

SQL injection manipulates database queries by embedding SQL syntax in user input. It can expose, alter, or delete data and even execute administrative commands.

Primary Defenses

The core defense is to keep query structure fixed and pass data separately through:

Parameterized queries: Database receives commands and data separately, never interpreting data as SQL syntax
Stored procedures: Predefined SQL functions with fixed structure that accept only data parameters

Secondary Defense

Input validation and sanitization add a second layer but cannot be relied on alone. Use allowlists that specify what characters are permitted, not denylists that try to block dangerous patterns. Sanitization through escaping special characters (e.g., using database-specific escaping functions) can help but is error-prone and should never replace parameterized queries.

NoSQL Injection

NoSQL databases avoid SQL syntax but still parse user input that can include operators or code. Injection can happen when JSON or query operators are accepted unchecked.

Defense principles:

Validate input types (expect strings, reject objects)
Restrict operator usage (block $where, $regex, and other dangerous operators)
Avoid execution of user-supplied code such as JavaScript in $where clauses
Use schema enforcement and allowlists to reduce exposure

Shell Command Injection

Shell injection exploits programs that pass user input to command interpreters like sh, bash, or cmd.exe. Shell metacharacters (;, |, $(), backticks) enable attackers to append new commands or substitute results.

Safest defense: Avoid shells entirely and use system APIs that execute programs directly, passing arguments as separate parameters (e.g., execve() with argument array, Python's subprocess with shell=False).

When shell use is unavoidable, combine allowlist validation with proper sanitization (e.g., shlex.quote() in Python to escape shell metacharacters), and run the process with minimal privileges.

Environment Variable Attacks

Programs inherit environment variables that control their behavior. These can be exploited through two distinct attack vectors.

Command Resolution Attacks (PATH, ENV, BASH_ENV)

Attack mechanism: Control which executable runs when a program or script invokes a command by name.

PATH manipulation redirects command lookups by placing attacker-controlled directories early in the search path. When a script runs ls or wget, the shell searches PATH directories in order. An attacker who can modify PATH or write to an early PATH directory can substitute malicious executables.

ENV and BASH_ENV specify initialization scripts that run when shells start. If controlled by an attacker, these variables cause arbitrary commands to execute at the beginning of every shell script, affecting system scripts and cron jobs.

Defenses:

Scripts with elevated privileges should explicitly set PATH to trusted directories
Unset ENV and BASH_ENV in security-sensitive contexts
Use absolute paths for critical commands (e.g., /usr/bin/ls instead of ls)

Library Loading Attacks (LD_PRELOAD, LD_LIBRARY_PATH, DLL Sideloading)

Attack mechanism: Control which shared libraries are loaded into running programs, allowing function-level hijacking rather than executable replacement.

LD_PRELOAD (Linux/Unix) specifies libraries to load before all others, enabling attackers to override standard library functions like malloc(), read(), or rand(). Through function interposition, the attacker's replacement function can call the original after modifying parameters or logging data - making attacks stealthy since the program continues to work normally while being monitored or manipulated.

LD_LIBRARY_PATH (Linux/Unix) redirects library searches to attacker-controlled directories before system directories.

DLL sideloading (Windows) exploits the DLL search order. Windows searches the executable's directory before system directories, allowing attackers to place malicious DLLs that will be loaded instead of legitimate system libraries.

Why library loading attacks are distinct:

More surgical: Override individual functions, not entire programs
More powerful: Hijack cryptographic functions, logging, authentication checks
Function interposition: Wrap original functions to add malicious behavior while maintaining normal operation (e.g., log all writes while still writing to files)
Different protection: Operating systems block LD_PRELOAD/LD_LIBRARY_PATH in setuid programs, but user-level attacks remain effective
Cross-platform parallel: LD_PRELOAD on Unix maps directly to DLL sideloading on Windows

Defenses:

Specify full paths when loading libraries
Verify library authenticity through digital signatures
Use Secure DLL Search Mode on Windows
Privileged applications should sanitize or ignore library-related environment variables
Note: Even with OS protections for privileged programs, attacks on user-level programs remain practical

Package and Dependency Attacks

Modern software depends heavily on third-party packages. Attackers exploit this through typosquatting (packages with names similar to popular ones), dependency confusion (preferring public packages over internal ones), and malicious installation scripts.

These are supply chain attacks rather than direct code injection but have the same effect: untrusted code executes with developer privileges. They represent command injection at build time—they exploit the same trust failure but target development environments instead of running applications.

Path Traversal

Path traversal occurs when user input controls file paths and uses relative path elements (..) to escape restricted directories. Attackers may exploit symbolic links, encoding tricks, or platform differences to bypass filters.

Path equivalence is a related vulnerability where multiple different path strings can reference the same file or directory. Operating systems and file systems may treat paths as equivalent even when they differ textually. Examples include: redundant slashes (///file vs /file), alternative separators (\ vs / on Windows), case variations on case-insensitive systems, or mixed use of . (current directory). Attackers exploit path equivalence to bypass validation that checks for exact string matches, allowing access to restricted resources through alternate representations.

Defenses:

Resolve paths to absolute form before validation (canonicalization)
Avoid direct concatenation of user input into file paths
Restrict application permissions so that even successful traversal yields limited access

Path traversal and character encoding attacks often overlap. Both exploit how systems interpret or normalize input paths, and both are prevented by consistent canonicalization—resolving paths and encodings to a standard form before applying security checks.

Character Encoding Issues

Encoding attacks rely on multiple representations of the same character to bypass validation. Overlong UTF-8 encodings and nested URL encodings can slip through checks that decode input later.

General rule: Decode and normalize before validating. Applications should reject ambiguous encodings and rely on standard, well-tested parsing libraries rather than custom decoders.

Race Conditions (TOCTTOU)

A time-of-check to time-of-use (TOCTTOU) vulnerability arises when a resource changes between validation and use. This can allow an attacker to substitute a protected file or link after a permissions check.

Fixes:

Use atomic operations that check and act in one step
Manipulate open file descriptors rather than filenames
Avoid separate "check then act" logic whenever possible

File Descriptor Misuse

Programs often assume that standard input, output, and error descriptors (0, 1, 2) are valid. If an attacker closes these before running a privileged program, new files may reuse those descriptor numbers. Output intended for the terminal may overwrite sensitive files.

Defense: Secure programs verify and reopen descriptors 0–2 before performing any file operations.

Input Validation

Input validation underpins all injection defenses but is difficult to implement correctly.

Validation Approaches

Allowlisting (safest): Specify what is allowed. Accept only characters, patterns, or values that are explicitly permitted. Unknown inputs are rejected by default.
Denylisting (less safe): Specify what is forbidden. Reject input containing dangerous patterns. Attackers often find bypasses through creative encodings or edge cases.

Sanitization Techniques

When potentially dangerous input must be processed, sanitization modifies it to make it safe:

Escaping special characters: Add escape sequences to neutralize characters with special meaning in the target context (SQL, shell, etc.). Use established libraries like Python's shlex.quote() for shell commands rather than manual escaping.
Removing or replacing characters: Strip out or substitute dangerous characters entirely. This is simpler than escaping but may be too restrictive for legitimate input.

Important: Sanitization should be context-specific and used as a secondary defense alongside proper APIs that separate commands from data.

Key Principles

Validate after decoding and normalization
Consider how the input will be used (context matters)
Length limits alone do not prevent injection
Use safe APIs that separate commands from data
Minimize trust boundaries in the system

Comprehension and Design Errors

Most injection flaws result from misunderstandings: programmers don't fully grasp how interpreters parse input or how system calls behave.

Common misunderstandings:

Not knowing all special characters that need escaping
Not realizing standard file descriptors can be closed
Assuming filenames don't contain special characters
Not understanding URL decoding order
Believing a simple string search prevents path traversal
Thinking validation and escaping are equivalent

Reducing errors:

Prefer simple, safe APIs that are hard to misuse
Provide secure examples in documentation
Make insecure options hard to reach (secure defaults)
Education and code review focused on platform-specific quirks

Defense in Depth

No single control can prevent all injection vulnerabilities. Secure systems rely on multiple layers:

Validate input at boundaries using allowlists where possible
Use APIs that isolate data from code (parameterized queries, argument arrays)
Run with least privilege and sandbox where possible
Audit and test for injection behaviors through code review and penetration testing
Monitor for suspicious activity through logging and anomaly detection

Command and input injection attacks persist because they exploit human assumptions about how software interprets input. Understanding those interpretations -- and designing systems that never blur data and commands -- is essential for secure programming.