pk.org: Computer Security/Lecture Notes

Command Injection and Input Validation Attacks

Study Guide

Paul Krzyzanowski – 2025-10-24

Command injection attacks exploit how programs interpret user input as executable commands rather than as data. They differ from memory corruption: the attacker alters what command runs instead of what code runs. These attacks affect databases, shells, file systems, and development environments, and remain among the most persistent classes of software vulnerabilities.

SQL Injection

SQL injection manipulates database queries by embedding SQL syntax in user input. It can expose, alter, or delete data and even execute administrative commands.

Primary Defenses

The core defense is to keep query structure fixed and pass data separately through:

Secondary Defense

Input validation and sanitization add a second layer but cannot be relied on alone. Use allowlists that specify what characters are permitted, not denylists that try to block dangerous patterns. Sanitization through escaping special characters (e.g., using database-specific escaping functions) can help but is error-prone and should never replace parameterized queries.

NoSQL Injection

NoSQL databases avoid SQL syntax but still parse user input that can include operators or code. Injection can happen when JSON or query operators are accepted unchecked.

Defense principles:

Shell Command Injection

Shell injection exploits programs that pass user input to command interpreters like sh, bash, or cmd.exe. Shell metacharacters (;, |, $(), backticks) enable attackers to append new commands or substitute results.

Safest defense: Avoid shells entirely and use system APIs that execute programs directly, passing arguments as separate parameters (e.g., execve() with argument array, Python's subprocess with shell=False).

When shell use is unavoidable, combine allowlist validation with proper sanitization (e.g., shlex.quote() in Python to escape shell metacharacters), and run the process with minimal privileges.

Environment Variable Attacks

Programs inherit environment variables that control their behavior. These can be exploited through two distinct attack vectors.

Command Resolution Attacks (PATH, ENV, BASH_ENV)

Attack mechanism: Control which executable runs when a program or script invokes a command by name.

PATH manipulation redirects command lookups by placing attacker-controlled directories early in the search path. When a script runs ls or wget, the shell searches PATH directories in order. An attacker who can modify PATH or write to an early PATH directory can substitute malicious executables.

ENV and BASH_ENV specify initialization scripts that run when shells start. If controlled by an attacker, these variables cause arbitrary commands to execute at the beginning of every shell script, affecting system scripts and cron jobs.

Defenses:

Library Loading Attacks (LD_PRELOAD, LD_LIBRARY_PATH, DLL Sideloading)

Attack mechanism: Control which shared libraries are loaded into running programs, allowing function-level hijacking rather than executable replacement.

LD_PRELOAD (Linux/Unix) specifies libraries to load before all others, enabling attackers to override standard library functions like malloc(), read(), or rand(). Through function interposition, the attacker's replacement function can call the original after modifying parameters or logging data - making attacks stealthy since the program continues to work normally while being monitored or manipulated.

LD_LIBRARY_PATH (Linux/Unix) redirects library searches to attacker-controlled directories before system directories.

DLL sideloading (Windows) exploits the DLL search order. Windows searches the executable's directory before system directories, allowing attackers to place malicious DLLs that will be loaded instead of legitimate system libraries.

Why library loading attacks are distinct:

Defenses:

Package and Dependency Attacks

Modern software depends heavily on third-party packages. Attackers exploit this through typosquatting (packages with names similar to popular ones), dependency confusion (preferring public packages over internal ones), and malicious installation scripts.

These are supply chain attacks rather than direct code injection but have the same effect: untrusted code executes with developer privileges. They represent command injection at build time—they exploit the same trust failure but target development environments instead of running applications.

Path Traversal

Path traversal occurs when user input controls file paths and uses relative path elements (..) to escape restricted directories. Attackers may exploit symbolic links, encoding tricks, or platform differences to bypass filters.

Path equivalence is a related vulnerability where multiple different path strings can reference the same file or directory. Operating systems and file systems may treat paths as equivalent even when they differ textually. Examples include: redundant slashes (///file vs /file), alternative separators (\ vs / on Windows), case variations on case-insensitive systems, or mixed use of . (current directory). Attackers exploit path equivalence to bypass validation that checks for exact string matches, allowing access to restricted resources through alternate representations.

Defenses:

Path traversal and character encoding attacks often overlap. Both exploit how systems interpret or normalize input paths, and both are prevented by consistent canonicalization—resolving paths and encodings to a standard form before applying security checks.

Character Encoding Issues

Encoding attacks rely on multiple representations of the same character to bypass validation. Overlong UTF-8 encodings and nested URL encodings can slip through checks that decode input later.

General rule: Decode and normalize before validating. Applications should reject ambiguous encodings and rely on standard, well-tested parsing libraries rather than custom decoders.

Race Conditions (TOCTTOU)

A time-of-check to time-of-use (TOCTTOU) vulnerability arises when a resource changes between validation and use. This can allow an attacker to substitute a protected file or link after a permissions check.

Fixes:

File Descriptor Misuse

Programs often assume that standard input, output, and error descriptors (0, 1, 2) are valid. If an attacker closes these before running a privileged program, new files may reuse those descriptor numbers. Output intended for the terminal may overwrite sensitive files.

Defense: Secure programs verify and reopen descriptors 0–2 before performing any file operations.

Input Validation

Input validation underpins all injection defenses but is difficult to implement correctly.

Validation Approaches

Allowlisting (safest)
Specify what is allowed. Accept only characters, patterns, or values that are explicitly permitted. Unknown inputs are rejected by default.
Denylisting (less safe)
Specify what is forbidden. Reject input containing dangerous patterns. Attackers often find bypasses through creative encodings or edge cases.

Sanitization Techniques

When potentially dangerous input must be processed, sanitization modifies it to make it safe:

Escaping special characters
Add escape sequences to neutralize characters with special meaning in the target context (SQL, shell, etc.). Use established libraries like Python's shlex.quote() for shell commands rather than manual escaping.
Removing or replacing characters
Strip out or substitute dangerous characters entirely. This is simpler than escaping but may be too restrictive for legitimate input.

Important: Sanitization should be context-specific and used as a secondary defense alongside proper APIs that separate commands from data.

Key Principles

Comprehension and Design Errors

Most injection flaws result from misunderstandings: programmers don't fully grasp how interpreters parse input or how system calls behave.

Common misunderstandings:

Reducing errors:

Defense in Depth

No single control can prevent all injection vulnerabilities. Secure systems rely on multiple layers:

  1. Validate input at boundaries using allowlists where possible

  2. Use APIs that isolate data from code (parameterized queries, argument arrays)

  3. Run with least privilege and sandbox where possible

  4. Audit and test for injection behaviors through code review and penetration testing

  5. Monitor for suspicious activity through logging and anomaly detection

Command and input injection attacks persist because they exploit human assumptions about how software interprets input. Understanding those interpretations -- and designing systems that never blur data and commands -- is essential for secure programming.

Next: Terms you should know