Final Exam Study Guide

The three-hour study guide for the final exam

Paul Krzyzanowski

February 2024

Disclaimer: This study guide attempts to touch upon the most important topics that may be covered on the exam but does not claim to necessarily cover everything that one needs to know for the exam. Finally, don't take the three hour time window in the title literally.

Last update: Tue Apr 23 17:10:47 EDT 2024

Introduction

Computer security is about keeping computers, their programs, and the data they manage “safe.” Specifically, this means safeguarding three areas: confidentiality, integrity, and availability. These three are known as the CIA Triad (no relation to the Central Intelligence Agency).

Confidentiality

Confidentiality means that we do not make a system’s data and its resources (the devices it connects to and its ability to run programs) available to everyone. Only authorized people and processes should have access. Privacy specifies limits on what information can be shared with others while confidentiality provides a means to block access to such information. Privacy is a reason for confidentiality. Someone being able to access a protected file containing your medical records without proper access rights is a violation of confidentiality.

Integrity

Integrity refers to the trustworthiness of a system. This means that everything is as you expect it to be: users are not imposters and processes are running correctly.

Data integrity means that the data in a system has not been corrupted.
Origin integrity means that the person or system sending a message or creating a file truly is that person and not an imposter.
Recipient integrity means that the person or system receiving a message truly is that person and not an imposter.
System integrity means that the entire computing system is working properly; that it has not been damaged or subverted. Processes are running the way they are supposed to.

Maintaining integrity means not just defending against intruders that want to modify a program or masquerade as others. It also means protecting the system against against accidental damage, such as from user or programmer errors.

Availability

Availability means that the system is available for use and performs properly. A denial of service (DoS) attack may not steal data or damage any files but may cause a system to become unresponsive.

Security is difficult. Software is incredibly complex. Large systems may comprise tens or hundreds of millions of lines of code. Systems as a whole are also complex. We may have a mix of cloud and local resources, third-party libraries, and multiple administrators. If security was easy, we would not have massive security breaches year after year. Microsoft wouldn’t have monthly security updates. There are no magic solutions … but there is a lot that can be done to mitigate the risk of attacks and their resultant damage.

We saw that computer security addressed three areas of concern. The design of security systems also has three goals.

Prevention: Prevention means preventing attackers from violating established security policies. It means that we can implement mechanisms into our hardware, operating systems, and application software that users cannot override – either maliciously or accidentally. Examples of prevention include enforcing access control rules for files and authenticating users with passwords.
Detection: Detection detects and reports security attacks. It is particularly important when prevention mechanisms fail. It is useful because it can identify weaknesses with certain prevention mechanisms. Even if prevention mechanisms are successful, detection mechanisms are useful to let you know that attempted attacks are taking place. An example of detection is notifying an administrator that a new user has been added to the system. Another example is being notified that there have been several consecutive unsuccessful attempts to log in.
Recovery: If a system is compromised, we need to stop the attack and repair any damage to ensure that the system can continue to run correctly and the integrity of data is preserved. Recovery includes forensics, the study of identifying what happened and what was damaged so we can fix it. An example of recovery is restoration from backups.

Security engineering is the task of implementing the necessary mechanisms and defining policies across all the components of the system. Like other engineering disciplines, designing secure systems involves making compromises. A highly secure system will be disconnected from any communication network, sit in an electromagnetically shielded room that is only accessible to trusted users, and run software that has been thoroughly audited. That environment is not acceptable for most of our computing needs. We want to download apps, carry our computers with us, and interact with the world. Even in the ultra-secure example, we still need to be concerned with how we monitor access to the room, who wrote the underlying operating system and compilers, and whether authorized users can be coerced to subvert the system. Systems have to be designed with some idea of who are likely potential attackers and what the threats are. Risk analysis is used to understand the difficulty of an attack on a system, who will be affected, and what the worst thing that can happen is. A threat model is a data flow model (e.g., diagram) that identifies each place where information moves into or out of the software or between subsystems of the program. It allows you to identify areas where the most effort should be placed to secure a system.

Secure systems have two parts to them: mechanisms and policies. A policy is a description of what is or is not allowed. For example, “users must have a password to log into the system” is a policy. Mechanisms* are used to implement and enforce policies. An example of a mechanism is the software that requests user IDs and passwords, authenticates the user, and allows entry to the system only if the correct password is used.

A vulnerability is a weakness in the security system. It could be a poorly defined policy, a bribed individual, or a flaw in the underlying mechanism that enforces security. An attack is the exploitation of a vulnerability in a system. An attack vector refers to the specific technique that an attacker uses to exploit a vulnerability. Example attack vectors include phishing, keylogging, and trying common passwords to log onto a system. An attack surface is the sum of possible attack vectors in a system: all the places where an attacker might try to get into the system.

A threat is the potential adversary who may attack the system. Threats may lead to attacks.

Threats fall into four broad categories:

Disclosure: Unauthorized access to data, which covers exposure, interception, interference, and intrusion. This includes stealing data, improperly making data available to others, or snooping on the flow of data.

Deception: Accepting false data as true. This includes masquerading, which is posing as an authorized entity; substitution or insertion of includes the injection of false data or modification of existing data; repudiation, where someone falsely denies receiving or originating data.

Disruption: Some change that interrupts or prevents the correct operation of the system. This can include maliciously changing the logic of a program, a human error that disables a system, an electrical outage, or a failure in the system due to a bug. It can also refer to any obstruction that hinders the functioning of the system.

Usurpation: Unauthorized control of some part of a system. This includes theft of service as well as any misuse of the system such as tampering or actions that result in the violation of system privileges.

The Internet increases opportunities for attackers. The core protocols of the Internet were designed with decentralization, openness, and interoperability in mind rather than security. Anyone can join the Internet and send messages … and untrustworthy entities can provide routing services. It allows bad actors to hide and to attack from a distance. It also allows attackers to amass asymmetric force: harnessing more resources to attack than the victim has for defense. Even small groups of attackers are capable of mounting Distributed Denial of Service (DDoS) attacks that can overwhelm large companies or government agencies.

Adversaries can range from lone hackers to industrial spies, terrorists, and intelligence agencies. We can consider two dimensions: skill and focus. Regarding focus, attacks are either opportunistic or targeted. Opportunistic attacks are those where the attacker is not out to get you specifically but casts a wide net, trying many systems in the hope of finding a few that have a particular vulnerability that can be exploited. Targeted attacks are those where the attacker targets you specifically. The term script kiddies is used to refer to attackers who lack the skills to craft their own exploits but download malware toolkits to try to find vulnerabilities (e.g., systems with poor or default passwords, hackable cameras). Advanced persistent threats (APT) are highly-skilled, well-funded, and determined (hence, persistent) attackers. They can craft their own exploits, pay millions of dollars for others, and may carry out complex, multi-stage attacks.

We refer to the trusted computing base (TCB) as the collection of hardware and software of a computing system that is critical to ensuring the system’s security. Typically, this is the operating system and system software but also includes the system firmware, bootloader, and any other software that, if attacked, can impact security. If the TCB is compromised, you no longer have assurance that any part of the system is secure. For example. the operating system may be modified to ignore the enforcement of file access permissions. If that happens, you no longer have assurance that any application is accessing files properly.

Command Injection

We looked at buffer overflow and printf format string attacks that enable the modification of memory contents to change the flow of control in the program and, in the case of buffer overflows, inject executable binary code (machine instructions). Other injection attacks enable you to modify inputs used by command processors, such as interpreted languages or databases. We will now look at these attacks.

SQL Injection

It is common practice to take user input and make it part of a database query. This is particularly popular with web services, which are often front ends for databases. For example, we might ask the user for a login name and password and then create a SQL query:

sprintf(buf,
	”SELECT * from logininfo WHERE username = '%s' AND password = '%s’;",
	uname, passwd);

Suppose that the user entered this for a password:

' OR 1=1 --

We end up creating this query string¹:

SELECT * from logininfo WHERE username = 'paul' AND password = '' OR 1=1 -- ';

The “--” after “1=1” is a SQL comment, telling it to ignore everything else on the line. In SQL, OR operations have precendence over AND so the query checks for a null password (which the user probably does not have) or the condition 1=1, which is always true. In essence, the user’s “password” turned the query into one that ignores the user’s password and unconditionally validates the user.

Statements such as this can be even more destructive as the user can use semicolons to add multiple statements and perform operations such as dropping (deleting) tables or changing values in the database.

This attack can take place because the programmer blindly allowed user input to become part of the SQL command without validating that the user data does not change the quoting or tokenization of the query. A programmer can avoid the problem by carefully checking the input. Unfortunately, this can be difficult. SQL contains too many words and symbols that may be legitimate in other contexts (such as passwords) and escaping special characters, such as prepending backslashes or escaping single quotes with two quotes can be error prone as these escapes differ for different database vendors. The safest defense is to use parameterized queries, where user input never becomes part of the query but is brought in as parameters to it. For example, we can write the previous query as:

uname = getResourceString("username");
passwd = getResourceString("password");
query = "SELECT * FROM users WHERE username = @0 AND password = @1";
db.Execute(query, uname, passwd);

A related safe alternative is to use stored procedures. They have the same property that the query statement is not generated from user input and parameters are clearly identified.

While SQL injection is the most common code injection attack, databases are not the only target. Creating executable statements built with user input is common in interpreted languages, such as Shell, Perl, PHP, and Python. Before making user input part of any invocable command, the programmer must be fully aware of parsing rules for that command interpreter.

Shell attacks

The various POSIX² shells (sh, csh, ksh, bash, tcsh, zsh) are commonly used as scripting tools for software installation, start-up scripts, and tying together workflow that involves processing data through multiple commands. A few aspects of how many of the shells work and the underlying program execution environment can create attack vectors.

system() and popen() functions

Both system and popen functions are part of the Standard C Library and are common functions that C programmers use to execute shell commands. The system function runs a shell command while the popen function also runs the shell command but allows the programmer to capture its output and/or send it input via the returned FILE pointer.

Here we again have the danger of turning improperly-validated data into a command. For example, a program might use a function such as this to send an email alert:

char command[BUFSIZE];
snprintf(command, BUFSIZE, "/usr/bin/mail –s \"system alert\" %s", user);
FILE *fp = popen(command, "w");

In this example, the programmer uses snprintf to create the complete command with the desired user name into a buffer. This incurs the possibility of an injection attack if the user name is not carefully validated. If the attacker had the option to set the user name, she could enter a string such as:

nobody; rm -fr /home/*

which will result in popen running the following command:

sh -c "/usr/bin/mail -s \"system alert\" nobody; rm -fr /home/*"

which is a sequence of commands, the latter of which deletes all user directories.

Other environment variables

The shell PATH environment variable controls how the shell searches for commands. For instance, suppose

PATH=/home/paul/bin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/games

and the user runs the ls command. The shell will search through the PATH sequentially to find an executable filenamed ls:

/home/paul/bin/ls
/usr/local/bin/ls
/usr/sbin/ls
/usr/bin/ls
/bin/ls
/usr/local/games/ls

If an attacker can either change a user’s PATH environment variable or if one of the paths is publicly writable and appears before the “safe” system directories, then he can add a booby-trapped command in one of those directories. For example, if the user runs the ls command, the shell may pick up a booby-trapped version in the /usr/local/bin directory. Even if a user has trusted locations, such as /bin and /usr/bin foremost in the PATH, an intruder may place a misspelled version of a common command into another directory in the path. The safest remedy is to make sure there are no untrusted directories in PATH.

Some shells allow a user to set an ENV or BASH_ENV variable that contains the name of a file that will be executed as a script whenever a non-interactive shell is started (when a shell script is run, for example). If an attacker can change this variable then arbitrary commands may be added to the start of every shell script.

Shared library environment variables

In the distant past, programs used to be fully linked, meaning that all the code needed to run the program, aside from interactions with the operating system, was part of the executable program. Since so many programs use common libraries, such as the Standard C Library, they are not compiled into the code of an executable but instead are dynamically loaded when needed.

Similar to PATH, LD_LIBRARY_PATH is an environment variable used by the operating system’s program loader that contains a colon-separated list of directories where libraries should be searched. If an attacker can change a user’s LD_LIBRARY_PATH, common library functions can be overwritten with custom versions. The LD_PRELOAD environment variable allows one to explicitly specify shared libraries that contain functions that override standard library functions.

LD_LIBRARY_PATH and LD_PRELOAD will not give an attacker root access but they can be used to change the behavior of program or to log library interactions. For example, by overwriting standard functions, one may change how a program generates encryption keys, uses random numbers, sets delays in games, reads input, and writes output.

As an example, let’s suppose we have a trial program that checks the current time against a hard-coded expiration time:

#include <time.h>
#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char **argv)
{
	unsigned long expiration = 1483228800;
	time_t now;

	/* check software expiration */
	now = time(NULL);
	if (time(NULL) > (time_t)expiration) {
		fprintf(stderr, "This software expired on %s", ctime(&expiration));
		fprintf(stderr, "This time is now %s", ctime(&now));
	}
	else
		fprintf(stderr, "You're good to go: %lu days left in your trial.\n",
			(expiration-now)/(60*60*24));
	return 0;
}

When run, we may get output such as:

$ ./testdate
This software expired on Sat Dec 31 19:00:00 2016
This time is now Sun Feb 18 15:50:44 2018

Let us write a replacement time function that always returns a fixed value that is less than the one we test for. We’ll put it in a file called time.c:

unsigned long time() {
	return (unsigned long) 1483000000;
}

We compile it into a shared library:

gcc -shared -fPIC time.c -o newtime.so

Now we set LD_PRELOAD and run the program:

$ export LD_PRELOAD=$PWD/newtime.so
$ ./testdate
You're good to go: 2 days left in your trial.

Note that our program now behaves differently and we never had to recompile it or feed it different data!

Input sanitization

The important lesson in writing code that uses any user input in forming commands is that of input sanitization. Input must be carefully validated to make sure it conforms to the requirements of the application that uses it and does not try to execute additional commands, escape to a shell, set malicious environment variables, or specify out-of-bounds directories or devices.

File descriptors

POSIX systems have a convention that programs expect to receive three open file descriptors when they start up:

file descriptor 0: standard input
file descriptor 1: standard output
file descriptor 2: standard error

Functions such as printf, scanf, puts, getc and others expect these file desciptors to be available for input and output. When a program opens a new file, the operating system searches through the file descriptor table and allocates the first available unused file descriptor. Typically this will be file descriptor 3. However, if any of the three standard file descriptors are closed, the operating system will use one of those as an available, unused file descriptor.

The vulnerability lies in the fact that we may have a program running with elevated privileges (e.g., setuid root) that modifies a file that is not accessible to regular users. If that program also happens to write to the user via, say, printf, there is an opportunity to corrupt that file. The attacker simply needs to close the standard output (file descriptor 1) and run the program. When it opens its secret file, it will be given file descriptor 1 and will be able to do its read and write operations on the file. However, whenever the program will print a message to the user, the output will not be seen by the user as it will be directed to what printf assumes is the standard output: file descriptor 1. Printf output will be written onto the secret file, thereby corrupting it.

The shell command (bash, sh, or ksh) for closing the standard output file is an obscure-looking >&-. For example:

./testfile >&-

Comprehension Errors

The overwhelming majority of security problems are caused by bugs or misconfigurations. Both often stem from comprehension errors. These are mistakes created when someone – usually the programmer or administrator – does not understand the details and every nuance of what they are doing. Some example include:

Not knowing all possible special characters that need escaping in SQL commands.
Not realizing that the standard input, output, or error file descriptors may be closed.
Not understanding how access control lists work or how to configure mandatory access control mechanisms such as type enforcement correctly.

If we consider the Windows CreateProcess function, we see it is defined as:

BOOL WINAPI CreateProcess(
  _In_opt_    LPCTSTR               lpApplicationName,
  _Inout_opt_ LPTSTR                lpCommandLine,
  _In_opt_    LPSECURITY_ATTRIBUTES lpProcessAttributes,
  _In_opt_    LPSECURITY_ATTRIBUTES lpThreadAttributes,
  _In_        BOOL                  bInheritHandles,
  _In_        DWORD                 dwCreationFlags,
  _In_opt_    LPVOID                lpEnvironment,
  _In_opt_    LPCTSTR               lpCurrentDirectory,
  _In_        LPSTARTUPINFO         lpStartupInfo,
  _Out_       LPPROCESS_INFORMATION lpProcessInformation);

We have to wonder whether a programmer who does not use this frequently will take the time to understand the ramifications of correctly setting process and thread security attributes, the current directory, environment, inheritance handles, and so on. There’s a good chance that the programmer will just look up an example on places such as github.com or stackoverflow.com and copy something that seems to work, unaware that there may be obscure side effects that compromise security.

As we will see in the following sections, comprehension errors also apply to the proper understanding of things as basic as various ways to express characters.

Directory parsing

Some applications, notably web servers, accept hierarchical filenames from a user but need to ensure that they restrict access only to files within a specific point in the directory tree. For example, a web server may need to ensure that no page requests go outside of /home/httpd/html.

An attacker may try to gain access by using paths that include .. (dot-dot), which is a link to the parent directory. For example, an attacker may try to download a password file by requesting

http://poopybrain.com/../../../etc/passwd

The hope is that the programmer did not implement parsing correctly and might try simply suffixing the user-requested path to a base directory:

"/home/httpd/html/" + "../../../etc/passwd"

to form

/home/httpd/html/../../../etc/passwd

which will retrieve the password file, /etc/passwd.

A programmer may anticipate this and check for dot-dot but has to realize that dot-dot directories can be anywhere in the path. This is also a valid pathname but one that should be rejected for trying to escape to the parent:

http://poopybrain.com/419/notes/../../416/../../../../etc/passwd

Moreover, the programmer cannot just search for .. because that can be a valid part of a filename. All three of these should be accepted:

http://poopybrain.com/419/notes/some..other..stuff/
http://poopybrain.com/419/notes/whatever../
http://poopybrain.com/419/notes/..more.stuff/

Also, extra slashes are perfectly fine in a filename, so this is acceptable:

http://poopybrain.com/419////notes///////..more.stuff/

The programmer should also track where the request is in the hierarchy. If dot-dot doesn’t escape above the base directory, it should most likely be accepted:

http://poopybrain.com/419/notes/../exams/

These are not insurmountable problems but they illustrate that a quick-and-dirty attempt at filename processing may be riddled with bugs.

Unicode parsing

If we continue on the example of parsing pathnames in a web server, let us consider a bug in early releases of Microsoft’s IIS (Internet Information Services, their web server). IIS had proper pathname checking to ensure that attempts to get to a parent are blocked:

http://www.poopybrain.com/scripts/../../winnt/system32/cmd.exe

Once the pathname was validated, it was passed to a decode function that decoded any embedded Unicode characters and then processed the request.

The problem with this technique was that non-international characters (traditional ASCII) could also be written as Unicode characters. A “/” could also be written in HTML as its hexadecimal value, %2f (decimal 47). It could also be represented as the two-byte Unicode sequence %c0%af.

The reason for this stems from the way Unicode was designed to support compatibility with one-byte ASCII characters. This encoding is called UTF-8. If the first bit of a character is a 0, then we have a one-byte ASCII character (in the range 0..127). However, if the first bit is a 1, we have a multi-byte character. The number of leading 1s determine the number of bytes that the character takes up. If a character starts with 110, we have a two-byte Unicode character.

With a two-byte character, the UTF-8 standard defines a bit pattern of

110a bcde   10fg hijk

The values a-k above represent 11 bits that give us a value in the range 0..2047. The “/” character, 0x2f, is 47 in decimal and 0010 1111 in binary. The value represents offset 47 into the character table (called codepoint in Unicode parlance). Hence we can represent the “/” as 0x2f or as the two byte Unicode sequence:

1100 0000   1010 1111

which is the hexadecimal sequence %c0%af. Technically, this is disallowed. The standard states that codepoints less than 128 must be represented as one byte but the two byte sequence is supported by most Unicode parsers. We can also construct a valid three-byte sequence too.

Microsoft’s bug was that they ignored parsing %c0%af as being equivalent to a / because it should not have been used to represent the character. However, the Unicode parser was happy to translate it and attackers were able to use this to access any file in on a server running IIS. This bug also gave attackers the ability to invoke cmd.com, the command interpreter, and execute any commands on the server.

After Microsoft fixed the multi-byte Unicode bug, another problem came up. The parsing of escaped characters was recursive, so if the resultant string looked like a Unicode hexadecimal sequence, it would be re-parsed.

As an example of this, let’s consider the backslash (``````), which Microsoft treats as equivalent to a slash (/) in URLs since their native pathname separator is a backlash³.

The backslash can be written in a URL in hexadecimal format as %5c. The “%” character can be expressed as %25. The “5” character can be expressed as %35. The “c” character can be expressed as %63. Hence, if the URL parser sees the string %%35c, it would expand the %35 to the character “5”, which would result in %5c, which would then be converted to a \```. If the parser sees%25%35%63, it would expand each of the%nncomponents to get the string%5c, which would then be converted to a`. As a final example, if the parser comes across ```%255c```, it will expand ```%25``` to ```%``` to get the string ```%5c```, which would then be converted to a ```\`.

It is not trivial to know what a name relates to but it is clear that all conversions have to be done before the validity of the pathname is checked. As for checking the validity of the pathname in an application, it is error-prone. The operating system itself parses a pathname a component at a time, traversing the directory tree and checking access rights as it goes along. The application is trying to recreate a similar action without actually traversing the file system but rather by just parsing the name and mapping it to a subtree of the file system namespace.

TOCTTOU attacks

TOCTTOU stands for Time of Check to Time of Use. If we have code of the form:

if I am allowed to do something
	then do it

we may be exposing ourselves to a race condition. There is a window of time between the test and the action. If an attacker can change the condition after the check then the action may take place even if the check should have failed.

One example of this is the print spooling program, lpr. It runs as a setuid program with root privileges so that it can copy a file from a user’s directory into a privileged spool directory that serves as a queue of files for printing. Because it runs as root, it can open any file, regardless of permissions. To keep the user honest, it will check access permissions on the file that the user wants to print and then, only if the user has legitimate read access to the file, it will copy it over to the spool directory for printing. An attacker can create a link to a readable file and then run lpr in the background. At the same time, he can change the link to point to a file for which he does not have read access. If the timing is just perfect, the lpr program will check access rights before the file is re-linked but will then copy the file for which the user has no read access.

Another example of the TOCTTOU race condition is the set of temporary filename creation functions (tempnam, tempnam, mktemp, GetTempFileName, etc.). These functions create a unique filename when they are called but there is no guarantee that an attacker doesn’t create a file with the same name before that filename is used. If the attacker creates and opens a file with the same name, she will have access to that file for as long as it is open, even if the user’s program changes access permissions for the file later on.

The best defense for the temporary file race condition is to use the mkstemp function, which creates a file based on a template name and opens it as well, avoiding the race condition between checking the uniqueness of the name and opening the file.

Application confinement

Two lessons we learned from experience are that applications can be compromised and that applications may not always be trusted. Server applications in particular, such as web servers and mail servers, have been compromised over and over again. This is particularly harmful as they often run with elevated privileges and on systems on which normal users do not have accounts. The second category of risk is that we may not always trust an application. We trust our web server to work properly. Still, we cannot necessarily trust that the game we downloaded from some unknown developer will not try to upload our files, destroy our data, or try to change our system configuration. In fact, unless we have the ability to scrutinize the codebase of a service, we will not know for sure if it tries to modify any system settings or writes files to unexpected places.

With this resignation to security in mind, we need to turn our attention to limiting the resources available to an application and making sure that a misbehaving application cannot harm the rest of the system. These are the goals of confinement.

Our initial thoughts to achieving confinement may involve proper use of access controls. For example, we can run server applications as low-privilege users and make sure that we have set proper read/write/execute permissions on files, read/write/search permissions on directories, or even set up role-based policies.

However, access controls usually do not give us the ability to set permissions for “don’t allow access to anything else.” For example, we may want our web server to have access to all files in /home/httpd but nothing outside of that directory. Access controls do not let us express that rule. Instead, we are responsible for changing the protections of every file on the system and making sure it cannot be accessed by “other”. We also have to hope that no users change those permissions. In essence, we must disallow the ability for anyone to make files publicly accessible because we never want our web server to access them. We may be able to use mandatory access control mechanisms if they are available but, depending on the system, we may not be able to restrict access properly either. More likely, we will be at risk of comprehension errors and be likely to make a configuration error, leaving parts of the system vulnerable. To summarize, even if we can get access controls to help, we will not have high assurance that they do.

Access controls also only focus on protecting access to files and devices. A system has other resources, such as CPU time, memory, disk space, and network. We may want to control how much of these an application is allowed to use. POSIX systems provide a setrlimit system call that allows one to set limits on certain resources for the current process and its children. These controls include the ability to set file size limits, CPU time limits, various memory size limits, and maximum number of open files.

We also may want to control the network identity for an application. All applications share the same IP address on a system, but this may allow a compromised application to exploit address-based access controls. For example, you may be able to connect to or even log into a system that believes you are a trusted computer. An exploited application may end up confusing network intrusion detection systems.

Just limiting access through resource limits and file permissions is also insufficient for services that run as root. If an attacker can compromise an app and get root access to execute arbitrary functions, she can change resource limits (just call setrlimit with different values), change any file permissions, and even change the IP address and domain name of the system.

In order to truly confine an application, we would like to create a set of mechanisms that enforce access controls to all of a system’s resources, are easy to use so that we have high assurance in knowing that the proper restrictions are in place, and work with a large class of applications. We can’t quite get all of this yet, but we can come close.

chroot

The oldest app confinement mechanism is Unix’s chroot system call and command, originally introduced in 1979 in the seventh edition⁴. The chroot system call changes the root directory of the calling process to the directory specified as a parameter.

chroot("/home/httpd/html");

Sets the root of the file system to /home/httpd/html for the process and any processes it creates. The process cannot see any files outside that subset of the directory tree. This isolation is often called a chroot jail.

Jailkits

If you run chroot, you will likely get an error along the lines of:

# chroot newroot
chroot: failed to run command ‘/bin/bash’: No such file or directory

This is because /bin/bash is not within the root (in this case, the newroot directory). You’ll then create a bin subdirectory and try running chroot again and get the same error:

# mkdir newroot/bin
# ln /bin/bash newroot/bin/bash
# chroot newroot
chroot: failed to run command ‘/bin/bash’: No such file or directory

You’ll find that is also insufficient and that you’ll need to bring in the shared libraries that /bin/bash needs by mounting /lib, /lib64, and /usr/lib within that root just to enable the shell to run. Otherwise, it cannot load the libraries it needs since it cannot see above its root (i.e., outside its jail). To simplify this process, a jailkit simplifies the process of setting up a chroot jail by providing a set of utilities to make it easier to create the desired environment within the jail and populate it with basic accounts, commands, and directories.

Problems with chroot

Chroot only limits access to the file system namespace. It does not restrict access to resources and does not protect the machine’s network identity. Applications that are compromised to give the attacker root access make the entire system vulnerable since the attacker has access to all system calls.

Chroot is available only to administrators. If this was not the case then any user would be able to get root access within the chroot jail. You would: 1. Create a chroot jail 2. Populate it with the shell program and necessary support libraries 3. Link the su command (set user, which allows you to authenticate to become any user) 4. Create password files within the jail with a known password for root. 5. Use the chroot command to enter the jail. 6. Run su root to become the root user. The command will prompt you for a password and validate it against the password file. Since all processes run within the jail, the password file is the one you set up.

You’re still in the jail but you have root access.

Escaping from chroot

If someone manages to compromise an application running inside a chroot jail and become root, they are still in the jail but have access to all system calls. For example, they can send signals to kill all other processes or shut down the system. This would be an attack on availability.

Attaining root access also provides a few ways of escaping the jail. On POSIX systems, all non-networked devices are accessible as files within the filesystem. Even memory is accessible via a file (/dev/mem). An intruder in a jail can create a memory device (character device, major number = 1, minor number = 1):

mknod mem c 1 1

With the memory device, the attacker can patch system memory to change the root directory of the jail. More simply, an attacker can create a block device with the same device numbers as that of the main file system. For example, the root file system on my Linux system is /dev/sda1 with a major number of 8 and a minor number of 1. An attacker can recreate that in the jail:

mknod rootdisk b 8 1

and then mount it as a file system within the jail:

mount -t ext4 rootdisk myroot

Now the attacker, still in the jail, has full access to the entire file system, which is as good as being out of the jail. He can add user accounts, change passwords, delete log files, run any commands, and even reboot the system to get a clean login.

FreeBSD Jails

Chroot was good in confining the namespace of an application but useless against providing security if an application had root access and did nothing to restrict access to other resources.

FreeBSD Jails are an enhancement to the idea of chroot. Jails provide a restricted filesystem namespace, just like chroot does, but also place restrictions on what processes are allowed to do within the jail, including selectively removing privileges from the root user in the jail. For example, processes within a jail may be configured to:

Bind only to sockets with a specified IP address and specific ports
Communicate only with other processes within the jail and none outside
Not be able to load kernel modules, even if root
Have restricted access to system calls that include:
- Ability to create raw network sockets
- Ability to create devices
- Modify the network configuration
- Mount or unmount filesystems

FreeBSD Jails are a huge improvement over chroot since known escapes, such as creating devices and mounting filesystems, and even rebooting the system, are disallowed. Depending on the application, policies may be coarse. The changed root provides all-or-nothing access to a part of the file system. This does not make Jails suitable for applications such as a web browser, which may be untrusted but may need access to files outside of the jail. Think about web-based applications such as email, where a user may want to upload or download attachments. Jails also do not prevent malicious apps from accessing the network and trying to attack other machines … or from trying to crash the host operating system. Moreover, FreeBSD Jails is a BSD-only solution. With an estimated 0.95…1.7% share of server deployments, it is a great solution on an operating system that is not that widely used.

Linux namespaces, cgroups, and capabilities

Linux’s answer to FreeBSD Jails was a combination of three elements: control groups, namespaces, and capabilities.

Control groups (cgroups)

Linux control groups, also called cgroups, allow you to allocate resources such as CPU time, system memory, disk bandwidth, network bandwidth, and the ability to monitor resource usage among user-defined groups of processes. This allows, for example, an administrator to allocate a larger share of the processor to a critical server application.

An administrator creates one or more cgroups and assigns resource limits to each of them. Then any application can be assigned to a control group and will not be able to use more than the resource limits configured in that control group. Applications are unaware of these limits. Control groups are organized in a hierarchy similar to processes. Child cgroups inherit some attributes from the parents.

Linux namespaces

Chroot only restricted the filesystem namespace. The filesystem namespace is the best-known namespace in the system but not the only one. Linux namespaces Namespaces provide control over how processes are isolated in the following namespaces:

Namespace	Description	Controls
IPC	System V IPC, POSIX message queues	Objects created in an IPC namespace are only visible to other processes in that namespace (CLONE_NEWIPC)
Network	Network devices, stacks, ports	Isolates IP protocol stacks, IP routing tables, firewalls, socket port numbers ( CLONE_NEWNET)
Mount	Mount points	A set of processes can have their own distinct mount points and view of the file system (CLONE_NEWNS)
PID	Process IDs	Processes in different PID namespaces can have their process IDs – the child cannot see parent processes or other namespaces (CLONE_NEWPID)
User	User & group IDs	Per-namespace user/group IDs. Also, you can be root in a namespace but have restricted privileges ( CLONE_NEWUSER )
UTS	host name and domain name	setting hostname and domainname will not affect rest of the system (CLONE_NEWUTS)
Cgroup	control group	Sets a new control group for a process (CLONE_NEWCGROUP)

A process can dissociate any or all of these namespaces from its parent via the unshare system call. For example, by unsharing the PID namespace, a process gets a no longer sees other processes and will only see itself and any child processes it creates.

The Linux clone system call is similar to fork in that it creates a new process. However, it allows you to pass flags that will specify which parts of the execution context will be shared with the parent. For example, a cloned process may choose to share memory and open file descriptors, which will make it behave like threads. It can also choose to share – or not – any of the elements of the namespace.

Capabilities

A problem that FreeBSD Jails tackled was that of restricting the power of root inside a Jail. You could be a root user but still disallowed from executing certain system calls. POSIX (Linux) capabilities⁵ tackle this issue as well.

Traditionally, Unix systems distinguished privileged versus unprivileged processes. Privileged processes were those that ran with a user ID of 0, called the root user. When running as root, the operating system would allow access to all system calls and all access permission checks were bypassed. You could do anything.

Linux capabilities identify groups of operations, called capabilities, that can be controlled independently on a per-thread basis. The list is somewhat long, 38 groups of controls, and includes capabilities such as:

CAP_CHOWN: make arbitrary changes to file UIDs and GIDs
CAP_DAC_OVERRIDE: bypass read/write/execute checks
CAP_KILL: bypass permission checks for sending signals
CAP_NET_ADMIN: network management operations
CAP_NET_RAW: allow RAW sockets
CAP_SETUID: arbitrary manipulation of process UIDs
CAP_SYS_CHROOT: enable chroot

The kernel keeps track of four capability sets for each thread. A capability set is a list of zero or more capabilities. The sets are:

Permitted: If a capability is not in this set, the thread or its children can never require that capability. This limits the power of what a process and its children can do.
Inheritable: These capabilities will be inherited when a thread calls execve to execute a program (POSIX programs are executed with the same thread; we are not creating a new process)
Effective: This is the current set of capabilities that the thread is using. The kernel uses these to perform permission checks.
Ambient: This is similar to Inheritable and contains a set of capabilities that are preserved across an execve of a program that is not privileged. If a setuid or setgid program is run, will clear the ambient set. These are created to allow a partial use of root features in a controlled manner. It is useful for user-level device drivers or software that needs a specific privilege (e.g., for certain networking operations).

A child process created via fork (the standard way of creating processes) will inherit copies of its parent’s capability sets following the rules of which capabilities have been marked as inheritable.

A set of capabilities can be assigned to an executable file by the administrator. They are stored as a file’s extended attributes (along with access control lists, checksums, and arbitrary user-defined name-value pairs). When the program runs, the executing process may further restrict the set of capabilities under which it operates if it chooses to do so (for example, after performing an operation that required the capability and knowing that it will no longer need to do so).

From a security point of view, the key concept of capabilities is that they allow us to provide limited elevation of privileges to a process. A process does not need to run as root (user ID 0) but can still be granted specific privileges. For example, we can grant the ping command the ability to access raw sockets so it can send an ICMP ping message on the network but not have any other administrative powers. The application does not need to run as root and even if an attacker manages to inject code, the opportunities for attack will be restricted.

The Linux combination of cgroups, namespaces, and capabilities provides a powerful set of mechanisms to

Set limits on the system resources (processor, disk, network) that a group of processes will use.
Constrain the namespace, making parts of the filesystem or the existence of other processes or users invisible.
Give restricted privileges to specific applications so they do not need to run as root.

This enables us to create stronger jails and have a greater degree of control over which processes are or are not allowed to do within that jail.

While bugs have been found in these mechanisms, the more serious problem is that of comprehension. The system has become far, far more complex than it was in the days of chroot. A user has to learn quite a lot to use these mechanisms properly. Failure to understand their behavior fully can create vulnerabilities. For example, namespaces do not prohibit a process from making privileged system calls. They simply limit what a process can see. A process may not be able to send a kill signal to another process only because it does not share the same process ID namespace.

Together with capabilities, namespaces allow a restricted environment that also places limits on the abilities to perform operations even if a process is granted root privileges. This enables ordinary users to create namespaces. You can create a namespace and even create a process running as a root user (UID 0) within that namespace but it will have no capabilities beyond those that were granted to the user; the user ID of 0 gets mapped by the kernel to a non-privileged user.

Containers

Software rarely lives as an isolated application. Some software requires multiple applications and most software relies on the installation of other libraries, utilities, and packages. Keeping track of these dependencies can be difficult. Worse yet, updating one shared component can sometimes cause another application to break. What was needed was a way to isolate the installation, execution, and management of multiple software packages that run on the same system.

Various attempts were undertaken to address these problems.

The most basic was to fix problems when they occurred. This required carefully following instructions for installing, updating, and configuring software and extensive testing of all services on the system when anything changed. Should something break, the service would be unavailable until the problems were fixed.
A drastic, but thorough, approach to isolation was to simply run each service on its own computer. That avoids conflicts in library versions and other dependencies. However, it is an expensive solution, is cumbersome, and is often overkill in most environments.
Finally, administrators could deploy virtual machines. This is a technology that allows one to run multiple operating systems on one computer and gives the illusion of services running on distinct systems. However, this is a heavyweight solution. Every service needs its own installation of the operating system and all supporting software for the service as well as standard services (networking, device management, shell, etc.). It is not efficient in terms of CPU, disk, or memory resources – or even administration effort.

Containers are a mechanism that was originally created not for security but to make it easy to package, distribute, relocate, and deploy collections of software. The focus of containers is not to enable end users to install and run their favorite apps but rather for administrators to be able to deploy a variety of services on a system. A container encapsulates all the necessary software for a service, all of its dependencies, and its configuration into one package that can be easily passed around, installed, and removed.

In many ways, a container feels like a virtual machine. Containers provide a service with a private process namespace, its own network interface, and its own set of libraries to avoid problems with incompatible versions used by other software. Containers also allow an administrator to give the service restricted powers even if it runs with root (administrator) privileges. Unlike a virtual machine, however, multiple containers on one system all share the same operating system and kernel modules.

Containers are not a new mechanism. They are implemented using Linux’s control groups, namespaces, and capabilities to provide resource access, isolation, and privilege control, respectively. They also make use of a copy on write file system. This makes it easy to create new containers where the file system can track the changes made by that container over a clean base version of a file system. Containers can also take advantage of AppArmor, which is a Linux kernel module that provides a basic form of mandatory access controls based on the pathnames of files. It allows an administrator to restrict the ability of a program to access specific files even within its file system namespace.

The best-known and first truly popular container framework is Docker. A Docker Image is a file format that creates a package of applications, their supporting libraries, and other needed files. This image can be stored and deployed on many environments. Docker made it easy to deploy containers using git-like commands (docker push, docker commit) and also to perform incremental updates. By using a copy on write file system, Docker images can be kept immutable (read-only) while any changes to the container during its execution are stored separately.

As people found Docker to be useful, the next design goal was to make it easier to manage containers across a network of many computers. This is called container orchestration. There are many solutions for this, including Apache Mesos, Kubernetes, Nomad, and Docker Swarm. The best known of these is kubernetes, which was designed by Google. It coordinates storage of containers, failure of hardware and containers, and dynamic scaling: deploying the container on more machines to handle increased load. Kubernetes is coordination software, not a container system; it uses the Docker framework to run the actual container.

Even though containers were designed to simplify software deployment rather than provide security to services, they do offer several benefits in the area of security:

They make use of namespaces, cgroups, and capabilities with restricted capabilities configured by default. This provides isolation among containers.
Containers provide a strong separation of policy (defined by the container configuration) from the enforcement mechanism (handled by the operating system).
They improve availability by providing the ability to have a watchdog timer monitor the running of applications and restarting them if necessary. With orchestration systems such as Kubernetes, containers can be re-deployed on another system if a computer fails.
The environment created by a container is reproducible. The same container can be deployed on multiple systems and tested in different environments. This provides consistency and aids in testing and ensuring that the production deployment matches the one used for development and test. Moreover, it is easy to inspect exactly how a container is configured. This avoids problems encountered by manual installation of components where an administrator may forget to configure something or may install different versions of a required library.
While containers add nothing new to security, they help avoid comprehension errors. Even default configurations will provide improved security over the defaults in the operating system, and configuring containers is easier than learning and defining the rules for capabilities, control groups, and namespaces. Administrators are more likely to get this right or import containers that are already configured with reasonable restrictions.

Containers are not a security panacea. Because all containers run under the same operating system, any kernel exploits can affect the security of all containers. Similarly, any denial of service attacks, whether affecting the network or monopolizing the processor, will impact all containers on the system. If implemented and configured properly, capabilities, namespaces, and control groups should ensure that privilege escalation cannot take place. However, bugs in the implementation or configuration may create a vulnerability. Finally, one has to be concerned with the integrity of the container itself. Who configured it, who validated the software inside of it, and is there a chance that it may have been modified by an adversary either at the server or in transit?

Application Sandboxing

Application sandboxing provides a controlled environment to safely execute potentially harmful software, minimizing system-wide risks. It restricts program operations based on predefined rules, allowing only certain actions within the system. This method is crucial for running applications from unknown sources and is extensively utilized by security researchers to monitor software behavior and detect malware. Sandboxes enforce restrictions on file access, network usage, and system interactions, offering a fundamental layer of security by controlling application capabilities more granitely than traditional methods like containers or jails.

We previously looked at isolation via jails and containers, which use mechanisms that include namespaces, control groups, and capabilities. These constitute a widely used form of sandboxing. However, these techniques focus on isolating an application (or group of processes) from other processes, restricting access to parts of the file system, and/or providing a separate network stack with a new IP address.

While mechanisms like jails and containers, which include namespaces, control groups, and capabilities are great for creating an environment to run services without the overhead of deploying virtual machines, they do not fully address the ability of restricting what normal applications can do. For instance, they cannot block a process from calling a non-privileged system call. We want to protect users from their applications: give users the ability to run apps but define rules to restrict what those apps can do on a per-app basis, such as opening files only with a certain name or permitting only TCP networking.

Sandboxing is currently supported on a wide variety of platforms at either the kernel or application level.

Application sandboxing via system call interposition & user-level validation

An example of a user-level sandbox is the Janus sandbox. Application sandboxing with Janus involves creating policies to define permissible system calls for each application. Janus uses a kernel module to intercept these calls and sends them to a user-level monitor program that decides whether to allow or block the call based on the configured policy file. Challenges include maintaining system state across processes and handling complex scenarios like network and file operations, pathname parsing, and potential race conditions (TOCTTOU issues).

Application sandboxing with integrated OS support

The better alternative to having a user-level process decide on whether to permit system calls is to incorporate policy validation in the kernel. Some operating systems provide kernel support for sandboxing. These include the Android Application Sandbox, the iOS App Sandbox, the macOS sandbox, and AppArmor on Linux. Microsoft introduced the Windows Sandbox in December 2018, but this functions far more like a container than a traditional application sandbox, giving the process an isolated execution environment.

Seccomp-BPF (SECure COMPuting with Berkeley Packet Filters) is a Linux security framework that enables limits on which system calls a process can execute. It uses the Berkeley Packet Filter to evaluate system calls as “packets,” and applying rules that govern their execution. Though it doesn’t provide complete isolation on its own, Seccomp is an essential component for constructing robust application sandboxes when combined with other mechanisms like namespaces and control groups.

Process virtual machine sandboxes: Java

The Java Virtual Machine (JVM) was designed to run compiled Java applications in a controlled manner on any system regardless of the operating system or hardware architecture. The JVM employs three main components to ensure security:

Bytecode Verifier: It scrutinizes Java bytecode before execution to confirm it adheres strictly to Java’s standards without security breaches like bypassing access controls or array bounds.
Class Loader: This component safeguards against the loading of untrusted classes and ensures the integrity of runtime environments through Address Space Layout Randomization (ASLR), maintaining the security of essential class libraries.
Security Manager: This enforces protection domains that define permissible actions within the JVM. It intercepts calls to sensitive methods, verifying permissions against a security policy, which can restrict actions like file and network access, preventing operations not allowed by the policy.

Building an effective sandbox in Java has proven complex, highlighted by persistent bugs, especially in the underlying C libraries and across different JVM implementations. Moreover, Java’s allowance for native methods can bypass these security mechanisms, introducing potential risks.

Virtual Machines

As a general concept, virtualization is the addition of a layer of abstraction to physical devices. With virtual memory, for example, a process has the impression that it owns the entire memory address space. Different processes can all access the same virtual memory location and the memory management unit (MMU) on the processor maps each access to the unique physical memory locations that are assigned to the process.

Process virtual machines present a virtual CPU that allows programs to execute on a processor that does not physically exist. The instructions are interpreted by a program that simulates the architecture of the pseudo machine. Early pseudo-machines included o-code for BCPL and P-code for Pascal. The most popular pseudo-machine today is the Java Virtual Machine (JVM). This simulated hardware does not even pretend to access the underlying system at a hardware level. Process virtual machines will often allow “special” calls to invoke system functions or provide a simulation of some generic hardware platform.

Operating system virtualization is provided by containers, where a group of processes is presented with the illusion of running on a separate operating system but, in reality, shares the operating system with other groups of processes – they are just not visible to the processes in the container.

System virtual machines*, allow a physical computer to act like several real machines, with each machine running its own operating system (on a virtual machine) and applications that interact with that operating system. The key to this machine virtualization is not to allow each operating system to have direct access to certain privileged instructions in the processor. These instructions would allow an operating system to directly access I/O ports, MMU settings, the task register, the halt instruction, and other parts of the processor that could interfere with the processor’s behavior and with the other operating systems on the system. Instead, a trap and emulate approach is used. Privileged instructions, as well as system interrupts, are caught by the Virtual Machine Monitor (VMM), also known as a hypervisor. The hypervisor arbitrates access to physical resources and presents a set of virtual device interfaces to each guest operating system (including the memory management unit, I/O ports, disks, and network interfaces). The hypervisor also handles preemption. Just as an operating system may suspend a process to allow another process to run, the hypervisor will suspend an operating system to give other operating systems a chance to run.

The two configurations of virtual machines are hosted virtual machines and native virtual machines. With a hosted virtual machine (also called a type 2 hypervisor), the computer has a primary operating system installed that has access to the raw machine (all devices, memory, and file system). This host operating system does not run in a virtual environment. One or more guest operating systems can then be run on virtual machines. The VMM serves as a proxy, converting requests from the virtual machine into operations that get sent to and executed on the host operating system. A native virtual machine (also called a type 1 hypervisor) is one where there is no “primary” operating system that owns the system hardware. The hypervisor is in charge of access to the devices and provides each operating system drivers for an abstract view of all the devices.

Security implications

Virtual machines (VMs) provide a deep layer of isolation, encapsulating the operating system along with all the applications it runs and files it needs within a secure environment separate from the physical hardware. Unlike lighter confinement methods like containers, a VM-contained compromise affects only that VM, akin to a contained issue in a physical machine.

Despite this isolation, VMs can still pose risks if compromised. Malicious entities can exploit VMs to attempt attacks on other systems within the same physical environment, leveraging the shared physical resources. Such scenarios underscore potential vulnerabilities in even well-isolated environments, highlighting the need for vigilant security practices across all layers.

A specific threat in such environments is the creation of covert channels through side channel attacks. These channels exploit system behaviors like CPU load variations to clandestinely transmit information between VMs clandestinely, bypassing conventional communication restrictions. This technique reveals how attackers can bridge gaps between highly secure and less secure systems, manipulating physical resource signals to communicate stealthily.

Malware

Malware is a term that refers to any malicious software that is unintentionally installed on a computer system. Malware can be distributed in various ways: viruses, worms, unintentional downloads, or trojan horses. It may spy on user actions and collect information on them (spyware), or present unwanted ads (adware). It may disable components of the system or encrypt files, undoing its damage if the owner pays money (ransomware). The software may sit dormant and wait for directives from some coordinator (a command and control server), who assembled an arsenal of hundreds of thousands of computers ready to do its bidding (for example, launch a distributed denial of service, DDoS, attack). Some software might be legitimate but may contain backdoors – undocumented ways to allow an outsider to use that software to perform other operations on your system.

Functions of malware

Malware can perform a variety of functions:

Destruction and denial of service:

Wiper malware can delete files or format the entire file system, deleting even the operating system system. Denial of service (DoS) attacks can flood a network or server with requests to make services unavailable to legitimate users. Another form of a DoS attack can lock users from accessing their computers or destroy devices.

Exfiltration

: Exfiltration refers to stealing data. Malware can upload confidential files, authentication credentials, messages. Spyware can track a user’s activity, acquiring browsing history messages being sent or received, file access, keyboard operations via keyloggers, and capture camera and microphone inputs.

Bots

Bots are processes that are deployed by an attacker and usually sit dormant. They periodically contact a Command & Control (C&C) server that, at the right time, can give them directions for an attack. These directions will often require downloading additional software needed for an attack. Attackers can deploy bots across millions of compromised computers, creating an army of them that is called a botnet. This is instrumental in carrying out distributed denial of service (DDoS) attacks or compute-intensive crypto mining.

Backdoors

A backdoor is a type of malicious code that, once installed, allows an attacker remote access to a computer or network while remaining hidden. This access typically bypasses normal authentication processes, giving attackers the ability to remotely control the affected system, steal sensitive data, or deploy additional malware. For example, a backdoor in a computer system could allow an attacker to remotely execute commands, manipulate files, and monitor user activities without detection and without logging onto the system.

Ransomware

Ransomware is software that will typically lock users from being able to access their system or encrypt their files, demanding payment to re-enable access or avoid disclosure. It may include running a wiper to delete data permanently if the ransom isn’t paid. There are various forms of ransomware, which include:

Crypto ransomware: Denial of service malware that encrypts files or storage devices.
Locker ransomware: Denial of service malware that locks users out of their devices.
Extortion ransomware: Exfiltrates data to a remote site and threatens to expose it.
Double extortion ransomware: Exfiltrate data to a remote site before encrypting it and threaten to disclose it if ransom isn’t paid.

Adware

Adware is generally non-destructive but is unwanted. It automatically displays or downloads advertising material such as banners or pop-ups when a user is online. It’s often bundled with free software or services, providing revenue to developers while offering the software at no cost to the user. Adware may compromise privacy by tracking user behavior to target ads more effectively.

Malware Infiltration mechanisms

There are various ways in which malware gets onto a system but the mechanisms fall into two categories:

An attacker exploited some vulnerability to enable the malware to be installed.
You installed the malware unknowingly.

Zero-day vulnerabilities refer to software flaws that are unknown to those who would be interested in mitigating the vulnerability, such as the vendor. The term “zero-day” indicates that the developers have zero days to fix the issue because it has already been exploited in the wild. These vulnerabilities are highly sought after by attackers because they are effective until discovered and patched.

Example: If a hacker discovers an unknown vulnerability in a web browser that allows unauthorized administrative access and this flaw is exploited before the developer becomes aware and fixes it, that is a zero-day vulnerability.

N-day vulnerabilities, also known as known vulnerabilities, refer to software flaws that have been publicly disclosed and for which a patch is often available. The “N” in N-day represents the number of days that have elapsed since the vulnerability was disclosed. Unlike zero-day vulnerabilities, N-day vulnerabilities are already known to vendors and cybersecurity professionals, and patches or workarounds are typically developed to mitigate them.

Example: A vulnerability in an operating system that allows elevation of privileges is reported and patched. If attackers exploit this vulnerability after the patch is released, it is considered an N-day vulnerability, as the patch availability makes it “known.”

Worms and viruses

A virus is a type of malware that attaches itself to a legitimate program and requires human interaction, such as running the infected program, to spread and execute its malicious activities.

Conversely, a worm is a standalone malware that self-replicates and spreads independently across networks without the need for attachment to a specific program or human interaction. For example, a worm might exploit vulnerabilities in a network to spread itself, while a virus might spread via email attachments opened by unsuspecting users.

The distinction from a virus is that a worm runs as a standalone process while a virus requires a host program.

The popular use of both terms, worm and virus, has often blurred the distinctions between them. People often refer to any malware as a virus. Their malicious effects can be similar.

Malware components

Key components of malware include:

Infection Mechanism: The method by which malware spreads or inserts itself into a system, such as through email attachments or exploiting vulnerabilities.

Packer: A tool that compresses or encrypts malware to evade detection from anti-virus software, often making it harder to analyze or identify the malware.

Dropper: A small helper program that installs the main malware, often avoiding detection by not containing the malicious code itself.

Payload: The part of malware designed to perform malicious actions, ranging from data theft to system damage.

Trigger: A condition or event that activates the malware’s payload, like a specific date or user action.

File infector viruses

A file infector virus is a type of malware that attaches itself to executable files and spreads by modifying other executable files it can access. When an infected file is launched, the virus is executed, usually performing malicious actions while also seeking other files to infect. This used to be the dominant mechanism for malware propagation in the early days of PCs but is more challenging with systems where users have restricted permissions or where the OS validates the digital signature of applications and drivers.

Infected flash drives

Malware can spread through USB devices in several ways:

Unprotected USB Firmware: Some malware targets the firmware of USB devices, which can be rewritten to include malicious code. When such a compromised device is plugged into any computer, the malware in the firmware can activate and cause the USB device to, for example, behave like a keyboard in addition to a storage device and send keyboard events to invoke a shell and run commands.
USB Drop Attack: This method involves intentionally leaving infected USB drives in public or easily accessible places. Unsuspecting individuals who find and use these drives on their computers inadvertently trigger malware installation.
Malicious Software or Links: USB drives may contain files that, when executed, install malware directly, or they may include links that lead to malicious websites. Opening these files or following these links can initiate the download and installation of harmful software.

Macro viruses

Macro viruses are a type of malware that embed themselves in documents and are executed when the document is opened. They are commonly written in Visual Basic for Applications, targeting Microsoft Office applications. Once activated, they can infect not only the document in which they reside but also other documents, spreading rapidly. These viruses can perform a series of operations from simple annoyances to damaging actions like corrupting files or sending data to third parties.

Even though Microsoft would present a warning about macros, users often explicitly permit them because they believe the content they are accessing is legitimate. Microsoft patched bugs that allowed macros to run without the user’s authorization but, as of 2022, attackers still found ways around these barriers.

Social engineering

By far the most common way that malware enters a system is via deception: the legitimate user of the system installed it unknowingly. This uses a social engineering attack to convince the user that it is in his or her interest to install the software. Social engineering is the art of manipulating, influencing, or deceiving a user into taking some action that is not in his/her or the organization’s best interest.

Attackers exploit human psychology rather than technical hacking techniques to infiltrate systems. This can involve phishing emails, pretexting, baiting with infected media, or any form of communication designed to elicit trust, provoke fear, or create urgency, leading individuals to reveal passwords, install malware, or open malicious links.

Any information the attacker can get about a user can help an attacker create a more convincing social attack. The term pretexting refers to using a concocted scenario to contact a user and get additional information (e.g., an attacker can pretend to be a caller from the IT department or a high-level manager from another location to try to extract information; with some rudimentary information, the attacker can mention some employee, department, or project names to sound like a true insider).

Phishing is a type of cyber attack that involves tricking individuals into revealing sensitive information or downloading malware by masquerading as a trustworthy entity in electronic communications, typically through emails that appear to come from reputable sources. Spear phishing is a more targeted version of phishing, where the attacker chooses specific individuals or organizations and tailors the message based on their characteristics, job positions, or other personal information to increase the likelihood of success. This specificity makes spear phishing significantly more effective and dangerous than generic phishing.

Credential stuffing

An attacker may obtain collections of stolen email addresses (or usernames) and passwords. Since people often use the same name and password on multiple systems, this often give the attacker access to services on other websites on which the user has accounts. Accounts for banking sites are, of course, particularly valuable since they can be a direct conduit for transferring money. This attack is called credential stuffing.

In some situations, such as getting access to a user’s email accounts, an attacker can log onto the systems or services as the owner of the account and install malware, monitor the internal organization, and even send email, disguised as the user (e.g., contact other employees or friends), which becomes a powerful social engineering attack.

Where does malware live?

File infector virus

A file infector virus is a virus that adds itself to an executable program. The virus patches the program so that, upon running, control will flow to the the virus code. Ideally, the code will install itself in some unused area of the file so that the file length will remain unchanged. A comparison of file sizes with the same programs on other systems will not reveal anything suspicious. When the virus runs, it will run the infector to decide whether to install itself on other files. The trigger will then decide whether the payload should be executed. If not, the program will appear to run normally.

Bootloader malware

A bootkits, also known as a boot sector virus, is a type of malware that infects the master boot record (MBR) or similar critical startup sectors of a computer. It loads itself before the operating system starts, giving it high-level control over the system and making it extremely difficult to detect and remove. Boot kits are often used to bypass operating system security measures and provide persistent access to the infected machine, even surviving system reinstalls if the MBR is not specifically cleaned.

JavaScript and PDF files

JavaScript, like Visual Basic, has evolved into a full programming language. Most browsers have security holes that involve Javascript. JavaScript can not only modify the content and structure of a web page but can connect to other sites. This allows any malicious site to leverage your machine. For example, systems can perform port scans on a range of IP addresses and report any detected unsecured services.

PDF (Portable Document Format) files, would seem to be innocent printable documents, incapable of harboring executable code. However, PDF is a complex format that can contain a mix of static and dynamic elements. Dynamic elements may contain Javascript, dynamic action triggers (e.g., “on open”), and the ability to retrieve “live” data via embedded URLs. As with Visual Basic scripts, PDF readers warn users of dynamic content but, depending on the social engineering around the file, the user may choose to trust the file … or not even pay attention to the warning in yet-another-dialog-box.

Trojans

A Trojan horse is a program with two purposes: an overt purpose and a covert one. The overt purpose is what compels the user to get and run the program in the first place. The covert purpose is unknown to the user and is the malicious part of the program.

For example, a script with the name of a common Linux command might be added to a target user’s search path. When the user runs the command, the script is run. That script may, in turn, execute the proper command, leading the user to believe that all is well. As a side effect, the script may create a setuid shell to allow the attacker to impersonate that user or mail copy over some critical data. Users install Trojans because they believe they are installing useful software, such as an anti-virus tool (BTW, a lot of downloadable hacker tools contain Trojans: hackers hacking wannabe hackers). The side-effect of this software can activate cameras, enable key loggers, or deploy bots for anonymization servers, DDoS attacks, or spam attacks.

Trojans may include programs (games, utilities, anti-malware programs), downloading services, rootkits (see next) and backdoors (see next). They appear to perform a useful task that does not raise suspicion on the part of the victim.

Backdoors

A backdoor is software that is designed with some undocumented mechanism to allow someone who knows about it to be able to access the system or specific functions in a way that bypasses proper authentication mechanisms. In many cases, they are not designed for malicious use: they may allow a manufacturer to troubleshoot a device or a software author to push an update. However, if adversarial parties discover the presence of a backdoor, they can use it for malicious purposes.

Rootkits

A rootkit is software that is designed to allow an attacker to access a computer and hide the existence of the software … and sometimes hide the presence of the user on the system.

Historically, a basic rootkit would replace common administration commands (such as ps, ls, find, top, netstat, etc.) with commands that mimic their operation but hide the presence of intruding users, intruding processes, and intruding files. The idea is that a system administrator should be able to examine the system and believe that all is fine and the system is free of malware (or of unknown user accounts).

User mode rootkits: A user mode rootkit involves replacing commands, interposing libraries intercepting messages, and patching commonly-used APIs that may divulge the presence of the malware. A skilled administrator may find unmodified commands or import software to detect the intruding software.
Kernel mode rootkits: A kernel mode rootkit is installed as a kernel module. Being in the kernel gives the rootkit unrestricted access to all system resources and the ability to patch kernel structures and system calls. For example, directory listings from the getdents64 system call may not report any names that match the malware. Commands and libraries can be replaced and not give any indication that malicious software is resident in the system.
Hypervisor rootkits: The most insidious rootkits are hypervisor rootkits. A hypervisor rootkit is a type of rootkit that attacks virtualized environments by targeting the hypervisor layer that controls the virtual machines. By infecting the hypervisor, the rootkit can gain control over all the virtual machines running on the host, enabling it to monitor and manipulate operations on these machines. This level of control makes detection and removal exceptionally challenging, as the rootkit can hide its presence from both the operating system and antivirus programs running on the virtual machines.

Deceptive web sites

Quite often, malicious links in phishing attacks direct the user to a web site in order to obtain their login credentials. These sites masquerade as legitimate sites. The Proofpoint study mentioned earlier found that for every legitimate website, there are 20 malicious sites that mimic it. This is known as typosquatting. Such sites can be masqueraded banking sites, Google/Microsoft/Apple authentication pages, videoconferencing plugin-software downloads, etc.

File serving sites, including those that host software or those that provide services such as PDF or mp3 conversion are often ad-sponsored. Some of the ads on these sites, however, often look like download links and can trick a user into clicking on the ad instead of the link for the actual content. The

Defenses

Malware was particularly easy to spread on older Windows systems since user accounts, and hence processes, ran with full administrative rights, which made it easy to modify any files on the system and even install kernel drivers. Adding file protection mechanisms, such as a distinction between user and administrator accounts added a significant layer of protection. However, malware installed by the user would run with that user’s privileges and would have full access to all of a user’s files. If any files are read or write protected, the malware can change DAC permissions.

Systems took the approach of warning users if software wanted to install software or asked for elevated privileges. Social engineering hopes to convince users that they actually want to install the software (or view the document). They will happily grant permissions and install the malware. MAC permissions can stop some viruses as they will not be able, for instance, to override write permissions on executable files but macro viruses and the user files are still a problem.

In general, however, studies have shown that by simply taking away admin rights (avoiding privilege escalation) from users, 94% of the 530 Microsoft vulnerabilities that were reported in 2016 could be mitigated and 100% of vulnerabilities in Office 2016 could be mitigated.

Anti-virus (anti-malware) software

There is no way to recognize all possible viruses. Anti-virus software uses two strategies: signature-based and behavior-based approaches.

With signature-based systems, anti-virus programs look for byte sequences that match those in known malware. Each bit pattern is an excerpt of code from a known virus and is called a signature. A virus signature is simply a set of bytes that make up a portion of the virus and allow scanning software to see whether that virus is embedded in a file. The hope is that the signature is long enough and unique enough that the byte pattern will not occur in legitimate programs. This scanning process is called signature scanning. Lists of signatures (“virus definitions”) have to be updated by the anti-virus software vendor as new viruses are discovered. Signature-based detection is used by most anti-virus products.

A behavior-based system monitors the activities of a process (typically the system calls or standard library calls that it makes). Ideally, sandboxing is employed, to ensure that the suspected code is run within a sandbox or even in an interpreted environment within a sandbox to ensure that it cannot cause real damage. Behavior-based systems try to perform anomaly detection. If the observed activity is deemed suspicious, the process is terminated and the user alerted. Sandboxed, behavior-based analysis is often run by anti-malware companies to examine what a piece of suspected malware is actually doing and whether it should be considered to be a virus. A behavior-based can identify previously-unseen malware but these systems tend to have higher false positive rates of detection: it is difficult to characterize exactly what set of operations constitute suspicious behavior.

Malware Countermeasures

Some viruses will take measures to try to defend themselves from anti-virus software.

Signature scanning countermeasures

A common thing to do in malware is to use a packer on the code, unpacking it prior to execution. Packing can be one of several operations:

Simply obscure the malware payload by exclusive-oring (xor) with a repeating byte pattern (exclusive-oring the data with the same byte pattern reconstructs it.
Compress the code and then uncompress it upon loading it prior to execution.
Encrypt the code and decrypt it prior to execution.

All of these techniques will change the signature of a virus. One can scan for a signature of a compressed version of the virus but there are dozens of compression algorithms around, so the scanning process gets more complicated.

With encryption (xor is a simple form of encryption), only the non-encrypted part of the virus contains the unpacking software (decryption software and the key). A virus scanner will need to match the code for the unpacker component since the key and the encrypted components can change each time the virus propagates itself.

Polymorphic viruses mutate their code each time they run while keeping the algorithm the same. This involves replacing sequences of instructions with functionally-identical ones. For example, one can change additions to subtractions of negative numbers, invert conditional tests and branches, and insert or remove no-op instructions. This thwarts signature scanning software because the the byte pattern of the virus is different each time.

Access control countermeasures

Access controls help but do not stop the problem of malware. Containment mechanisms such as containers work well for server software but are usually impractical for user software (e.g., you want Microsoft Word to be able to read documents anywhere in a user’s directories). Application sandboxing is generally far more effective and is a dominant technique used in mobile software.

Trojans, deceptive downloads, and phishing attacks are insidiously difficult to defend against since we are dealing with human nature: users want to install the software or provide the data. They are conditioned to accepting pop-up messages and entering a password. Better detection in browsers & mail clients against suspicious content or URLs helps. However, malware distributors have been known to simply ask a user to rename a file to turn it into one that is recognized by the operating system as an executable file (or a disk image, PDF, or whatever format the malware come in and may otherwise be filtered by the mail server or web browser.

Sandboxing countermeasures

Virusus are unlikely to get through a sandbox (unless there are vulnerabilities or an improper configuration). However, there are areas where malware can address sandboxing:

Vendor examination
Anti-virus vendors often test software within a tightly configured sandboxed environment so they can detect whether the software is doing anything malicious (e.g., accessing files, devices, or the network in ways it is not supposed to). If they detect that they do have malware, they will dig in further and extract a signature so they can update and distribute their list of virus definitions. Viruses can try to get through this examination phase by setting a trigger to keep the virus from immediately performing malicious actions or to stay dormant for the first several invocations. The hope is that the anti-virus vendors will not see anything suspicious and the virus will never be flagged as such by their software.
User configuration (entitlements)
Virtually all mobile applications, and increasingly more desktop/laptop applications, are run with application sandboxes in place. These may disallow malware from accessing files, devices, or the network. However, it never hurts to ask. The software can simply ask the user to modify the sandbox settings. If social engineering is successful, the user may not even be suspicious and not wonder why a game wants access to contacts or location information.

Biometric authentication

Biometric authentication is the process of identifying a person based on their physical or behavioral characteristics as opposed to their ability to remember a password or their possession of some device. It is the third of the three factors of authentication: something you know, something you have, and something you are.

It is also fundamentally different than the other two factors because it does not deal with data that lends itself to exact comparisons. For instance, sensing the same fingerprint several times will not likely give you identical results each time. The orientation may differ, the pressure and angle of the finger may result in some parts of the fingerprint appearing in one sample but not the other, and dirt, oil, and humidity may alter the image. Biometric authentication relies on pattern recognition and thresholds: we have to determine whether two patterns are close enough to accept them as being the same.

A false acceptance rate (FAR) is when a pair of different biometric samples (e.g., fingerprints from two different people) is accepted as a match. A false rejection rate (FRR) is when a pair of identical biometric samples is rejected as a match. Based on the properties of the biometric data, the sensor, the feature extraction algorithms, and the comparison algorithms, each biometric device has a characteristic ROC (Receiver Operating Characteristic) curve. The name derives from early work on RADAR and maps the false acceptance versus false rejection rates for a given biometric authentication device. For password authentication, the “curve” would be a single point at the origin: no false accepts and no false rejects. For biometric authentication, which is based on thresholds that determine if the match is “close enough”, we have a curve.

At one end of the curve, we can have an incredibly low false acceptance rate (FAR). This is good as it means we will not have false matches: the enemy stays out. However, it also means the false reject rate (FRR) will be very high. If you think of a fingerprint biometric, the stringent comparison needed to yield a low FAR means that the algorithm will not be forgiving to a speck of dirt, light pressure, or a finger held at a different angle. We get high security at the expense of inconveniencing legitimate users, you may have to present their finger repeatedly for sensing, hoping that it will eventually be accepted.

At the other end of the curve, we have a very low false rejection rate (FRR). This is good since it provides convenience to legitimate users. Their biometric data will likely be accepted as legitimate, and they will not have to deal with the frustration of re-sensing their biometric, hoping that their finger is clean, not too greasy, not too dry, and pressed at the right angle with the correct pressure. The trade-off is that it’s more likely that another person’s biometric data will be considered close enough as well and accepted as legitimate.

Numerous biological components can be measured. They include fingerprints, irises, blood vessels on the retina, hand geometry, facial geometry, facial thermographs, and many others. Data such as signatures and voice can also be used, but these often vary significantly with one’s state of mind (your voice changes if you’re tired, ill, or angry). They are behavioral systems rather than purely physical systems, such as your iris patterns, length of your fingers, or fingerprints, and tend to have lower recognition rates. Other behavioral biometrics include keystroke dynamics, mouse use characteristics, gait analysis, and even cognitive tests.

Regardless of which biometric is used, the important thing to do to make it useful for authentication is to identify the elements that make it different. Most of us have swirls on our fingers. What makes fingerprints different from finger to finger are the various variations in those swirls: ridge endings, bifurcations, enclosures, and other elements beyond that of a gently sloping curve. These features are called minutia. The presence of minutia, their relative distances from each other and their relative positions can allow us to express the unique aspect of a fingerprint as a relatively compact stream of bits rather than a bitmap.

Two important elements of biometrics are robustness and distinctiveness. Robustness means that the biometric data will not change much over time. Your fingerprints will look mostly the same next year and the year after. Your fingers might grow fatter (or thinner) over the years and at some point in the future, you might need to re-register your hand geometry data.

Distinctiveness relates to the differences in the biometric pattern among the population. Distinctiveness is also affected by the precision of a sensor. A finger length sensor will not measure your finger length to the nanometer, so there will be quantized values in the measured data. Moreover, the measurements will need to account for normal hand swelling and shrinking based on temperature and humidity, making the data even less precise. Accounting for these factors, approximately one in a hundred people may have hand measurements similar to yours. A fingerprint sensor may typically detect 40–60 distinct features that can be used for comparing with other sensed fingerprints. An iris scan, on the other hand, will often capture over 250 distinct features, making it far more distinctive and more likely to identify a unique individual.

Some sensed data is difficult to normalize. Here, normalization refers to the ability to align different sensed data to some common orientation. For instance, identical fingers might be presented at different angles to the sensors. The comparison algorithm will have to account for possible rotation when comparing the two patterns. The inability to normalize data makes it difficult to perform efficient searches. There is no good way to search for a specific fingerprint short of performing a comparison against each stored pattern. Data such as iris scans lends itself to normalization, making it easier to find potentially matching patterns without going through an exhaustive search.

In general, the difficulty of normalization and the fact that no two measurements are ever likely to be the same makes biometric data not a good choice for identification. It is difficult, for example, to construct a system that will store hundreds of thousands of fingerprints and allow the user to identify and authenticate themselves by presenting their finger. Such a system will require an exhaustive search through the stored data and each comparison will itself be time-consuming as it will not be a simple bit-by-bit match test. Secondly, fingerprint data is not distinct enough for a population of that size. A more realistic system will use biometrics for verification and have users identify themselves through some other means (e.g., type their login name) and then present their biometric data. In this case, the software will only have to compare the pattern associated with that user.

The biometric authentication process comprises several steps:

Enrollment. Before any authentication can be performed, the system needs to store the user’s biometric data to later use it for comparison. The user will have to present the data to the sensor, distinctive features need to be extracted, and the resulting pattern stored. The system may also validate if the sensed data is of sufficiently high quality or ask the user to repeat the process several times to ensure consistency in the data.
Sensing. The biological component needs to be measured by presenting it to a sensor, a dedicated piece of hardware that can capture the data (e.g., a camera for iris recognition, a capacitive fingerprint sensor). The sensor captures the raw data (e.g., an image).
Feature extraction. This is a signal processing phase where the interesting and distinctive components are extracted from the raw sensed data to create a biometric pattern that can be used for matching. This process involves removing signal noise, discarding sensed data that is not distinctive or not useful for comparisons and determining whether the resulting values are of sufficiently good quality that it makes sense to use them for comparison. A barely-sensed fingerprint, for instance, may not present enough minutia to be considered useful.
Pattern matching. The extracted sample is now compared to the stored sample that was obtained during the enrollment phase. Features that match closely will have small distances. Given variations in measurements, it is unlikely that the distance will be zero, which would indicate a perfect match.
Decision. The “distance” between the sensed and stored samples is now evaluated to decide if the match is close enough. The decision determination decides whether the system favors more false rejects or more false accepts.

Security implications

Several security issues relate to biometric authentication.

Sensing: Unlike passwords or encryption keys, biometric systems require sensors to gather the data. The sensor, its connectors, the software that processes sensed data, and the entire software stack around it (operating system, firmware, libraries) must all be trusted and tamper-proof.
Secure communication and storage: The communication path after the data is captured and sensed must also be secure so that attackers will have no ability to replace a stored biometric pattern with one of their own.
Liveness: Much biometric data can be forged. Gummy fingerprints can copy real fingerprints, pictures of faces or eyes can fool cameras into believing they are looking at a real person, and recordings can be used for voice-based authentication systems.
Thresholds: Since biometric data relies on “close-enough” matches, you can never be sure of a certain match. You will need to determine what threshold is good enough and hope that you do not annoy legitimate users too much or make it too easy for the enemy to get authenticated.
Lack of compartmentalization: You have a finite set of biological characteristics to present. Fingerprints and iris scans are the most popular biometric sources. Unlike passwords, where you can have distinct passwords for each service, you cannot have this with biometric data.
Theft of biometric data: If someone steals your password, you can create a new one. If someone steals your fingerprint, you have nine fingerprints left and then none. If someone gets a picture of your iris, you have one more left. Once biometric data is compromised, it remains compromised.

Bitcoin & Blockchain

Bitcoin is considered to be the first blockchain-based cryptocurrency and was designed as an open, distributed, public system: there is no authoritative entity and anyone can participate in operating the servers.

With a centralized system, all trust resides in a trusted third party, such as a bank. The system fails if the bank disappears, the banker makes a mistake, or if the banker is corrupt. With Bitcoin, the goal was to create a completely decentralized, distributed system that allows people to manage transactions while preventing opportunities for fraud.

Cryptographic building blocks of bitcoin

Bitcoin uses a few key cryptographic structures.

Hash pointers

A hash pointer is similar to a traditional pointer, but instead of just containing the address (a reference to) of the next block of data, it also includes a cryptographic hash of the data in the next block. When a hash pointer points to a block of data, it effectively links to the data and provides a way to verify that the data has not been tampered with. If any alteration occurs in the data, the cryptographic hash will change, indicating a discrepancy between the expected hash (stored in the hash pointer) and the hash calculated from the altered data.

This feature of hash pointers is particularly crucial for implementing blockchains and distributed ledgers. In a blockchain, each block contains a hash pointer that points to the previous block, creating a secure and unbreakable chain of blocks. This structure ensures that if an attacker attempts to alter the data in any block, they would need to alter all subsequent blocks in the chain due to the interconnected hashes, a task that, as we will see, is computationally infeasible due to Bitcoin’s proof-of-work requirements.

Merkle trees

Merkle trees provide a way to efficiently and securely verify the contents of large data sets. A Merkle tree is a binary tree where each leaf node contains the hash of a block of data, and each non-leaf node contains the hash of the concatenation of its child nodes' hashes. At the top level is a single hash, known as the root hash or Merkle root, that represents the entirety of the data within the tree.

The beauty of a Merkle tree lies in its ability to quickly verify whether a specific piece of data is included in the set by traversing the tree of hashes from the target data’s hash up the Merkle root of the tree.

In a blockchain, each block contains a Merkle tree of all the transactions within that block. This allows for the verification of any single transaction without needing to inspect the entire block.

Public key cryptography and digital signatures

Public key cryptography is used in the inputs and outputs of transactions. Each user creates a public key, which can be shared with anyone, and a private key, which is kept secret by the owner. These keys are mathematically related but it is computationally infeasible to deduce the private key from the public key.

Digital signatures use the user’s private key to sign a message, creating a signature that anyone can verify using that user’s public key. Signing is effectively taking a hash of the message and encrypting it with a private key. The signature allows someone to use the corresponding public key to verify the integrity of the message.

The ledger and the Bitcoin network

Bitcoin maintains a complete list of every single transaction since its creation in January 2009. This list of transactions is called the ledger and is stored in a structure called a blockchain. Complete copies of the ledger are replicated at Bitcoin nodes around the world. There is no concept of a master node or master copies of the ledger. Anyone can download the software and un a Bitcoin node – they all run the same software. New systems get the names of some well-known nodes when they download the software. After connecting to one or more nodes, a Bitcoin node will ask each for a list of known Bitcoin nodes. This creates a peer discovery process that allows a node to get a complete list of other nodes in the network.

User identities and addresses

We know how to create unforgeable messages: just sign them. If Alice wants to transfer $500 to Charles, she can create a transaction record that describes this transfer and sign it with her private key (e.g., use a digital signature algorithm or create a hash of the transaction and encrypt it with her private key). Bitcoin uses public-private key pairs and digital signatures to sign transactions.

Bitcoin transactions — the movement of bitcoins from one account to another — are associated with public these keys and not users. Users are anonymous. Your identity is your public key and you can make transactions using this identity if you can prove that you have the corresponding private key.

There is never any association of your public key with your name. In fact, nothing stops you from creating multiple keys and having identities. The system does not care, or know, what your physical identity is or how many addresses you assigned to yourself. All that matters is that only you have the corresponding private keys to the public keys identified in your transactions so you are the only one who could have created valid signatures for your transactions.

In its initial deployment, your public key was your Bitcoin identity. If someone wanted to transfer money to you, they would create a transaction where your public key is identified as the recipient of the bitcoin. Bitcoin now identifies recipients by their Bitcoin address. Your Bitcoin address is essentially a hash of your public key that creates a shorter value than the public key..

Since a hash is a one-way function, someone can create (or verify) your address if they are presented with your public key. However, they cannot derive your public key if they have your address.

Bitcoin uses addresses only as destinations; an address can only receive funds. If Bob wants to send bitcoin to Alice, he will identify Alice as the output – the target of the money – by using her address. At some point in the future, Alice can use that money by creating a transaction whose source (input) refers to the transaction where she received the bitcoin. Any bitcoin node can validate this transaction:

Alice’s transaction will identify where the money comes from (inputs). Each input contains contains a reference to a past transaction, her public key, and her signature for the transaction (a hash of the transaction encrypted with her private key).
Any node can validate the signature by using Alice’s public key, which is also a field of the input. This proves that someone who owns the private key (Alice) that corresponds to that public key created the transaction.
That transaction’s input contains a reference to an older transaction where Alice’s address is identified as the output (listing a specific number of bitcoin that is being transferred).
A bitcoin node can hash Alice’s public key (from Alice’s transaction) to create the address and see that it is the same as in the output address of the past transaction that is referenced. That way, it validates that the older transaction is indeed transferring a specified amount of bitcoin to Alice.

User transactions (moving coins)

A transaction contains inputs and outputs. Inputs identify where the bitcoin comes from and outputs identify where to whom it is being transferred.

If Alice wants to send bitcoin to Bob, she creates a message that is a bitcoin transaction and sends it to one or more bitcoin nodes. When a node receives a message, it will forward the transaction to its peers (other nodes it knows about). Typically, within approximately five seconds, every bitcoin node on the network will have a copy of the transaction and can process it.

The bitcoin network is not a database. It is build around a ledger, a list of all transactions. There are no user accounts that can be queried. In her transaction, Alice needs to provide one or more links to previous transactions that will add up to at least the required amount of bitcoin that she’s sending. These links to earlier transactions are called inputs. Each input is an ID of an earlier transaction. Inputs are outputs of previous transactions.

When a bitcoin node receives a transaction, it performs several checks:

The signature of each input is validated by checking it against the public key in the transaction. This ensures that it was created by someone who has the private key that corresponds to the public key.
It hashes the public key in the transaction to create the address, which will be matched against the output addresses in the inputs.
The transactions listed in the inputs are validated to make sure that those transactions have not been used by any other transaction. This ensures there will be no double spending.
Finally, it makes sure that there is a sufficient quantity of bitcoin output by those input transactions.

A bitcoin transaction contains:

One or more inputs:: Each input identifies transactions where coins come from. These are references to past transactions. Each input also contains a signature and a public key that corresponds to the private key that was used to create the signature. A user may have multiple identities (keys) and reference past transactions that were directed to different addresses that belong to the user.
Output:: Destination address & amount – who the money goes to. This is simply the recipient’s bitcoin address.
Change:: The transaction owner’s address & bitcoin amount. Every input must be completely spent Any excess is generated as another output to the owner of the transaction.

Transaction fee (anywhere from 10¢ to a few $ per transaction). There is a limited amount of space (about 1 MB) in a block. A transaction is about 250 bytes. To get your transaction processed quickly, you need to outbid others.

Blocks and the blockchain

Transactions are sent to all the participating servers. Each system keeps a complete copy of the entire ledger, which records all transactions from the very first one. Currently the bitcoin ledger is about 560 GB.

Transactions are grouped into a block. A block is just a partial list of transactions. When a server is ready to do so, it can add the block to the ledger, which is a linked list of blocks that comprise the blockchain. In Bitcoin, a block contains ten minutes worth of transactions, all of which are considered to be concurrent.

Approximately every ten minutes, a new block of transactions is added to the blockchain. A block is approximately a 1.5–2 MB in size and holds around 2,000–4,000 transactions. To make it easy to locate and verify a specific transaction within a block, the blocks are stored in a Merkle tree.By validating the chain of hashes along the path, it is easy to validate that a specific transaction belongs to the block and has not been tampered.

Securing the Block

A critically important part of the Bitcoin blockchain is to make sure that blocks in the blockchain have not been modified. A blockchain is a linked list of blocks that are linked via hash pointers.

Each block contains a hash pointer to the previous block in the chain. A hash pointer not only to the previous block but also contains a SHA-256⁶ hash of that block. This creates a tamper-proof structure. If the contents of any block are modified (accidentally or maliciously), the hash pointer that points to that block will no longer be valid (the hashes won’t match).

To make a change to a block, an attacker will need to modify all the hash pointers from the most recent block back to the block that was changed. One way to prevent such a modification could have been to use signed hash pointers to ensure an attacker cannot change their values. However, that would require someone to be in charge of signing these pointers and there is no central authority in Bitcoin; anyone can participate in building the blockchain. We need a different way to protect blocks from modification.

Proof of Work

Bitcoin makes the addition of a new block – or modification of a block in a blockchain – difficult by creating a puzzle that needs to be solved before the block can be added to the blockchain. By having a node solve a sufficiently difficult puzzle, there will be only a tiny chance that two or more nodes will propose adding a block to the chain at the same time.

This puzzle is called the Proof of Work and is an idea that was adapted from an earlier system called hashcash. Proof of Work requires computing a hash of three components, hash(B, A, W) where:

B = block of transactions (which includes the hash pointer to the previous block)
A = address (i.e., hash of the public key) of the owner of the server doing the computation
W = the Proof of Work number

When servers are ready to commit a block of transactions onto the chain, they each compute this hash, trying various values of W until the hash result has a specific pre-defined property. The property they are searching for is a hash value that is less than some given number. Currently, it’s a value that where the leading 74 bits of the 256-bit hash are all be 0s). The property changes over time via a Difficulty Adjustment Algorihm to ensure that the puzzle never gets too easy regardless of how many nodes are in the network or how fast processors get. The difficulty is adjusted to ensure that one new block will be added approximately every ten minutes.

Recall that one property of a cryptographic hash function is the inability to deduce any of the input by looking at the output. Hence, we have no idea what values of W will yield a hash with the desired properties. Servers have to try trillions of values with the hope that they will get lucky and find a value that yields the desired hash. This process of searching for W is called mining.

When a server finds a value of W that yields the desired hash, it advertises that value to the entire set of bitcoin servers. Upon receiving this message, it is trivial for a server to validate the proof of work by simply computing hash(B, A, W) with the W sent in the message and checking the resultant value. The servers then add the block, which contains the Proof of Work number and the winner’s address, onto the blockchain.

Bitcoin’s mining difficulty is adjusted every 2,016 blocks, which corresponds to approximately 14 days, to keep the average rate at which blocks are added to the blockchain at 10 minutes. This allows the network to handle changes in the number of miners participating in computing the proof work.

Double Spending and modifying past transactions

A major concern with decentralized cryptocurrency systems is double spending. Double spending refers to sending the same funds (or tokens) to multiple parties: Alice sends $500 to Charles and $500 to David but only has $500 in her account. Bitcoin deals with this by having every server maintain the complete ledger, so Alice’s entire list of transactions can be validated before a new one is accepted.

Alice may decide to go back to an older transaction and modify it. For example, she might change change the transaction that sent bitcoin to Charles into one that sends money to David – or simply delete the fact that she paid Charles the full amount.

To do this, she would need to compute a new proof of work value for that block so the block hash will be valid. Since Bitcoin uses hash pointers, each block contains a hash pointer to the previous (earlier) block. Alice would thus need to compute new proof of work values for all newer blocks in the chain so that her modified version of the entire blockchain is valid. She ends up making a competing blockchain.

Recomputing the proof of work numbers is a computationally intensive process. Because of the requirement to generate the Proof of Work for each block, a malicious participant will not be able to catch up with the cumulative work of all the other participants. Because of errors or the rare instances where multiple nodes compute the proof of work concurrently, even honest participants may, on occasion, end up building a competing blockchain. Bitcoin’s policy is that the longest chain in the network is the correct one. The length of the chain is the chain’s score and the highest-scoring chain will be considered the correct one by the servers. A participant is obligated to update its chain with a higher-scoring one if it gets notice of a higher-scoring chain from another system. If it doesn’t update and insists on propagating its chain as the official one, its chain will simply be ignored by others.

51% Attack

Let us go back to the example of Alice maliciously modifying a past transaction. In addition to the work of modifying the existing blockchain, Alice will also need to process new transactions that are steadily arriving, and making the blockchain get longer as new blocks get added to it. She needs to change the existing blockchain and also compute proof of work values for new blocks faster than everyone else in the network so that she would have the longest valid chain and hence a high score.

If she can do this then her chain becomes the official version of the blockchain and everyone updates their copy. This is called a 51% attack. To even have a chance of succeeding, Alice would need more computing power than the reset of the systems in the Bitcoin network combined. Back in 2017, The Economist estimated that “bitcoin miners now have 13,000 times more combined number-crunching power than the world’s 500 biggest supercomputers,” so it is not feasible for even a nation-state attacker to harness sufficient power to carry out this attack on a popular cryptocurrency network such as Bitcoin. Blockchain works only because of the assumption that the majority of participants are honest … or at least not conspiring together to modify the same transactions.

Even if someone tried to do this attack, they’d likely only be able to modify transactions in very recent history – in the past few blocks of the blockchain. This is why For this reason, transactions further back in the blockchain are considered to be more secure.

Committing Transactions

Because of the chain structure, it requires more work to modify older transactions (more blocks = more proof of work computations). Modifying only the most recent block is not hugely challenging. Hence, the further back a transaction is in the blockchain, the less likely it is that anyone can amass the computing power to change it and create a competing blockchain.

A transaction is considered confirmed after some number, N, additional blocks are added to the chain. The value of N is up to the party receiving the transaction - a level of comfort. The higher the number, the deeper the transaction is in the blockchain and the harder it is to alter. Bitcoin recommends N=1 for low-value transactions (payments under $1,000; this enables them to be confirmed quickly), N=3 for deposits and mid-value transactions, and N=6 for large payments (e.g., $10k…$1M). Even larger values of N could be used for extremely large payments.

Rewards

Why would servers spend a huge amount of computation, which translates to huge investments in computing power and electricity, just to find a value that produces a hash with a certain property? To provide an incentive, the system rewards the first server (the miner) that advertises a successful Proof of Work number by depositing a certain number of Bitcoins into their account. To avoid rewarding false blockchains as well as to encourage continued mining efforts, the miner is rewarded only after 99 additional blocks have been added to the ledger.

The reward for computing a proof of work has been designed to decrease over time. The reward is cut in half every 210,000 blocks:

50 bitcoins for the first 4 years since 2008
25 bitcoins after block #210,000 on November 28, 2012
12.5 bitcoins after block #420,000 on July 9, 2016 2019
6.25 bitcoins at block #630,000 on May 11, 2020
3.125 bitcoins at block #840,00 on April 20, 2024

In total, there will be 32 Bitcoin halvings. After that, the reward will reach zero and there will be a maximum of around 21 million bitcoins in circulation. However, recall that each transaction has a fee associated with it. Whoever solves the puzzle first and gets a confirmed block into the blockchain will also reap the sum of all the transaction fees in that block.

Centralization

Bitcoin has been designed to operate as a large-scale, global, fully decentralized network. Anybody can download the software and operate a bitcoin node. All you need is sufficient storage to store the blockchain. There are currently over 9,000 reachable full nodes spread across 99 countries. It is estimated that there are over 100,000 total nodes, including those that are be running old versions of software or are not always reachable. In this sense, Bitcoin is truly decentralized. Note that there are different types of nodes. The nodes we discussed serve as full nodes. They maintain an entire copy of the blockchain and accept transactions. Light nodes are similar but store only a part of the blockchain, talking to a full node parent if they need to access other blocks.

Not everyone who operates a bitcoin node does mining (proof of work computation). Mining is incredibly time energy intensive. To make money on mining, one needs to buy dedicated ASIC mining hardware that is highly optimized to compute SHA-256 hashes. Conventional computers will cost more in energy than they will earn in bitcoin rewards. Because of this, mining tends to be concentrated among a far smaller number of players. It is not as decentralized as much as one would like.

Bitcoin software is open source but there is only a small set of trusted developers. The software effort is inspectable but not really decentralized. Bugs have been fixed but many nodes still run old and buggy versions. Bitcoin transactions cannot be undone even if they were created by buggy nodes or via compromised keys.

Network Security

The Internet is designed to interconnect various networks, each potentially using different hardware and protocols, with the Internet Protocol (IP) providing a logical structure atop these physical networks. IP inherently expects unreliability from underlying networks, delegating the task of packet loss detection and retransmission to higher layers like TCP or applications. Communication via IP involves multiple routers and networks, which may compromise security due to their unknown trust levels.

The OSI model helps describe the networking protocol stacks for IP:

Physical Layer: Involves the actual network hardware.
Data Link Layer: Manages protocols for local networks like Ethernet or Wi-Fi.
Network Layer: Handles logical networking and routing across physical networks via IP.
Transport Layer: Manages logical connections, ensuring reliable data transmission through TCP, or provides simpler, unreliable communication via UDP.

Each layer plays a critical role in ensuring data is transmitted securely and efficiently across the internet.

Data link layer

In an Ethernet network, the data link layer is handled by Ethernet transceivers and Ethernet switches. Security was not a consideration in the design of this layer and several fundamental attacks exist at this layer. Wi-Fi also operates at the data link layer and uses the same address structure as ethernet. It adds encryption on wireless data between the device and access point. Note that the encryption is not end-to-end, between hosts, but only to the access point.

Switch CAM table overflow

Sniff all data on the local area network (LAN).

A CAM table overflow attack exploits the self-learning mechanism of a network switch, which uses a content addressable memory (CAM) table to map MAC addresses to switch ports for efficient packet forwarding. By flooding the switch with fake MAC addresses, an attacker can overflow the CAM table. Once the table is full, the switch behaves like a hub, broadcasting packets to all ports, thus allowing the attacker to intercept data. To protect against this, port security can be configured to limit the number of MAC addresses allowed on a port, preventing unauthorized devices from overwhelming the CAM table.

VLAN hopping (switch spoofing)

Sniff all data from connected virtual local area networks.

A VLAN hopping attack exploits VLAN (Virtual Local Area Network) configurations to gain unauthorized access to multiple VLANs. VLANs segregate network traffic for enhanced security and efficiency. Since switches can connect to other switches, VLAN trunking, managed via the IEEE 802.1Q standard, allows multiple VLANs to share a single physical connection between switches.

Attackers can perform switch spoofing by emulating a switch, tricking a real switch into thinking it’s connected to another switch. This allows the attacker’s device to receive traffic across all VLANs. Defending against such attacks involves configuring managed switches to restrict trunking to authorized ports.

ARP cache poisoning

Redirect IP packets by changing the IP address to MAC address mapping.

An ARP cache poisoning attack exploits the Address Resolution Protocol (ARP), which is used by the operating system to map IP addresses to MAC addresses. Attackers can respond falsely to ARP queries or send gratuitous ARP responses not associated with a request, claiming their MAC address corresponds to another device’s IP address. This corrupts the ARP caches of devices on the network.

Defenses include Dynamic ARP Inspection on switches, which verifies ARP packets against a trusted list, and static ARP entries to prevent unauthorized changes.

DHCP spoofing

Configure new devices on the LAN with your choice of DNS address, router address, etc.

DHCP spoofing attacks target the Dynamic Host Configuration Protocol (DHCP), which networks use to assign IP addresses and network configuration parameters to devices dynamically.

The attack begins with a DHCP Discover message, which devices broadcast to find DHCP servers. Malicious actors respond to these messages before legitimate servers, directing devices to use attacker-specified DNS or gateway settings. This redirection allows attackers to intercept, manipulate, or block data.

The problem is challenging to mitigate because of the trust placed in network broadcasts and the speed of response. A defense mechanism, DHCP snooping, helps by validating DHCP messages on network switches and blocking unauthorized DHCP offers, thereby safeguarding against malicious server responses.

Network (IP) layer

The Internet Protocol (IP) layer is responsible for getting datagrams (packets) to their destination. It does not provide any guarantees on message ordering or reliable delivery. Datagrams may take different routes through the network and may be dropped by queue overflows in routers.

Source IP address authentication

Anyone can impersonate an IP datagram.

One aspect of the design of IP networking is that there is no source IP address authentication. Clients are expected to use their own source IP address but anybody can override this if they have administrative privileges on their system by using a raw sockets interface.

This enables an attacker to forge messages to appear that they come from another system. Any software that authenticates requests based on their IP addresses will be at risk.

Anonymous denial of service

The ability to set an arbitrary source address in an IP datagram can be used for anonymous denial of service attacks. If a system sends a datagram that generates an error, the error will be sent back to the source address that was forged in the query. For example, a datagram sent with a small time-to-live, or TTL, value will cause a router that is hit when the TTL reaches zero to respond back with an ICMP (Internet Control Message Protocol) Time to Live exceeded message. Error responses will be sent to the forged source IP address and it is possible to send a vast number of such messages from many machines (by assembling a botnet) across many networks, causing the errors to all target a single system.

Routers

Routers are computers with multiple network links and often with special-purpose hardware to facilitate the rapid movement of packets across interfaces. They run operating systems and have user interfaces for administration. As with many other devices that people don’t treat as “real” computers, there is a danger that they routers will have simple or even default passwords. Moreover, owners of routers may not be nearly as diligent in keeping the operating system and other software updated as they are with their computers.

Routers can be subject to some of the same attacks as computers. Denial of service (DoS) attacks can keep the router from doing its job. One way this is done is by sending a flood of ICMP datagrams. The Internet Control Message Protocol is typically used to send routing error messages and updates and a huge volume of these can overwhelm a router. Routers may also have input validation bugs and not handle certain improper datagrams correctly.

Route table poisoning is the modification of the router’s routing table either by breaking into a router or by sending route update datagrams over an unauthenticated protocol.

Transport layer (UDP, TCP)

UDP and TCP are transport layer protocols that allow applications to establish communication channels with each other. Each endpoint of such a channel is identified by a port number (a 16-bit integer that has nothing to do with Ethernet switch ports). The port number allows the operating system to direct traffic to the proper socket.

UDP, the User Datagram Protocol, is stateless, connectionless, and unreliable. As we saw with IP source address forgery, anybody can send UDP messages with forged source IP addresses.

TCP (Transmission Control Protocol) is a stateful, connection-oriented, and reliable protocol used in network communications. Being stateful, TCP keeps track of the connection’s state through sequence numbers, ensuring that packets are ordered correctly and no data is lost. As a connection-oriented protocol, TCP establishes a connection using a three-way handshake process before any data transfer. This handshake involves SYN (synchronize), SYN-ACK (synchronize acknowledgment), and ACK (acknowledgment) packets to synchronize and acknowledge connection establishment.

TCP’s three-way handshake not only establishes a connection but also initializes sequence numbers, which are crucial for ensuring data integrity and order. The process starts when the client sends a SYN packet to the server with a random initial sequence number. The server responds with a SYN-ACK packet, acknowledging the client’s sequence number by adding one, and provides its own random initial sequence number. The client completes the handshake by sending an ACK packet, acknowledging the server’s sequence number. This exchange of sequence numbers sets the foundation for a reliable, ordered data transmission.

TCP’s use of random initial sequence numbers is critical for security. By starting with a random sequence number, TCP mitigates sequence number prediction attacks, where an attacker predicts the sequence numbers of packets to spoof legitimate packets or hijack a session. This randomness helps in maintaining the integrity and security of the data exchange process.

SYN flooding

SYN flooding attacks target the TCP three-way handshake by flooding a server with SYN packets, often from spoofed IP addresses, leading to server resource exhaustion and service unavailability.

SYN cookies defend against SYN flooding attacks by having the server create an initial sequence number is a cryptographic hash of the source and destination IP addresses and ports, along with a secret number. This allows the server to verify the legitimacy of incoming ACK packets without needing to store state information prematurely, thus preventing resource exhaustion. By encoding this connection-specific information into the sequence number, the server ensures that only clients completing the valid handshake can establish a connection.

TCP Reset

A somewhat simple attack is to send a RESET (RST) segment to an open TCP socket. If the server sequence number is correct, then the connection will close. Hence, the tricky part is getting the correct sequence number to make it look like the RESET is part of the genuine message stream.

Sequence numbers are 32-bit values. The chance of successfully picking the correct sequence number is tiny: 1 in 2³², or approximately one in four billion. However, many systems will accept a large range of sequence numbers approximately in the correct range to account for the fact that packets may arrive out of order, so they shouldn’t necessarily be rejected just because the sequence number is incorrect. This can reduce the search space tremendously, and an attacker can send a flood of RST packets with varying sequence numbers and a forged source address until the connection is broken.

Routing protocols

Autonomous Systems (AS) sets of IP addresses that are under the control of a single network operator. The Border Gateway Protocol (BGP) is the protocol used by external routers at each AS to exchange routing information between each other. BGP enables AS to determine the best routes for sending network traffic and manage the pathways by which data packets travel across the Internet, thus ensuring efficient and reliable routing.

BGP Hijacking

BGP hijacking, also known as route hijacking, involves maliciously redirecting internet traffic by corrupting the routing tables used by Border Gateway Protocol (BGP). An attacker misleads other networks into believing that the best route to specific IP addresses goes through their malicious system. This can be used to intercept, inspect, or redirect internet traffic to fraudulent sites.

BGP Path Forgery attacks manipulate the Border Gateway Protocol (BGP) by falsely advertising optimal paths to specific network destinations. This type of attack exploits BGP’s trust-based nature, which lacks mechanisms for path verification, leading to traffic being misrouted through the attacker’s network. These actions enable the attacker to intercept or manipulate data traffic.

BGP Prefix Forgery involves malicious actors advertising unauthorized IP prefixes via BGP. By advertising more specific prefixes than those used legitimately, attackers can divert traffic to themselves. BGP favors the most specific route available, making this a particularly effective method for redirecting traffic. This can lead to data interception or denial of service as traffic is misrouted to the attacker’s network.

Two security measures that were added to BGP were RPKI and BGPsec. RPKI (Resource Public Key Infrastructure) enhances BGP security by allowing networks to use public keys and digital signatures to verify that a network is authorized to announce specific IP prefixes, thus preventing invalid route announcements. However, RPKI’s effectiveness is limited by partial adoption and the need for network operators to maintain accurate and up-to-date certificate information.

BGPsec secures BGP by providing cryptographic validation of the entire AS path, not just the origin. This helps prevent path manipulation attacks. The main drawbacks of BGPsec include its increased complexity, higher computational overhead, and slow adoption.

Domain Name System (DNS)

The Domain Name System (DNS) is a Hierarchical service that maps Internet domain names to IP addresses. A user’s computer runs the DNS protocol via a program known as a DNS stub resolver. It first checks a local file for specific preconfigured name-to-address mappings. Then it checks its cache of previously-found mappings. Finally, it contacts an external DNS resolver, which is usually located at the ISP or is run as a public service, such as Google Public DNS or OpenDNS.

We trust that the name-to-address mapping is legitimate. Web browsers, for instance, rely on this to enforce their same-origin policy. However, DNS queries and responses are sent using UDP with no authentication or integrity checks. The only check is that each DNS query contains a Query ID (QID). A DNS response must have a matching QID so that the client can match it to the query. These responses can be intercepted and modified or just forged. Malicious responses can return a different IP address that will direct IP traffic to different hosts

A solution called DNSsec has been proposed. It is a secure extension to the DNS protocol that provide authenticated requests & responses. However, few sites support it.

Pharming attack

A pharming attack is an attack on the configuration information maintained by a DNS server –either modifying the information used by the local DNS resolver or modifying that of a remote DNS server. By changing the name to IP address mapping, an attacker can cause software to send packets to the wrong system.

The most direct form of a pharming attack is to modify the local hosts file to add a malicious name-to-address mapping. Alternatively, malware may modify the DNS server settings on a system so that it would contact an attacker’s DNS server, which can provide the wrong IP address for certain domain names.

DNS cache poisoning (DNS spoofing attack)

DNS queries first check the local host’s DNS cache to see if the results of a past query have been cached. A DNS cache poisoning attack, also known as DNS spoofing, involves corrupting the DNS cache with false information to redirect users to malicious websites. In the general case, DNS cache poisoning refers to any mechanism where an attacker is able to provide malicious responses to DNS queries, resulting in those responses getting cached locally.

JavaScript on a malicious website can perform a DNS cache poisoning attack. This attack takes advantage of the fact that a DNS response for a subdomain, such as a.bank.com can contain information about a new DNS server for the entire bank.com domain.

The browser requests access to a legitimate site but with an invalid subdomain. For example, a.bank.com. Because the system will not have the address of a.bank.com cached, it sends a DNS query to an external DNS resolver using the DNS protocol.

The DNS query includes a query ID (QID) x₁. At the same time that the request for a.bank.com is made, JavaScript launches an attacker thread that sends 256 responses with random QIDs (y₁, y₂, y₃, …}. Each of these DNS responses tells the server that the DNS server for bank.com is at the attacker’s IP address.

If one of these responses happens to have a matching QUD, the host system will accept it as truth that all future queries for anything at bank.com should be directed to the name server run by the attacker. If the responses don’t work, the script can try again with a different subdomain, b.bank.com. The attack might take several minutes, but there is a high likelihood that it will eventually succeed.

Summary: An attacker can run a local DNS server that will attempt to provide spoofed DNS responses to legitimate domain name lookup requests. If the query ID numbers of the fake response match those of a legitimate query (trial and error), the victim will get the wrong IP address, which will redirect legitimate requests to an attacker’s service.

DNS Rebinding

Web application security is based on the same-origin policy. Browser scripts can access cookies and other data on pages only if they share the same origin, which is the combination of URI (protocol), host name, and port number. The underlying assumption is that resolving a domain name takes you to the correct server.

The DNS rebinding attack allows JavaScript code on a malicious web page to access private IP addresses in the victim’s network. The attacker configures the DNS entry for a domain name to have a short time to live (TTL). When the victim’s browser visits the page and downloads JavaScript from that site, that JavaScript code is allowed to interact with the domain thanks to the same origin policy. However, right after downloading the script, the attacker can reconfigure the DNS server so that future queries will return an address in the internal network. The JavaScript code can then try to request resources from that system since, as far as the browser is concerned, the origin is the same because the name of the domain has not changed.

Summary: short time-to-live values in DNS allow an attacker to change the address of a domain name so that scripts from that domain can now access resources inside the private network.

Virtual Private Networks (VPNs)

Network tunnels serve the purpose of securely transmitting data between different network segments or over the internet by encapsulating the data packets within the protocol of an underlying network. This enables moving data across networks that might not otherwise support the same communication protocols, creating a communication channel over a public network infrastructure. For example, an IP packet on a local area network that is directed to a local IP address at a branch can be encapsulated within an IP packet that is sent to the router at that branch office, which would then extract the packet and route it on the internal network.

A tunnel provides connectivity but not security. A VPN (Virtual Private Network) is created by adding security to a network tunnel. This usually involves encrypting the encapsulated packet and adding a message authentication code (MAC) to ensure that any data transmitted between the endpoints remains confidential and secure from potential eavesdropping or modification. Additionally, VPNs employ authentication methods to verify the identities of the endpoints, further securing the data exchange within the tunnel.

IPsec (Internet Protocol Security) is a set of VPN protocols used to secure Internet communications by authenticating and encrypting each IP packet in a data stream. Communications in IPsec use one of two two main protocols: AH (Authentication Header) or ESP (Encapsulating Security Payload).

AH ensures data integrity and authenticity by adding a message authentication code (MAC) to each datagram but does not provide encryption.

ESP provides the same assurance of integrity as AH but also adds encryption in addition to a MAC, ensuring the confidentiality, integrity, and authenticity of data.

IPsec can operate in two modes: Transport and Tunnel. Transport mode encrypts only the payload of the IP packet, leaving the header untouched, and is suitable for end-to-end communication between hosts. Tunnel mode encrypts the entire IP packet and encapsulates it within a new packet, and is used mainly for gateway-to-gateway communications, such as VPNs, where entire packets need to be protected as they traverse untrusted networks.

IPSec supports the use of:

HMAC for message authentication.
Diffie-Hellman key exchange to create random session keys for HMAC and encryption while assuring forward secrecy.
Symmetric encryption of data using AES for the ESP protocol
X.509 digital certificates or pre-shared keys for authentication of endpoints.

Transport Layer Security (TLS)

Virtual Private Networks (VPNs) operate at the network layer to connect entire networks, tunneling all IP traffic without differentiating between specific data streams. This approach does not directly provide application-to-application secure communication. In contrast, Transport Layer Security (TLS), evolved from Secure Sockets Layer (SSL), operates above TCP to provide authentication, integrity, and encryption directly to applications. TLS preserves the sockets interface, allowing developers to implement network security transparently. Applications like web browsers use HTTPS, which incorporates TLS for secure communication over HTTP.

TLS has been designed to provide:

Data encryption: Symmetric cryptography is used to encrypt data.
Key exchange: During the authentication sequence, TLS performs a Diffie-Hellman key exchange so that both sides can obtain random shared session keys. From the common key, TLS uses a pseudorandom generator to create all the keys it needs for encryption and integrity.
Data integrity: Ensure that we can detect if data in transit has not been modified and new data has not been injected. TLS includes an HMAC function based on the SHA-256 hash for each message.
Authentication: TLS authenticates the endpoints prior to sending data. Authentication can be unidirectional (the client may just authenticate the server) or bidirectional (each side authenticates the other). TLS uses public key cryptography and X.509 digital certificates as a trusted binding between a user’s public key and their identity.
Interoperability & evolution: TLS was designed to support different key exchange, encryption, integrity, & authentication protocols. The start of each session enables the protocol to negotiate what protocols to use for the session.

TLS sub-protocols

TLS operates through two main phases:

the handshake protocol and the record protocol.

The handshake protocol (authentication and setup):
During the handshake protocol, the client authenticates the server using X.509 digital certificates and digital signatures. They then use Ephemeral Diffie-Hellman key exchange to create a common key. This provides forward secrecy to the communication session.
The record protocol (communication):
Following the handshake, the record protocol encrypts application data using the agreed-upon symmetric encryption algorithm, ensuring confidentiality and using a hashed message authentication code (HMAC) to ensure message integrity as data is transmitted between the server and client.

Firewalls

A firewall protects the junction between an untrusted network (e.g., external Internet) and a trusted network (e.g., internal network). Two approaches to firewalls are packet filtering and proxies. A packet filter, or screening router, determines not only the route of a packet but whether the packet should be dropped based on contents in the IP header, TCP/UDP header, and the interface on which the packet arrived. It is usually implemented inside a border router, also known as the gateway router that manages traffic flow between the ISP and user’s network. The basic principle of firewalls is to never have a direct inbound connection from the originating host from the Internet to an internal host; all traffic must flow through a firewall and be inspected.

The packet filter evaluates a set of rules to determine whether to drop or accept a packet. This set of rules forms an access control list, often called a chain. Strong security follows a default deny model, where packets are dropped unless some rule in the chain specifically permits them.

First-generation packet filters implemented stateless inspection. A packet is examined on its own with no context based on previously-seen packets.

Second-generation packet filters track TCP connections and other information from previous connections. These stateful packet inspection (SPI) firewalls allow the router to keep track of outstanding TCP connections. For instance:

They can block TCP data traffic if a connection setup did not take place to avoid sequence number prediction attacks.
They can track that a connection has been established by a client to a remote server and allow return traffic to that client (which is essential for any interaction by someone inside the network with external services).
They can track connectionless UDP and ICMP messages and allow responses to be sent back to clients in the internal network. DNS queries and pings (ICMP echo-reply messages) are examples of these.
They also and understand the relationship between packets. For example, when a client establishes an FTP (file transfer protocol) connection to a server on port 21, the server establishes a connection back to the client on a different port when it needs to send data.

Packet filters traditionally do not look above the transport layer (UDP and TCP protocols and port numbers).

Third-generation packet filters incorporate deep packet inspection (DPI), which allows a firewall to examine application data as well and make decisions based on its contents. Deep packet inspection can validate the protocol of an application as well as check for malicious content such as malformed URLs or other security attacks. DPI is often considered to be part of Intrusion Prevention Systems. Examples are detecting application-layer protocols such as HTTP and then applying application-specific filters, such as checking for suspicious URLs or disallowing the download of certain ActiveX or Java applets.

Deep Packet Inspection (DPI) firewalls evolved to Deep Content Inspection (DCI) firewalls. These use the same concept but are capable of buffering large chunks of data from multiple packets that contain an entire object and acting on it, such as unpacking base64-encoded content from web and email messages and performing a signature analysis for malware.

Application proxies

Application proxies act as intermediaries for specific applications. They inspect and filter traffic at the application layer, ensuring that only valid protocol traffic passes between networks. By validating data exchanges against known protocols, they enhance security by preventing protocol-specific attacks. When running on dual-homed hosts, these proxies benefit from an added layer of isolation; one network interface connects to the public network and the other to the private network, thereby controlling and monitoring all inbound and outbound communication effectively.

DMZs

In a typical firewalled environment using a screened subnet architecture, two distinct subnets are established: the DMZ (**demilitarized zone)** for externally accessible services like web and mail servers, and another for internal systems shielded from external access. Traffic control and security are enforced by screening routers. The exterior router manages access to the DMZ, filtering incoming traffic to allowed services, while the interior router controls traffic from the DMZ to the internal network, ensuring only necessary communications pass. This setup can be simplified using a single router with detailed filtering rules for each interface to accomplish the same function.

Deperimeterization and zero trust

The trustworthiness of systems in internal networks diminished as people would move their laptops and phones between different environments, users would install random software on their systems, systems had to access cloud services, remote work become common, and there was an increased likelihood of malware getting installed on any computers in a company’s network. The breakdown of a secure boundary between a trusted internal and untrusted external network is called deperimiterization.

This shift led to the development of the Zero Trust model, which does not assume internal network traffic is automatically safe. Instead, it enforces strict identity verification and least privilege access for every user and device, regardless of their location relative to the traditional network perimeter.

Host-based firewalls

Firewalls generally intercept all packets entering or leaving a local area network. A host-based firewall, on the other hand, runs on a user’s computer. Unlike network-based firewalls, a host-based firewall can associate network traffic with individual applications. Its goal is to prevent malware from accessing the network. Only approved applications will be allowed to send or receive network data. Host-based firewalls are particularly useful in light of deperimiterization.. A concern with host-based firewalls is that if malware manages to get elevated privileges, it may be able to shut off the firewall or change its rules.

Intrusion detection/prevention systems

An enhancement to screening routers is the use of intrusion detection systems (IDS). Intrusion detection systems are often parts of DPI firewalls and try to identify malicious behavior. There are three forms of IDS:

A protocol-based IDS validates specific network protocols for conformance. For example, it can implement a state machine to ensure that messages are sent in the proper sequence, that only valid commands are sent, and that replies match requests.
A signature-based IDS is similar to a PC-based virus checker. It scans the bits of application data in incoming packets to try to discern if there is evidence of “bad data”, which may include malformed URLs, extra-long strings that may trigger buffer overflows, or bit patterns that match known viruses.
An anomaly-based IDS looks for statistical aberrations in network activity. Instead of having predefined patterns, normal behavior is first measured and used as a baseline. An unexpected use of certain protocols, ports, or even amount of data sent to a specific service may trigger a warning.

Anomaly-based detection implies that we know normal behavior and flag any unusual activity as bad. This is difficult since it is hard to characterize what normal behavior is, particularly since normal behavior can change over time and may exhibit random network accesses (e.g., people web surfing to different places). Too many false positives will annoy administrators and lead them to disregard alarms.

A signature-based system employs misuse-based detection. It knows bad behavior: the rules that define invalid packets or invalid application layer data (e.g., ssh root login attempts). Anything else is considered good.

Intrusion Detection Systems (IDS) monitor traffic entering and leaving the network and report any discovered problems. Intrusion Prevention Systems (IPS) serve the same function but are positioned to sit between two networks like a firewall and can actively block traffic that is considered to be a threat or policy violation.

Type	Description
Firewall (screening router)	1^st generation packet filter that filters packets between networks. Blocks/accepts traffic based on IP addresses, ports, protocols
Stateful inspection firewall	2^nd generation packet filter. Like a screening router but also takes into account TCP connection state and information from previous connections (e.g., related ports for TCP)
Deep Packet Inspection firewall	3^rd generation packet filter. Examines application-layer protocols
Application proxy	Gateway between two networks for a specific application. Prevents direct connections to the application from outside the network. Responsible for validating the protocol
IDS/IPS	Can usually do what a stateful inspection firewall does + examine application-layer data for protocol attacks or malicious content
Host-based firewall	Typically screening router with per-application awareness. Sometimes includes anti-virus software for application-layer signature checking
Host-based IPS	Typically allows real-time blocking of remote hosts performing suspicious operations (port scanning, ssh logins)

Web security

Early Web Browsers: Initially, browsers could only deal with static content. Because of this, they weren’t a useful target of attacks and security efforts were mainly directed at server-side attacks through malformed URLs, buffer overflows, and similar vulnerabilities.

Modern Browsers: As browsers evolved, they became more complex, with support for cookies, JavaScript, DOM, CSS, AJAX, WebSockets, and multimedia. All this introduces new security challenges since scripts can communicate over the network, access page contents, and modify them. WebAssembly and Google Native Client (NaCl) enable the execution of sandboxed binary software in browsers, enhancing performance but providing additional challenges in ensuring isolation and proper behavior.

Web security model

The web security model is designed to protect both users and providers of web applications by managing how scripts interact with different web resources. Central to this model is the Same-Origin Policy, which allows scripts running on web pages to only access data from the same site that delivered them.

The term same-origin refers to a policy where two resources are considered to be of the same origin if they have the same scheme (protocol), hostname, and port number. The policy helps to prevent malicious scripts on one site from obtaining access to sensitive data on another site through the user’s browser, thereby protecting user data and privacy.

Under the same-origin policy, each origin has access to common client-side resources that include:

Cookies: Key-value data that clients or servers can set. Cookies associated with the origin are sent with each http request.
JavaScript namespace: Any functions and variables defined or downloaded into a frame share that frame’s origin.
DOM tree: This is the JavaScript definition of the HTML structure of the page.
DOM storage: Local key-value storage.

Any JavaScript code downloaded into a frame will execute with the authority of its frame’s origin. For instance, if cnn.com loads a script from jQuery.com, the script runs with the authority of cnn.com.

Passive content, which is non-executable content such as CSS files and images, has no authority. This normally should not matter as passive content does not contain executable code but there have been attacks in the past that had code in passive content and made that passive content turn active.

Cross-origin content

A page may load content from multiple origins. The same-origin policy defines that JavaScript code loaded from anywhere runs with the authority of the frame’s origin. Content from other origins is generally not readable or writable by JavaScript. For example:

A frame can load images from other origins but cannot inspect that image.
A frame may embed CSS from any origin but cannot inspect the CSS content.
A frame can load JavaScript, which executes with the authority of the frame’s origin but if the code is downloaded from a different origin, it is executable but not readable.

Cross-Origin Resource Sharing (CORS) is a security feature that allows web applications running at one origin to request resources from a different origin. This is an extension of the Same-Origin Policy that restricts such cross-origin requests by default. CORS provides a way for server administrators to specify who can access their resources and under what conditions. This is done through the use of HTTP headers that send browsers an identification of sites that should be considered to be treated as if they share the same origin. For example, when a user downloads a page, a server on example.com can send an HTTP header that contains:

Access-Control-Allow-Origin: http://www.example.com

which tells the browser that the URL http://www.example.com will be treated as the same origin as the frame’s URL (e.g., http://example.com).

Cookies

Cookies are small pieces of data, name-value sets, sent from a website and stored on a user’s web browser. Every time the user loads the website, the browser sends relevant cookies back to the server to notify the website of the user’s previous activity.

Cookies serve three primary purposes on the web:

Session Management: Cookies can store login information, shopping cart data, and other details that keep track of user sessions, allowing users to pick up where they left off on previous visits without needing to re-enter information.

Personalization: They store user preferences, such as themes, language settings, and location, to tailor the browsing experience to the user’s needs and preferences.

Tracking: Cookies are used to monitor and analyze user behavior over time, helping websites and advertisers gather insights into browsing habits, which can be used for targeted advertising and optimizing the user experience.

There are two main types of cookies based on their lifetime:

Session cookies: These are temporary cookies that remain in the cookie file of your browser until you leave the site.

Persistent cookies: These remain in the cookie file of your browser for much longer (though how long will depend on the specified lifetime of the specific cookie). They are used to remember your preferences within an application and remain on your desktop after you close your browser.

A browser will handle cookies for multiple web sites (origins) and various parts of a site.

Browsers send and receive cookies but cookies don’t quite use the same concept of an origin. Cookies are bound by a scope that includes the domain and path where they were set. A cookie associated with a specific domain and path will only be sent to the server when a request is made that matches its scope. The domain attribute specifies which domain the cookie belongs to, while the path attribute restricts the cookie to a specific directory. A server at example.com might set a cookie with a path of /blog to ensure that the cookie is only sent when accessing parts of the site within the /blog directory. This provides a degree of isolation that can prevent cookies from being sent across different contexts, which can be important for security and compartmentalization of user sessions and preferences.

Third-party cookies: Placed by websites other than the one you are currently visiting, often by advertisers to track browsing history.

Security implications arise because cookies can store sensitive information such as user IDs, passwords, login state, and other personal details that might be exploitable. To enhance security, cookies often incorporate:

HttpOnly flag: This makes the cookie inaccessible to client-side scripts, reducing the risk of cross-site scripting (XSS) attacks.
Secure flag: This restricts the transmission of cookies to secure (HTTPS) connections, preventing them from being intercepted during the transmission over unsecured networks.

Cross-Site Request Forgery (XSRF)

Cross-Site Request Forgery (CSRF) is an attack that tricks a web browser into executing an unwanted action on a web service where a user is authenticated. An attacker crafts a malicious website or email with requests to a vulnerable web service where the user is already logged in. When the user interacts with the malicious content, the browser makes requests to the application, sending cookies with the user’s credentials, as if the user themselves made the request.

For example, if a user is logged into their banking site and then clicks on a deceptive link that requests a funds transfer, the banking site might process that request as legitimate. This vulnerability exploits the trust that a web application has in the user’s browser, and the mitigation often includes implementing anti-forgery tokens which must accompany each transaction, ensuring that the request was intentionally made by the user.

There are several defenses against Cross-site request forgery:

The server can validate the Referer header on the request. This will tell it whether the request came via a link or directly from a user (or from a link on a trusted site).
The server can require some unique token (an anti-CSRF token) to be present in the request. For instance, visiting netflix.com might cause the Netflix server to return a token that must be passed to any successive URL. An attacker will not be able to create a static URL on her site that will contain this random token.
The interaction with the server can use HTTP POST requests instead GET requests, placing all parameters into the body of the request rather than in the URL. State information can be passed via hidden input fields instead of cookies. This doesn’t solve the problem but gives the attacker the challenge of getting the victim to click on a malicious web page that can run a script to post a request rather than simply present a URL that contains parameters for the desired action.

Clickjacking

Clickjacking is a malicious technique of tricking a web user into clicking on something different from what the user perceives, effectively hijacking the clicks meant for another page. This is done by overlaying a transparent iframe over a visually appealing element, such as a video play button or a survey form. The user believes they are interacting with the genuine site, but the click is being routed to a hidden frame, leading to potential unauthorized actions, such as liking a page, sharing personal information, or enabling microphone access.

There are several ways for a web programmer to defend against clickjacking. JavaScript code can be added to a web page to prevent it from being framed. This script checks if the current window is the topmost window, and if it’s not, it can force the page to break out of the frame. Alternatively, an HTTP header can indicate whether a browser should be allowed to render a page in an iframe.

Input sanitization problems

Any user input needs to be parsed carefully before it can be made part of a URL, HTML content, or JavaScript. Consider a script that is generated with some in-line data that came from a malicious user:

<script> var x = "untrusted_data"; </script>

The malicious user might define that untrusted_data to be

Hi"; </script> <h1> Hey, some text! </h1> <script> malicious code... x="Bye

The resulting script to set the variable x now becomes

<script> var x = "Hi"; </script> <h1> Hey, some text! </h1> <script> malicious code... x="Bye"; </script>

Cross-site scripting

Cross-site scripting (XSS) is a web security vulnerability that enables attackers to inject malicious scripts into content from otherwise benign and trusted websites. This is achieved by manipulating web applications to return user input as part of their pages. XSS attacks occur when an application includes untrusted data without proper validation or escaping, allowing attackers to execute scripts in the victim’s browser context. This can lead to hijacked sessions, defacement of websites, or redirection to malicious sites.

The two main types of XSS are reflected and persistent (stored). Reflected XSS attacks involve crafting a URL that contains the malicious script. When the URL is visited, the script executes within the context of the user’s browser. Persistent XSS, on the other hand, stores the malicious script in the web server (like in a database – for example, user reviews or blogs), and the script is then served to users in the content of a normal page.

XSS can be avoided via input sanitization, which is the process of cleaning and validating user input, ensuring it’s safe for processing by the web application and not executable as code. This often involves escaping special characters, stripping out unwanted script elements, or using secure input handling frameworks to prevent the insertion of untrusted HTML content.

Homograph (homoglyph) attacks

Homograph attacks take advantage of characters that look alike used to deceive users. For example, the domain “paypaI.com” where the last letter is a capital ‘I’ instead of an ‘l’, mimicking “paypal.com” can be used in phishing scams to make users believe they are going to a valid website.

Unicode is a comprehensive system designed to represent over 128,000 characters, covering almost all of the world’s scripts and symbols, including alphabets like Latin, Greek, Cyrillic, and scripts for languages such as Arabic, Hindi, Chinese, and many more, along with emojis and ancient scripts.

Unicode’s design, which allows visually similar or identical characters from different scripts, poses risks for deception attacks. The introduction of IDNs (Internationalized Domain Names) allows the use of Unicode characters in domain names, which has further facilitated deceptive practices by enabling the creation of domain names that visually mimic legitimate ones but use characters from different scripts. For instance, using a Cyrillic ‘a’ instead of the Latin ‘a’ can mislead users into thinking they are accessing a familiar website. The characters look identical but are different to a DNS service.

Websites like “wikipedia.org” can be mimicked using characters from non-Latin scripts, such as Greek or Cyrillic, to create visually indistinguishable yet technically different URLs, misleading users and potentially leading to phishing or other forms of cyber fraud.

Tracking via images

The same-origin policy treats images as static content with no authority. It would seem that images should not cause problems. However, an image tag (IMG) can pass parameters to the server, just like any other URL:

<img src="http://evil.com/images/balloons.jpg?extra_information" height="300" width="400"/>

The parameter can be used to notify the server that the image was requested from a specific page. Unlike cookies, which can sometimes be disabled, users will not block images from loading.

An image itself can be hidden by setting its size to a single pixel … and even making it invisible:

<img src="https://attacker.com/onebyone.png" height="1" width="1" />

These images are called tracking pixels or spy pixels.

When a browser loads an image:

The server that hosts the image is contacted with an HTTP GET request for the content.
Any cookies for that server will be sent by the browser.
Any extra information that’s part of the image URL will be sent. This information can, for example, identify the website or page that is hosting the content.
The server logs the time and IP address that requested the image.
The HTTP headers also identify the browser version, operating system, and type of device.

A server can use the image data to identify the specific page and read a cookie to get a unique ID for the user. The ID can be used as a key for an object store or database and store every page a user visited. That enables tracking the user’s visits across different pages.

Mobile device security

What makes mobile devices unique?

In many ways, mobile devices should not be different from laptops or other computer systems. They run operating systems that are derived from those those systems, run multiple apps, and connect to the network. There are differences, however, that make them more attractive targets to attackers.

Users

Several user factors make phones different from most computing devices:

Mobile users often do not think of their phones as real computers. They may not have the same level of paranoia that malware may get in or their activities may be monitored.
Users tend to install a lot more apps on their phones than they do on their computers. These apps are more likely to be from unknown software vendors than those installed on computers.
Social engineering may work more easily on phones. People are often in distracted environments when using their phones and may not pay attention to realize they are experiencing a phishing attack.
Phones are small. Users may be less likely to notice some security indicators, such as an EV certificate indicator. It is also easier to lose the phone … or have it stolen.
A lot of phones are protected with bad PINs. Four-digit PINs still dominate and, as with passwords, people tend to pick bad ones – or at least common ones. In fact, four PINs (1234, 1111, 0000, 1212, 7777) account for over 20% of PINs chosen by users.
While phones have safeguards to protect resources that apps can access users may grant app permission requests without thinking: they will just click through during installation to get the app up and running.

Interfaces

Phones have many sensors built into them: GSM, Wi-Fi, Bluetooth, and NFC radios as well as a GPS, microphone, camera. 6-axis gyroscope and accelerometer, and even barometer. These sensors can enable attackers to monitor the world around you: identify where you are and whether you are moving. They can record conversations and even capture video. The sensors are so sensitive that it has been demonstrated that a phone on a desk next to a keyboard can pick up vibrations from a user typing on the neighboring keyboard. This led to a word recovery rate of 80%.

Apps

There are a lot of mobile apps. Currently, there are about 2.6 million Android apps and 2.2 million iOS apps. Most of these apps are written by unknown, and hence untrusted, parties. We would be be wary of downloading many of these on our PCs but think nothing of doing so on our phones. We place our trust in several areas:

The testing & approval process by Google (automated) and Apple (automated + manual)
The ability of the operating system to sandbox an application
The operating system’s requirement of users granting permissions to access certain resources.

This trust may be misplaced as the approval process is far from foolproof. Overtly misadvertised or malicious apps can be detected but it is impossible to discern what a program will do in the future. Sandboxes have been broken in the past and users may be too happy to grant permissions to apps. Moreover, apps often ask for more permissions than they use. For example, a security researcher surveyed flashlight apps available for Android and discovered that, of the 937 apps surveyed, the majority requested an average of 25 permissions per app.

Most apps do not get security updates. There is little economic incentive for a developer to support existing apps, particularly if newer ones have been deployed.

Platform

Mobile phones are comparable to desktop systems in complexity. In some cases, they may even be more complex. This points to the fact that, like all large systems, they will have bugs and some of these bugs will be security sensitive. For instance, in late March, 2017, Apple released an upgrade for iOS, stating that they fixed over 80 security flaws. This is almost 10 years after the release of the iPhone. You can be certain there are many more flaws lurking in the system and more will be added as new features are introduced.

Because of bugs in the system, malicious apps may be able to get root privileges. If they do, they can install rootkits, enabling long-term control while concealing their presence. A lot of malicious iOS apps, for instance, gained root privileges by exploiting heap overflow vulnerabilities.

Unlike desktop systems and laptops, phones enforce a single user environment. Although PCs are usually used as single-user systems, they support multiple user accounts and run a general-purpose timesharing operating system. Mobile devices are more carefully tuned to the single-user environment.

Threats

Mobile devices are are threats to personal privacy as well as at risk of traditional security violations. Personal privacy threats include identifying users and user location, accessing the camera and microphone, and leaking personal data from the phone over the network. Additional threats include traditional phishing attacks, malware installation, malicious Android intents (messages to other apps or services), and overly-broad access to system resources and sensors.

Android security

Android was conceived as an operating system for network-connected smart mobile devices. The company was acquired by Google in 2005 and the engineering effort shifted to developing it as a Linux-based operating system platform that would be provided for free to third-party phone manufacturers. Google would make money from services and apps.

Applications on Android had specific needs, some of which have not been priorities in the design of desktop operating systems. These include:

Integrity: The platform should ensure that the app is not modified between the time of its creation and its installation by users.
Isolation: Each app needs private data and app components as well as the private data need to be protected from other apps.
Sharing: Apps may need access to shared storage and devices, including the network. This includes the on-device file system, external storage, communication interfaces, and the various sensors available on phones and other smart devices.
Inter-app services: An app needs to be able to send messages to other apps to interact with services – but only when it is permitted to do so. This affects the design of apps. Desktop apps have a single launching point and generally run as single, monolithic process. Android apps, on the other hand, contain multiple independent components that include activities (user-facing parts), services (background components), and content providers (data access components).
Portability: The previous needs are general and apply to other mobile platforms, such as iOS. Since Android was designed as an operating system for a variety of hardware, apps should be able to run on different hardware architectures. A decision was made that apps should be distributed in a portable, architecture-neutral format. Apps run under the ART (Android Runtime) virtual machine, which is a variant of of the Java virtual machine (JVM). The original intention in Android was that apps would be written only in Java but it soon became clear that support for native (C and C++) code was needed and Google introduced the Native Development Kit to support this.

App package

An Android app is distributed as a package in an APK format, which is a single zip-format compressed file that contains several components:

Activity

Code for the user-visible component - the interface. The code component includes compiled code and needed resource files, such as strings, images, UI layouts, and data.

Service

Code background component - services that the app may offer to other programs

Content provider

A database for whatever persistent data the app needs to store. It implements APIs to provide access to structured data. For instance, user contacts will be stored in a content provider.

Broadcast receiver

Mailbox for received messages

Package manifest (META-INF)

This contains items needed to validate the origin of the package and that it has not been tampered:

Signed list of hashes
Application creator’s certificate

Application manifest

This enumerates what makes up the application and includes:

Name (e.g., com.example.myapp)
Components list
Device requirements
Intents: interfaces the app supports to activate services & activities
Permissions: access to services this app requires (e.g., android.permission.SEND_SMS)
Permissions other apps need to access services this app provides

App Integrity

All Android apps must be signed by the developer. This allows both developers and users to know that the application will be installed without modifications on their device. On installation, the Android Package Manager verifies the signature.

Applications do not have to be signed by any central authorities. Developers can use a self-signed certificate and Android does not perform verification of CAs (certification authorities) in the certificate.

Prior to distribution, the contents of the APK package are hashed and signed by the developer and then the signature along with the developer’s certificate is inserted into the APK.

The app package is verified prior to installation by creating a hash and validating it against the signature in the package. If the app is distributed through the Google Play store, Google performs the same checks.

For additional protection, Google optionally supports a feature called Google Play Protect. This validates the app before it is downloaded and checks the user’s device for potential malware. It warns the user about malware or any apps that violate Google’s Unwanted Software Policy, such as apps that hide or misrepresent information.

The Android app sandbox

Android relies on process sandboxing for most of its security. Android is based on Linux, which is a multi-user operating system. Under Linux, each user has a distinct user ID and all apps run by that user run with the privileges of the user (ignoring setUID apps). This allows any one app full access to all user data.

User IDs

Android supports only a single user and uses Linux user IDs for isolating app privileges. Under Android, each app normally runs under a different user ID. Hence, apps are isolated and can only access their resources. Access requests to other objects involve messages that pass through a gatekeeper, which validates access requests.

Core Android services also run in a similar manner, with each service running under its own unique user ID. For example:

user id	service
1001	Telephony
1002	Bluetooth
1003	Graphics
1004	Input devices
1005	Audio

Related apps may share the same Linux user ID if a sharedUserID attribute is set to the same domain for two or more applications as long as those apps are also signed by the same certificate. This would allow these related apps to share files and they can be configured to even share the same Dalvik virtual machine.

File permissions

Two mechanisms are used to enforce file access permissions:

Linux file permissions These provide discretionary access control, allowing the owner (and root) to change permissions to allow others access to the files. With this mechanism, an app can decide to share a data file.
SELinux mandatory access control Certain data and cache directories in Android are protected with the SELinux (Security-Enhanced Linux) mandatory access control (MAC) kernel extension. This ensures that even the owner cannot change access permissions for the files.

Internal storage provides a per-app private directory for files used by each application. External storage (e.g., attached microSD cards or USB devices) is shared among all apps and, of course, may be moved to other computers.

Intents

Android apps communicate with system services, between app components, and with other apps via intents. Intents are messaging objects. An intent is a message that contains: - requested action - data being sent to the action - the component and app that should handle the intent

Intents are declarations of app capabilities and the messaging format used to communicate with an app. They identify app components and how those components are started (e.g., foreground or background and what the entry points are).

Intents allow an app to:

Start a service (background task)
Start an activity (start a user-facing foreground task, such as a camera or phone)
Deliver notifications to one or more apps (broadcasts)

An app lists its available intents in its app manifest and these intents are registered when the app is installed. The intents form the list of services that the app exposes to other applications. If several apps register the same intent, the user selects which app should be launched. For example, you may have multiple browsers installed and are asked to pick the one that should be associated with implicit intents to display a URL.

Intents may explicit or implicit. With explicit intents, the app identifies the target component in the intent. That is, the intent is sent to a specific, named app. With implicit intents, the app asks Android to find a component based on the type of data being sent. For example, sending a URL to display a web page can be an explicit intent that will cause Android to open the default web browser.

Intents are used to invoke system services as well as services available on any installed apps. Common examples of intents are: add a calendar event, set an alarm, take a photo & return it, view a contact, add a contact, show a location on a map, retrieve a file, or initiate a phone call.

Intents pass through & are validated by Android’s gatekeeper.

Permissions

An app manifest also contains permissions, which can specify which apps or services are allowed access to the app’s services and whether a user needs to be prompted to grant permission. Permissions determine whether one app is allowed to access another app’s component.

Apps need permissions to access any services, which include:

System resources: logs, battery levels, …
System interfaces: Internet, Bluetooth, send SMS, send email, …
Sensitive data: SMS messages, contacts, email, …
Any app-defined services

Every service, whether a normal app or a system service is assigned a protection level that determines who may be able to access it:

Permission	Type
Normal	this is the default; there is no danger to the users or system if this service is accessed
Dangerous	Access can compromise the system or privacy. The user has to approve access during installation or runtime
Signature	Access granted only if the app signed by the same developer & contains the same certificate. This allows related apps to share services (e.g., Microsoft Office apps)
SignatureOrSystem	Similar to signature_ but access will be granted if a system application is requesting it

The application manifest file defines the type of permission and the service that is associated with permission name.

Permissions are managed in two forms:

Permission text strings: These are enforced by Android middleware. Sensitive resources such as the phone are only accessible via APIs and access is mediated through these APIs.
Linux group IDs: Group permissions are enforced by Linux file access checks. For efficiency, networking and file access operations do not go through APIs but directly to Linux. This includes access to Bluetooth, Wi-Fi, and external storage. To be able to access resources, the app needs to be a member of the group that corresponds to the resource. Android dynamically adds user IDs to various groups based on what permissions are granted to them.

Other protections

The Linux operating system provides per-process memory isolation and address space layout randomization (ASLR). Linux also uses no-execute (NX) protection on stack and heap memory pages if the processor supports it

The Java compiler provides provides stack canaries, and its memory management libraries provide some heap overflow protections (checks of backward & forward pointers in dynamically allocated structures).

Android supports whole disk encryption so that if a device is stolen, an attacker will not be able to easily recover file contents even with raw access to the flash file system.

Unlike iOS, Android supports the concurrent execution of multiple apps. It is up to the developer to think about being frugal with battery life. Apps store state their state in persistent memory so they can be stopped and restarted at any time. This ability to stop an app also helps with DoS attacks as the app is not accepting requests or using system resources.

Security concerns

An app can probe whether another app has specific permissions by specifying a permission with an intent method call to that app. This can help an attacker identify a target app. Receivers need to be able to handle malicious intents, even for actions they do not expect to handle and for data that might not make sense for the action.

Apps may also exploit permissions re-delegation. An app, not having a certain permission, may be able gain those privileges by communicating through another app. If a public component does not explicitly have an access permission listed in its manifest definition, Android permits any app to access it. For example, the Power Control Widget (a default Android widget) allows third-party apps to change protected system settings without requesting permissions to control those settings. This is done by presenting the user with a pop-up interface to control power-related settings. A malicious app can send a fake intent to the Power Control Widget while simulating the pressure of the widget button to switch power-related settings. It is effectively simulating a user’s actions on the screen.

By using external storage, apps can exercise permissions avoidance. By default, all apps have access to external storage. Many apps store data in external storage without specifying any protection, allowing other apps to access that data.

Another way permissions avoidance is used is that Android intents allow opening some system apps without requiring permission to do so. These apps include the camera, SMS, contact list, and browser. For instance, opening a browser via an intent can be dangerous since it enables data transmission, receiving remote commands, and even downloading files without user intervention.

iOS security

App signing

iOS requires mandatory code signing. Unlike Android, which accepts self-signed certificates, the app package must be signed using an Apple Developer certificate and apps are only available for This does not ensure trustworthiness of an app but identifies the registered developer and ensures that the app has not been modified after it has been signed.

Runtime protection

Apple’s iOS provides runtime protection via OS-level sandboxing. System resources and the kernel are shielded from user apps. The sandbox also limits which system calls an app can call Except through kernel exploits, an app cannot leave its sandbox.

The app sandbox restricts the ability of one app to access another app’s data and resources. Each app has its own sandbox directory. The OS enforces the sandbox and permits access only to files within that directory, as well as restricted access to to system preferences, the network, and other resources.

Inter-app communication can take place only through iOS APIs. Code generation by an app is prevented because data memory pages cannot be made executable and executable memory pages are not writable by user processes.

Data protection

All file contents are encrypted with a unique 256-bit AES per-file key, which is generated when the file is created.

This per-file key is encrypted with a class key and is stored along with the file’s metadata, which is the part of the file system that describes attributes of the file, such as size, modification time, and access permissions.

The class key is generated from a hardware key in the device and the user’s passcode. Unless the passcode is entered, the class key cannot be created and the file key cannot be decrypted.

The file system’s metadata is also encrypted. A file system key is used for this, which is derived directly from the hardware key, which is generated when iOS is installed. Keys are stored in Apple’s Secure Enclave, a separate processor and isolated memory that cannot be accessed directly by the main processor. Encrypting metadata encrypts the entire structure of the file system. Someone who rips out the flash memory from an iOS device and examines it will not be able to see neither file contents (they are encrypted with per-file keys) nor information about those files (the metadata is encrypted with a file system key).

A hardware AES engine encrypts and decrypts the file as it is written/read on flash memory so file encryption is done transparently and efficiently.

The iOS kernel partition is mounted read-only, so even if an app manages to break out of its sandbox due to some vulnerability and gain root access, it will still not have permission to modify the kernel.

Communication

The iOS sandbox restricts apps from accessing files stored by other apps or making changes to device settings. Each app is given a unique home directory for its files. System files and resources are shielded from the user’s apps

Unlike Android, where each app is assigned a unique user ID, apps under iOS run as a non-privileged user “mobile.”

The iOS framework grants entitlements to apps. These are digitally-signed key-value pairs that are granted to an app to allow access to specific services. It is essentially a capability. If you have an entitlement, then you can access a service.

Kernel protection

In addition to the sandbox, iOS also uses address space layout randomization (ASLR) and memory execute protection for stack and heap pages via ARM’s Execute Never (XN) memory page flag.

Masque attacks

While Apple normally expects users to install apps only from its App Store, users need to be able to deploy pre-production apps to friendly parties for testing and enterprises may need to deploy in-house apps to their employees. Apple supports a Developer Enterprise Program to create and distribute such in-house apps. This mechanism has been used to replace existing apps with private versions. The vulnerability has been patched.

iOS has been hit several times with masque attacks. While there have been various forms of these, the basic attack is to get users to install malicious apps that have been created with the same bundle identifier as some exiting legitimate app. This malicious app replaces the legitimate app and masquerades as that app. Since Apple will not host an app with a duplicate bundle identifier, the installation of these apps has to bypass the App Store. Enterprise provisioning is used to get users to install this. which typically requires the user going to a URL that redirects the user to an XML manifest file hosted on a server. The ability to launch this attack is somewhat limited as the user will generally need to have an enterprise certificate installed to make the installation seamless.

Web apps

Both iOS and Android have full web browsers that can be used to access web applications. They also permit web apps to appear as a regular app icon. The risks here are the same as those for web browsers in general: loading untrusted content and leaking cookies and URLs to foreign apps.

Mobile-focused web-based attacks can take advantage of the sensors on phones. The HTML5 Geolocation API allows JavaScript to find your location. A Use Current Location permission dialog appears, so the attacker has to hope the user will approve but there the attacker can provide incentives via a Trojan horse approach: provide a service that may legitimately need your location.

Recently, a proof of concept web attack showed how JavaScript could access the phone’s accelerometers to detect movements of the phone as a user enters a PIN. The team that implemented this achieved a 100% success rate of recognizing a four-digit PIN within five attempts of a user entering it. Apple patched this specific vulnerability but there may be more undiscovered ones.

Hardware support for security

All Android and iOS phones currently use ARM processors. ARM provides a dedicated security module, called TrustZone, that coexists with the normal processor. The hardware is separated into two “worlds”: secure (trusted) and non-secure (non-trusted) worlds. Any software resides in only one of these two worlds and the processor executes in only one world at a time.

Each of these worlds has its own operating system and applications. Android systems run an operating system called Trusty TEE in the secure world and, of course, Linux in the untrusted world.

Logically, you can think of the two worlds as two distinct processors, each running their own operating system with their own data and their own memory. Non-secure applications cannot access any own memory or registers of secure resources directly. The only way they can communicate is through a messaging API.

In practice, the hardware creates two virtual cores for each CPU core, managing separate registers and all processing state in each world.

The phone’s operating system and all applications reside in the non-trusted world. Secure components, such as cryptographic keys, signature services, encryption services, and payment services live in the trusted world. Even the operating system kernel does not have access to any of the code or data in the trusted world. Hence, even if an app manages a privilege escalation attack and gains root access, it will be unable to access certain security-critical data.

Applications for the trusted world include key management, secure boot, digital rights management, secure payment processing, mobile payments, and biometric authentication.

Apple Secure Enclave

Apple uses modified ARM processors for iPhones and iPads. In 2013, they announced Secure Enclave for their processors. The details are confidential but it appears to be similar in function to ARM’s TrustZone but designed as a physically separate coprocessor. As with TrustZone, the Secure Enclave coprocessor runs its own operating system (a modified L4 microkernel in this case).

The processor has its own secure bootloader and custom software update mechanism. It uses encrypted memory so that anything outside the Secure Enclave cannot access its data. It provides:

All cryptographic operations for data protection & key management.
Random number generation.
Secure key store, including Touch ID (fingerprint) and the Face ID neural network.
Data storage for payment processing.

The Secure Enclave maintains the confidentiality and integrity of data even if the iOS kernel has been compromised.

Steganography and Watermarking

Cryptography’s goal is to hide the contents of a message. Steganography’s goal is to hide the very existence of the message. Classic techniques included the use of invisible ink, writing a message on one’s head and allowing the hair to cover it, microdots, and carefully-clipped newspaper articles that together communicate the message.

A null cipher is one where the actual message is hidden among irrelevant data. For example, the message may comprise the first letter of each word (or each sentence, or every second letter, etc.). Chaffing and winnowing entails the transmission of a bunch of messages, of which only certain ones are legitimate. Each message is signed with a key known only to trusted parties (e.g., a MAC). Intruders can see the messages but can’t validate the signatures to distinguish the valid messages from the bogus ones.

Messages can be embedded into images. There are a couple of ways of hiding a message in an image:

A straightforward method to hide a message in an image is to use low-order bits of an image, where the user is unlikely to notice slight changes in color. An image is a collection of RGB pixels. You can mess around with the least-significant bits and nobody will notice changes in the image, so you can just encode the entire message by spreading the bits of the message among the least-significant bits of the image.
You can do a similar thing but apply a frequency domain transformation, like JPEG compression does, by using a Discrete Cosine Transform (DCT). The frequency domain maps the image as a collection ranging from high-frequency areas (e.g., “noisy” parts such as leaves, grass, and edges of things) through low-frequency areas (e.g., a clear blue sky). Changes to high frequency areas will mostly be unnoticed by humans: that’s why jpeg compression works. It also means that you can add your message into those areas and then transform it back to the spatial domain. Now your message is spread throughout the higher-frequency parts of the image and can be extracted if you do the DCT again and know where to look for the message.

Many laser printers embed a serial number and date simply by printing very faint color splotches.

Steganography is closely related to watermarking. and the terms “steganography” and “watermarking” are often used interchangeably.

The primary goal of watermarking is to create an indelible imprint on a message such that an intruder cannot remove or replace the message. It is often used to assert ownership, authenticity, or encode DRM rules. The message may be, but does not have to be, invisible.

The goal of steganography is to allow primarily one-to-one communication while hiding the existence of a message. An intruder – someone who does not know what to look for – cannot even detect the message in the data.

References

Injection

SQL Injection, The Open Web Application Security Project, April 10, 2016.

SQL Injection, Acunetix.

Simson Garfinkel & Gene Spafford, Section 11.5, Protecting Yourself, Practical UNIX & Internet Security, Second Edition, April 1996. Discusses shell attacks.

Directory traversal attack, Wikipedia.

Why does Directory traversal attack %C0%AF work?, Information Security Stack Exchange, September 9, 2016

Tom Rodriquez, What are unicode vulnerabilities on Internet Information Server (IIS)?, SANS.

The Unicode Consortium.

IDN homograph attack, Wikipedia.

Time of check to time of use, Wikipedia.

Michael Cobb, How to mitigate the risk of a TOCTTOU attack, TeachTarget, August 2011.

Ernst & Yount LLP Security & Technology Solutions, Using Attack Surface Area And Relative Attack Surface Quotient To Identify Attackability. Customer Information Paper.

Michael Howard, Back to the Future: Attack Surface Analysis and Reduction, Microsoft Secure Blog, February 14, 2011.

Olivier Sessink, Jailkit, November 18, 2015.

Confinement

Evan Sarmiento, Chapter 4. The Jail Subsystem, FreeBSD Architecture Handbook, The FreeBSD Documentation Project. 2001, Last modified: 2016–10–29.

Matteo Riondato, Chapter 14. System Administration: Jails, FreeBSD Architecture Handbook, The FreeBSD Documentation Project. 2001, Last modified: 2016–10–29.

Chapter 1. Introduction to Control Groups, Red Hat Enterprise Linux 6.8 Resource Management Guide.

José Manuel Ortega, Everything you need to know about Containers Security, OSDEM 2018 presentation.

Johan De Gelas, Hardware Virtualization: the Nuts and Bolts, AnandTech article, March 17 2008.

Note that sprintf is vulnerable to buffer overflow. We should use snprintf, which allows one to specify the maximum size of the buffer. ↩︎
Unix, Linux, macOS, FreeBSD, NetBSD, OpenBSD, Android, etc. ↩︎
the official Unicode name for the slash and backslash characters are solidus and reverse solidus, respectively. ↩︎
Note that Wikipedia and many other sites refer to this as “Version 7 Unix”. Unix has been under continuous evolution at Bell Labs from 1969 through approximately 1989. As such, it did not have versions. Instead, an updated set of manuals was published periodically. Installations of Unix have been referred to by the editions of their manuals. ↩︎
Linux capabilities are not to be confused with the concept of capability lists, which are a form of access control that Linux does not use). ↩︎
SHA-256 is the SHA-2 family of hash functions that produces a 256-bit output. The SHA-2 family also includes HA-224, SHA-256, SHA-384, and SHA-512. ↩︎