Exam 2 study guide

The one-hour study guide for exam 2

Paul Krzyzanowski

Latest update: Sun Mar 24 18:46:50 EDT 2019

Disclaimer: This study guide attempts to touch upon the most important topics that may be covered on the exam but does not claim to necessarily cover everything that one needs to know for the exam. Finally, don't take the one hour time window in the title literally.

Application Sandboxing

The goal of an application sandbox is to provide a controlled and restricted environment for code execution. This can be useful for applications that may come from untrustworthy sources, such as games from unknown developers or software downloaded from dubious sites. The program can run with minimal risk of causing widespread damage to the system. Sandboxes are also used by security researchers to observe how software behaves: what the program trying to do and whether it is attempting to access any resources in a manner that is suspicious for the application. This can help identify the presence of malware within a program. The sandbox defines and enforces what an individual application is allowed to do while executing in within its sandbox.

We previously looked at isolation via jails and containers, which use mechanisms that include namespaces, control groups, and capabilities. These constitute a widely-used form of sandboxing. However, these techniques focus on isolating an application (or group of processes) from other processes, restricting access to parts of the file system, and/or providing a separate network stack with a new IP address.

While this is great for running services without the overhead of deploying virtual machines, it does not sufficiently address the basic needs of running normal applications. We want to protect users from their applications: give users the ability to run apps but restrict what those apps can do on a per-app basis.

For example, you may want to make sure that a program accesses only files under your home directory with a suffix of “.txt”, and only for reading, without restricting the entire file system namespace as chroot would do, which would require creating a separate directory structure for shared libraries and other standard components the application may need. As another example, you might want an application to have access only to TCP networking. With a mechanism such as namespaces, we cannot exercise control over the names of files that an application can open or their access modes. Namespaces also do not allow us to control how the application interacts with the network. Capabilities allow us to restrict what a process running with root privileges can do but offer no ability to restrict more fundamental operations, such as denying a process the ability to read a file even if that file has read access enabled. The missing ingredient is rule-based policies to define precisely what system calls an application can invoke – down to the parameters of the system calls of interest.

Instead of building a jail (a container), we will add an extra layer of access control. An application will have the same view of the operating system as any other application but will be restricted in what it can do.

Sandboxing is currently supported on a wide variety of platforms at either the kernel or application level. We will examine four types of application sandboxes:

  1. User-level validation
  2. OS support
  3. Browser-based application sandboxing
  4. The Java sandbox

Note that there are many other sandbox implementations. This is just a representative sampling.

Application sandboxing via system call interposition & user-level validation

Applications interact with their environment via system calls to the operating system. Any interaction that an application needs to do aside from computation, whether legitimate or because it has been compromised, must be done through system calls: accessing files or devices, changing permissions, accessing the network, talking with other processes, etc.

An application sandbox will allow us to create policies that define which system calls are permissible to the application and in what way they can be used.

If the operating system does not provide us with the required support and we do not have the ability to recompile an application to force it to use alternate system call libraries, we can rely on system call interposition to construct a sandbox. System call interposition is the process of intercepting an app’s system calls and performing additional operations. The technique is also called hooking. In the case of a sandbox, it will intercept a system call, inspect its parameters, and decide whether to allow the system call to take place or return an error.

Example: Janus

One example of doing validation at the user level is the Janus sandboxing system, developed at UC Berkeley, originally for SunOS but later ported to Linux. Janus uses a loadable, lightweight, kernel module called mod_janus. The module initializes itself by setting up hooks to redirect system call requests to to itself. A hook is simply a mechanism that redirects an API request somewhere else and allows it to return back for normal processing. For example, a function can be hooked to simply log the fact that it has been called. The Janus kernel module copies the system call table to redirect the vector of calls to the mod_janus.

A user-configured policy file defines the allowable files and network operations for each sandboxed application. Users run applications through a Janus launcher/monitor program, which places the application in the sandbox. The monitor parses the policy file and spawns a child process for the user-specified program. The child process executes the actual application. The parent Janus process serves as the monitor, running a policy engine that receives system call notifications and decides whether to allow or disallow the system call.

Whenever a sandboxed application makes a system call, the call is redirected by the hook in the kernel to the Janus kernel module. The module blocks the thread (it is still waiting for the return from the system call) and signals the user-level Janus process that a system call has been requested. The user-level Janus process’ policy engine then requests all the necessary information about the call (calling process, type of system call, parameters). The policy engine makes a policy decision to determine whether, based on the policy, the process should be permitted to make the system call. If so, the system call is directed back to the operating system. If not, an error code is returned to the application.

Challenges of user-level validation

The biggest challenge with implementing Janus is that the user-level monitor must mirror the state of the operating system. If the child process forks a new process, the Janus monitor must also fork. It needs to keep track of not just network operations but the proper sequencing of the steps in the protocol to ensure that no improper actions are attempted on the network. This is a sequence of socket, bind, connect, read/write, and shutdown system calls. If one fails, chances are that the others should not be allowed to take place. However, the Janus monitor does not have the knowledge of whether a particular system call succeeded or not; approved calls are simply forwarded from the module to the kernel for processing. Failure to handle this correctly may enable attack vectors such as trying to send data on an unconnected socket.

The same applies with file operations. If a file failed to open, read and write operations should not be allowed to take place. Keeping track of state also gets tricky if file descriptors are duplicated (e.g., via the dup2 system call); it is not clear whether any requested file descriptor is a valid one or not.

Pathname parsing of file names has to be handled entirely by the monitor. We earlier examined the complexities of processing "../" sequences in pathnames. Janus has to do this in order to validate any policies on permissible file names or directories. It also has to keep track of relative filenames since the application may change the current directory at any time via the chdir system call. This means Janus needs to intercept chdir requests and process new pathnames within the proper context. Moreover, the application may change its entire namespace if the process calls chroot.

File descriptor can cause additional problems. A process can pass an open file descriptor to another process via UNIX domain sockets, which can then use that file descriptor (via a sendfd and recv_fd set of calls). Janus would be hard-pressed to know that this happened since that would require understanding the intent of the underlying sendmsg system calls and cmsg directives.

In addition to these difficulties, user-level validation suffers from possible TOCTTOU (time-of-check-to-time-of-use) race conditions. The environment present when Janus validates a request may change by the time the request is processed.

Application sandboxing with integrated OS support

The better alternative to having a user-level process decide on whether to permit system calls is to incorporate policy validation in the kernel. Some operating systems provide kernel support for sandboxing. These include the Android Application Sandbox, the iOS App Sandbox, the macOS sandbox, and AppArmor on Linux. Microsoft introduced the Windows Sandbox in December 2018, but this functions far more like a container than a traditional application sandbox, giving the process an isolated execution environment.


Seccomp-BPF, which stands for SECure COMPuting with Berkeley Packet Filters, is a sandboxing framework that is available on Linux systems. It allows the user to attach a system call filter to a process and all of the descendants of that process. Users can enumerate allowable system calls and also allow or disallow access to specific files or network protocols. Seccomp has been a core part of the Android security since the release of Android O in August 2017.

Seccomp uses the Berkeley Packet Filter (BPF) interpreter, which is a framework that was initially created for network socket filtering. With socket filtering, a user can create a filter to allow or disallow certain types of data to come through the socket. Since BPF is a framework that was initially created for sockets, seccomp sends “packets” that represent system calls to the BPF (Berkeley Packet Filter) interpreter. The filter allows the user to define rules that are applied to these system calls. These rules enable the inspection of each system call and its arguments and take subsequent action. Actions include allowing the call to run or not. If the call is not permitted, rules can specify whether an error is returned to the process, a SIGSYS signal is sent, or whether the process gets killed.

Seccomp is not designed to serve as a complete sandbox solution but is a tool for building sandboxes. For further process isolation, it can be used with other components, such as namespaces, capabilities, and control groups. The biggest downside of seccomp is the use of the BPF. BPF is a full interpreter – a processor virtual machine – that supports reading/writing registers, scratch memory operations, arithmetic, and conditional branches. Policies are compiled into BPF instructions before they are loaded into the kernel. It provides a low-level interface and the rules are not simple condition-action definitions. System calls are referenced by numbers, so it is important to check the system architecture in the filter as Linux system call numbers may vary across architectures. Once the user gets past this, the challenge is to apply the principle of least privilege effectively: restrict unnecessary operations but ensure that the program still functions correctly, which includes things like logging errors and other extraneous activities.

The Apple Sandbox

Conceptually, Apple’s sandbox is similar to seccomp in that it is a kernel-level sandbox, although it does not use the Berkeley Packet Filter. The sandbox comprises:

  • User-level library functions for initializing and configuring the sandbox for a process
  • A server process for handling logging from the kernel
  • A kernel extension that uses the TrustedBSD API to enforce sandbox policies
  • A kernel extension that provides support for regular expression pattern matching to enforce the defined policies

An application initializes the sandbox by calling sandbox_init. This function reads a human-friendly policy definition file and converts it into a binary format that is then passed to the kernel. Now the sandbox is initialized. Any function calls that are hooked by the TrustedBSD layer will be passed to the sandbox kernel extension for enforcement. Note that, unlike Janus, all enforcement takes place in the kernel. Enforcement means consulting a list of sandbox rules for the process that made the system call (the policy that was sent to the kernel by sandbox_init). In some cases, the rules may involve regular expression pattern matching, such as those that define filename patterns).

The Apple sandbox helps avoid comprehension errors by providing predefined sandbox profiles (entitlements). Certain resources are restricted by default and a sandboxed app must explicitly ask the user for permission. This includes accessing:

  • the system hardware (camera, microphone, USB)
  • network connections, data from other apps (calendar, contacts)
  • location data, and user files (photos, movies, music, user-specified files)
  • iCloud services

For mobile devices, there are also entitlements for push notifications and Apple Pay/Wallet access.

Once permission is granted, the sandbox policy can be modified for that application. Some basic categories of entitlements include:

  • Restrict file system access: stay within an app container, a group container, any file in the system, or temporary/global places
  • Deny file writing
  • Deny networking
  • Deny process execution

Browser-based Sandboxing: Chromium Native Client (NaCl)

Since the early days of the web, browsers have supported a plug-in architecture, where modules (containing native code) could be loaded into the browser to extend its capabilities. When a page specifies a specific plug-in via an <object> or <embed> element, the requested content is downloaded and the plug-in that is associated with that object type is invoked on that content. Examples of common plug-ins include Adobe Flash, Adobe Reader (for rendering pdf files), and Java, but there are hundreds of others. The challenge with this framework is how to keep the software in a plug-in from doing bad things.

An example of sandboxing designed to address the problem of running code in a plugin is the Chromium Native Client,called NaCl. Chromium is the open source project behind the Google Chrome browser and Chrome OS. The NaCl Browser plug-in designed to allow safe execution of untrusted native code within a browser, unlike JavaScript, which is run through an interpreter. It is built with compute-intensive applications in mind or interactive applications that use the resources of a client, such as games.

NaCl is a user-level sandbox and works by restricting the type of code it can sandbox. It is designed for the safe execution of platform-independent, untrusted native code inside a browser. The motivation was that some browser-based applications will be so compute-intensive that writing them in JavaScript will not be sufficient. These native applications may be interactive and may use various client resources but will need to do so in a controlled and monitored manner.

NaCl supports two categories of code: trusted and untrusted. Trusted code can run without a sandbox. Untrusted code must run inside a sandbox. This code has to be compiled using the NaCl SDK or any compiler that adheres to NaCl’s data alignment rules and instruction restrictions (not all machine instructions can be used). Since applications cannot access resources directly, the code is also linked with special NaCl libraries that provide access to system services, including the file system and network. NaCl includes a GNU-based toolchain that contains custom versions of gcc, binutils, gdb, and common libraries. This toolchain supports 32-bit ARM, 32-bit Intel x86 (IA–32), x86–64, and 32-bit MIPS architectures.

NaCl executes with two sandboxes in place:

  1. The inner sandbox uses Intel’s IA–32 architecture’s segmentation capabilities to isolate memory regions among apps so that even if multiple apps run in the same process space, their memory is still isolated. Before executing an application, the NaCl loader applies static analysis on thecode to ensure that there is no attempt to use privileged instructions or create self-modifying code. It also attempts to detect security defects in the code.

  2. The outer sandbox uses system call interposition to restrict the capabilities of apps at the system call level. Note that this is done completely at the user level via libraries rather than system call hooking.

Process virtual machine sandboxes: Java

A different type of sandbox is the Java Virtual Machine. The Java language was originally designed as a language for web applets, compiled Java programs that would get download and run dynamically upon fetching a web page. As such, confining how those applications run and what they can do was extremely important. Because the author of the application would not know what operating system or hardware architecture a client had, Java would compile to a hypothetical architecture called the Java Virtual Machine (JVM). An interpreter on the client would simulate the JVM and process the instructions in the application. The Java sandbox has three parts to it:

The bytecode verifier verifies Java bytecodes before they are executed. It tries to ensure that the code looks like valid Java byte code with no attempts to circumvent access restrictions, convert data illegally, bypass array bounds, or forge pointers.

The class loader enforces restrictions on whether a program is allowed to load additional classes and that key parts of the runtime environment are not overwritten (e.g., the standard class libraries). The class loader ensures that malicious code does not interfere with trusted code nad ensures that trusted class librares remain accessible and unmodified. It implements ASLR (Address Space Layout Randomization) by randomly laying out Runtime data areas (stacks, bytecodes, heap).

The security manager enforces the protection domain. It defines what actions are safe and which are not; it creates the boundaries of the sandbox and is consulted before any access to a resource is permitted. It is called at the time an application makes a call to specific methods so it can provide run-time verification of whether a program has been given rights to invoke the method, such as file I/O or network access. Any actions not allowed by the security policy result in a SecurityException being thrown. The security manager is the component that allows the user to restrict an application from accessing files or accessing the network, for example.A user can create a security policy file that enumerates what an application can or cannot do.

Java security is deceptively complex. After over twenty years of bugs one hopes that the truly dangerous ones have been fixed. Even though the Java language itself is pretty secure and provides dynamic memory management and array bounds checking, buffer overflows have been found in the underlying C support library, which has been buggy in general. Varying implementations of the JVM environment on different platforms make it unclear how secure any specific client will be. Moreover, Java supports the use of native methods, libraries that you can write in compiled languages such as C that interact with the operating system directly. These bypass the Java sandbox.


Malware is a term that refers to any malicious software that is unintentionally installed on a computer system. Malware can be distributed in various ways: viruses, worms, unintentional downloads, or trojan horses. It may spy on user actions and collect information on them (spyware), or present unwanted ads (adware). It may disable components of the system or encrypt files, undoing its damage if the owner pays money (ransomware). The software may sit dormant and wait for directives from some coordinator, who assembled an arsenal of hundreds of thousands of computers ready to do his bidding (for example, launch a distributed denial of service, DDoS, attack). Some software might be legitimate but may contain backdoors – undocumented ways to allow an outsider to use that software to perform other operations on your system.

Malware Motivation

A saying often paraphrased from Sun Tzu’s The Art of War is “know your enemy.” In the case of malware, it helps to understand why someone would want to install malicious software on your computer. There are numerous reasons. Some are:

Steal account credentials
If an attacker can obtain someone’s login and password credentials on one system, there is a good chance that those same credentials will work on other systems.
An attacker may have an interest in spying on the activities of a particular person. Traditionally, this would have been done through planting covert cameras and recording devices. Now it is often easier to accomplish the same results - and more - by installing software on the target’s computer. Such software is called spyware.
Data theft
An attacker may target a person at a specific company (or a student taking a certain class) in an attempt to exfiltrate data of strategic interest. Alternatively, an attacker may target people anonymously, with the hope of obtaining information of value, such as credit card data or bank account information.
There’s vengeance or vandalism: the attacker may want to destroy a target’s content or devices.
Host services
An attacker may need to harness computing, storage, or network resources. This can help hide the owner’s identity or amass a large collection of computers. An attacker can set up servers to host contraband data (e.g., stolen credit cards, login credentials, illegal material), send spam email on a large scale, mine cryptocurrency for free, or create a botnet for DDoS (distributed denial of service) attacks.
Finally, the attacker may want money. Ransomware installs software to encrypt files that will be (hopefully) decrypted if ransom is paid. The emergence of cryptocurrencies led to a huge increase in ransomware since they enabled anonymous payments.

Another saying paraphrased from The Art of War is “all warfare is based on deception.” This is also useful to consider with malware since it is most often installed willingly by the user of the system via some form of deception rather than through the exploitation of bugs in the system.

Malware Infiltration

There are several ways in which malware gets onto a system.

An attacker can exploit vulnerabilities in system services, particularly network services, to inject code that will download the malware. Zero-day vulnerabilities are particularly useful to attackers. These are bugs that have been discovered but not yet reported to the software vendor and patched. As such, an attacker can be confident that the exploit will work on virtually all systems and does not have to rely on targets who were not diligent enough to keep their systems patched. Ideally (for the attacker), the vulnerabilities will allow malware to run with elevated privileges so they can access all parts of a system or conceal itself more effectively.

Malware might be installed unknowingly via infected removable media, most commonly USB flash drives (in earlier years, it would have been CDs or floppy disks).

Social engineering

By far the most common way that malware enters a system is via deception: the legitimate user of the system installed it unknowingly. This uses a social engineering attack to convince the user that it is in his or her interest to install the software. Social engineering is the art of manipulating, influencing, or deceiving a user into taking some action that is not in his/her or the organization’s best interest. Goal of social engineers is to obtain your trust and get you to divulge information or provide them with some form of access. In computers, social engineering refers to any techniques used by an adversary to trick you into disclosing information, opening a file, downloading an attachment, reading a message, or running a program.

Websites may offer downloads of “security” software, system “cleaner” software, or software “updates,” none of which will do their purported task. An attacker may convince a user to click on a URL in an email attachment or a web page. Software obtained from file sharing services are also excellent venues for distributing malware. A user may try to avoid spending $4,000 for an AutoCAD license or $240/year for an Adobe Illustrator license and turn to a file sharing site to download a patched copy or a crack for the software that bypasses license checks. Quite often, these downloads contain malware instead of the desired software (what do you expect - the user is trying to be a thief downloading software from thieves).

An attacker may search collections of stolen email addresses (or usernames) and passwords. Since people often use the same name and password on multiple systems, this can often give the attacker access to other websites on which the user has accounts. Accounts for banking sites are, of course, particularly valuable since they can be a direct conduit for transferring money.

Given login information about a user, an attacker can log onto the systems or services as the owner of the account and install malware, monitor the internal organization, and even send email (e.g., contact other employees or friends).

Any information the attacker can get about a user can help an attacker create a more convincing social attack. The term pretexting refers to using a concocted scenario to contact a user and get additional information (e.g., an attacker can pretend to be a caller from the IT department or a high-level manager from another location to try to extract information; with some rudimentary information, the attacker can mention some employee, department, or project names to sound like a true insider).

Types of Malware

Worms and viruses

A virus is software that attaches itself to another piece of software. It may also be content, such as scripts inside a Microsoft Word document or PDF file, that will be accessed and hence executed by some software. It may also be an email attachment that contains a document or software with the malware or a link to the malware.

It might even be a modification of the boot loader of a computer or the firmware on a flash drive. The key point is that it does not run as an independent process. A virus may spread automatically by trying to

A virus is executed because another program ran. Viruses are often spread by sharing files or software. On a computer, a virus may replicate itself onto other files or software to maximize its chance of spreading and reduce its chance of being removed.

A worm is conceptually similar in that it can do the same damage to the computer as a virus can. The distinction from a virus is that a worm runs as a standalone process while a virus requires a host program.

Worms and virus may require human intervention to spread. In other cases, they can replicate themselves and spread to other systems automatically, exploiting weaknesses in software on those computers to allow themselves to infiltrate those machines. The popular use of both terms, worm and virus, has often blurred the distinctions between them.

When using non-legitimate ways of getting into a system or elevating their privileges, attackers often try to find zero-day vulnerabilities. These are vulnerabilities (bugs or configuration errors) that have not been publicly reported and hence are unpatched.

Virus and worm components

Viruses and worms contains three components:

Infection mechanism
The infection mechanism is the component of a worm or virus that enables it to spread. It can exploit software vulnerabilities to connect to other systems, patch certain files, or alter start-up scripts.
This is the malicious part of the virus and contains the code that does the actual harm to the system such as uploading personal information or deleting files. In some cases, the payload may be a generic service that contacts a command and control server from which it gets specific instructions on what to do (e.g., mine cryptocurrency, send spam, participate in a DDoS attack).
The trigger, also called a logic bomb, is code that is run whenever a file containing the virus is run. It makes the decision whether the payload should be executed. For example, some viruses may stay dormant until a particular date, number of runs, or upon getting directives from a command and control server.

Malware residence: where does it live?

File infector virus

A file infector virus is a virus that adds itself to an executable program. The virus patches the program so that, upon running, control will flow to the the virus code. Ideally, the code will install itself in some unused area of the file so that the file length will remain unchanged. A comparison of file sizes with the same programs on other systems will not reveal anything suspicious. When the virus runs, it will run the infector to decide whether to install itself on other files. The trigger will then decide whether the payload should be executed. If not, the program will appear to run normally.

Bootloader virus

Boot sector viruses have an infector that installs itself in the Master Boot Record (MBR) of a disk. In older BIOS-based PC systems, the first sector of the bootable storage device is read into memory and executed when the system boots, Normally, the code that is loaded is the boot loader that then loads the operating system. By infecting the master boot record, the virus can repeatedly re-infiltrate the operating system or files on the disk even if any malware on the system was previously detected and removed.

Boot sector viruses were popular in the early days of PCs when users often booted off floppy disks and shared these disks. The virus would often use DOS commands to install itself onto other disks that it detects. Users on those systems had full administrative rights to modify any part of the system.

These viruses have largely diminished as attackers found more appealing targets. However, there is no reason that malware that attacks on the bootloader should not be considered to be a continued threat. 2011 saw the emergence of ransomware that modified the boot loader to prevent the operating system from booting unless a ransom was paid. In 2016, Petya Trojan ransomware was deployed, which also infects the MBR and encrypts disk contents.

Infected flash drives

In the early days of PCs, people would share content by passing around floppy disks. This became a means for viruses to spread, which could be planted in either the boot sector or in files. These days, people share USB flash drives the way they used to share floppies.


In earlier Windows systems, Microsoft provided a feature called AutoRun. It was designed to make the CD (and, later, DVD and flash drive) experience better for users, particularly when using CDs for software installation. If the CD contained a file called autorun.inf, Windows would automatically execute a program identified within that file. While this made the experience of figuring out what to do after a CD is inserted easier for most users, it created a horrific security vulnerability: all that an adversary had to do was to get you to insert the media. Moreover, this functionality worked with any removable storage so that inserting a flash drive would automatically run a program defined within autorun.inf on the drive.

Microsoft eventually removed this capability from flash drives but some manufacturers created USB drives that emulated a CD drive to offer the “convenience” of AutoRun. Microsoft ultimately removed this functionality altogether in Windows 7. However, there are still old, unpatched versions of Windows out there that can be exploited with this vulnerability.

USB Firmware

The more insidious problem with USB flash drives now is unprotected firmware. A USB flash drive is a bunch of memory as well as firmware – embedded software on the chip. The firmware runs when you plug the drive into your computer. It identifies the drive as a USB storage device and manages the transferring of data. You don’t see this firmware and cannot tell if it has been changed. Because the firmware defines what the USB device is, modified firmware on the flash drive could present the drive as a keyboard and send a set of keyboard commands to the host system (for example, commands to open the terminal window and delete files).

A USB device can have multiple profiles associated with it and thus present itself as multiple devices, so the flash drive can tell the computer it is a keyboard but also a flash drive, so the user will still be able to use the device as a storage device. The firmware could also modify file contents as they pass between the USB storage device and host computer. The same attack can be user on other USB devices. For example, an ethernet adapter can redirect network messages to an attacker’s site.

Reprogramming the firmware has not been exploited by malware thus far, at least not in a widespread manner, but the vulnerability has been demonstrated and the source code to do this is freely and readily available.

Data leakage

The most common problem with flash drives is their portability and small size: they are easy to lose and easy to borrow. This makes them vulnerable to data leakage, which is just a fancy term that means some adversary may access your data simply by borrowing your flash drive.

In 2016, researchers at the University of Illinois ran an experiment where they scattered nearly 300 USB drives in public areas through the campus. Each of those drives was loaded with files that, when opened on a network-connected computer, would contact a server to tell it that the drive has been picked up and the file was opened. The results of the study showed that 98% of the drives were picked up and someone opened up at least one file on 45% of them[1]. Because of the risk of malicious firmware, even formatting a drive does not make it safe to use.

Inadvertent program execution

The portability of flash drives makes them a distribution mechanism. Experiments of scattering a number of them in parking lots revealed that many people are all too willing to plug a random drive into their system.

Even without automatic execution capabilities enabled, attackers can use flash drives as a distribution mechanism for malware. The Stuxnet attack exploited a windows bug in rendering shortcut icons where just viewing them in Windows Explorer enabled the execution of arbitrary code. Others have exploited a bug in video playback that allowed code execution. Even something as simple as an HTML file on a drive may direct the target to a website that can launch an attack.

Macro viruses

Some applications have support for macros, which allow the user to define a set of commands to avoid repetitive tasks and improve productivity. They are particularly common in text editors but are present in other applications as well, such as Photoshop and Microsoft Word and Excel. In some cases, as with Microsoft Office applications, macros are embedded in the document, which means they can be passed on to other users who access that document. Some macro capabilities are far more powerful than simply defining repetitive commands. Microsoft Office products, for example, provide Visual Basic scripting, which effectively allows users to embed complete programs into their documents. VBScript is based on Visual Basic and provides features that make it easy to access network printers, network files, special folders, user information, and even execute scripts on remote systems.

Scripts in Office documents can spread not only by having the user pass the original infected document around but by modifying the default template file, normal.dot. This will affect every other document on the system. With operating systems providing better access controls and users not running with administrative privileges, embedded scripts are a ripe area for attack. If you can convince somebody to open a document, they will run your program on their machine.

The challenge, of course, is to get a file with a malicious macro to target users and get them to open it. One of the most common techniques is to send it as an email attachment with some inducement to get the user to click on the document. This is an example of social engineering.

One hugely-successful virus that did this was the ILOVEYOU virus from 2000. The subject of the message stated that it is a letter from a secret admirer. The attachment wasn’t even a document; it was a visual basic script. To provide a better user experience, Microsoft would hide file extensions by default (macOS does this too). The file was named LOVE-LETTER-FOR-YOU.TXT.vbs but the .vbs suffix, which indicated that the file was a visual basic script, was hidden from users, so they only saw LOVE-LETTER-FOR-YOU.TXT. Not being aware of when extensions are hidden and when they are not, millions of users assumed they received an innocuous text file and clicked on it. Upon execution, the script would copy itself into various folders, modify and add new entries to the system registry, replace various types of files with copies of itself (targeting music and video files), and try to propagate itself through Internet relay Chat clients as well as email. If that wasn’t enough, it would download a file called WIN-BUGFIX.EXE and execute it. This was not a bug fixing program but rather a program that extracted user passwords and mailed them to the hacker.

The ILOVEYOU virus transmitted itself largely through email to contacts in infected computers, so your “secret admirer” message came from someone you knew and hence you were more likely to click on it. An earlier highly successful virus, Melissa, spread by offering a list of passwords for X-rated web sites. Email-based virus transmission is still a dominant mechanism. Sender headers and links are often disguised to make it look like the content is from a legitimate party.

JavaScript and PDF files

JavaScript, like Visual Basic, has evolved into a full programming language. Most browsers have security holes that involve Javascript. JavaScript can not only modify the content and structure of a web page but can connect to other sites. This allows any malicious site to leverage your machine. For example, systems can perform port scans on a range of IP addresses and report any detected unsecured services.

PDF (Portable Document Format) files, would seem to be innocent printable documents, incapable of harboring executable code. However, PDF is a complex format that can contain a mix of static and dynamic elements. Dynamic elements may contain Javascript, dynamic action triggers (e.g., “on open”), and the ability to retrieve “live” data via embedded URLs. As with Visual Basic scripts, PDF readers warn users of dynamic content but, depending on the social engineering around the file, the user may choose to trust the file … or not even pay attention to the warning in yet-another-dialog-box.

Trojan horses

A Trojan horse is a program with two purposes: an overt purpose and a covert one. The overt purpose is what compels the user to get and run the program in the first place. The covert purpose is unknown to the user and is the malicious part of the program.

For example, a script with the name of a common Linux command might be added to a target user’s search path. When the user runs the command, the script is run. That script may, in turn, execute the proper command, leading the user to believe that all is well. As a side effect, the script may create a setuid shell to allow the attacker to impersonate that user or mail copy over some critical data. Users install trojan horses because they believe they are installing useful software, such as an anti-virus tool (BTW, a lot of downloadable hacker tools contain trojan horses: hackers hacking wannabe hackers). The side-effect of this software can activate cameras, enable key loggers, or deploy bots for anonymization servers, DDoS attacks, or spam attacks.

Trojans may include programs (games, utilities, anti-malware programs), downloading services, rootkits (see next) and backdoors (see next). They appear to perform a useful task that does not raise suspicion on the part of the victim.


A backdoor is software that is designed with some undocumented mechanism to allow someone who knows about the mechanism to be able to access the system or specific functions in a way that bypasses proper authentication mechanisms. In many cases, they are not designed for malicious use: they may allow a manufacturer to troubleshoot a device or a software author to push an update. However, if adversarial parties discover the presence of a backdoor, they can use it for malicious purposes.

An old, but famous, example of a backdoor is the sendmail mail delivery server. The author of sendmail wanted to have development access on a production system that had the program installed so that he can continue to improve it. The system administrator refused such access. His next release of sendmail contained a password-protected backdoor that gave him access to the system via the sendmail server. The password was hard-coded in the program and soon became well-known. Robert Morris used the knowledge of this backdoor as one of the mechanisms for his worm to propagate to other systems. More recently, in 2014, some Samsung Galaxy phones were delivered with backdoors that provide remote access to the data on the phone.


A rootkit is software that is designed to allow an attacker to access a computer and hide the existence of the software … and sometimes hide the presence of the user on the system.

Historically, a basic rootkit would replace common administration commands (such as ps, ls, find, top, netstat, etc.) with commands that mimic their operation but hide the presence of intruding users, intruding processes, and intruding files. The idea is that a system administrator should be able to examine the system and believe that all is fine and the system is free of malware (or of unknown user accounts).

User mode rootkits

The rootkit just described is a user mode rootkit and involves replacing commands, intercepting messages, and patching commonly-used APIs that may divulge the presence of the malware. A skilled administrator may find unmodified commands or import software to detect the intruding software.

Kernel mode rootkits

A kernel mode rootkit is installed as a kernel module. Being in the kernel gives the rootkit unrestricted access to all system resources and the ability to patch kernel structures and system calls. For example, directory listings from the getdents64 system call may not report any names that match the malware. Commands and libraries can be replaced and not give any indication that malicious software is resident in the system.

Hypervisor rootkits

The most insidious rootkits are hypervisor rootkits. A hypervisor sits below the operating system and is responsible for translating between virtual device operations from operating systems and the underlying hardware. All I/O flows through the hypervisor. Most computer systems do not run virtual machines and hence have no hypervisor. These systems are prime targets for a hypervisor-based rootkit. Now you can have an environment where the entire operating system can run unmodified - or even be reinstalled - and be unaware that its operations are being intercepted at a lower level. The hypervisor does not have to virtualize all hardware interactions: just the ones it cares about. For example, it might want to grab keyboard events to record passwords and messages.

Hypervisor attacks have not been deployed but have been demonstrated as a proof of concept. Detection is difficult and often relies on measuring completion times of system calls. If they go through a hypervisor, they will take a longer time and the on-chip Time Stamp Counter (TSC), which counts CPU cycles, will show a longer value with a hypervisor in place. An alternative, and far more obscure, method of detection, is the use of an instruction that stores the interrupt descriptor table register (IDTR) into a memory location (the SIDT instruction). The hypervisor changes the register’s value and the instruction can detect that. However, this does not have to take place on a system with only one operating system, so measuring timing differences may still be the more foolproof approach.

Gathering information

Malware has varying goals. These goals may include spying on user activity, destroying content, assembling a collection of servers, or extracting money from a victim. One common goal is to gather information … or get the user to provide information. Your computer might not have anything of direct value to an adversary, but your PayPal, bank, Amazon, or eBay credentials might be useful.


Phishing is a social engineering attack whose most common purpose is to get personal information from someone, usually login credentials to some service. These are often carried out vie email with similar techniques that are used to spread infected files. A message announcing that your PayPal account is being canceled, that your bank detected a fraudulent transaction, or that FedEx could not deliver a package may prompt the receiver to panic and immediately click on a link in the message, which may result in the browser displaying a site crafted to look like PayPal, the bank, or FedEx and prompt the user for login and password information.

Phishing attacks are surprisingly effective. A 2018 study by Proofpoint found that 52% of all successful phishing emails are clicked on within one hour of being sent.

Spear phishing is a targeted form of phishing. A phishing attack sends the same message to a large set of users, hoping that some percentage of them will be fooled. A spear phishing attack sends a customized message that demonstrates some knowledge of the target, which will usually lead the target to think that the message is legitimate. For example, the 2016 Democratic National Committee (DNC) was facilitated by spear phishing. Targets were sent a message containing bit.ly links, which is a common URL shortening service that hid the actual underlying URLs. Once clicked, the web site would display what looked like a legitimate Google accounts login page, already pre-populated with the victim’s GMail address.

More recent GMail spear phishing attacks send email to contacts of compromised accounts. The email contains an innocent-looking attachment: a thumbnail image of a document. When the victim clicks on the attachment, a web page that looks like a legitimate Google sign-in page is presented. As soon as the victim enters a name and password, the attackers get the credentials, log into the account, and target people in the victim’s contact list. They use an image of an actual attachment in the victim’s email and an actual subject line to make the email look more legitimate.

A 2017 report by Webroot found that 1.385 million new and unique phishing sites are created each month. Some warning signs that a mail message may be a phishing attack are:

  1. From header: is it from an unknown or suspicious address?

  2. To header: if the message is sent to multiple people, do you recognize any other names on the header?

  3. Date header: if the message purports to be a personal message, was it sent during normal business hours?

  4. Subject header: is the suspicious and is it relevant to your activities?

  5. Message content: is the message a request to click on a link in order to avoid a negative consequence?

  6. Embedded links: are there any links that you are asked to click? If you look at the target of those links, are they misspelled, suspicious, or for a site different from that of the sender?

  7. Attachments: is there an unexpected attachment that you are expected to open, such as a Microsoft Word document or PDF file?

Deceptive web sites

Quite often, malicious links in phishing attacks direct the user to a web site in order to obtain their login credentials. These sites masquerade as legitimate sites. The Proofpoint study mentioned earlier found that for every legitimate website, there are 20 malicious sites that mimic it. This is known as typosquatting. Such sites can be masqueraded banking sites, Google/Microsoft/Apple authentication pages, videoconferencing plugin-software downloads, etc.

File serving sites, including those that host software or those that provide services such as PDF or mp3 conversion are often ad-sponsored. Some of the ads on these sites, however, often look like download links and can trick a user into clicking on the ad instead of the link for the actual content. The


Another way of obtaining information is to snoop on a user’s actions. Keyloggers record everything a victim types and allow a user to extract login names, passwords, and entire messages.

Keyloggers can be implemented in several ways:

Malicious hypervisor
Since a hypervisor provides virtual interfaces for all the resources of a computer, it can capture all keyboard, mouse, and even video data. These attacks are difficult since they rely on the ability to install a hypervisor.
Kernel-based rootkit
All input/output operations go through the operating system kernel. Modifying the kernel allows malicious software to log and upload keystroke data.
System call hooking
Some operating systems provide a system call hooking mechanism that allows data to and from system calls to be intercepted. We saw how this was used to implement sandboxing. Windows enables this without having to install any kernel-level drivers. The SetWindowsHookEx system call can be used to report WH_KEYBOARD and WH_MOUSE events, capturing keyboard and mouse activity.
Browser-based logging
JavaScript can be used to capture onKeyUp() events. These events will be captured for one page but other hacks can be used to create a broader context with embedded pages. Form submission can also be intercepted to get populated form data without having to reassemble key presses into coherent account credentials.
Hardware loggers
Although visible to the user, hardware key loggers can be used for USB-connected keyboards. Some of these have embedded Wi-Fi transceivers that enable an attacker to collect the data from a distance.


If we consider the goals of malware agin, one common goal was to extract money: even hackers need to monetize their efforts. An indirect way of accomplishing this was by collecting information to gain access to bank account data, PayPal data, or modifying accounts that may take money, such as eBay accounts. A more direct way of getting money is to demand it from the victim. Ransomware is a relatively new form of malware that locks a computer, keeps it from booting, or encrypts all files on the system. It then asks the suer to pay a ransom (usually via bitcoin) to get a decryption program.


Malware was particularly easy to spread on older Windows systems since user accounts, and hence processes, ran with full administrative rights, which made it easy to modify any files on the system and even install kernel drivers. Adding file protection mechanisms, such as a distinction between user and administrator accounts added a significant layer of protection. However, malware installed by the user would run with that user’s privileges and would have full access to all of a user’s files. If any files are read or write protected, the malware can change DAC permissions.

Systems took the approach of warning users if software wanted to install software or asked for elevated privileges. Social engineering hopes to convince users that they actually want to install the software (or view the document). They will happily grant permissions and install the malware. MAC permissions can stop some viruses as they will not be able, for instance, to override write permissions on executable files but macro viruses and the user files are still a problem.

In general, however, studies have shown that by simply taking away admin rights (avoiding privilege escalation) from users, 94% of the 530 Microsoft vulnerabilities that were reported in 2016 could be mitigated and 100% of vulnerabilities in Office 2016 could be mitigated.

Anti-virus (anti-malware) software

There is no way to recognize all possible viruses. Anti-virus software uses two strategies: signature-based and behavior-based approaches.

With signature-based systems, anti-virus programs look for byte sequences that match those in known malware. Each bit pattern is an excerpt of code from a known virus and is called a signature (not to be confused with digital signatures, discussed later in the course). A virus signature is simply a set of bytes that make up a portion of the virus and allow scanning software to see whether that virus is embedded in a file. The hope is that the signature is long enough and unique enough that the byte pattern will not occur in legitimate programs. This scanning process is called signature scanning. Lists of signatures (“virus definitions”) have to be updated by the anti-virus software vendor as new viruses are discovered. Signature-based detection is used by most anti-virus products.

A behavior-based system monitors the activities of a process (typically the system calls or standard library calls that it makes). Ideally, sandboxing is employed, to ensure that the suspected code is run within a sandbox or even in an interpreted environment within a sandbox to ensure that it cannot cause real damage. Behavior-based systems try to perform anomaly detection. If the observed activity is deemed suspicious, the process is terminated and the user alerted. Sandboxed, behavior-based analysis is often run by anti-malware companies to examine what a piece of suspected malware is actually doing and whether it should be considered to be a virus. A behavior-based can identify previously-unseen malware but these systems tend to have higher false positive rates of detection: it is difficult to characterize exactly what set of operations constitute suspicious behavior.

Windows Defender, as an example, makes use of both signature-based scanning as well as behavior-based process monitoring. It uses signature-based scanning on files and behavior-based analysis for running processes. Behavior monitoring includes scanning for suspicious file and registry changes (e.g., ransomware may try to encrypt all the files on a system and a lot of malware may try to modify the registry so that the software runs when the system is rebooted.


Some viruses try to defend themselves from anti-virus software.

Signature scanning countermeasures

A common thing to do is to use a packer on the code, unpacking it prior to execution. Packing can be one of several operations:

  • Simply obscure the malware payload by exclusive-oring (xor) with a repeating byte pattern (exclusive-oring the data with the same byte pattern reconstructs it.
  • Compress the code and then uncompress it upon loading it prior to execution.
  • Encrypt the code and decrypt it prior to execution.

All of these techniques will change the signature of a virus. One can scan for a signature of a compressed version of the virus but there are dozens of compression algorithms around, so the scanning process gets more complicated.

With encryption (xor is a simple form of encryption), only the non-encrypted part of the virus contains the unpacking software (decryption software and the key). A virus scanner will need to match the code for the unpacker component since the key and the encrypted components can change each time the virus propagates itself.

Polymorphic viruses mutate their code each time they run while keeping the algorithm the same. This involves replacing sequences of instructions with functionally-identical ones. For example, one can change additions to subtractions of negative numbers, invert conditional tests and branches, and insert or remove no-op instructions. This thwarts signature scanning software because the the byte pattern of the virus is different each time.

Sandboxing countermeasures

Viruses can try to get through the sandboxed examination that anti-virus often use by setting a trigger to keep the virus from immediately performing malicious actions or to stay dormant for the first several invocations.

Access control countermeasures

Access controls help but do not stop the problem of malware. Containment mechanisms such as containers work well for server software but are usually impractical for user software (e.g., you want Microsoft Word to be able to read documents anywhere in a user’s directories). Application sandboxing is generally far more effective and is a dominant technique used in mobile software.

Trojans, deceptive downloads, and phishing attacks are insidiously difficult to defend against since we are dealing with human nature: users want to install the software or provide the data. They are conditioned to accepting pop-up messages and entering a password. Better detection in browsers & mail clients against suspicious content or URLs helps. However, malware distributors have been known to simply ask a user to rename a file to turn it into one that is recognized by the operating system as an executable file (or a disk image, PDF, or whatever format the malware come in and may otherwise be filtered by the mail server or web browser.

  1. Matthew Tischer, Zakir Durumeric, et al., Users Really Do Plug in USB Drives They Find, University of Illinois,  ↩


Cryptography deals with encrypting plaintext using a cipher, also known as an encryption algorithm, to create ciphertext, which is unintelligible to anyone unless they can decrypt the ciphertext. It is a tool that helps build protocols that address:

Showing that the user really is that user.
Validating that the message has not been modified.
Binding the origin of a message to a user so that she cannot deny creating it.
Hiding the contents of a message.

A restricted cipher is one where the workings of the cipher must be kept secret. There is no reliance on any key and the secrecy of the cipher is crucial to the value of the algorithm. This has obvious flaws: people in the know leaking the secret, designers coming up with a poor algorithm, and reverse engineering.

For any serious use of encryption, we use well-tested, non-secret algorithms that rely on secret keys. A key is a parameter to a cipher that alters the resulting ciphertext. Knowledge of the key is needed to decrypt the ciphertext. Kerckhoffs’s Principle states that a cryptosystem should be secure even if everything about the system, except the key, is public knowledge. We expect algorithms to be publicly known and all security to rest entirely on the secrecy of the key.

A symmetric encryption algorithm uses the same secret key for encryption and decryption.

An alternative to symmetric ciphers are asymmetric ciphers. An asymmetric, or public key cipher uses two related keys. Data encrypted with one key can only be decrypted with the other key.

Properties of good ciphers

For a cipher to be considered good, ciphertext should be indistinguishable from random values. Given ciphertext, there should be no way to extract the original plaintext or the key that was used to create it except by of enumerating over all possible keys. This is called a brute-force attack. The keys used for encryption should be large enough that a brute force attack is not feasible. Each additional bit in a key doubles the number of possible keys and hence doubles the search time.

Classic cryptography

Monoalphabetic substitution ciphers

The earliest form of cryptography was the monoalphabetic substitution cipher. In this cipher, each character of plaintext is substituted with a character of ciphertext based on a substitution alphabet (a lookup table). The simplest of these is the Cæsar cipher, known as a shift cipher, in which a plaintext character is replaced with a character that is n positions away in the alphabet. The key is the simply the the shift value: the number n. Substitution ciphers are vulnerable to frequency analysis attacks, in which an analyst analyzes letter frequencies in ciphertext and substitutes characters with those that occur with the same frequency in natural language text (e.g., if “x” occurs 12% of the time, it’s likely to really be an “e” since “e” occurs in English text approximately 12% of the time while “x” occurs only 0.1% of the time).

Polyalphabetic substitution ciphers

Polyalphabetic substitution ciphers were designed to increase resiliency against frequency analysis attacks. Instead of using a single plaintext-to-ciphertext mapping for the entire message, the substitution alphabet may change periodically. In the Alberti cipher (essentially a secret decoder ring), the substitution alphabet changes every n characters as the ring is rotated one position every n characters. The Vigenère cipher is a grid of Cæsar ciphers that uses a repeating key. A repeating key is a key that repeats itself for as long as the message. Each character of the key determines which Cæsar cipher (which row of the grid) will be used for the next character of plaintext. The position of the plaintext character identifies the column of the grid.

One-time Pads

The one-time pad is the only provably secure cipher. It uses a random key that is as long as the plaintext. Each character of plaintext is permuted by a character of ciphertext (e.g., add the characters modulo the size of the alphabet or, in the case of binary data, exclusive-or the next byte of the text with the next byte of the key). The reason this cryptosystem is not particularly useful is because the key has to be as long as the message, so transporting the key securely becomes a problem. The challenge of sending a message securely is now replaced with the challenge of sending the key securely. The position in the key (pad) must by synchronized at all times. Error recovery from unsynchronized keys is not possible. Finally, for the cipher to be secure, a key must be composed of truly random characters, not ones derived by an algorithmic pseudorandom number generator. The key can never be reused.

The one-time pad provides perfect secrecy (not to be confused with forward secrecy, also called perfect forward secrecy, which will be discussed later), which means that the ciphertext conveys no information about the content of the plaintext. It has been proved that perfect secrecy can be achieved only if there are as many possible keys as the plaintext, meaning the key has to be as long as the message.

Stream ciphers

A stream cipher simulates a one-time pad by using a keystream generator to create a set of key bytes that is as long as the message. A keystream generator is a pseudorandom number generator that is seeded, or initialized, with a key that drives the output of all the bytes that the generator spits out. The keystream generator is fully deterministic: the same key will produce the same stream of output bytes each time. Because of this, receivers only need to have the key to be able to decipher a message. However, because the keystream generator does not generate true random numbers, the stream cipher is not a true substitute for a one-time pad. Its strength rests on the strength of the key. A keystream generator will, at some point, will reach an internal state that is identical to some previous internal state and produce output that is a repetition of previous output. This also limits the security of a stream cipher but the repetition may not occur for a long time, so stream ciphers can still be useful for many purposes.

Rotor machines

A rotor machine is an electromechanical device that implements a polyalphabetic substitution cipher. It uses a set of disks (rotors), each of which implements a substitution cipher. The rotors rotate with each character in the style of an odometer: after a complete rotation of one rotor, the next rotor advances one position. Each successive character gets a new substitution alphabet applied to it. The multi-rotor mechanism allows for a huge number of substitution alphabets to be employed before they start repeating when the rotors all reach their starting position. The number of alphabets is cr, where c is the number of characters in the alphabet and r is the number of rotors.

Transposition ciphers

Instead of substituting one character of plaintext for a character of ciphertext, a transposition cipher scrambles the position of the plaintext characters. Decryption is the knowledge of how to unscramble them.

A skytale is an ancient implementation of a transposition cipher where text written along a strip of paper is wrapped around a rod and the resulting sequences of text are read horizontally. This is equivalent to entering characters in a two-dimensional matrix horizontally and reading them vertically. Because the number of characters might not be a multiple of the width of the matrix, extra characters might need to be added at the end. This is called padding and is essential for block ciphers, which encrypt chunks of data at a time.

Block ciphers

Most modern ciphers are block ciphers, meaning that they encrypt a chunk of bits, or block, of plaintext at a time. The same key is used to encrypt each successive block of plaintext.

AES and DES are two popular symmetric block ciphers. Symmetric block ciphers are usually implemented as iterative ciphers. The encryption of each block of plaintext iterates over several rounds. Each round uses a subkey, which is a key generated from the main key via a specific set of bit replications, inversions, and transpositions. The subkey is also known as a round key since it is applied to only one round, or iteration. This subkey determines what happens to the block of plaintext as it goes through a substitution-permutation (SP) network. The SP network, guided by the subkey, flips some bits by doing a substitution, which is a table lookup of an input bit pattern to get an output bit pattern and a permutation, which is a scrambling of bits in a specific order. The output bytes are fed into the next round, which applies a substitution-permutation step onto a different subkey. The process continues for several rounds (16 rounds for DES, 10–14 rounds for AES). and the resulting bytes are the ciphertext for the input block.

Feistel ciphers

A Feistel cipher is a form of block cipher where a block plaintext is split into two parts. The substitution-permutation round is applied to only one part. That output is then XORed with the other part and the two halves are swapped. At each round, half of the input block remains unchanged. DES, the Data Encryption Standard, is an example of a Feistel cipher. AES, the Advanced Encryption Standard, is not.


Two popular symmetric block ciphers are DES, the Data Encryption Standard, and AES, the Advanced Encryption Standard. DES was adopted as a federal standard in 1976 and is a block cipher based on the Feistel cipher that encrypts 64-bit blocks using a 56-bit key.

DES has been shown to have some minor weaknesses against cryptanalysis. Key can be recovered using 247 chosen plaintexts or 243 known plaintexts. Note that this is not a practical amount of data to get for a real attack. The real weakness of DES is not the algorithm but but its 56-bit key. An exhaustive search requires 255 iterations on average (we assume that, on average, the plaintext is recovered halfway through the search). This was a lot for computers in the 1970s but is not much for today’s dedicated hardware or distributed efforts.


Triple-DES (3DES) solves the key size problem of DES and allows DES to use keys up to 168 bits. It does this by applying three layers of encryption:

  1. C’ = Encrypt M with key K1
  2. C’’ = Decrypt C’ with key K2
  3. C = Encrypt C’’ with key K3

If K1, K2, and K3 are identical, we have the original DES algorithm since the decryption in the second step cancels out the encryption in the first step. If K1 and K3 are the same, we effectively have a 112-bit key and if all three keys are different, we have a 168-bit key.

Cryptanalysis is not effective with 3DES: the three layers of encryption use 48 rounds instead of 16 making it infeasible to reconstruct the substitutions and permutations that take place. DES is relatively slow compared with other symmetric ciphers, such as AES. It was designed with hardware encryption in mind 3DES is, of course, three times slower than DES.


AES, the Advanced Encryption Standard, was designed as a successor to DES and became a federal government standard in 2002. It uses a larger block size than DES: 128 bits versus DES’s 64 bits and supports larger key sizes: 128, 192, and 256 bits. Even 128 bits is complex enough to prevent brute-force searches.

No significant academic attacks have been found thus far beyond brute force search. AES is also typically 5–10 times faster in software than 3DES.

Block cipher modes

Electronic codebook (ECB)

When data is encrypted with a block cipher, it is broken into blocks and each block is encrypted separately. This leads to two problems. If different encrypted messages contain the same substrings and use the same key, an intruder can see that the same data is encrypted. Secondly, a malicious party can delete, add, or replace blocks (perhaps with random junk or perhaps with blocks that were captured from previous messages). This basic form of a block cipher is called an electronic codebook (ECB). Think of the codebook as a database of encrypted content. You can look up a block of plaintext and find the corresponding ciphertext.

Cipher block chaining (CBC)

Cipher block chaining (CBC) addresses these problems. Every block of data is still encrypted with the same key. However, prior to being encrypted, the data block is exclusive-ored with the previous block of ciphertext. The receiver does the process in reverse: a block of received data is decrypted and then exclusive-ored with the previously-received block of ciphertext to obtain the original data. The very first block is exclusive-ored with a random initialization vector, which must be transmitted to the remote side. Note that CBC does not make the encryption more secure; it simply makes the result of each block of data dependent on all previous previous blocks so that data cannot be inserted or deleted in the message stream.

Counter mode (CTR)

Counter mode (CTR) also addresses these problems but in a different way. The ciphertext of each block is a function of its position in the message. Encryption starts with a message counter. The counter is incremented for each block of input. Only the counter is encrypted. The resulting ciphertext is then exclusive-ored with the corresponding block of plaintext, producing a block of message ciphertext. To decrypt, the receiver does the same thing and needs to know the starting value of the counter as well as the key. An advantage of CTR mode is that each block has no dependance on other blocks and encryption on multiple blocks can be done in parallel.


The goal of cryptanalysis is break codes. Most often, it is to identify some non-random behavior of an algorithm that will give the analyst an advantage over an exhaustive search of the key space.

Differential cryptanalysis seeks to identify non-random behavior by examining how changes in plaintext input affect changes in the output ciphertext. It tries to find whether certain bit patterns are unlikely for certain keys or whether the change in plaintext results in likely changes in the output.

Linear cryptanalysis tries to create equations that attempt to predict the relationships between ciphertext, plaintext, and the key. An equation will never be equivalent to a cipher but any correlation of bit patterns give the analyst an advantage.

Neither of these methods will break a code directly but may help find keys or data that are more likely are that are unlikely. It reduces the keys that need to be searched.

Public key cryptography

Public key algorithm, also known as asymmetric ciphers, use one key for encryption and another key for decryption. One of these keys is kept private (known only to the creator) and is known as the private key. The corresponding key is generally made visible to others and is known as the public key.

Anything encrypted with the private key can only be decrypted with the public key. This is the basis for digital signatures. Anything that is encrypted with a public key can be encrypted only with the corresponding private key. This is the basis for authentication and covert communication.

Public and private keys are related but, given one of the keys, there is no feasible way of computing the other. They are based on trapdoor functions, which are one-way functions – there is no known way to compute the inverse unless you have extra data: the other key.

RSA public key cryptography

The RSA algorithm is the most popular algorithm for asymmetric cryptography. Its security is based on the difficulty of finding the factors of the product of two large prime numbers. Unlike symmetric ciphers, RSA encryption is a matter of performing arithmetic on large numbers. It is also a block cipher and plaintext is converted to ciphertext by the formula:

c = me mod n

Where m is a block of plaintext, e is the encryption key, and n is an agreed-upon modulus that is the product of two primes. Given the ciphertext c, e, and n, there is no efficient way to compute the inverse to obtain m. To decrypt the ciphertext, you need the decryption key, d:

m = cd mod n

Elliptic curve cryptography (ECC)

Elliptic curve cryptography (ECC) is a more recent public key algorithm that is an alternative to RSA. It is based on finding points along a prescribed elliptic curve, which is an equation of the form:

y2 = x3 + ax + b

Elliptic curves have nothing to do with ellipses or conic sections. Here, the security rests not our inability to factor numbers but our inability to perform discrete logarithms in a finite field.

The RSA algorithm is still the most widely used public key algorithm, but ECC has some advantages:

  • ECC can use far shorter keys for the same degree of security. Security comparable to 256 bit AES encryption requires a 512-bit ECC key but a 15,360-bit RSA key

  • ECC requires less CPU consumption and uses less memory than RSA.

  • Generating ECC keys is faster than RSA (but much slower than AES, where a key is just a random number).

On the downside, ECC is more complex to implement and encryption is slower than with RSA.

Secure communication

Symmetric cryptography

Communicating securely with symmetric cryptography is easy. All communicating parties must share the same secret key. Plaintext is encrypted with the secret key to create ciphertext and then transmitted or stored. It can be decrypted by anyone who has the secret key.

Asymmetric cryptography

Communicating securely with asymmetric cryptography is a bit different. Anything encrypted with one key can be decrypted only by the other related key. For Alice to encrypt a message for Bob, she encrypts it with Bob’s public key. Only Bob has the corresponding key that can decrypt the message: Bob’s private key.

Hybrid cryptography

Asymmetric cryptography, however, is considerably slower than symmetric cryptography. AES, for example, is approximately 1,500 times faster for decryption than RSA and 40 times faster for encryption. AES is also much faster than ECC. Key generation is far slower with RSA or ECC than it is with symmetric algorithms, where the key is just a random number rather than a set of carefully chosen numbers with specific properties. Moreover, certain keys with RSA may be weaker than others. Because of these factors, RSA (and ECC) is rarely used to encrypt large chunks of information. Instead, it is common to use hybrid cryptography, where a public key algorithm is used to encrypt a randomly-generated key that will encrypt the message with a symmetric algorithm. This randomly-generated key is called a session key, since it is generally used for one communication session and then discarded.

Key Exchange

The biggest problem with symmetric cryptography is key distribution. For Alice and Bob to communicate, they must share a secret key that no adversaries can get. However, Alice cannot send the key to Bob since it would be visible to adversaries. She cannot encrypt it because Alice and Bob do not share a key yet.

Key exchange using a trusted third party

For two parties to communicate using symmetric ciphers they need to share the same key. The ways of doing this are:

  1. Share the key via some trusted mechanism outside of the network, such are reading it over the phone or sending a flash drive via FedEx.

  2. Send the key using a public key algorithm.

  3. Use a trusted third party.

We will first examine the use of a trusted third party. A trusted third party is a trusted system that has everyone’s key. Hence, only Alice and the trusted party (whom we will call Trent) have Alice’s secret key. Only Bob and Trent have Bob’s secret key.

The simplest way of using a trusted third party is to ask it to come up with a session key and send it to the parties that wish to communicate. Foe example, Alice sends a message to Trent requesting a session key to communicate with Bob. This message is encrypted with Alice’s secret key so that Trent knows the message could have only come from Alice.

Trent generates a random session key and encrypts it with Alice’s secret key. He also encrypts the same key with Bob’s secret key. Alice gets both keys and passes the one encrypted for Bob to Bob. Now Alice and Bob have a session key that was encrypted with each of their secret keys and they can communicate by encrypting messages with that session key.

This simple scheme is vulnerable to replay attacks. An eavesdropper, Eve, can record messages from Alice to Bob and replay them at a later time. Eve might not be able to decode the messages but she can confuse Bob by sending him seemingly valid encrypted messages.

The second problem is that Alice sends Trent an encrypted session key but Trent has no idea that Alice is requesting to communicate with him. While Trent authenticated Alice (simply by being able to decrypt her request) and authorized her to talk with Bob (by generating the session key), that information has not been conveyed to Bob.

Needham-Schroeder: nonces

The Needham-Schroeder protocol improves the basic key exchange protocol by adding nonces to messages. A nonce is simply a random string – a random bunch of bits. Alice sends a request to Trent, asking to talk to Bob. This time, it doesn’t have to even be encrypted. As part of the request she sends a nonce.

Trent responds with a message that contains:

  • Alice’s ID
  • Bob’s ID
  • the nonce
  • the session key
  • a ticket: a message encrypted for Bob containing Alice’s ID and the same session key

This entire message is encrypted with Alice’s secret key. Alice can validate that the message is a response to her message because:

  • It is encrypted for her: nobody but Alice and Trent has Alice’s secret key.
  • It contains the same nonce as in her request, so it is not a replay of some earlier message.

Alice sends the ticket (the message encrypted with Bob’s key) to Bob. He can decrypt it and knows:

  • The message must have been generated by Trent since only Trent and Bob know Bob’s key.
  • That he will be communicating with Alice because Trent placed Alice’s ID in that ticket.
  • The session key since Trent placed that in the ticket too.

Bob can now communicate with Alice but he will first authenticate Alice to be sure that he’s really communicating with her. He’ll believe it’s Alice if she can prove that she has the session key. To do this, Trent creates another nonce, encrypts it with the session key, and sends it to Alice. Alice decrypts the message, subtracts one from the nonce, encrypts the result, and sends it back to Trent. She just demonstrated that she could decrypt a message using the session key and return back a known modification of the message. Needham-Schroeder is a combined authentication and key exchange protocol.

Denning-Sacco modification: timestamps to avoid key replay

One flaw in the Needham-Schroeder algorithm is when Alice sends the ticket to Bob. The ticket is encrypted with Bob’s secret key and contains Alice’s ID as well as the session key. If an attacker grabbed a communication session and managed to decrypt the session key, she can replay the transmission of the ticket to Bob. Bob won’t know that he received that same session key in the past. He will proceed to validate “Alice” by asking her to prove that she indeed knows the session key. In this case, Eve, our eavesdropper, does know it; that’s why she sent the ticket to Bob. Bob completes the authentication and thinks he is talking with Alice when in reality he is talking to Eve.

A fix for this is to add a timestamp to the ticket. When Trent creates the ticket that Alice will give to Bob, it is a message encrypted for Bob and contains Alice’s ID, the session key, and a timestamp.

When Bob receives a ticket, he checks the timestamp. If it is older than some recent time (e.g., a few seconds), Bob will simply discard the ticket, assuming that he is getting a replay attack.

Otway-Rees protocol: session IDs instead of timestamps

A problem with timestamps is that their use relies on all entities having synchronized clocks. If Bob’s clock is significantly off from Trent’s, he may falsely accept or falsely reject a ticket that Alice presents to him. Time synchronization becomes an attack vector for this protocol. If an attacker can change Bob’s concept of time, she may be able to convince Bob to accept an older ticket. To do this, she can create fake NTP (network time protocol) responses to force Bob’s clock to synchronize to a different value or, if Bob is paranoid and uses a GPS receiver to synchronize time, create fake GPS signals.

A way to avoid the replay of the ticket without using timestamps is to use a session ID with each message. The Otway-Rees protocol differs a bit from Needham-Schroeder but is conceptually very similar.

  1. Alice sends a message to Bob that contains a session ID, both of their IDs, and a message encrypted with Alice’s secret key. This message contains Alice and Bob’s IDs as well as the session ID.

  2. Bob sends Trent a request to communicate with Alice, containing Alice’s message as well as a message encrypted with his secret key that also contains the session ID.

  3. Trent now knows that Alice wants to talk to Bob since the session ID is inside her encrypted message and that Bob agrees to talk to Alice since that same session ID is inside his encrypted message.

  4. Trent creates a random session key encrypted for Bob and the same key encrypted for Alice and sends both of those to Bob, along with the session key.

The protocol also incorporates nonces to ensure that there is no replay attack on Trent’s response even if an attacker sends a message to Bob with a new session ID and old encrypted session keys (that were cracked by the attacker).


Kerberos is a trusted third party authentication, authorization, and key exchange protocol using symmetric cryptography and based closely on the Needham-Schroeder protocol with the Denning Sacco modification (the use of timestamps).

When Alice wands to talk with Bob (they can be users and services), she first needs to ask Kerberos. If access is authorized, Kerberos will send her two messages. One is encrypted with Alice’s secret key and contains the session key for her communication with Bob. The other message is encrypted with Bob’s secret key. Alice cannot read or decode this second message. It a ticket, also known as a sealed envelope. It contains the same session key that Alice received but is encrypted for Bob. Alice will send that to Bob. When Bob decrypts it, he knows that the message must have been generated by an entity that knows its secret key: Kerberos. Now that Alice and Bob both have the session key, they can communicate securely by encrypting all traffic with that session key.

To avoid replay attacks, Kerberos places a timestamp in Alice’s response and in the ticket. For Alice to authenticate herself to Bob, she needs to prove that she was able to extract the session key from the encrypted message Kerberos sent her. She proves this by generating a new timestamp, encrypting it with the session key, and sending it to Bob. Bob now needs to prove to Alice that he can decode messages encrypted with the session key. He takes Alice’s timestamp, adds one (just to permute the value), and sends it back to Alice, encrypted with their session key.

Since your secret key is needed to decrypt every service request you make of Kerberos, you’ll end up typing your password each time you want to access a service. Storing the key in a file to cache it is not a good idea. Kerberos handles this by splitting itself into two components that run the same protocol: the authentication server (AS) and the ticket granting server (TGS). The authentication server handles the initial user request and provides a session key to access the TGS. This session key can be cached for the user’s login session and allows the user to send requests to the TGS without re-entering a password. The TGS is the part of Kerberos that handles requests for services. It also returns two messages to the user: a different session key for the desired service and a ticket that must be provided to that service.

Diffie-Hellman key exchange

The Diffie-Hellman key exchange algorithm allows two parties to establish a common key without disclosing any information that would allow any other party to compute the same key. Each party generates a private key and a public key. Despite their name, these are not encryption keys; they are just numbers. Diffie-Hellman does not implement public key cryptography. Alice can compute a common key using her private key and Bob’s public key. Bob can compute the same common key by using his private key and Alice’s public key.

Diffie-Hellman uses the one-way function abmod c. Its one-wayness is due to our inability to compute the inverse: a discrete logarithm. Anyone may see Alice and Bob’s public keys but will be unable to compute their common key. Although Diffie-Hellman is not a public key encryption algorithm, it behaves like one in the sense that it allows us to exchange keys without having to use a trusted third party.

Key exchange using public key cryptography

With public key cryptography, there generally isn’t a need for key exchange. As long as both sides can get each other’s public keys from a trusted source, they can encrypt messages using those keys. However, we rarely use public key cryptography for large messages. It can, however, be used to transmit a session key. This use of public key cryptography to transmit a session key that will be used to apply symmetric cryptography to messages is called hybrid cryptography. For Alice to send a key to Bob:

  1. Alice generates a random session key.
  2. She encrypts it with Bob’s public key & sends it to Bob.
  3. Bob decrypts the message using his private key and now has the session key.

Bob is the only one who has Bob’s private key to be able to decrypt that message and extract the session key. A problem with this is that anybody can do this. Charles can generate a random session key, encrypt it with Bob’s public key, and send it to Bob. For Bob to be convinced that it came from Alice, she can encrypt it with her private key (this is signing the message).

  1. Alice generates a random session key.
  2. She signs it by encrypting the key with her private key.
  3. She encrypts the result with Bob’s public key & sends it to Bob.
  4. Bob decrypts the message using his private key.
  5. Bob decrypts the resulting message with Alice’s public key and gets the session key.

If anybody other than Alice created the message, the result that Bob gets by decrypting it with Alice’s public key will not result in a valid key for anyone. We can enhance the protocol by using a standalone signature (encrypted hash) so Bob can identify a valid key from a bogus one.

Forward secrecy

If an attacker steals, for example, Bob’s private key, he will be able to go through old messages and decrypt old session keys (the start of every message to Bob contained a session key encrypted with his public key). Forward secrecy, also called perfect forward secrecy, is the use of keys and key exchange protocols where the compromise of a key does not compromise past session keys. There is no secret that one can steal that will allow the attacker to decrypt multiple past messages.

Diffie-Hellman enables forward secrecy. Alice and Bob can each generate a key pair and send their public key to each other. They can then compute a common key that nobody else will know and use that to communicate. Achieving forward secrecy requires single-use (ephemeral) keys. Next time Alice and Bob want to communicate, they will generate a new set of keys and compute a new common key. At no time do we rely on long-term keys, such as Alice’s secret key or RSA private key. Encrypting a session key with a long-term key, such as Bob’s public key, will not achieve forward secrecy. If an attacker ever finds Bob’s private key, she will be able to extract the session key.

Difie-Hellman is good for for achieving forward secrecy because it is efficient to create new new key pairs on the fly. RSA or ECC keys can be used as well but key generation is far less efficient. Because of this, RSA keys tend to be used only for long-term keys (e.g., for authentication).

Message Integrity

One-way functions

A one-way function is one that can be computed relatively easily in one direction but there is no known way of computing the inverse function. One-way functions are crucial in a number of cryptographic algorithms, including digital signatures, Diffie-Hellman key exchange, and both RSA and elliptic curve public key cryptography. For Diffie-Hellman and public key cryptography, they ensure that someone cannot generate the corresponding private key when presented with a public key.

Hash functions

A particularly useful form of a one-way function is the cryptographic hash function. This is a one-way function whose output is always a fixed number of bits for any input. Hash functions are commonly used in programming to construct hash tables, which provide O(1) lookups of keys.

Cryptographic hash functions produce far longer results than those used for hash tables. Common lengths are 224, 256, 384, or 512 bits. Good cryptographic hash functions (e.g., SHA–1, SHA–2, SHA–3) have several properties:

  1. Like all hash functions, take arbitrary-length input and produce fixed-length output

  2. Also like all hash functions, they are deterministic; they produce the same result each time when given identical input.

  3. They exhibit pre-image resistance, or hiding. Given a hash H, it should not be feasible to find a message M where H=hash(M).

  4. The output of a hash function should not give any information about any of the input. For example, changing a byte in the message should not cause any predictable change in the hash value.

  5. They are collision resistant. While hash collisions can exist (the number of possible hashes is smaller than than number of possible messages; see the pigeonhole principle), it is not feasible to find any two different messages that hash to the same value. Similarly, it is not feasible to modify the plaintext without changing its resultant hash.

  6. They should be relatively efficient to compute. We would like to use hash functions as message integrity checks and generate them for each message without incurring signifficant overhead.

The cryptographic hash function is the basis for message authentication codes and digital signatures.

Because of these properties, we have high assurance that a message would no longer hash to the same value if it is modified in any way. The holy grail for an attacker is to be able to construct a message that hashes to the same value as another message. That would allow the attacker to substitute a new message for some original one (for example, redirecting a money transfer). Searching for a collision with a pre-image (known message) is much harder than searching for any two messages that produce the same hash. The birthday paradox tells us that the search for a collision of any two messages is approximately the square root of the complexity of searching for a collision on a specific message. This means that the strength of a hash function for a brute-force collision attack is approximately half the number of bits of the hash. A 256-bit hash function has a strength of approximately 128 bits.

Popular hash functions include SHA–1 (160 bits), SHA–2 (commonly 256 and 512 bits), and SHA–3 (256 and 512 bits).

Message Integrity and Hash Pointers

A cryptographic hash serves as a checksum for a message. If a message has been modified, it will yield a different hash. By associating a hash with a message, we have a basis for managing the integrity of that message: being able to detect if the message gets changed.

Tamper-resistant linked-lists: blockchains

One way of associating a hash with a message is via the use of hash pointers. Pointers are used in data structures to allow one data element to refer to another. In processes, a pointer is a memory location. In distributed systems, a pointer may be an IP address and object identifier. A hash pointer is a tuple that contains a traditional pointer along with the hash of the data element that is being pointed to. It allows us ot validate that the information being pointed to has not been modified.

The same structures that use pointers can be modified to use hash pointers and create tamper-evident structures. For example, a linked list can be constructed with each element containing a hash pointer to the next element instead of a pointer.


Adding a new block is easy. You allocate the block, copy the head hash pointer into it (the next pointer), and update the head hash pointer to point to the new block and contain a hash of that block.

If an adversary modifies, say, data block 1, we can detect that. The hash pointer in Data–2 will point to Data–1 but the hash of Data–1 will no longer match the hash in the pointer. For a successful attack, the adversary will also need to modify the hash value in the hash pointer in block 2. That will make the hash pointer in block 3 invalid, so that will need to be changed. The adversary will need to change all the hash pointers leading up to the head of the list. If we’re holding on to the head of the list (e.g., in a variable) so that the adversary cannot modify it, then we will always be able to detect tampering. A linked list using hash pointers is called a blockchain.

Merkle Trees

Another useful structure using hash pointers in place of conventional pointers is a binary tree, called a Merkle tree when implemented with hash pointers.

Merkle Tree
Merkle Tree

Leaf nodes of a Merkle tree contain conventional hash pointers: pointers to the data blocks and the hashes of those data blocks. Non-leaf child nodes contain left and right pointers along with the hash of the two hashes they point to. As with binary trees, Merkle trees give us the advantage of being able to locate data in O(log n) time instead of linear time. Similarly, we can validate any data in O(log n) time by traversing from the root down to the last hash pointer at the leaf.

Applications of blockchains and Merkle trees

With Merkle trees, the information in a leaf does not generally contain information to help you traverse the tree to search for data; we’re not building a search tree. The entire purpose of the tree is to make it efficient to manage and validate the integrity of the underlying data. That is, the hash pointer strucute is there just to allow you to validate the underlying data rather than search for stuff. If you want to search, you can add extra information to each node – in general, we’re not concerned with secrecy and you can build whatever search structures an application needs. Hash pointers are all about helping assess the integrity of data. Structures such as hash-pointer-based linked lists and Merkle trees were designed with peer-to-peer systems in mind where data can come from various untrusted peers. You just need to get the root hash from a trusted place.

The top-level pointer (the root in the case of a tree; the head in the case of linked lists) represents the integrity of the entire set of data. If any data block changes, that top level pointer will allow the user to detect that there has been a change. Therefore, it is important that this value be stored securely and obtained via a trustworthy mechanism.

What a Merkle tree allows you to do is check the integrity of replicated data on a branch-by-branch basis in an efficient manner. Merkle trees are designed for environments where data is replicated among multiple systems and you want each system to be able to validate the integrity of the entire file. This helps in two cases:

  1. You can validate downloaded data without having to wait for the entire set of data to be downloaded.

  2. You can efficiently compare your data with that on another system.

Suppose you have a file and want to check whether any blocks in your version are corrupted with respect to a version on another server. Both you and another system assembled your own structure of hash pointers.

With a linked list of hash pointers, you’d start at the head of the list and compare hashes. If the hashes match, you are confident that your files match. If you have a mismatch, you need to compare the next hash. If it matches what you have then you know that first block has been modified. If it doesn’t then you need to get the hash after that. Ultimately, you may need to traverse the entire list linearly.

With Merkle trees, it becomes easier to find the block (or blocks) that have changed. If the root hash matches, you know that your entire data set matches. If not, you request the left & right hashes and compare those with your tree. If one doesn’t match then you can compare the hashes under that subtree, iterating down the tree until you find the mismatched data block. You do not need to iterate through the entire list of blocks. This is attractive for replicated data sets where we have tens of millions of data blocks, for example, and sending a hash list is not efficient. It is essentially a tree search to find the block that is inconsistent.

Message Authentication Codes (MACs)

A cryptographic hash helps us ensure message integrity: it serves as a checksum that allows us to determine if a message has been modified. If the message is modified, it no longer hashes to the same value as before. However, if an attacker modifies a message, she may be able to modify the hash value as well. To prevent this, we need a hash that relies on a key for validation. This is a message authentication code, or MAC. Two forms of MACs are hash-based ones and block cipher based ones:

Hash-based MAC (HMAC):
A hash-based MAC uses a cryptographic hash function to hash the message and the key. Anyone who does not know the key will not be able to recreate the hash.
Block cipher-based MAC (CBC-MAC):
Recall that cipher block chaining assures us that every encrypted block is a function of all previous blocks. CBC-MAC uses a zero initialization vector and runs through a cipher block chained encryption, discarding all output blocks except for the last one, which becomes the MAC. Any changes to the message will be propagated to that final block and the same encryption cannot be performed by someone without the key.

Digital signatures

Message authentication codes rely on a shared key. Anybody who posesses the key can modify and re-sign a message. There is no assurance that the action was done by the author of the message. Digital signatures have stronger properties than MACs:

  1. Only you can sign a message but anybody should be able to validate it.
  2. You cannot copy the signature from one message and have it be valid on another message.
  3. An adversary cannot forge a signature, even after inspecting an arbitrary number of signed messages.

Digital signatures require three operations:

  1. Key generation: {private_key, verification_key } := gen_keys(keysize)
  2. Signing: signature := sign(message, private_key)
  3. Validation: isvalid := verify(message, signature, verification_key)

Since we trust hashes to be collision-free, it makes sense to apply the signature to the hash of a message instead of the message itself. This ensures that the signature will be a small, fixed size and makes it easy to embed in hash pointers and other structures.

There are several commonly-used digital signature algorithms:

DSA, the Digital Signature Algorithm
The current NIST standard that generates key pairs that are secure because of the difficulty of computing discrete logarithms.
ECDSA, Elliptic Curve Digital Signature Algorithm
A variant of DSA that uses elliptic curve cryptography
Public key cryptographic algorithms
RSA or Elliptic Curve Cryptography applied to message hashes.

All these algorithms generate public and private key pairs. The first two are not general-purpose encryption algorithms but are designed solely for digital signatures.

We saw how public key cryptography can be used to encrypt messages: Alice encrypts a message using Bob’s public key to ensure that only Bob could decrypt it with his private key. We can use public key backwards: Alice can encrypt a message using her private key. Anyone can decrypt the message using her public key but, in doing so, would know that the message was encrypted by Alice.

A digital signature can be constructed by simply encrypting the hash of a message with the creator’s (signer’s) private key. Anyone who has the message signer’s public key can decrypt the hash and thus validate the hash against the message. Other parties cannot recreate the signature.

Note that, with a MAC, the recipient or anyone in possession of the shared key can create the same MAC. With a digital signature, the signature can only be created by the owner of the private key. Unlike MACs, digital signatures provide non-repudiation – proof of identity. Alice cannot claim that she did not create a signature because nobody but Alice has her private key. Also unlike MACs, anyone can validate a signature since public keys are generally freely distributed. as with MACs, digital signatures also provide proof of integrity, assurance that the original message has not been modified.

Covert and authenticated messaging

We ignored the encryption of a message in the preceding discussion; our interest was assuring integrity. However, there are times when we may want to keep the message secret and validate that it has not been modified. Doing this involves sneding a signature of the message along with the encrypted message.

A basic way for Alice to send a signed and encrypted message to Bob is for her to use hybrid cryptography and:

1. Create a signature of the message. This is a hash of the message encrypted with her private key.
2. Create a session key for encrypting the message. This is a throw-away key that will not be needed beyond the communication session.
3. Encrypt the message using the session key. She will use a fast symmetric algorithm to encrypt this message.
4. Package up the session key for Bob: she encrypts it with Bob's public key. Since only Bob has the corresponding private key, only Bob will be able to decrypt the session key.
5. She sends Bob: the encrypted message, encrypted session key, and signature.

Anonymous identities

A signature verification key (e.g., a public key) can be treated as an identity. You posess the corresponding private key and therefore only you can create valid signatures that can be verified with the public key. This identity is anonymous; it is just a bunch of bits. There is nothing that identifies you as the holder of the key. You can simply assert your identity by being the sole person who can generate valid signatures.

Since you can generate an arbitrary number of key pairs, you can create a new identity at any time and create as many different identities as you want. When you no longer need an identity, you can discard your private key for that corresponding public key.

Identity binding: digital certificates

While public keys provide a mechanism for asserting integrity via digital signatures, they are themselves anonymous. We’ve discussed a scenario where Alice uses Bob’s public key but never explained how she can assert that the key really belongs to Bob and was not planted by an adversary. Some form of identity binding of the public key must be implemented for you to know that you really have my public key instead of someone else’s. How does Alice really know that she has Bob’s public key?

X.509 digital certificates provide a way to do this. A certificate is a data structure that contains user information (called a distinguished name) and the user’s public key. This data structure also contains a signature of the certification authority. The signature is created by taking a hash of the rest of the data in the structure and encrypting it with the private key of the certification authority. The certification authority (CA) is responsible for setting policies of how they validate the identity of the person who presents the public key for encapsulation in a certificate.

To validate a certificate, you would hash all the certificate data except for the signature. Then you would decrypt the signature using the public key of the issuer. If the two values match, then you know that the certificate data has not been modified since it has been signed. The challenge is how to get the public key of the issuer. Public keys are stored in certificates, so the issuer would have a certificate containing its public key. This certificate can be signed by yet another issuer. This kind of process is called certificate chaining. For example, Alice can have a certificate issued by the Rutgers CS Department. The Rutgers CS Department’s certificate may be issued by Rutgers University. Rutgers University’s certificate could be issued by the State of New Jersey Certification Authority, and so on. At the very top level, we will have a certificate that is not signed by any higher-level certification authority. A certification authority that is not underneath any other CA is called a root CA. In practice, this type of chaining is rarely used. More commonly, there are hundreds of autonomous certification authorities acting as root CAs that issue certificates to companies, users, and services. The certificates for many of the trusted root CAs are preloaded into operating systems or, in some cases, browsers. See here for Microsoft’s trusted root certificate participants and here for Apple’s trusted root certificates.

Every certificate has an expiration time (often a year or more in the future). This provides some assurance that even if there is a concerted attack to find a corresponding private key to the public key in the certificate, such a key will not be found until long after the certificate expires. There might be cases where a private key might be leaked or the owner may no longer be trustworthy (for example, an employee leaves a company). In this case, a certificate can be revoked. Each CA publishes a certificate revocation list, or CRL, containing lists of certificates that they have previously issued that should no longer be considered valid. To prevent spoofing the CRL, the list is, of course, signed by the CA. Each certificate contains information on where to obtain revocation information.

The challenge with CRLs is the TOCTTOU problem: not everyone may check the certificate revocation list in a timely manner and some systems may accept a certificate not knowing that it was revoked. Some systems, particularly embedded systems, may not even be configured to handle CRLs.