Exam 3 study guide

The one-hour study guide for exam 3

Paul Krzyzanowski

Latest update: Fri Mar 24 15:36:33 EDT 2017

Disclaimer: This study guide attempts to touch upon the most important topics that may be covered on the exam but does not claim to necessarily cover everything that one needs to know for the exam. Finally, don't take the one hour time window in the title literally.

File Systems for flash memory

There are a couple of differences between solid state flash memory and magnetic disk drives. NAND flash memory presents itself as a block-addressable device: data is read or written a page (block) at a time and each page is uniquely addressed. In this way, it is very similar to a magnetic disk drive. Because there is no moving disk head that must to seek to a specific track to read or write a block, there is no seek latency with NAND flash, making the use of elevator algorithms pointless.

Unlike a disk, data cannot be written directly to a page of NAND flash: the contents of the page must be erased first, turning each write operation into an erase-write sequence. Unlike disk drives, each page of NAND flash can handle only a limited number of erase-write cycles before it wears out. This number is typically between 100,000 and 1,000,000. Because of this, a goal in NAND flash is wear leveling: distribute writes across all pages of the storage.

Dynamic wear leveling monitors high and low use areas of flash memory and at some point swaps high-use blocks (pages) with low-use blocks. Blocks that are not getting modified are not affected. A more aggressive form of wear leveling is static wear leveling where even unchanging (static) contents are periodically moved to other blocks to give all blocks a chance to wear out evenly. Wear leveling is implemented either in a NAND flash controller or in a layer of software between the flash device driver and the block device. This layer of software is called a Flash Translation Layer (FTL). The FTL maintains a block lookup table that maps logical to physical blocks so the block can be identified and located even after it is moved.

Log-structured file systems

To avoid reusing the same blocks over and over again, a log-structured file system is attractive for flash memory that does not have a controller that implements wear leveling. This is a file system where each file operation is written as an entry to a transaction log (e.g., “new file 111: name=ABC”, “new data block 0 for file 111: …”). One entry may make an older entry obsolete. When played from start to end, the log allows us to construct an up-to-date file system. Unlike traditional file systems, which implement a data structure to navigate among directories and file contents, the entire file system is stored simply as a sequence of log entries.

YAFFS is a popular log-structured file system. All data is written to a log as chunks. Each chunk is either a data chunk or an object header, which represents file metadata or a directory. New chunks may cause certain old chunks to become obsolete. If all the chunks in a block are no longer valid, the chunk can be reused. Garbage collection is performed periodically to reclaim free blocks. Free blocks can also be created by copying live (in-use) chunks from blocks that contain mostly deleted chunks onto other blocks.

On startup, the YAFFS driver scans the log and constructs an in-memory view of the file system hierarchy (minus the data, which is still located in, and read from, flash). Checkpointing is the process of periodically writing out this file system hierarchy onto the file system. It speed up mounts since the file system driver can read the last checkpoint and no longer needs to reconstruct the entire file system hierarchy from the first transaction.

Pseudo devices and special file systems

Pseudo devices

The classic view of a device driver is that of software that interfaces with some physical device and allows user processes to read, write, and otherwise control the device. The device, however, need not be a real device. Drivers can be written to create data streams for various purposes. A few examples of this are the null device (which discards any writes and always returns an end-of-file on reads), zero device (similar to the null device but always returns bytes containing values of 0), and the random device, which returns random numbers.

The loop device is a pseudo device that is given a file name (via an ioctl system call) and creates a block device interface to that file. Recall that a block device interface is one where the system can only read and write fixed-size blocks of data. File systems are formatted on top of block devices (they are usually disk drives or NAND flash memory). Since a file can now present itself as a block device, one can now create a file system within that file and mount it just as one would a physical disk or flash memory.

Special file systems

Our classic view of a file system is that of a file system driver which lays out data structures on top of a block device and allows one to create, access, and delete files and directories. A file system driver can be more abstract, however. The process file system, for example, creates a file system view of processes and other aspects of the kernel. Each process is presented as a directory. Various aspects of the process, such as its command line, open files, memory map, signals, and statistics, are presented as files and directories within the process directory. A device file system presents all the kernel’s devices as device files so that one does not need to explicitly create and remove them from a /dev directory in a physical file system. FUSE is a kernel module that allows one to create user-level file systems. The VFS layer sends requests to FUSE that, in turn, sends them to a user-level process that interprets the requests as it sees fit.

Client-server Networking

Computers communicate with each other over a data communications network. In many cases, particularly over a wide area, multiple computers share the same communication channel (the same set of wires or radio spectrum). Figuring out how to do this is the job of Multiple Access Protocols. There are three approaches:

  1. Channel partitioning. Each communication session either gets a fixed set of time slots (this is called Time Division Multiplexing, TDM, or a specific band of frequencies it can use (this is called Frequency Division Multiplexing, FDM). The downside with this approach is that bandwidth is allocated even if the communication channel is idle.

  2. Taking turns. Various protocols enable one node to know that it is its turn to communicate. One way to do this is by having a network master that informs nodes when they can communicate. Another is a token passing protocol where a machine passes a token (special message) to its neighbor. Ownership of the token means that the system can communicate. These solutions have the danger of failing (lost token) or requiring the complexity of a master coordinator.

  3. Random access. Random access protocols rely on statistical multiplexing. That’s just a fancy way of saying that messages from multiple systems can be mixed together, one after another. There are no scheduled time slots. There is a danger of a collision occurring if two systems send messages at the same time so retransmission may be needed in these cases. It turns out that random access is the best of the three multiple access protocols in terms of using the capacity of communication links.

Two common ways of communicating on a network are circuit-switching and packet switching. A circuit-switched network uses channel partitioning and divides the network into short, fixed-length time slots and each node is allowed the use of only specific time slots. This is called Time Division Multiplexing (TDM). With circuit switching, a dedicated path (route) is established between two endpoints. This path is called a virtual circuit. Circuit switching provides guaranteed bandwidth and constant latency but does not use networking resources efficiently since time slots may go unused if a node has nothing to transmit. The public telephone network is an example of circuit switching, providing a maximum delay of 150 milliseconds and digitizing voice to a constant 64 kbps data rate. Packet-switched networking uses random access and variable-length time slots. Data is segmented into variable-size packets and each packet must be identified and addressed. These are called datagrams . Packet-switching generally cannot provide guaranteed bandwidth or constant latency. Ethernet is an example of a packet-switched network.

Data networking is usually implemented as layers — a stack of several protocols, each responsible for a specific aspect of networking. The OSI reference model defines seven layers of network protocols. Some of the more interesting ones are: the data link, network, transport, and presentation layers.

  • The data link layer (layer 2) transfers data within a physical local area network.
  • The network layer (layer 3) manages routing: the journey of packets from one machine to another possibly across multiple networks.
  • The transport layer (layer 4) manages the communication of data from one application to another rather than machine-to-machine delivery.
  • The presentation layer (layer 6) manages the representation of data and handles any necessary conversion of data types across architectures (for example, different byte ordering in integers, different character representations).

IP Networking

The Internet Protocol (IP) handles the interconnection of multiple local and wide-area networks. It is a logical packet-switched network whose data is transported by one or more physical networks (such as Ethernet, for example). In the OSI model, IP is a layer 3 (network) protocol, while ethernet is a layer 2 (data link) protocol. Each machine on an IP network must have a unique IP address. As IP is a logical overlay network that connects multiple physical networks together, the IP address is separate from, and unrelated to, the address of any underlying network (e.g., ethernet MAC address).

Since IP is a logical network, any machine that needs to send out IP packets must do so via the physical network. Typically, this is Wi-Fi or Ethernet, which uses a 48-bit Ethernet address that is unrelated to a 32-bit IP (or 128-bit IPv6) address. To send an IP packet out, the system needs to find the physical (Ethernet) destination address (the MAC address, or Media Access Control address) that corresponds to the desired IP destination. The Address Resolution Protocol, or ARP, accomplishes this. It works by broadcasting a request containing an IP address (do you know the corresponding MAC address for this IP address?) and then waiting for a response from the machine with the corresponding IP address. To avoid doing this for every outgoing packet, it maintains a cache of most recently used addresses.

Humans aren’t even happy working with IP numbers and prefer to use textual names instead (e.g., ls.cs.rutgers.edu). The Domain Name System, DNS, is a service that manages the mapping of hierarchical human-friendly names to their corresponding IP addresses. It is implemented as worldwide set of distributed servers that refer queries to other servers as they traverse the hierarchy of a domain name.

There are two transport-layer protocols on top of IP: TCP and UDP. TCP (Transmission Control Protocol) simulates virtual circuit (connection-oriented) service. This layer of software ensures that packets arrive in order to the application and lost or corrupt packets are retransmitted. It permits bidirectional communication and tries to “play nice” on the network by lowering its bandwidth if it detects packet loss (which it assumes is due to network congestion). The transport layer keeps track of the destination so that the application can have the illusion of a connected data stream.

UDP (User Datagram Protocol) is a lightweight layer on top of IP that provides datagram (connectionless) service. UDP drops packets with corrupt data and does not request retransmits. It does not ensure in-order delivery or reliable delivery.

Port numbers in both TCP and UDP form the transport address and are used to allow the operating system to direct the data to the appropriate application endpoint (or, more precisely, to the socket that is associated with the communication stream). A source port number allows an application to know where to send return data.

Protocol encapsulation keeps the layers of the networking stack separate. A TCP or UDP packet, along with associated packet data, is treated simply as data within an IP packet. The IP header has a 6-bit protocol field that identifies the type of protocol that is encapsulated within it so that an IP driver can send the packet to the right transport-level protocol (e.g., to the TCP module). Likewise, an Ethernet packet (called a “frame”) treats the entire IP packet as data. It has a two-byte value that identifies the type of protocol that it is encapsulating so that an ethernet driver can send the data to the correct protocol module (e.g., to the IP driver).

Sockets

Sockets are the interface to the network that is provided to applications by the operating system. A socket is a communication endpoint that allows one application to communicate with another application. Writing data to a socket allows a corresponding socket on another application to receive that data. The interface is usually a data network between machines but doesn’t have to be — sockets can also be used for interprocess communication within the same system. Because sockets are designed to deal with application-to-application communication, they almost always interact at the transport layer of the OSI reference model (UDP/IP and TCP/IP if IP communications is used).

Sockets are created with the socket system call and are assigned an address and port number with the bind system call. For connection-oriented protocols, a socket on the server is set to listen for connections with the listen system call. The accept call blocks until a connection is received at a listening socket, at which point the server process is given a new socket dedicated to the new connection. A client can establish a connection with a server socket via the connect system call. After this, sending and receiving data is compatible with file operations: the same read and write system calls can be used. This is a key feature of sockets: once the network interface has been set up, the application can use file system I/O to communicate and does not have to think about networking. When communication is complete, the socket can be closed with the shutdown system call or the regular file system close system call.

User processes interact with sockets either through socket-specific system calls (socket, listen, connect, etc.) or through file system calls (read, write, etc.). However, sockets are not implemented as a file system under the VFS layer of the operating system. The distinction between files and sockets occurs in the kernel data structure referenced by the socket’s file descriptor. Every socket has its own socket structure associated with it. This structure identifies the set of operations that can be performed on it as well as its data send and receive queues. The structure also identifies the networking protocol that is associated with it (e.g., TCP) so that it can queue outbound data for processing by the appropriate protocol layer.

The generic networking interface provided by sockets interacts with protocol-specific implementations. Just like the file system and device drivers, network protocols are modular and can be installed, registered, and removed dynamically.

We love layered protocols because they nicely partition the different things that need to happen in a networking stack. What we don’t love is moving data around. Allocating, copying, and freeing data as it moves between layers of the protocol stack will severely impact performance. Instead, a packet is placed in a socket buffer (sk_buff) as soon as it is created: either at the system call (for sending data) or at the device driver (for received data). Instead of copying data, a pointer to the socket buffer moves between the queues of each layer of the networking stack. When an application needs to transmit data, the socket layer allocates a socket buffer with enough extra space to hold the process’ data along with all the headers that will be needed as the packet goes down the protocol stack (e.g., TCP, IP, and ethernet headers). The data is copied from the user’s process into the socket buffer in kernel space. The next time it is copied is when all protocol processing is complete and the data, with all of its headers, moves from the socket buffer out to the device.

An abstract device interface sits below the networking layers and right above the actual device drivers. This interface contains generic functions for initializing devices, sending data and allocating socket buffers for received data. The network device driver (e.g., an Ethernet or 802.11b/g/n/ac driver) interacts with the physical network and is responsible for actually transmitting data out to the network and grabbing received data from the hardware. As with character and block device drivers, network device drivers are modules that can be inserted, registered, and deleted dynamically.

With fast networks, the rate at which packets come in can generate thousands to hundreds of thousands of interrupts per second. To avoid this, Linux NAPI (“New” API) disables network device interrupts when a packet is received and reverts to periodic polling. If, at some time, a poll yields no data then interrupts are re-enabled and polling is stopped. This avoids unnecessary polling when there is no incoming traffic. Once a network interrupt is serviced, interrupts are again disabled and we repeat the same process.

Remote Procedure Calls

One problem with the interface offered by sockets was that it provided a send-receive model of interaction. However, most programs use a functional (procedure call) model of interaction. Remote procedure calls are a programming language construct (something provided by the compiler), as opposed to an operating system construct, such as sockets. They provide the illusion of calling a procedure on a remote machine. During this time, execution of the local thread stops until the results are returned. The programmer is alleviated from packaging data, sending and receiving messages, and parsing results.

The illusion of a remote procedure call is accomplished by generating stub functions. On the client side, the client stub is a function with the same interface as the desired remote procedure. Its function, however, is to take the parameters, marshal them into a network message, send them to the server (using sockets), await a reply, and then unmarshal the results and return them to the caller. On the server side, the server stub (sometimes known as a skeleton) is responsible for being the main program that registers the service and awaits incoming requests for running the remote procedure. It unmarshals the data in the request, calls the user’s procedure, and marshals the results into a network message that is sent back to the recipient.

An Interface Definition Language (IDL) allows programmers to define remote interfaces. An rpc compiler takes the IDL as input and generates client and server stubs as output. The programmer does not have to deal with writing marshaling code or even any networking code. The generated client stub looks like the desired remote function except all it does is marshal the parameters and send a message to the server. The generated server listens for incoming messages and calls the requested server function. The operating system is not involved. It only knows about sockets. However, in some cases, client stubs can be used inside the kernel (see NFS, the Network File System, covered later).

Network attached storage

There are two basic models for implementing distributed file systems: the download/upload model or the remote access model. In the download/upload model, an entire file is downloaded to the local machine when that file is opened and, if modified, uploaded when closed. In the remote access model, individual requests are forwarded to the remote server as remote procedures.

In a stateful file system, the server maintains varying amounts of state about client access to files (e.g., whether a file is open, whether a file has been downloaded, cached blocks, modes of access). In a stateless file system, the server maintains no state about a client’s access to files. The design of a file system will influence the access semantics to files. Sequential semantics are what we commonly expect to see in file systems, where reads return the results of previous writes. Session semantics occur when an application owns the file for the entire access session, writing the contents of the file only upon close, thereby making the updates visible to others after the file is closed, and overwriting any modifications made by others prior to that.

Network file systems: examples

We examine three different network file systems. While they all provide users with the expected hierarchical file system view of remote files, they were designed with different goals. Sun’s NFS had a focus on fault-tolerance and rapid recovery from crashes due to its stateless design. CMU’s AFS was designed to enable long-term whole-file caching of files to reduce load on servers and support huge numbers of users. Microsoft’s SMB was designed with the goal of providing an identical experience to the local file system, even if it meant using a connection-oriented protocol and minimizing the amount of caching done at clients.

NFS

NFS was originally designed as a stateless, RPC-based model implementing commands such as read bytes, write bytes, link files, create a directory, and remove a file. Since the server does not maintain any state, there is no need for remote open or close procedures: these are used only on the client to keep track of the state locally. NFS works well in faulty environments: there’s no state to restore if a client or server crashes. To improve performance, regardless of how little data an application requests, a client requests remote data a block a large chunk at a time (8 KB by default, but negotiated based on network speeds) and performs read-ahead (fetching future blocks before they are needed). It also caches data for some period of time. NFS suffers from ambiguous semantics because the server, as well as other clients, has no idea what blocks the client has cached and the client does not know whether its cached blocks are still valid. The system checks modification times if there are file operations to the server but otherwise invalidates the blocks after a few seconds. File locking could not be supported because of NFS’s stateless design but was added through a separate lock manager that maintained the state of locks.

AFS

AFS was designed as an improvement over NFS to support file sharing on a massive scale. NFS suffered because clients would never cache data for a long time (not knowing if it would become obsolete) and had to frequently contact the server. AFS introduced the use of a partition on a client’s disk to cache large amounts of remote files for a long time: a model of whole file caching and long-term caching. It supports a file download-upload model. The entire file is downloaded on first access (whole file download) and uploaded back to the server after a close only if it was modified. Because of this behavior, AFS provides session semantics: the last one to close a modified file wins and other changes (earlier closes) are lost.

During file access, the client need never bother the server: it already has the file. When a client first downloads a file, the server makes a callback promise: it maintains a list of each client that has downloaded a copy of a certain file. Whenever it gets an update for that file, the server goes through the list and sends a callback to each client that may have a cached copy so that it can be invalidated on the client. The next time the client opens that file, it will download it from the server. Files under AFS are shared in units called volumes. A volume is just a directory (with its subdirectories and files) on a file server that is assigned a unique ID among the cell of machines. If an administrator decides to move the volume to another server, the old server can issue a referral to the new server. This allows the client to remain unaware of resource movement.

SMB

Microsoft’s Server Message Block protocol was designed as a proprietary connection-oriented, stateful file system with the goal of providing strong consistency, full support for locking as well as other features provided by the native file system rather than client caching and performance. While it does not use remote procedure calls, its access principle is the same: it uses a remote access model as opposed to whole file downloads. Requests (message blocks) are functional messages, providing file access commands such as open, create, rename, read, write, and close.

Protection & Security

Protection is the mechanism that provides and enforces controlled access of resources to users and processes. Security is the set of policies that define authorized access to a system.

The principle of least privilege is an approach to security that specifies that any entity (users, processes, functions) should be able to access only the resources that are essential to completing its task and no more. Privilege separation is a design where we segment a process into multiple parts, each granted only the privileges that it needs to operate. This allows us to partition a component that needs administrative access or needs to access a critical object away from other components and hence minimize any damage if the other components get compromised.

In the most general sense, an operating system is responsible with providing processes and threads with controlled and protected access to resources. The process scheduler is responsible for providing threads with access to the CPU; the memory manager and the MMU are responsible for giving processes protected access to their own memory address space; device drivers and the buffer cache provide access to peripheral devices; sockets provide access to the network; and the file system provides access to logical data on disks.

An operating system is responsible for ensuring that threads can access objects (devices, files, networks, etc.) only in accordance with security policies. Every thread operates in a protection domain, which defines what resources it is allowed to access. We can model the full system-wide set of access rights as an access matrix. Rows represent protection domains (e.g., users or group IDs under which a thread runs) and columns represent objects. The row, column intersection defines the access rights of a specific domain on a specific object. An access matrix is not an efficient structure to implement, so operating systems typically model it with access control lists. An access control list (ACL) is a column of an access matrix: a list of permissions for each domain that are assigned to a specific object. For example, a file may specify different read, write, or execute rights for different users. In Linux and most current operating systems, a domain is associated with a user.

A challenge with access control lists is that they may potentially be long and hence will not fit into a fixed-length inode structure. To address this, UNIX-derived systems provide a severely limited form of ACLs by having the inode contain read, write, and execute permissions for the only the file owner’s ID, file’s group ID, and everyone else. If a full ACL is used, it is stored as an extended attribute and uses additional disk block(s) apart from the inode.

Virtually all modern operating systems implement some form of access control lists. An alternate approach is a capability list, which represents a row of an access matrix. Each domain has a capability, which is an enumeration of all the objects in the system and the operations that the domain (user or group) can perform on them. Capabilities are typically implemented with items such as network services, where an authorization server may present a process with a capability token that it can pass on to an object and thereby prove that it is allowed to access the object.

The access control mechanisms available in most systems (and the ones we use most) follow a model called discretionary access control (DAC). This allows the thread to access objects to which it has permissions and also pass that information to other processes or write it to other objects. Access to objects is restricted based on the identity of users or groups. A mandatory access control (MAC) model is one where a centrally-controlled policy restricts the actions of domains on objects. Users cannot modify or override the policy. MAC models are often associated with the enforcement of a Multi-Level Secure (MLS) access model, of which the Bell-LaPadula model is the most widely used example. In this model, objects are classified in a security hierarchy: unclassified, confidential, secret, and top-secret. Users (domains) are are also assigned a security clearance. The overall policy is no read up; no write down. This means that a user cannot read objects that belong to a higher clearance level than the user is assigned. The user also cannot write or modify objects that belong to a lower clearance level.

This model, however, has limited use outside of governments as those policies do not apply well to civilian life. Another form of MAC is the use of an application sandbox (see later discussion). This allows an administrator or domain owner to specify restrictions on a per-process basis regardless of the user ID (protection domain) under which the process is executing. The process cannot override these restrictions.

Cryptography

Cryptography deals with encrypting plaintext using a cipher (also known as an encryption algorithm to create ciphertext, which is unintelligible to anyone unless they can decrypt the message.

A restricted cipher is one where the workings of the cipher must be kept secret. There is no reliance on a key and the secrecy of the cipher is crucial to the value of the algorithm. This has obvious flaws: people in the know leaking the secret, coming up with a poor algorithm, and reverse engineering. For any use of serious encryption, we use well-tested, non-secret algorithms that rely on secret keys.

A one-way function is one that can be computed relatively easily in one direction but there is no known way of computing the inverse function. One-way functions are crucial in a number of cryptographic algorithms, including digital signatures, <-—, Diffie-Hellman key exchange, –> and RSA public key cryptography. For RSA keys, they ensure that someone cannot generate the corresponding private key when presented with a public key. A particularly useful form of a one-way function is the hash function. This is a one-way function whose output is always a fixed number of bits for any input. For good cryptographic hash functions, it is highly unlikely that two messages will ever hash to the same value, it is extremely difficult to construct text that hashes to a specific value, and it is extremely difficult to modify the plaintext without changing its hash. This is called collision resistance. The hash function is the basis for message authentication codes and digital signatures. Note that when we talk about cryptography and use phrases such as “extremely difficult”, we mean “impossible for all practical purposes,” not that “you can do it if you spend an entire day or two working on the problem.”

Secure communication

To communicate securely using a symmetric cipher, both parties need to have a shared secret key. Alice will encode a message to Bob using the key and Bob will use the key to decode the message. If Alice wants to communicate with Charles, she and Charles will also need a secret key. The fact that every pair of entities will need a secret key leads to a phenomenon known as key explosion. Overall, in a system with n users, there will be O(n2) keys.

The biggest problem with symmetric cryptography is dealing with key distribution: how can Alice and Bob establish a key so they can communicate securely?

Using true public key cryptography, such as the RSA algorithm, if Alice encrypts a message with Bob’s public key, Bob will be the only one who can decrypt it since doing so will require Bob’s private key. Likewise, Bob can encrypt messages with Alice’s public key, knowing that only Alice will be able to decrypt them with her private key.

Session keys

A session key is a random key that is generated for encryption during a single communication session. It is useful because if the key is ever compromised, no lasting information is obtained: future communication sessions will use different keys. A hybrid cryptosystem uses public key cryptography to send a session key securely. The originator generates a random session key and encrypts it with the recipient’s public key. The recipient decrypts the message with the corresponding private key to extract the session key. After that, symmetric cryptography is used for communication, with messages encrypted with the session key. This has the advantages of higher performance (public key cryptography is much, much slower than symmetric cryptography) and ease of communicating with multiple parties (just encrypt the session key with the public keys of each of the recipients). It also allows the bulk of data to be encrypted with session keys instead of the hardly-ever-changing public keys. Generating a public-private key pair is also much, much slower than generating a symmetric key, which is just a random number.

Digital signatures

With public key cryptography, a digital signature is simply the act of encrypting a hash of a message with the creator’s private key. Anyone who has the public key can decrypt the hash and thus validate it against the message. Other parties cannot recreate the signature since they do not have the private key even though they can create the hash.

Authentication

The three factors of authentication are: something you have (such as a key or a card), something you know (such as a password or PIN), and something you are (biometrics). Combining these into a multi-factor authentication scheme can increase security against the chance that any one of the factors is compromised. For example, the use of a password combined with an access card is two-factor authentication. Using two passwords, however, is still single-factor authentication because the same factor is used twice.

Password Authentication Protocol (PAP)

The classic authentication method is the use of reusable passwords. This is known as the password authentication protocol, or PAP. With PAP, a password is associated with each user and stored in a file. The system asks you to identify yourself (login name) and then enter a password. If the password matches that which is associated with the login name on the system then you are authenticated.

One problem with the protocol is that if someone gets hold of the password file on the system, then they have all the passwords. The common way to thwart this is to store hashes of passwords instead of the passwords themselves. This takes advantage of one-way functions. To authenticate a user, check if


hash(password) == stored_hashed_password(user)

Even if someone gets hold of the password file, they won’t be able to reconstruct the original password from the hash. They’ll have to resort to an exhaustive search or a dictionary attack and try different passwords, searching for one that hashes to the value in the file.

Hashes are vulnerable to a dictionary attack with pre-computed hashes. An attacker can, ahead of time, create a table of a few million passwords and their corresponding hashes. Then, when confronted with a hash value, one simply searches the table for a matching hash to find the password that created it. A similar vulnerability exists if Alice notices that Bob’s password hash is the same as hers. She now knows that she and Bob share the same password.

Salt is extra data that is added to the password prior to hashing it. What salt does is change the input text and hence the resultant hash. When you created your password and stored its hash, the system generated a random salt to go along with it (and stored the value along with your identity and hashed password). Suppose your salt is “abc” and your password is “test123”. We now hash “test123abc” (or some such combination) and get a value that is NOT hash(“test123”). The attacker can see the salt value in the password file but there’s no way to remove its effect from the hash value. Now, the attacker needs a much, much larger set of data to account for all possible salt values. This will generally not be feasible for decent-sized salt values. For example, Apache and BSD use a salt that’s up to 8 characters. Even with just alphanumeric characters, each possible password can have over 200 trillion variations as an input to the hash function.

The other problem with reusable passwords is that, if a network is insecure, an eavesdropper may sniff the password from the network. A potential intruder may also simply observe the user typing a password. To thwart this, we turn to one-time passwords. If someone sees you type a password (or gets it from the network stream), it won’t matter because that password will be useless for future logins.

Challenge Handshake Authentication

One way to avoid the hazards of reusable passwords is never to send data over the network that an attacker can capture and replay to authenticate onto the system in the future. The Challenge Handshake Authentication Protocol, CHAP, does this by generating and sending a “challenge”, which is a random bunch of bits (more generally called a nonce). Both the client (e.g., user) and server share a secret key. The client generates a hash on data the includes the challenge and the secret key and sends the hash back to the server. The server can compute the same hash since it also has the challenge and knows the shared secret. If the computed hash matches the one it received, then authentication was successful. Anyone sniffing the network can get a challenge but will never see the shared secret and hence will not be able to generate a valid response.

challenge: nonce
response: hash(key, nonce)

SecurID®

RSA’s SecurID is a two-factor authentication system that generates one-time passwords for response to a user login prompt. It relies on a user password (a PIN: Personal ID Number) and a small computing device called a token (an authenticator card, fob, or software). The token generates a new number every 30 seconds. The number is a function of the time of day and a seed: a number that is unique for each card. To authenticate to a server, you send a combination of your PIN and the number from the number from the token in lieu of a password. A legitimate remote system will have your PIN as well as the token seed and will be able to compute the same value to validate your password. An intruder would not know both PIN and the token’s seed and will never see the data on the network.

password: hash(PIN, time, seed)

Public key authentication

A nonce is a random bunch of bits that is generated on the fly and is usually used to present to the other party as to prove that they are capable of encrypting something with a specific key that they possess. The use of a nonce is central to public key authentication (e.g., the kind used in SSL, the secure sockets layer). If I send you a nonce and you encrypt it with your private key and give me the results, I can decrypt that message using your public key. If the decryption matches the original nonce, this will convince me that only you could have encrypted the message since you are the only one who has access to your private key.

challenge: nonce
response: E~{k}(nonce)

Digital certificates

While public keys simplify authentication (just decrypt this with my public key and you know that I was the only one who could have encrypted it), identity binding of the public key must be preserved. That is, you need to be convinced that my public key really is mine and not an impersonator’s. X.509 digital certificates provide a solution for this. A certificate is a data structure that contains user information and the user’s public key. This data structure also contains a identification of the certification authority (CA) and its signature. The signature is a hash of the certificate data that is encrypted using the certification authority’s private key. The certification authority (CA) is responsible for setting policies to validate identity of the person who presents the public key for encapsulation in a certificate. You can validate a user’s certificate if you have the CA’s public key. To validate, you generate a hash of the certificate (minus its signature). You then decrypt the signature on the certificate using the CA’s public key. If the decrypted signature and your hash match, then you are convinced that no data in the certificate has been altered.

The CA’s public key is present in another certificate: the CA’s certificate. CA certificates for several hundred well-known CAs are pre-installed in places such as iOS’s Trust Store, OS X’s keychain, and Microsoft’s Trusted Root Certification Authorities store.

Security

Most security attacks on systems are not cryptographic: they do not find weaknesses in cryptographic algorithms and try to break ciphers. Most rely on bugs in software or the trust of individuals.

For protocols with no encryption that use a public (or sniffable) network, one can sniff the network traffic to extract logins and passwords (there’s a lot of software that makes this easy; snort, for example, is one of the oldest and most popular). If one cannot find a password for a user, one can try guessing it. If one can’t do that then one can try all combinations. An exhaustive search through every possible password may be time-prohibitive. A dictionary attack is one where you go through a dictionary of words and names and test them as potential passwords, applying common tricks such as prefixing and suffixing digits and substituting numbers that look like letters). Performing this attack becomes a lot easier if you are lucky enough to get the hash password that you are searching for. Then you can perform the search on your own machine without going through the login service on the target system.

A social engineering attack is one where you convince a person to give you the necessary credentials. You might do this by impersonating as a system administrator and simply asking. A Trojan horse is a program that masquerades as a legitimate program and tricks you into believing that you are interacting with the trusted program. A common example is a program that masquerades as a login program and obtains an unsuspecting user’s login and password. Phishing is an example of email that purports to come from a trusted party (such as your bank) and attempts to get the user to click on a link that will take them to what they believe is the party’s web site. In reality, it is an intruder’s site that is designed to look like the legitimate one. The goal here is also to collect data such as your login and password or perhaps your bank account, social security, or credit card numbers.

A buffer overflow bug is one where software expects to read a small amount of data into a fixed-size buffer but never checks to see whether the incoming data is bigger than the buffer. What ends up happening is that the software continues to append data to the buffer but is now writing into memory beyond that which is allocated to the buffer. If the buffer was declared as a local variable within a function, then its memory resides on the stack. At some point, the overflow data will clobber the return address for that function. A carefully crafted data stream can ensure that the return address gets modified with the address of other code in this same stream, which will get executed as soon as the function attempts to return. This technique is known as stack smashing.

The first approach to guard against stack smashing was to turn off execute permission in the memory management unit (MMU) for memory pages that are allocated for the stack. In the past, this was not possible since neither AMD nor Intel architectures had MMUs that permitted this. With this in place, an attacker cannot inject executable code.

This defense was not sufficient, however. Since the return address can still be overwritten with a buffer overflow attack, an attacker can find some code within the program or any shared libraries that the program uses that performs a desired task and place that address as the return address on the stack. It is unlikely that there is a segment of code that will do everything the attacker needs. A clever enhancement of this technique led to return oriented programming, or ROP. Here, the attacker finds a sequence of useful bits of code in various libraries. All of the code must be at the tail end of functions with a a return instruction at the end. Once the function returns, it “returns” to an address that is read from the stack and is under control of the attacker. The return address can then take the program to another snippet of code. All together, the attacker adds a sequence of return addresses on the stack to get each block of code to execute in the desired order.

Two additional defenses make this technique more difficult:

stack canaries
The compiler places a random value (called a canary) on the stack just before the the region of data that is allocated to local variables, including buffers. A buffer overflow attack will overwrite this value before it overwrites the return value on the stack. Before returning from the function, the compiler adds code to check the value of the canary. If it is corrupt, the function will abort the program instead of return.
address space layout randomization (ASLR)
The compiler randomly arranges the exact addresses where it places the stack, heap, or libraries. This makes it impossible for the attacker to jump to a known address in a library. However, this requires that all libraries are compiled to generate position independent code.

A rootkit is code that hides the presence of users or additional software from the user. Sometimes, this is done by replacing commands that would present this data (e.g., ps, ls, who, … on UNIX/Linux systems). In more sophisticated implementations, the rootkit modifies the operating system to intercept system calls.

The four A’s of security are:

  1. Authentication: the process of binding an identity to the user. Note the distinction between authentication and identification. Identification is simply the process of asking you to identify yourself (for example, ask for a login name). Authentication is the process of proving that the identification is correct.

  2. Authorization: given an identity, making a decision on what access the user is permitted. Authentication is responsible for access control.

  3. Accounting: logging system activity so that any breaches can be identified (intrusion detection) or a post facto analysis can be performed.

  4. Auditing: inspecting the software and system configuration for security flaws.

Sandboxes

A sandbox is an environment designed for running programs while restricting their access to certain resources, such as disk space, file access, amount of memory, and network connections. It allows users to set restriction policies so that they can run untrusted code with minimal risk of harming their system. It is a form of mandatory access control in that the process, even though it runs under the protection domain of the user, loses specific privileges and cannot regain them.

The simplest approach to sandboxing is the use of the chroot system call on Unix systems. This changes the root directory of the file system for a process from the real root to one that is specified by the parameter to chroot. For example, the call

chroot("/var/spool/postfix");

will set the root directory of the process to /var/spool/postfix. No files outside of this directory will be visible. This form of sandbox is known as a chroot jail. Even if an attacker compromises the application, she will not be able to access any files or run any programs outside the jail. One weakness is that if the attacker gets administrative privileges, she can use the mknod system call to create a device file for the disk that holds the root file system and re-mount the file system. This approach also does not restrict network operations.

A rule-based sandbox provides the ability to set finer-grain policies on what an application cannot do. For example, an application may be disallowed from reading the file /etc/passwd; or be disallowed from writing any files; or not be allowed to establish a TCP/IP connection. Sandboxing is currently supported on a wide variety of platforms at either the kernel or application level. One example is the Apple Sandbox on OS X. It allows a detailed set of policies to be enumerated that govern networking and file system reads and writes. These policies can include patterns to allow one to restrict file operations to specific directories or to files matching certain names. These policies are parsed and loaded into the kernel. The kernel runs the TurstedBSD subsystem. This subsystem was originally designed to enforce mandatory access control policies but in this case passes requests to a kernel extension that goes through the list of rules to determine whether a specific instance of a system call should be allowed or not.

A popular example of a sandbox that operates outside of the operating system is the Java Virtual Machine, initially designed for running Java applets, which were compiled Java programs would be loaded and run dynamically upon fetching a web page. The Java sandbox has three parts to it:

  1. The byte-code verifier tries to ensure that the code looks like valid Java byte code with no attempts to circumvent access restrictions, convert data illegally, or forge pointers.

  2. The class loader enforces restrictions on whether a program is allowed to load additional applets and that an applet is not allowed to access classes belonging to other applets.

  3. The security manager is invoked to provide run-time verification of whether a program has rights to invoke certain methods, such as file I/O or network access.

Signed Software

Signed software allows an operating system to check the signature associated with an executable file to ensure that the file’s contents have not been modified since they left the publisher (for example, a virus is not attached to it). Recall that a signature is an encrypted hash. The basic idea of signing software is to do the same as we would for signing any other digital data: generate a hash of the entire package, encrypt it with the software publisher’s private key, and attach it to the software as a signature. To validate the signature, you would compute the hash of the program and compare it with the decrypted signature. You would use the using the software publisher’s public key to decrypt the signature. This key would be present in a digital certificate that is obtained from that publisher.

For example, Apple OS X’s Gatekeeper allows users to ensure that only apps from the Apple App Store or trusted publishers are allowed to run. All 64-bit versions of Microsoft Windows require that all kernel software, including drivers, must have an accompanying digital signature in order to be loaded.

Computing a hash for software before we run it would involve scanning through the entire code base. Demand-paged virtual memory systems load pages of the program as needed, so this would greatly increase the startup time of the software. Instead, signed software will often contain a signature for every memory page. When a particular page is loaded, the operating system would check the signature for that page.

Virtualization

As a general concept, virtualization is the addition of a layer of abstraction to physical devices. With virtual memory, a process has the impression that it owns the entire memory address space. Different processes can all access the same virtual memory location and the memory management unit (MMU) on the processor maps each access to the unique physical memory locations that are assigned to the process.

With storage virtualization, a computer gets a logical view of disks connected to a machine. In reality, those disks may be networked to a computer via a fibre channel switch or ethernet interface and may be parts of physical disks or collections of disks that appear to the computer as one disk.

CPU virtualization allows programs to execute on a machine that does not really exist. The instructions are interpreted by a program that simulates the architecture of the pseudo machine. Early pseudo-machines included o-code for BCPL and P-code for Pascal. The most popular pseudo-machine today is the Java virtual machine.

Machine virtualization allows a physical computer to act like several real machines with each machine running its own operating system (on a virtual machine) and applications that interact with that operating system. The key to machine virtualization is to not allow each guest operating system to have direct access to certain privileged instructions in the processor. These instructions would allow an operating system to directly access I/O ports, MMU settings, the task register, the halt instruction and other parts of the processor that could interfere with the processor’s behavior and with other operating systems. Instead, such instructions as well as system interrupts are intercepted by the Virtual Machine Monitor (VMM), also known as a hypervisor. The hypervisor arbitrates access to physical resources and presents a set of virtual device interfaces to each guest operating system (including the memory management unit, I/O ports, disks, and network interfaces).

Two configurations of virtual machines are hosted virtual machines and native virtual machines. With a hosted virtual machine, the computer has a primary operating system installed that has access to the raw machine (all devices, memory, and file system). One or more guest operating systems can be run on virtual machines. The VMM serves as a proxy, converting requests from the virtual machine into operations that get executed on the host operating system. A native virtual machine is one where there is no “primary” operating system that owns the system hardware. The hypervisor is in charge of access to the devices and provides each operating system drivers for an abstract view of all the devices.

The latest processors from Intel and AMD support the concept of a virtual machine layer and the ability to intercept privileged instructions. Prior to that, one of two approaches was used to implement virtualization.

Binary translation pre-scans the instruction stream of any code that has to run in privileged mode and replaces all privileged instructions with interrupts to the VMM. Paravirtualization requires modifying the operating system to replace privileged instructions with calls to a VMM API. This, of course, requires access to the source code of the operating system.