pk.org: CS 417/Lecture Notes

Network-Attached Storage

Distributed File Systems, Caching, and Consistency

Paul Krzyzanowski – 2026-03-03

In the early days of networked computing, sharing data meant either logging in to the remote machine and working there, or explicitly copying files back and forth with tools like FTP. Both approaches put the burden of managing data locations on the user. If you moved a file to a different server, everyone who depended on it had to update their scripts and habits. This remains common in many environments today, where users log in to another system (via a command like ssh or a virtual desktop), copy files via scp or sftp, or transfer content via email, text, or a removable flash drive.

The goal of network-attached storage (NAS) is access transparency: remote files should look and behave like local files. Processes should be able to open, read, and write files on a remote server using the same system calls they use for local files, without knowing or caring where the storage physically lives. The applications do not change.

In this discussion, we cover the major architectural approaches to remote file access: NFS and SMB, which represent opposite design philosophies from the same era; AFS, which solved NFS’s scalability problem through whole-file caching and a uniform global namespace; and Coda, which extended AFS for disconnected operation. We then look at how NFS and SMB each evolved to absorb the better ideas from those systems.


Access Transparency and the Virtual File System

Achieving access transparency requires tricking the operating system into treating remote files the same way it treats local ones. The mechanism that makes this possible is the Virtual File System (VFS) layer, first introduced in SunOS in 1985 and since been adopted widely across Unix and Unix-like operating systems (Linux, macOS, *BSD). Windows has a conceptually similar abstraction layer via the Installable File System driver, IFS.

Before VFS, an operating system had one file system implementation compiled in. Supporting a second type of file system (say, a different disk format from another vendor) required modifying the kernel. VFS solved this by introducing a clean abstraction layer: a standard interface of file system operations (open, read, write, close, stat, and so on) that any file system driver must implement. The kernel always talks to this interface; the driver underneath handles the specifics.

This makes supporting multiple file system types straightforward. Linux can simultaneously mount an ext4 partition, a FAT-formatted USB drive, and a ZFS pool because each has a driver that implements the VFS interface. The kernel does not need to know which is which.

Mount points are how the VFS layer stitches these different file systems into a single unified directory tree. When you mount a device or a remote share, you attach it at some path in the existing tree. Accessing /mnt/data/report.txt works exactly the same whether /mnt/data is a local disk, a USB stick, or a remote server. The VFS layer intercepts the path lookup and dispatches it to whichever driver owns that part of the tree.

A remote file system client is just another VFS driver. Instead of issuing disk read and write commands, it translates VFS operations into network requests and sends them to a server. When the response arrives, it returns the result back up through the VFS interface to the waiting application. From the application’s perspective, nothing is different. The path /home/paul/paper.tex is just a path, regardless of which server holds it. This is how NFS, AFS, and SMB all achieve their access transparency.


Design Considerations

Before studying specific systems, it is worth understanding the design dimensions that all networked file systems must navigate.

Consistency

If two clients cache the same file and one modifies it, what does the other see? In a local file system, this is solved by the operating system kernel, which has a single coherent buffer cache. In a distributed file system, you have multiple independent caches on separate machines with no shared memory. Keeping them consistent requires either frequent communication with the server or a protocol where the server actively notifies clients of changes.

State

State is any information the server stores about what clients are doing. A completely stateless server stores nothing about client activity between requests. Every request must carry enough context to be executed independently. A stateful server tracks which clients have which files open, which locks are held, and what data each client has cached.

Stateless servers make crash and restart behavior much easier to support: there is nothing to recover. But statelessness makes it impossible for the server to coordinate activity between clients. For example, the server cannot tell you whether another client has a lock, because it does not track locks. It also cannot inform a client that its cached data is no longer valid because it doesn’t store any information on client cache behavior.

Stateful servers enable richer semantics but must handle the recovery problem: after a crash and restart, what happens to clients that had state (open files, locks) with the old server instance? The server must either restore that state from persistent storage, negotiate a recovery protocol with clients to rebuild it, or declare all previous state invalid and force clients to re-establish their sessions. Each approach has costs. They also have to be prepared to handle the overhead of tracking information about every client that is using the service.

Caching

Accessing remote files will usually be slower than accessing local files. There’s the overhead of the network, the local kernel, processing the request on the file server, and possibly a more heavily loaded server.

Caching reduces latency and server load by keeping copies of recently accessed data close to the user. The challenge is keeping those copies consistent. Options include:

Write-through: Modifications are sent to the server immediately, keeping the server always up to date. Other clients still need to learn about the change, either by checking with the server or by receiving an invalidation.

Write-behind (delayed write): Modifications are accumulated locally and sent to the server later. More efficient in bulk, but the server may have stale data in the interim.

Write-on-close (session semantics): All modifications are sent when the file is closed. Efficient for many workloads, but if two clients are both modifying the same file, the last one to close it overwrites the other’s changes.

Callbacks: The server keeps track of which clients have cached a file. When the file is modified, the server proactively sends invalidation messages to those clients. This is more efficient than polling but requires the server to maintain per-file lists of which clients hold cached copies – it is inherently a stateful mechanism. A stateless server cannot send callbacks because it does not know who has cached what.

On the read side, clients commonly use read-ahead: fetching blocks beyond what the application has explicitly requested, on the assumption that sequential access is likely. This hides network latency for streaming reads and is effective for workloads that scan files sequentially.

Security

How does the server know who is making a request? This is not a trivial question when client and server are separate machines. A network file system must decide how much it trusts the identity claims in each request.

The original NFS approach was to trust the user ID and group IDs that the client included in each RPC call. This is simple but completely insecure: any user with sufficient privilege on their client machine can claim to be any other user on the server. In a closed, trusted network of a few dozen machines all administered by the same team, this was acceptable. In a large or open environment, it is not.

More secure systems authenticate the client’s identity through a trusted third party (typically Kerberos), so the server can verify that the claimed identity is genuine. NFSv4 made this mandatory. A networked file system’s security model is a design choice with real consequences for what environments the system can safely be deployed in.

Access Models

There are two fundamental approaches to remote file access.

In the remote access model, the server provides a functional interface: read N bytes at offset M, write a chunk of data, create a file. The client sends fine-grained requests for specific file operations, and the server responds. Sending requests to the server helps ensure that the server always has the authoritative copy of the data. However, latency is higher since every file operation involves a network round trip.

In the upload/download model, the entire file is downloaded to the client when the file is opened (often fetching it in chunks as needed). The client then works with its local copy. On close, if the file was modified, it is uploaded back to the server. File access during the session is local and fast. The tradeoffs are that downloading an entire large file when you only need a few bytes is wasteful, and concurrent modifications by multiple clients lead to last-write-wins conflicts.

Access Semantics

The access model a system uses determines what consistency guarantees applications can rely on. It is worth being explicit about this before looking at specific systems.

Unix (POSIX) semantics define the ideal: a write by one process is immediately visible to all other processes reading the same file. The local kernel enforces this through a single shared buffer cache. Every reader sees the most recent write, always.

Networked file systems cannot provide this without a network round trip on every read – or an always-up-to-date push from the server on every write. Neither is practical at scale.

Close-to-open consistency (used by NFS) is a pragmatic relaxation. The server’s version is consulted when a file is opened, but reads between opens may return stale cached data. Two clients can simultaneously hold stale copies of the same file and diverge. The inconsistency resolves on the next open.

Session semantics (used by AFS and Coda) go further: a client’s modifications are not visible to any other client until the file is closed and the updated version is uploaded. Within a single open session, the client works entirely from its local copy. If two clients modify the same file during overlapping sessions, the last one to close wins. This is weaker than close-to-open consistency in one sense – updates are invisible longer – but it enables the aggressive caching that makes AFS fast.

Understanding which semantics a system provides is essential for reasoning about whether an application will behave correctly when run against that system.


NFS: Simple, Stateless, Interoperable

Motivation

NFS (Network File System) was developed at Sun Microsystems and first described publicly in 1985. Sun’s goal was not to build a perfect distributed file system. It was to build a simple, vendor-neutral file sharing protocol that any Unix system (and many other operating systems) could implement quickly. The design goals were deliberately conservative:

  1. Cross-platform interoperability: Any networked system should be able to act as an NFS client or server, not just Unix. NFS was built on Sun’s ONC RPC (Open Network Computing RPC) and XDR (External Data Representation) for encoding, both of which Sun published openly. The protocol was quickly ported to DOS, VMS, OS/2, VM/CMS, and other platforms, in both client and server roles. Interoperability was tested at annual “Connectathon” events starting in 1986.

  2. Statelessness: If the server crashes and reboots, clients should be able to continue without any recovery protocol. All necessary context arrives with each request. This made simple request retransmission feasible and made UDP a natural early choice: if a request is lost, the client can retry it, and many operations are safe to reissue. Later deployments commonly used TCP as networks and workloads changed.

  3. Simplicity: The protocol should be a minimal RPC interface, exposing only the operations necessary for basic file access.

An important consequence of these goals is that NFS was explicitly not designed to preserve all Unix file system semantics. It was designed to be implementable on a wide variety of backends, some of which did not support Unix semantics at all (DOS, VMS, and others implemented NFS servers). Where a tradeoff arose between full semantic fidelity and interoperability, NFS chose interoperability.

The NFSv2 RPC Interface

The simplicity of NFS’s original protocol is best understood by looking at its complete list of remote procedures from RFC 1094:

Procedure Description
NULL No-op; used to test server reachability
GETATTR Get file attributes (size, timestamps, type, permissions)
SETATTR Set file attributes
LOOKUP Look up a filename in a directory; return a file handle
READLINK Read the target path of a symbolic link
READ Read bytes from a file at a given offset
WRITE Write bytes to a file at a given offset
CREATE Create a new file
REMOVE Remove a file
RENAME Rename a file
LINK Create a hard link
SYMLINK Create a symbolic link
MKDIR Create a directory
RMDIR Remove a directory
READDIR Read entries from a directory
STATFS Get file system statistics (total blocks, free blocks, etc.)

There is no OPEN, no CLOSE, no LOCK, no SEEK, and no APPEND. These operations would all require the server to maintain state.

The file handle is an opaque identifier the server returns from LOOKUP or CREATE. The client uses it in subsequent requests. It persists across server restarts (the server encodes enough information in the handle to reconstruct which file it refers to), which is what makes the stateless model work.

What NFS Cannot Do Well

Because NFS was stateless and kept its RPC set minimal, several Unix file access behaviors are either unsupported or semantically broken.

File locking. Locks require state: the server must remember who holds which lock. The original NFS had no locking support at all. File locking was added later through a separate service, the Network Lock Manager (NLM), which ran alongside the NFS server and maintained lock state independently. If the NLM crashed separately from the NFS server, lock state could be lost or inconsistent. The Network Status Monitor (NSM) protocol was added alongside NLM to detect client crashes and release their locks, but it was an awkward retrofit.

Appending to a file. There is no APPEND procedure. To append, a client first calls GETATTR to get the current file size, then calls WRITE at that offset. But between those two calls, another client may have also appended to the file. The second client’s append overwrites or gaps the first. This race condition is inherent to the stateless model: without a lock or an atomic append operation, concurrent appends are not safe.

Open file reference counting. On a local Unix file system, if a process deletes a file while another process has it open, the file data persists until the last file descriptor referring to it is closed. The directory entry disappears immediately, but the inode and its data remain. NFS has no equivalent mechanism: the server has no knowledge of which files clients have open. If a client deletes a file and another client still has it open, the file disappears from the server. The second client’s subsequent reads may fail or return errors. NFS client implementations work around this with silly renames: before deleting a file, the client checks if it has the file open locally, and if so, renames it to a hidden name (like .nfsXXXXXX) before sending the REMOVE to the server. When the local file descriptor is eventually closed, the client removes the renamed file.

Security. The original NFS used AUTH_UNIX authentication, which simply sends the client’s user ID and group IDs in each RPC call. The server trusts these values. Any user who can become root on their client machine can impersonate any other user to the NFS server. This was acceptable in the closed, trusted networks of 1985, but it is a significant weakness in modern environments. NFSv4 addressed this with mandatory support for RPCSEC_GSS and Kerberos.

Caching in NFS

NFS clients cache data in blocks and prefetch additional blocks ahead of what the application has requested. The maximum transfer size in NFSv2 was 8 KB. Because the server is stateless and tracks no client caches, there is no callback mechanism. Instead, NFS uses timestamp validation: before using a cached block, the client checks the file’s modification time on the server and discards cached data if the server’s version is newer.

“Periodically” is the problem. NFS checks the modification time only when a file is opened (close-to-open consistency) or after a short cache validity timeout of a few seconds. Between checks, a client may read stale data. This is weaker than the sequential semantics of a local file system, and it has caused real bugs in distributed applications that assumed tighter guarantees.

NFS in Practice

Despite its consistency limitations, NFS became ubiquitous because it solved the right problem at the right time. It was simple, worked across multiple vendors, and was good enough for the workloads of the era. Non-shared content and read-only shared directories (documentation, software installations, home directories with mostly personal files) worked fine. Concurrent writes to shared files were rare in practice, and the consistency model was acceptable when users understood it.


AFS: Scaling Through Whole-File Caching

Motivation

AFS grew out of the Andrew Project at Carnegie Mellon University, which began in 1982 as a collaboration between CMU and IBM to build a campus-wide distributed computing environment.

The design of AFS was shaped by careful empirical measurement of how users actually accessed files. A 1985 paper by Satyanarayanan and colleagues analyzed file system workloads across the CMU campus and found that the vast majority of file accesses were reads, files were overwhelmingly accessed by a single user, and most files were small and accessed in their entirety. These observations pointed directly to a design: if you can cache the entire file at the client and trust the server to tell you when it changes, you can serve nearly all accesses locally and dramatically reduce server load.

AFS was designed to fix exactly this. NFS clients had to frequently validate their caches against the server because the server had no way to push notifications. The server was constantly fielding validation requests even when nothing had changed. AFS’s design aimed to make that constant back-and-forth unnecessary.

Whole-File Caching

When a process opens a file under AFS, the client caches the file on local disk, typically fetching it in chunks on demand. Reads and writes during the session hit this cached copy with no server involvement. When the process closes the file, if it was modified, the client uploads the updated contents back to the server.

This is the upload/download model taken to its logical conclusion. Because the client works from a local disk copy, file access is as fast as a local disk regardless of network conditions. The client never stalls waiting for the network while reading or writing.

The consistency behavior is session semantics: changes made by one client become visible to others only after the file is closed and uploaded. If two clients write the same file concurrently, the last one to close wins, and earlier changes can be overwritten. This differs from NFS close-to-open consistency: under AFS, other clients do not see your updates until you close the file.

Callbacks

The key mechanism that makes AFS’s aggressive caching safe is the callback. When a client downloads a file, the server records the download in a per-file callback list and makes a callback promise to the client: “I will notify you if this file changes.” The client can then use its local cached copy indefinitely without asking the server for updates.

When a client closes a modified file and uploads it, the server sends callback revocations to every other client on the callback list for that file. Those clients mark their cached copies as invalid. The next time one of them opens the file, it downloads the new version and gets a fresh callback promise.

This inversion, where the server pushes invalidations rather than clients polling for changes, is what gives AFS its scalability. The server only communicates with a client when something interesting happens, not on every access. In practice, a file is opened and read far more often than it is modified, so most client accesses proceed without any server interaction.

Naming and the Global Namespace

One of AFS’s most significant departures from NFS is how it handles naming. In NFS, each client machine mounts remote directories at whatever local path the system administrator chooses. On one machine, a file system might be mounted at /home; on another, the same file system might appear at /nfs/users. There is no guaranteed consistency. A script that works on one machine may fail on another because the same data is at a different path.

AFS enforces a uniform global namespace. All AFS content appears under /afs on every client machine. The second level of the path is the cell name, which corresponds to an administrative domain (typically a domain name, such as cs.rutgers.edu or athena.mit.edu). So a file at /afs/cs.rutgers.edu/user/paul/notes.txt has exactly that same path on every AFS client machine in the world, regardless of which server stores it and regardless of which client the user is sitting at.

This was a major usability improvement over NFS. Users could move between machines without worrying whether the same paths were available. Shell scripts and makefiles could be shared without concern for whether the paths they reference would resolve correctly on another system.

AFS organizes the file system into volumes: a subtree of the file system namespace, typically corresponding to a user’s home directory or a shared software repository. Each volume has a globally unique ID within an AFS cell. If an administrator needs to move a volume to a different server for load balancing or maintenance, the old server can issue a referral pointing to the new location. The client transparently follows the referral. Applications are never aware that the storage has moved. The file’s path in the global namespace remains stable even as the underlying server changes.


Coda: Disconnected Operation

Coda keeps AFS’s client caching and callback-based coherence, but adds replication and a mode where the client continues operating while offline.

Motivation

AFS assumed that clients always have network access. In the late 1980s, researchers at Carnegie Mellon began building Coda on top of AFS to support laptops and mobile workstations that might be intermittently disconnected. The design question was: can a client continue to work with cached files when the network is unavailable, and then reconcile changes when connectivity is restored?

Replicated Storage

Coda extended AFS’s volume concept to allow volumes to be replicated across multiple servers. The set of servers hosting a volume is called a Volume Storage Group (VSG). At any moment, the subset of those servers that the client can currently reach is the Accessible Volume Storage Group (AVSG).

When a client opens a file, it contacts the accessible servers and compares their version vectors to verify they are mutually consistent. If they disagree, the client initiates a resolution to synchronize the out-of-date servers before proceeding. This check happens on every open, whether for read or write. Reads can then be served by any accessible server. Writes go to all accessible servers simultaneously.

Disconnected Mode

When no server in the VSG is reachable, the client enters disconnected operation mode. In this mode, the client works entirely from its local disk cache. File system operations are recorded in a client modification log (CML) rather than being sent to the server.

The CML is a log of operations – store, create, remove, rename, mkdir – not a copy of file contents. The actual modified file data stays in the local disk cache. This distinction matters: if a file is modified multiple times during disconnection, only the final cached version is uploaded on reconnection, even if the CML contains multiple store entries for it. It also creates a dependency between the CML and the cache: if a modified file is evicted from the local cache before reconnection, the CML has a store entry pointing to content that is no longer available, and that data is lost.

When network connectivity is restored, Coda replays the CML in order, uploading cached file contents for each store operation and applying other operations to the server. It also receives invalidations from the server for files that changed during the disconnection. If a file was modified both by the disconnected client and by another client during the disconnection period, a conflict occurs. Coda detects these conflicts and flags them for user resolution. There is no automatic merge; the user must manually reconcile the conflicting versions. The operation-log structure is also what makes conflict detection possible: Coda can compare the sequence of operations in the CML against what the server recorded during the disconnection and identify files that were touched by both sides.

Hoarding

Disconnected operation only works if the files you need are in the local cache when you lose connectivity. Coda supports hoarding: explicit, user-directed cache prefetching. Before leaving the office (and the network), a user can tell Coda which files to ensure are in the local cache. The hoard database tracks these preferences. Coda periodically fills the cache with hoarded files so they are available offline.

AFS and Coda Today

Neither AFS nor Coda is widely used in modern deployments. AFS survives at some universities and research institutions, where it was deeply embedded in campus infrastructure, and OpenAFS is still maintained as an open-source project. But its operational complexity, aging authentication model (it was built around Kerberos 4), and lack of mainstream OS integration have made it difficult to justify in new deployments. NFS and SMB do the job well enough for most workloads, and object storage handles many of the large-scale use cases that AFS was built for.

Coda remained a research prototype and was never widely deployed outside of CMU. Its ideas, particularly the client modification log and conflict detection on reconnection, influenced later work on optimistic replication and offline-capable systems, but Coda itself is no longer actively developed.


Evolution: NFS and SMB After AFS

AFS introduced ideas that fundamentally changed how people thought about networked file systems: long-term caching backed by server callbacks, server-to-client notification, and location transparency through referrals. Both NFS and Microsoft’s SMB protocol eventually absorbed these ideas, each in their own way.

SMB: Stateful from the Start

Microsoft’s Server Message Block (SMB) protocol, which became the native file sharing mechanism for Windows, was designed in the 1980s with exactly the opposite philosophy from NFS.

Where NFS was stateless and simple, SMB was connection-oriented and deeply stateful. Clients opened persistent sessions with servers, and the server tracked every open file, every lock, and every byte range under lock.

SMB’s primary goal was correctness under Windows semantics, not cross-platform interoperability. Windows applications expect strong locking guarantees: if you open a file for exclusive write access in Word, no other application can open it for writing. NFS’s advisory locking retrofitted via NLM was too weak for this. SMB enforced access modes at the server.

The stateful model enabled something NFS could not do: proper mandatory locking, byte-range locks, and the ability to support the exact file-sharing semantics Windows users expected. The cost was that server crashes were disruptive: all sessions, all open files, and all locks were lost.

Opportunistic Locks (Oplocks) in SMB

When Windows NT introduced a need for better performance through caching, Microsoft added opportunistic locks (oplocks) to SMB. An oplock is a server-granted capability that tells the client what kind of caching is safe. The core mechanism is that the server monitors file access and intervenes when a conflict arises – exactly the same idea as AFS callbacks, but applied at a finer granularity and with multiple modes.

There are four classic oplock types worth understanding:

Exclusive (Level 1) oplock. The server grants this when only one client has the file open. The client can cache both reads and writes locally and need not contact the server for either. This is the highest-performance mode. Example: a user opens a Word document that nobody else is editing. Word gets an exclusive oplock and buffers all edits locally. Reads and writes never touch the network until the file is closed or the oplock is broken.

Level 2 oplock. Granted to multiple clients when all of them are reading the same file and none is writing. Each client can cache reads locally. If any client attempts a write, the server breaks all Level 2 oplocks first. Example: several users have a shared reference document open simultaneously. Each caches the data locally and reads without network traffic, knowing no one is modifying it.

Batch oplock. Originally designed for build tools and batch scripts that repeatedly open and close the same file in rapid succession. With a batch oplock, the client keeps the file “open” on the server even after the local process closes it, so that the next open can proceed without a round trip. Example: a C compiler opens, reads, and closes the same header file dozens of times during a build. A batch oplock collapses this into a single persistent open on the server.

Filter oplock. Granted exclusively but yields immediately when any other process opens the file. Primarily useful for anti-virus scanners and file system filters that need to inspect a file before other processes access it, without blocking those processes for long.

When a conflict arises, the server sends an oplock break to the client holding the oplock. The client must flush any cached writes to the server and acknowledge the break before the server can honor the conflicting request. The conflicting client waits during this exchange (with a timeout, typically 35 seconds, after which the operation may fail). This is why a second client can appear to hang on open: it is waiting for the first client to flush cached state and acknowledge the break.

For example, Alice has an exclusive oplock on budget.xlsx and has been editing for ten minutes without saving. Bob opens the same file. The server sends Alice an oplock break: her client immediately flushes her pending writes to the server, breaks to a Level 2 oplock (or no oplock if Bob needs write access), and the server then allows Bob’s open to complete.

Later versions of Windows generalized oplocks into leases (introduced in SMB 2.1 with Windows 7). Leases provide the same caching permissions but with cleaner semantics: they are named by a client-generated key rather than by a file handle, which means the caching grant can survive a file being closed and reopened, and can cover both file data and directory metadata in a single grant. The practical effect is more efficient caching for applications that open the same file repeatedly.

SMB 2 and Beyond

The original SMB protocol was verbose and chatty, designed for slow local area networks of the 1980s. It was dramatically redesigned in SMB 2 (introduced with Windows Vista in 2007). The redesign reduced the command set from over 100 operations to 19, added pipelining (sending multiple requests before receiving responses), and compounding (packing multiple related operations into a single network message). Both reduce round-trip latency on higher-latency networks.

SMB 2 also added durable handles: a client that loses network connectivity temporarily can reconnect and resume with all its file handles intact, rather than having to reopen every file and re-establish every lock. This is a significant improvement in recovery behavior.

Apple dropped its proprietary AFP protocol in favor of SMB 2 in macOS 10.9 (Mavericks, 2013), acknowledging that SMB 2’s performance and interoperability had surpassed what AFP could offer. SMB remains the default file sharing protocol on macOS today. macOS also includes a built-in NFS client and can mount NFS shares, making it a capable participant in Unix-oriented environments, but SMB is what macOS uses by default when connecting to file servers.

SMB 3, introduced with Windows 8 and Windows Server 2012, added capabilities relevant to high-availability and datacenter deployments. Transparent Failover allows a client connected to a clustered file server to survive the failure of one cluster node and reconnect to another without losing its open files or locks – session state is shared across the cluster nodes. SMB Multichannel allows a single SMB session to use multiple network interfaces simultaneously, increasing throughput and providing redundancy if one path fails. SMB 3 also introduced end-to-end encryption at the protocol level, replacing the older approach of tunneling SMB through IPsec.

NFS Version 4: Abandoning Statelessness

NFSv4, standardized by the IETF in the early 2000s, is a substantial departure from the original NFS design principles. After two decades of experience with stateless NFS, the conclusion was clear: statelessness was preventing better consistency guarantees and forcing awkward workarounds for locking and caching.

NFSv4 introduced a stateful server. Clients now open and close files explicitly, and the server tracks open file state. With statefulness came the ability to grant delegations, which are conceptually the same as SMB’s oplocks: the server grants the client the right to perform certain operations on a file (read, write, or both) without contacting the server, knowing it will notify the client if another client creates a conflict.

NFSv4 also added compound RPC: multiple operations can be packed into a single request, reducing round trips for common sequences of operations (such as looking up a path component by component). The protocol moved from UDP to mandatory TCP, and added strong authentication and encryption, which original NFS lacked almost entirely.

NFSv4 borrowed location transparency from AFS: a server can issue referrals directing clients to wherever content actually lives, even if that is a different server. This is an administrative tool for planned migrations and namespace federation – the administrator moves content to a new server while the old one is still running, and the old server issues referrals so clients follow transparently. Once the migration is complete, the old server can be decommissioned. The original NFS had no equivalent mechanism; moving content to a different server required updating every client’s mount configuration manually.

The Convergence

Looking at NFSv4 and modern SMB side by side, the distance between them has narrowed considerably. The table below shows where each protocol stands on the key mechanisms we have covered.

Mechanism NFS (v2/v3) NFSv4 SMB 1 SMB 2+
Stateful server No Yes Yes Yes
Compound/pipelined requests No Yes (COMPOUND RPC) No Yes (compounding + pipelining)
Client caching grants No Yes (delegations) Yes (oplocks) Yes (oplocks + leases)
Server-to-client notification No Yes (via delegation recall) Yes (oplock break) Yes (oplock break / lease break)
Referrals No Yes Yes (via DFS) Yes (via DFS)
Strong authentication Optional (Kerberos via RPCSEC_GSS) Mandatory NTLM / Kerberos Kerberos / NTLMv2
Transport UDP or TCP TCP only TCP (NetBIOS) TCP

Each of these mechanisms addresses a specific limitation of the original stateless, single-request-per-round-trip designs.

The remaining differences are deployment context and interoperability stance. NFS is dominant in Linux and Unix environments and is the standard choice for shared storage in HPC clusters and virtualization infrastructure. SMB is native to Windows and dominant in enterprise file server environments, though it runs fine on Linux via Samba and on macOS natively (macOS also supports NFS).


Consistency Semantics: A Summary

It is worth pausing to compare the consistency models we have seen.

POSIX/Unix semantics (local files): The kernel provides a single coherent cache, so processes generally observe recent writes quickly and consistently.

NFS (close-to-open consistency): When a file is opened, the client checks whether the cached version is stale. Between opens, stale reads are possible. Not POSIX-compliant but acceptable for most workloads.

AFS/Coda (session semantics): Changes made by one client become visible to others only after the file is closed. The last client to close a modified file wins. Weaker than NFS but enables aggressive caching.

NFSv4/SMB with delegations or oplocks: The server actively manages which clients can cache what, and revokes permissions when conflicts arise. In the absence of conflict, clients can cache aggressively. In the presence of conflict, the server coordinates. Closer to POSIX semantics than NFS or AFS, but still not identical.

None of these systems provide full POSIX semantics across a network without significant performance cost. The history of networked file systems is the history of finding acceptable compromises between consistency, performance, and simplicity.


Access Transparency: File Systems vs. Coordination Services

It is worth highlighting a fundamental difference between the networked file systems we covered in this lecture and the coordination services we discussed earlier.

NFS, AFS, and SMB all achieve access transparency: applications interact with remote files using standard file system calls (open, read, write, close) with no awareness that the storage is remote. The VFS layer hides the distinction. A program compiled against local files runs unchanged against NFS-mounted files.

Chubby, ZooKeeper, and etcd make no attempt at access transparency. Applications talk to them through a specific client library or HTTP/gRPC API. There is no pretense that these are ordinary files, and they are not mounted into the local directory tree. They are explicitly distributed coordination services, and applications are written explicitly to use them as such.

This is the right design choice for coordination services. The semantics of a coordination service (strong consistency, leases, watches, distributed locks) do not map cleanly onto the POSIX file API. Trying to hide that complexity behind open() and read() would mislead developers about what guarantees they are getting. Explicit APIs make the coordination model visible.


Next: Week 6 Study Guide

Back to CS 417 Documents