Naming and binding

Paul Krzyzanowski

October 30, 2015

What’s in a name? That which we call a rose By any other word would smell as sweet.Shakespeare, Romeo and Juliet 2.2.43–44

Definitions

One danger in any discussion of naming is that of getting carried away with both the metaphysical and semantic definitions of naming. We will attempt to avoid that here. John F. Shoch’s definitions in Inter-Network Naming, Addressing, Routing (RFC 1498) does a fine job in defining the basic terms:

Name
A name identifies what you want.
Address
An address identifies where it is.
Route
A route identifies a way to get there.

A few useful additional definitions put forth by J.H. Saltzer in his 1978 paper include:

Binding
Binding is the process of mapping a name to an address. J. H. Saltzer defines it as choosing “a lower-level implementation for a particular higher-level semantic construct.” We can avoid the semantic problems of differentiating names from addresses by thinking of binding as the process of mapping a name into some lower-level name.
Context
A context is a particular set of bindings. A name only has meaning relative to some context.
Directory or Naming Network
A set of catalogs (name→object binding tables) that may include other directories
Naming hierarchy
A naming network organized in a tree-structured form.
Pathname
A multi-component name traversing a path in a naming hierarchy.
Root
A starting catalog in a naming network.
Indirect entry
An entry in a catalog that binds to a name instead of the underlying object.
Name service
A service that provides a binding function. A computing environment will have many names, each relevant within a specific context. On a networked system, we may need names for:
Name Description
Services for example, time of day, file service, domain name service, …
Nodes identify a computer that can run services or programs
Network attachment points ports on a network: places where a node is attached (not exactly applicable for an ethernet)
Paths the route between network attachment points
Objects within services for example, file names on a file server

A name can be anything that is convenient or makes sense in describing some object or service. For humans, it may be a human-readable character string. For machines, it may be a binary identifier (“address”). A service may run at one or more nodes. It may need to move from node to node without losing its identity. A node may be connected to one or more network attachment points and may need to move from one to another place on the network without losing its identity. Multiple paths may exist between network attachment points or the path might change. These conditions warn us that it is not a good idea for a name to contain an implicit binding, one that can be derived from the name alone (for example, having a node name be a textual representation of the machine’s Ethernet address).

Names may be considered pure or impure. A pure name is one that contains no information about the object or where that object may be found. For example, a user ID is a pure name. An ethernet MAC address is a pure name (a low level one) since it does not tell you anything about the device or where it is located. Pure names contain no discernable context. An impure name contains information about the underlying object encoded within the name. For example, an email address is an impure name since it contains a domain name as a context – supplementary information about the name. An IP domain name, such as cs.rutgers.edu, is another example of an impure name since it contains embedded within it a hierarchy that identifies the context of that name: “cs” is a pure name but it’s within “rutgers,” which is within “edu.” A file pathname is impure since it contains identification about where the object is located. If the object is moved, the name is no longer relevant.

Note that impure names are not inherently bad. They can make lookup of names easier in many cases. The basic problem with impure names is with moving objects around in the namespace; it may lead to their name changing. A problem with pure names is managing the namespace: how do you look up an object efficiently if you have billions of names? How do you know for certain that an object does not exist?

Uniqueness for names can be a challenge, particularly when there is a huge amount of names involved and multiple users (or organziations) that choose them. A way of achieving uniqueness is to use a hierarchy. A hierarchy is a collection of pure names that creates a compound name. This leads to an impure name but each level of the hierarchy can manage the scope of names at that level. For example, the .edu administrators manage the uniqueness of the name “rutgers.edu.” The rutgers.edu administrators manage the uniqueness of the name “cs.rutgers.edu” and the CS department at Rutgers can pick whatever names they want as long as they are unique within that domain. Examples of compound names are domain names (www.cs.rutgers.edu), URLs (http://pk.org/417/notes/naming.html), and file pathnames (/usr/share/dict/words).

Binding

Let us consider the basic mechanism of sending a data packet to a service and the bindings that are involved in doing this.

  1. Find a node on which the required service resides. This requires service name resolution. The binding is service name to node name.

  2. Find a network attachment point to which the node is connected. This requires locating the node name. The binding is node name to attachment point (or address).

  3. Find a path from this location to that point. This is the routing service. The binding is address to route.

As an example of naming and binding, we can consider the machine names on the Internet. A name of cs.rutgers.edu may bind to the IP address of 128.6.4.2. In turn, the IP address 128.6.4.2 may bind to the Ethernet address 08:00:20:1f:13:83. Note that the term address becomes context-dependent. We tend to think of addresses as the “lower-level” representation of a name. In essence, addresses are just names. A human is comfortable dealing with cs.rutgers.edu. An IP driver finds it easier to work with the 32-bit name 128.6.4.2, and an Ethernet driver prefers the 48-bit name 08:00:20:1f:13:83.

If we consider naming and binding in file systems, we have the user-friendly and programmer-friendly textual names that are bound to internal names that are a function of the file system implementation. For example, a pathname of /bin/ls may bind to {device major number=8, minor number=1, inode 1311206}.

An important issue in binding is that of when the binding takes place. It’s desirable to delay it for as long as possible to ensure that we have the very latest binding in case names change or resources move but we need to balance that with performance implications. A machine’s address may change while its name remains the same, yet the services on that machine should be accessible. A service may run on a different port, yet a client should be able to locate it. The inode allocated to a file may change, yet that should not cause problems in accessing the file by name.

The least flexible binding is static binding. This is essentially a hard-coded binding. For example, a program may assume that SMTP mail service is always available on port 25 and simply access port 25 instead of attempting to resolve the binding through other means. Fortunately, well-followed conventions will often save that program. A more dangerous example is that of a program attempting to contact a machine by using a hard-coded IP or Ethernet address.

An alternative to static binding is dynamic binding. With dynamic binding, we no longer rely on a hard-coded name address binding but have some mechanism for resolving the name on demand. One form of dynamic binding is early binding. In this case a binding operation is actually performed, but it is performed some time before the binding is needed. For example, if a program needs to contact a server multiple times during a long period of execution, it might perform a name to IP address binding once at the start for efficiency. The danger here is that the server’s address may change during the execution of this program and the program will be unable to contact the server at some point in time (or will contact the wrong server). At times, early binding is a crucial optimization. IP address to Ethernet address binding is an example of a case where it would be prohibitively expensive to resolve an IP address for every packet sent from a machine. It makes sense to maintain a cache (the ARP cache) of previously used bindings. Problems arise if the bindings change. With ethernet addresses, the problem is usually that of not being able to reach the destination machine and the system attempts to perform another binding. Early bindings hurt in a dynamically changing environment.

Most flexible is late binding, a form of dynamic binding where the binding is performed on demand, just before use. Previous bindings are not cached. An example where this is useful is accessing a file system. Consider a user editing a file and frequently saving changes. Assume that the editor is implemented in such a way that it writes into a temporary file during the edit and renames the file to the permanent name upon save. In this case, a different inode (or FAT index) is allocated each time the file is saved, yet the name remains the same. The only way of assuring that the correct file is opened (say, by another process) is by performing the exernal to internal name binding at the time of open. The caveat with dynamic binding is that the name resolution process often takes quite a bit of time. If the same name has to be resolved over and over again, early binding may yield considerable performance gains.

Name servers

It’s one thing to say that we have a name for an object and have a name-to-address binding for that object but where dooes this list of bindings reside? How do we look up names? A solution, of course, is to provide a naming system for a particular set of names. This system is often made available as a network service: a name server that is accessible on some well-known system on a known port. The domain name server, DNS, which resolves IP domain names to IP addresses, is an example of a name server. On a large scale, a single server can become a bottleneck so it may be implemented as a distributed collection of name servers.

Replication of name server contents is useful for both scalability and fault tolerance. It helps with scalability because several servers with replicated data can take on client load. It helps with fault tolerance because a client can contact another replicated server if one is not accessible. Caching is also a form of replication since a copy of frequently used name-to-address bindings resides in the cache.
An ARP cache (a cache of recently used IP address to Ethernet address bindings) is an example of this. Any form of replication is at risk of consistency problems. If a master copy is modified, replicated copies will have stale bindings. Even if some form of synchronization is employed, the process may take time. The hope in cases where replication is used is that it often does not matter too much if the data is stale and the binding may be performed again.

References

  • John F. Shoch, Inter-Network Naming, Addressing, Routing, RFC 1498. 1978 [easy and quick reading]

  • J. H. Saltzer, On Naming and Binding. 1978, MIT. [dated terminology and overly-philosophical, yet this is a good paper to read]

  • J. H. Saltzer, Naming and Binding of Objects, Chapter 3A. © 1978 J. H. Saltzer.

  • Sape Mullender, Ed., Distributed Systems, Chapter 12: Names by Roger M. Needham, © 1993 Addison-Wesley [good coverage, but make sure you read Saltzer’s paper first]

This is a revision of an original version that was written on september 29, 2012.