The Remote Procedure Call (RPC) provides an abstraction for process-to-process communication that enables calling functions on remote systems without having to handle data formatting and message parsing. RPC transforms network communication from I/O-based read/write interfaces to familiar procedure call semantics.
RPC Operation
The ability to call a remote function is provided by stub functions. Instead of calling a remote function, the client calls a local client stub. On the server side, the server stub receives the request and calls the local function on that machine.
A Client stub (proxy) is a local function with the same interface as the remote procedure. It packages the parameters, sends them to the server, waits for a reply, extracts the results, and returns them to the caller. The client calls the stub as if it were a local function – because it is.
A Server stub (skeleton) registers the service, awaits incoming requests, extracts data from requests, calls the actual procedure, packages results, and sends responses back to clients. The server’s actual procedure doesn’t know it’s being called remotely.
Static stub generation uses an RPC compiler to read an interface definition and generate stub code before compilation. Dynamic stub generation creates proxy objects at runtime via reflection, eliminating the need for separate compilation. Languages like Java, Python, Ruby, and JavaScript support dynamic generation. Languages like C, C++, Rust, and Go require static generation.
Marshalling is the process of packaging data into a network message. Serialization converts data elements into a flat byte array suitable for transmission. These terms are often used interchangeably in RPC contexts, though marshalling may include additional metadata like function identifiers or version numbers.
RPC is synchronous by default: the client blocks until the server responds. This matches local procedure call behavior but can be problematic for slow services. Some frameworks offer asynchronous variants where the client continues execution and handles responses later through callbacks or futures.
Challenges in RPC
Partial failure is the fundamental challenge that distinguishes distributed systems from local computing. Local procedure calls either work or the entire process crashes. With remote calls, the server may fail, the network may be slow or unreliable, or responses may be lost. The client cannot tell the difference between a slow server and a failed one.
Parameter passing poses challenges because most parameters in programs are passed by value, which makes remote communication easier by sending the data. However, some parameters are passed by reference. A memory address is meaningless on a remote machine. The common solution is copy-restore (also called copy-in/copy-out): send the referenced data to the remote side, let the server function use a local reference, then send the possibly modified data back to the client. This can be expensive for large data structures.
Data representation differs across machines. Different processors use different byte ordering: big-endian systems store the most significant byte first, while little-endian systems store the least significant byte first. Intel and AMD processors use little-endian. Network protocols traditionally use big-endian (network byte order). Serialization formats must handle these differences so that data can be correctly interpreted on any machine.
Two strategies exist for handling data representation differences:
-
A canonical format requires all senders to convert data to a standard format before transmission. ONC RPC used this approach with XDR, which always uses big-endian byte order. The disadvantage is that conversion happens on every message, even when communicating between identical architectures.
-
Receiver-makes-right allows the sender to transmit data in its native format, and the receiver converts only if necessary. DCE RPC used this approach with NDR. This avoids unnecessary conversion when machines share the same architecture.
Interface Definition and Type Systems
Interface Definition Languages (IDLs) describe service interfaces in a language-neutral way. IDLs enable stub generation for multiple languages and provide a contract between clients and servers.
Schemas define data structure, field types, and encoding rules. They enable validation, code generation from definitions, and structured evolution as interfaces change over time. Common schema formats include Protocol Buffers, JSON Schema, Avro, and Thrift IDL.
Schemas define the data that gets serialized. Serialization is important for marshalling and for data storage in file systems or object stores.
Serialization formats use one of two approaches for encoding data:
-
Implicit typing transmits only values without type information, requiring the receiver to know the expected sequence of parameters. This is efficient but does not easily support optional parameters.
-
Explicit typing transmits type information with each value, making data self-describing but larger. JSON and XML use explicit typing.
Protocol Buffers (often called protobuf) is a binary serialization format developed at Google. Each field in a message has a unique numeric tag that identifies it in the binary format. Fields may be omitted and will take default values. Protocol Buffers is compact, efficient, strongly typed, and supports schema evolution because new fields can be added with new tags and old code simply ignores fields it does not recognize. It is typically 3 to 10 times smaller than equivalent JSON and parsed 10 to 100 times faster. Other binary serialization formats exist, such as Apache Avro (used in big data ecosystems) and Apache Thrift.
Versioning allows interfaces to change while maintaining compatibility. Forward compatibility means new code can read old data. Backward compatibility means old code can read new data. Strategies include adding optional fields, avoiding field removal, and maintaining multiple service versions simultaneously.
Versioning is critical in distributed environments because you cannot count on all clients getting updates. For serialized stored data, versioning is critical because new software may encounter older data formats.
Early RPC Systems
ONC RPC (Sun’s Open Network Computing RPC) was one of the first RPC systems to achieve widespread use, thanks to the popularity of Sun workstations and NFS. It introduced several concepts that became standard:
-
An interface definition with the rpcgen compiler that creates stubs.
-
Program numbers for service identification.
-
A name service to allow clients to look up program numbers to find the current port number for the service.
-
Versioning support to allow gradual client migration.
-
XDR (eXternal Data Representation) as a canonical binary encoding format.
DCE RPC (Distributed Computing Environment RPC), defined by the Open Group, addressed some ONC RPC limitations. It introduced:
-
UUIDs (128-bit Universal Unique Identifiers) to replace manually chosen program numbers and eliminate collision risk.
-
Cells as administrative groupings of machines with directory servers for location transparency.
-
Receiver-makes-right data representation (NDR), which allows the sender to transmit data in its native format so the receiver converts only if necessary, avoiding unnecessary conversion between machines with identical architectures.
DCE RPC became the foundation for Microsoft’s RPC implementation and later DCOM.
Service Discovery
Services must be located before they can be invoked. Service discovery mechanisms include name servers that map service names to network locations, DNS for domain-based resolution, configuration files with hardcoded endpoints, and service meshes that manage service-to-service communication with sidecars.
Security
RPC calls often traverse networks where messages could be intercepted or modified. Security concerns include authentication (verifying the identity of clients and servers) and encryption (preventing eavesdroppers from reading message contents). Modern RPC systems typically use TLS to address both concerns.
Reliability and Failure Handling
Timeouts prevent clients from waiting indefinitely for unresponsive servers. A timeout specifies a duration, such as waiting at most 5 seconds.
Deadlines specify an absolute time by which an operation must complete, such as completing by 10:30:00 UTC. Deadlines are propagated through call chains so downstream services know how much time remains. With deadline propagation, if insufficient time remains, a service can fail fast rather than starting work that it cannot complete.
Cancellation allows clients to terminate in-progress requests when results are no longer needed, freeing resources on both client and server.
Retries handle transient failures but must be used carefully.
-
Idempotent operations produce the same result when executed multiple times and are safe to retry. Examples include retrieving the contents of a shopping cart or setting a user’s name to a specific value.
-
Non-idempotent operations produce side effects if run multiple times and require careful handling to avoid duplicate processing. Examples include transferring money between accounts or adding an item to a shopping cart.
-
RPC frameworks may offer at-least-once or at-most-once semantics for function calls. With at-least-once semantics, the RPC library may resend the request if it does not receive a timely response, so a remote procedure may execute one or more times. With at-most-once semantics, the RPC system tries to ensure the server executes the procedure no more than once, typically by tagging requests with unique IDs and suppressing duplicates. Local procedure calls have exactly-once semantics, but achieving exactly-once for remote calls is extremely difficult because you cannot distinguish “the server never received the request” from “the server processed it, but the response was lost.”
-
Idempotency keys provide a solution for non-idempotent operations that must be retryable. The client generates a unique identifier (typically a UUID) for each logical request. If a retry occurs, the same key is sent. The server stores results keyed by this identifier and returns the cached result for duplicate requests, avoiding re-executing the operation. Design challenges include determining how long to store keys and results, ensuring identifiers are never reused, and handling storage persistence across system restarts.
Idempotency keys differ from request IDs in their purpose. An idempotency key ensures an operation executes at most once by deduplicating requests at the application level. A request ID (or correlation ID) is used for observability and debugging, allowing you to trace a single request through multiple services in logs and traces.
Exponential backoff progressively increases the delay between retry attempts to avoid overwhelming struggling services. The delay adds random jitter to prevent synchronized retry storms in which many clients retry at exactly the same time.
Circuit breakers prevent cascading failures by failing fast when a dependency is unhealthy. A circuit breaker monitors failure rates and trips when failures exceed a threshold, immediately failing subsequent requests without attempting the call. After a cooling-off period, the circuit breaker allows test requests through, and if these succeed, normal operation resumes. This prevents requests from piling up waiting for timeouts and gives the failing service time to recover.
Distributed Objects and Lifecycle Management
Traditional RPC invokes stateless procedures. Distributed objects extend RPC to support object-oriented programming with stateful remote objects. Objects have identity (each instance is distinct), state (maintained between method calls), and lifecycle (objects must be instantiated and destroyed). This introduces challenges around object identity, state management, and lifecycle.
Microsoft DCOM (Distributed Component Object Model) extended COM to support remote objects. It used a binary interface standard supporting cross-language interoperability and tracked object instances via interface pointer identifiers.
DCOM introduced surrogate processes, which are generic host processes on the server that dynamically load components as clients request them. This eliminated the need for developers to write custom server processes and provided fault isolation since component crashes affected only the surrogate, not the client.
DCOM faced challenges with configuration complexity, platform dependence (Windows only), and stateful object management. It evolved into COM+, which added transactions, security, and object pooling.
Java RMI (Remote Method Invocation) is Java’s native approach to distributed objects. Remote objects implement interfaces extending java.rmi.Remote and are registered with an RMI registry. Clients receive stubs that transparently handle marshalling and communication. RMI uses dynamic stub generation through reflection and Java’s native serialization. It remains relevant primarily in legacy enterprise Java applications.
Distributed Garbage Collection
Because objects are created when needed, they must be deleted when no longer needed. In a local program, the garbage collector tracks object references and frees memory when an object has no more references. In a distributed system, references span machine boundaries, and the garbage collector on one machine cannot see references held by another.
Reference counting tracks how many clients hold references to each object and deletes them when the count reaches zero. This fails when clients crash without releasing references, when messages are lost, or during network partitions. Systems like Python’s RPyC use reference counting. Microsoft DCOM also used reference counting but fell back to leases.
Leases provide time-limited references to objects. Clients must periodically renew leases before they expire. If a client crashes or loses connectivity, the lease expires, and the server can safely delete the object. Leases use explicit time bounds: “you own this reference until time T, unless you renew.”
Heartbeats (also called keep-alive pinging) require clients to periodically send messages listing all objects they are using. The server treats silence as abandonment. The key difference from leases is that heartbeats imply continuous proof of liveness: “I will assume you’re dead if I stop hearing from you.” Java RMI uses lease-based garbage collection with explicit dirty/clean messages. Microsoft DCOM/COM+ uses keep-alive pinging, where clients periodically send ping sets listing active objects.
Both approaches accept that objects might occasionally be deleted prematurely if network delays prevent timely renewal, or kept alive slightly longer than necessary. These tradeoffs are generally acceptable given the alternative of memory leaks from crashed clients.
Why Web Services
Traditional RPC systems (ONC RPC, DCE RPC, DCOM) were designed for enterprise networks. They faced challenges on the internet: proprietary binary formats created interoperability problems across organizations, dynamic ports were blocked by firewalls, stateful objects complicated replication and load balancing, and strict request-response patterns could not handle streaming or notifications.
Web services emerged to solve these problems. By using HTTP as the transport, services could work through firewalls and proxies. By using text-based formats such as XML and JSON, services can achieve interoperability across different platforms and languages. The web’s existing infrastructure for security, caching, and load balancing could be reused.
XML-RPC was one of the first web services, marshalling data into XML messages transmitted over HTTP. It was human-readable, used explicit typing, and worked through firewalls. It failed to gain widespread adoption due to limited data types, lack of extensibility, XML’s verbosity, and missing features for interface definitions and schemas.
SOAP (formerly Simple Object Access Protocol) extended XML-RPC with user-defined types, message routing, and extensibility. SOAP supported various interaction patterns: request-response, request-multiple-response, asynchronous notification, and publish-subscribe. WSDL (Web Services Description Language) served as SOAP’s IDL for describing data types, messages, operations, and protocol bindings. SOAP declined due to complexity, heavyweight frameworks, interoperability issues between implementations, and verbosity.
REST
REST (Representational State Transfer) treats the web as a collection of resources rather than remote procedures. Resources are data identified by URLs. HTTP methods operate on resources using CRUD operations (Create, Read, Update, Delete):
-
POST creates new resources and is not idempotent because repeating the same POST may create multiple resources
-
GET retrieves resources and is idempotent because repeating the same GET returns the same result
-
PUT updates entire resources and is idempotent because repeating the same PUT produces the same state
-
PATCH updates parts of resources and may or may not be idempotent depending on the implementation
-
DELETE removes resources and is idempotent because repeating the same DELETE on a removed resource has no additional effect
REST is stateless, meaning each request contains all information needed to process it. Servers do not maintain client session state between requests. This improves scalability but shifts state management to clients or to the server application using session IDs and data stores.
REST typically uses JSON for data serialization. JSON is human-readable, simple, and has native browser support. However, it lacks schemas, requires parsing overhead, and has larger message sizes than binary formats.
REST emphasizes discoverability, with responses containing links to related resources that allow clients to navigate the API. REST is well-suited for CRUD operations but can be awkward for complex operations that do not map cleanly to resources.
gRPC
gRPC was developed at Google and uses Protocol Buffers for serialization and HTTP/2 for transport. HTTP/2 allows multiple requests and responses to flow over a single TCP connection simultaneously, uses compact binary headers, and supports server-initiated messages.
gRPC supports four communication patterns: unary (traditional request-response), server streaming (one request, multiple responses), client streaming (multiple requests, one response), and bidirectional streaming (multiple requests and responses interleaved).
gRPC includes built-in support for deadlines, cancellation, and metadata propagation. It has become the dominant choice for internal microservice communication.
gRPC is preferred for internal service-to-service communication, performance-critical paths, streaming requirements, and strongly typed contracts. REST over HTTP remains the default for public APIs, browser-based applications, simple CRUD operations, and situations where human readability is important.
Service-Oriented Architecture and Microservices
Service-Oriented Architecture (SOA) is an architectural approach that treats applications as integrations of network-accessible services with well-defined interfaces. Services are designed to be reusable components that communicate through standardized protocols. SOA emerged in the late 1990s and early 2000s alongside SOAP. SOA emphasized loose coupling between services, allowing them to evolve independently.
Microservices are a modern evolution of SOA principles. Both approaches structure applications as collections of services with well-defined APIs. The key differences are in implementation philosophy:
-
SOA typically used heavyweight middleware (enterprise service buses) for communication and orchestration, while microservices favor lightweight protocols like REST and gRPC with direct service-to-service communication.
-
SOA services were often large and shared databases, while microservices emphasizes smaller services where each service owns its own data.
-
SOA governance was centralized, while microservices favor decentralized governance where teams choose their own technologies.
Microservices can be viewed as SOA implemented with modern tooling and a bias toward simplicity. The core insight is the same: decompose applications into independent services that can evolve separately.
Benefits of microservices include independent scaling of components, technology flexibility per service, fault isolation, and parallel development by multiple teams. Drawbacks include distributed system complexity, network overhead, complex debugging and testing, operational burden, and eventual consistency challenges.
The modular monolith alternative keeps a single deployable unit while enforcing internal module boundaries. It captures organizational benefits like clearer ownership and cleaner interfaces without paying the cost of distributed communication.
Microservices make sense when independent change is the real problem: different teams, different release schedules, or very different scaling requirements. They are a poor fit when the main problem is product iteration, when teams are small, or when operational overhead is unaffordable.
Observability
Observability is the ability to reconstruct what happened from the signals a system emits. When requests cross multiple services, failures and slowdowns rarely show up where the problem actually is.
Logs record discrete events and are most useful when they include a request ID.
Traces record the path of a single request through the system. A trace consists of spans, where each span represents one operation and includes timing and parent-child relationships. Tracing answers “where did the time go?” across multiple services.
Request IDs (also called correlation IDs or trace IDs) are generated when a request enters the system and propagated through all downstream calls. The ID is included in logs and trace context to reconstruct the full request path during debugging. Request IDs are for observability, helping you understand what happened. They differ from idempotency keys, which ensure operations are not duplicated.
Circuit breakers prevent cascading failures by failing fast when a dependency is unhealthy, rather than letting requests pile up waiting for timeouts.
General Summary
RPC transforms low-level network communication into familiar procedure call interfaces through generated stubs, marshalling, and interface definitions. The evolution from traditional RPC through SOAP and REST to gRPC reflects the changing landscape of distributed computing: from enterprise networks to the internet, from XML to JSON and Protocol Buffers, from simple request-response to streaming.
Communication in distributed systems requires careful attention to reliability and failure handling. Timeouts prevent resource exhaustion. Retries handle transient failures. Idempotency and idempotency keys make operations safe to retry. Circuit breakers prevent cascading failures. Observability with request IDs and tracing enables understanding system behavior.
The right technology choice depends on context: REST for public APIs and simplicity, gRPC for internal communication and performance, with many systems using multiple technologies for different purposes.
Microservices are a tool for organizational scaling, not a technical silver bullet. They make sense when independent deployment is the real problem, but introduce significant operational complexity.
What You Don’t Need to Study
-
The differences between big-endian and little-endian (just know that there are differences in encoding data)
-
Any specific port numbers or network configurations
-
Historical dates or version numbers
-
Specific programming language APIs or library names beyond the concepts they illustrate
-
Modern RPC frameworks other than gRPC (Apache Thrift, Smithy, Finagle implementation details)
-
Binary serialization formats other than Protocol Buffers (Apache Avro, Thrift, Cap’n Proto, FlatBuffers)
-
Java RMI API specifics (registry methods, UnicastRemoteObject details, rmic tool)
-
DCOM specifics beyond understanding its goals, surrogate processes, and garbage collection approach
-
Security implementations (you don’t need to know how TLS works; just that using HTTPS provides a secure transport)
-
XML-RPC message format syntax
-
SOAP message structure or WSDL syntax
-
Specific company examples for RPC or web services deployments (Twilio Segment, Amazon Prime Video migrations)
-
Specific HTTP header names or status codes beyond the basic methods (GET, POST, PUT, PATCH, DELETE)
-
Details of HTTP/2 implementation (like frame types, flow control mechanisms, HPACK compression) - just know it’s binary and offers multiplexing communication channels)
-
Traces, Health Checks, Metrics
-
GraphQL
-
Message queues (these will be covered in later lectures)