Remote Procedure Calls and Web Services

Core RPC Concepts

Asynchronous RPC: An RPC variant where the client continues execution immediately after making a call and handles the response later through callbacks or futures, rather than blocking until the server responds.
Big-endian: A byte ordering convention where the most significant byte is stored first, used by network protocols (network byte order) and some processors.
Client stub (proxy): A local function with the same interface as a remote procedure that handles marshalling parameters, sending requests, and returning results to the caller.
Copy-restore (copy-in/copy-out): A technique for handling pass-by-reference parameters in RPC by sending the referenced data to the server, allowing local modification, and sending the modified data back to the client.
Dynamic stub generation: Creation of proxy objects at runtime using reflection, eliminating the need for a separate stub compilation step.
Little-endian: A byte ordering convention where the least significant byte is stored first, used by Intel and AMD processors.
Marshalling: The process of packaging data into a network message for transmission, often including additional metadata like function identifiers or version numbers.
Partial failure: The fundamental challenge of distributed systems where some components may fail while others continue operating, unlike local systems where failures typically affect the entire process.
Remote Procedure Call (RPC): A mechanism that allows a process on one machine to invoke a function on another machine with procedure call semantics, hiding network communication details inside stub functions.
Server stub (skeleton): The server-side component that registers the service, awaits requests, extracts data, calls the actual procedure, and packages results for the client.
Static stub generation: Use of an RPC compiler to read an interface definition and generate stub code before program compilation.
Synchronous RPC: The default RPC execution model where the client blocks and waits for the server to respond before continuing, matching the behavior of local procedure calls.

RPC Systems and Implementations

DCE RPC: The Distributed Computing Environment RPC defined by the Open Group that introduced UUIDs for service identification, cells for administrative grouping with directory servers, and receiver-makes-right data representation (NDR) to avoid unnecessary conversions between identical architectures.
DCOM (Distributed Component Object Model): Microsoft’s distributed object system that extended COM to support remote objects using a binary interface standard for cross-language interoperability, interface pointer identifiers for object identity, and keep-alive pinging for distributed garbage collection.
Java RMI (Remote Method Invocation): Java’s native distributed object system that uses dynamic stub generation, the RMI registry for service discovery, and lease-based distributed garbage collection for remote object lifecycle management.
ONC RPC: Sun’s Open Network Computing RPC, one of the first widely-used RPC systems, that introduced interface definition with rpcgen, program numbers for service identification, versioning support, and XDR (eXternal Data Representation) for canonical binary encoding.
RPyC: A Python RPC framework that uses dynamic stub generation through reflection to create proxy objects at runtime.

Data and Interfaces

Backward compatibility: The ability of old code to successfully read and process data produced by new code, typically achieved by treating unknown fields as optional.
Canonical format: A data representation strategy where all senders convert data to a standard format before transmission, requiring conversion on every message even between identical architectures.
Explicit typing: A serialization approach that transmits type information with each value, making data self-describing but larger, as used by JSON and XML.
Forward compatibility: The ability of new code to successfully read and process data produced by old code, typically achieved by ignoring unknown fields.
Implicit typing: A serialization approach that transmits only values without type information, requiring the receiver to know the expected data format, which is efficient but inflexible.
Interface Definition Language (IDL): A language-neutral specification format that describes service interfaces, enabling stub generation for multiple programming languages and providing a contract between clients and servers.
JSON (JavaScript Object Notation): A human-readable text-based data serialization format that is simple and has native browser support but lacks schemas and requires parsing overhead.
Protocol Buffers: A binary serialization format developed by Google that is compact, efficient, strongly-typed, and includes schema evolution support through numeric field tags.
Receiver-makes-right: A data representation strategy where the sender transmits data in its native format and the receiver converts only if necessary, avoiding unnecessary conversion between machines with identical architectures.
Schema: A definition of data structure, field types, and encoding rules that enables validation, code generation, and structured evolution as interfaces change.
Serialization: The conversion of data elements into a flat byte array suitable for network transmission.
XML (eXtensible Markup Language): A verbose but human-readable text format using tags to describe data structure, supporting schemas through XSD files for validation but slow to parse.

Reliability and Failure Handling

At-least-once semantics: RPC execution semantics where a remote procedure may execute one or more times because the RPC library may resend requests if it does not receive a timely response.
At-most-once semantics: RPC execution semantics where the RPC system tries to ensure the server executes the procedure no more than once, typically by tagging requests with unique IDs and suppressing duplicates.
Circuit breaker: A mechanism that prevents cascading failures by monitoring failure rates and failing fast when a dependency is unhealthy, rather than letting requests pile up waiting for timeouts.
Deadline: An absolute time by which an operation must complete, propagated through call chains so downstream services know how much time remains.
Exponential backoff: A retry strategy that progressively increases the delay between retry attempts to avoid overwhelming struggling services, often with randomized jitter to prevent synchronized retry storms.
Idempotency key: A unique identifier generated by the client for each logical request, allowing the server to deduplicate retries and return cached results for non-idempotent operations.
Idempotent operation: An operation that produces the same result when executed multiple times, making it safe to retry without causing duplicate side effects.
Non-idempotent operation: An operation that produces side effects when run multiple times, requiring careful handling such as idempotency keys to avoid duplicate processing.
Retry: The practice of re-attempting failed operations to handle transient failures, but requiring careful consideration of idempotency to avoid duplicate processing.
Timeout: A maximum duration to wait for an operation to complete before giving up, preventing clients from waiting indefinitely for unresponsive servers.

Service Lifecycle and Discovery

Distributed garbage collection: Mechanisms for managing the lifecycle of remote objects, typically using leases or heartbeats to determine when objects are no longer referenced and can be safely deleted.
Heartbeat: A periodic message sent by a client to indicate it is still using remote resources, with the server treating silence as abandonment.
Lease: A time-limited reference to a remote object that must be periodically renewed by the client, automatically expiring if the client crashes or loses connectivity.
Reference counting: A distributed garbage collection approach that tracks how many clients hold references to each object, deleting objects when the count reaches zero, but vulnerable to crashes and message loss.
Service discovery: Mechanisms for locating services, including name servers, DNS, configuration files, and service meshes.
Versioning: The practice of allowing interfaces to change while maintaining compatibility with existing clients through strategies like optional fields and avoiding field removal.

Web Services and Protocols

CRUD: An acronym for Create, Read, Update, Delete, representing the four basic operations for persistent storage that map to HTTP methods in REST APIs.
DELETE: An HTTP method used in REST APIs to remove a resource, which is idempotent because repeating the same DELETE on a removed resource has no additional effect.
GET: An HTTP method used in REST APIs to retrieve a resource, which is idempotent because repeating the same GET returns the same result without side effects.
gRPC: A modern RPC framework developed at Google that uses Protocol Buffers for serialization and HTTP/2 for transport, supporting unary calls, streaming, deadlines, and cancellation.
POST: An HTTP method used in REST APIs to create new resources, which is not idempotent because repeating the same POST may create multiple resources.
PUT: An HTTP method used in REST APIs to update or replace an entire resource, which is idempotent because repeating the same PUT produces the same state.
REST (Representational State Transfer): An architectural style that treats the web as a collection of resources identified by URLs, using HTTP methods to operate on those resources in a stateless manner.
SOAP (formerly Simple Object Access Protocol): An XML-based web service protocol that extended XML-RPC with user-defined types, message routing, and extensibility, but declined due to complexity and verbosity.
WSDL (Web Services Description Language): An XML-based interface definition language for SOAP services that describes data types, messages, operations, and protocol bindings.
XML-RPC: An early web service protocol that marshaled data into XML messages transmitted over HTTP, featuring simplicity but limited adoption due to restricted data types and lack of extensibility.

Architecture and Design

Microservices: An architectural style that structures applications as collections of small, independently deployable services, each owning its data and communicating via well-defined APIs.
Modular monolith: A single deployable unit that enforces internal module boundaries to capture organizational benefits of microservices without the cost of distributed communication.
Service-Oriented Architecture (SOA): An architectural approach that treats applications as integrations of network-accessible services with well-defined interfaces, emphasizing loose coupling and reusability.

Observability

Correlation ID: See Request ID.
Log: A record of discrete events in a system, most useful when it includes a request ID to follow one request across multiple services.
Metric: A summary of system behavior over time, such as request rate, error rate, or latency percentiles, that indicates what changed and when but not what happened for a specific request.
Observability: The ability to reconstruct what happened in a distributed system from the signals it emits, typically through logs, metrics, and traces.
Request ID (correlation ID): An identifier generated when a request enters the system and propagated through all downstream calls to reconstruct the full request path during debugging.
Trace: A record of the path of a single request through a distributed system, made of spans that show where time was spent across multiple services.

Remote Procedure Calls and Web Services

Core RPC Concepts

RPC Systems and Implementations

Data and Interfaces

Reliability and Failure Handling

Service Lifecycle and Discovery

Web Services and Protocols

Architecture and Design

Observability

Back to CS 417 Documents