Message Queues and Event Streaming
- Message broker
- A service that sits between producers and consumers, storing messages durably and allowing each side to operate independently at its own pace.
- Producer (publisher)
- A process that writes messages to a message broker without knowing which consumers will receive them.
- Consumer (subscriber)
- A process that reads messages from a message broker without knowing which producers wrote them.
- Topic
- A named category or stream of messages; producers write to a topic and consumers subscribe to one.
- AMQP (Advanced Message Queuing Protocol)
- The protocol underlying RabbitMQ, which defines exchanges, queues, and binding-based routing.
- Exchange
- In RabbitMQ, the component that receives messages from producers and routes them to one or more queues according to binding rules.
- Binding
- In RabbitMQ, a configured link from an exchange to a queue; optionally carries a binding key that the exchange matches against incoming message routing keys to decide whether to route a message to that queue.
- Fanout exchange
- A RabbitMQ exchange type that broadcasts every message to all queues bound to it, ignoring routing keys.
- Direct exchange
- A RabbitMQ exchange type that routes a message to queues whose binding key exactly matches the message’s routing key.
- Topic exchange
- A RabbitMQ exchange type that routes messages based on wildcard pattern matching against routing keys.
- Message acknowledgment
- A signal sent by a consumer to the broker confirming successful message processing; the broker retains unacknowledged messages and redelivers them on consumer failure.
- At-most-once delivery
- A messaging guarantee in which a message is sent once and not retried; fast, but the message is lost if the consumer is unavailable or a failure occurs.
- At-least-once delivery
- A messaging guarantee in which the system retries until it receives an acknowledgment; no message is lost, but duplicates are possible. Consumers must be idempotent or deduplicate.
- Exactly-once effect
- The observable guarantee that each message produces its intended effect exactly once, achieved by combining at-least-once delivery with idempotent or transactional processing at the consumer. Requires cooperation from the source, the broker, and the output destination; not a property of the broker alone.
- Backpressure
- The condition in which a consumer or downstream system cannot keep up with the rate at which a producer generates data; addressed through buffering (absorb bursts in a queue), dropping (discard when the buffer is full), or flow control (signal the producer to slow down).
- Partition
- In Kafka, an ordered log within a topic that grows only by appending; once a record is written it is never modified or overwritten. Each partition is stored on a single broker and independently replicated.
- Offset
- In Kafka, a sequential integer that uniquely identifies a record’s position within a partition; consumers track their own offsets to control their position in the log.
- Consumer group
- In Kafka, a named set of consumers that collectively consume a topic; each partition is assigned to exactly one group member at a time, distributing work across the group.
- Log compaction
- A Kafka retention policy in which the broker keeps only the most recent record for each key, discarding older records with the same key during background processing. Produces a log that always contains the latest value per key rather than a complete history.
- Event sourcing
- An architectural pattern in which all state changes are recorded as an ordered log of events, allowing state to be reconstructed by replaying the log.
- Sequential I/O
- Disk access that reads or writes a continuous stream of data without seeking; orders of magnitude faster than random I/O and the basis for Kafka’s performance.
- Page cache
- An operating system mechanism that caches recently accessed disk blocks in RAM; Kafka exploits this to serve reads at memory speeds without a separate in-memory cache.
- In-sync replica (ISR)
- In Kafka, a follower that is sufficiently caught up with the leader; the
acks=allsetting requires all ISR members to confirm a write before the producer receives an acknowledgment. - Event time
- The timestamp at which an event actually occurred, as recorded by the source system; the correct basis for time-based aggregations.
- Processing time
- The timestamp at which the stream processing system receives and processes an event; easier to implement than event time but incorrect when data arrives late or out of order.
- Tumbling window
- A fixed-size, non-overlapping time window; each event belongs to exactly one window.
- Sliding window
- A fixed-size time window that advances by a configurable step smaller than the window size, producing overlapping windows in which some events appear multiple times.
- Session window
- A time window that groups events separated by less than a configurable inactivity gap; window size is not fixed and reflects natural bursts of user activity.
- Watermark
- A progress estimate in stream processing: the system treats events earlier than the watermark timestamp as sufficiently unlikely to arrive that it will wait for them no longer, and uses it to decide when to close a window and emit results. Derived by subtracting a configured lag from the latest event timestamp seen.
- Micro-batch
- The execution model used by Spark Structured Streaming, in which events are collected into small batches that are processed as a series of short batch jobs rather than one event at a time.
- Unbounded table
- The Spark Structured Streaming abstraction that treats an incoming stream as a table that grows indefinitely; users write queries against it using standard Spark APIs.
- Checkpoint (streaming)
- A durable snapshot of a stream processor’s progress (offsets) and accumulated state, saved periodically so the system can recover from a failure by replaying from the last checkpoint rather than starting over. Provides at-least-once semantics; exactly-once additionally requires an idempotent output destination.
- Sink
- In stream processing, the output destination where results are written; must support idempotent writes or transactional commits to achieve exactly-once semantics end-to-end.
- Output mode
- In Spark Structured Streaming, the policy for what portion of the result table is written to the sink on each trigger: append (new rows only), complete (full result table), or update (changed rows only).
Content Delivery Networks
- Flash crowd
- A sudden large surge in demand for a resource, typically caused by a news event or popular content release, that overwhelms the capacity of a single origin server.
- Origin server
- The content provider’s authoritative server; the source of truth for all content in a CDN.
- Edge server
- A CDN server located close to end users, typically inside ISPs or at internet exchange points, that serves cached content to reduce latency and origin load.
- Parent server
- A CDN server in the tier between edge servers and the origin, used as a shared cache for edge servers in a region to reduce repeated fetches from the origin.
- Push CDN
- A CDN model in which the content provider explicitly uploads content to CDN storage nodes ahead of demand.
- Pull CDN
- A CDN model in which edge servers fetch content from the origin on the first request and cache it for subsequent requests.
- CNAME (Canonical Name)
- A DNS record type that maps one hostname to another; used by CDN customers to delegate their domain name to the CDN’s DNS infrastructure.
- Dynamic DNS
- A DNS server that returns different IP addresses for the same hostname based on real-time factors such as user location, server load, and network conditions.
- Tiered distribution
- The CDN content lookup strategy in which a cache miss at the edge triggers a search through progressively higher cache tiers (regional peers → parent → origin) before reaching the origin.
- Cache-Control
- An HTTP response header that instructs caches (browsers, proxies, CDN edge servers) how to store and validate a resource; directives include
max-age,no-store,no-cache,public, andprivate. - Edge Side Includes (ESI)
- A markup language that allows a CDN to assemble a page from independently cached fragments at the edge, enabling partial caching of pages that include some dynamic content.
- HTTP Live Streaming (HLS)
- An Apple-developed protocol that delivers video by breaking it into short segments served as regular HTTP files; each segment can be cached and served by CDN edge servers.
- MPEG-DASH (Dynamic Adaptive Streaming over HTTP)
- An open international standard for adaptive bitrate video streaming, used by Netflix, YouTube, Amazon Prime Video, and most other major platforms on non-Apple devices. Like HLS, it delivers video as short HTTP segments with a manifest file; unlike HLS, it is codec-agnostic and not controlled by a single company.
- Adaptive bitrate (ABR)
- A video delivery technique that encodes content at multiple quality levels; the player automatically selects the appropriate level based on current network conditions, allowing graceful degradation on slow connections.
- Overlay network
- An application-level network built on top of the public internet, used by CDNs to route traffic between nodes along paths selected by measured performance rather than BGP routing policy.
- Anycast
- A network addressing scheme in which multiple servers worldwide share a single IP address; BGP routing directs each client’s connection to the nearest server advertising that address, based on routing path length.
- BGP (Border Gateway Protocol)
- The routing protocol that governs how traffic flows between autonomous networks (ISPs) on the internet; used both by CDNs for anycast routing and as a source of topology information for DNS-based routing decisions.
- Distributed denial-of-service (DDoS)
- An attack that floods a target with traffic from many sources to exhaust its bandwidth or processing capacity; CDNs mitigate DDoS by absorbing attack traffic across their distributed infrastructure.
- TLS termination
- Decrypting an HTTPS connection at a CDN edge server before forwarding the request to the origin; reduces handshake latency for users and offloads cryptographic work from the origin.
BitTorrent
- Swarm
- In BitTorrent, the collection of all peers currently downloading or seeding a particular file.
- Piece
- In BitTorrent, a fixed-size chunk of the file being distributed; each piece has an associated hash for integrity verification.
- Seeder
- A BitTorrent peer that has the complete file and is uploading pieces to other peers.
- Leecher
- A BitTorrent peer that is downloading a file and has not yet acquired all pieces; leechers upload the pieces they have while continuing to download.
- Rarest-first
- The BitTorrent piece selection strategy in which a peer preferentially downloads pieces that the fewest other peers currently have, ensuring that rare pieces are quickly distributed through the swarm.
- Tracker
- In BitTorrent, a server that maintains lists of peers participating in a swarm and responds to peer discovery queries.
Edge Computing
- V8 isolate
- A lightweight sandbox used by Cloudflare Workers to execute JavaScript; isolates provide memory isolation between concurrent workers without the overhead of separate processes, and can be initialized in microseconds.
- Edge computing
- The practice of executing application logic on CDN edge nodes close to users rather than at a centralized origin, reducing round-trip latency for dynamic operations.