Security in Distributed Systems

Distributed systems force us to think about security differently from monolithic applications. In a single program on one machine, an operating system tracks identities and enforces access permissions. In a distributed system, every network hop is a boundary, every service call is a trust decision, and every credential can become a stepping stone for moving through the rest of the system.

Computer security is a huge subject. For distributed systems, the useful subset is the part that changes design choices: how services protect data in transit, how they authenticate each other, how they authorize requests, and how identities and trust are managed across many machines. The interesting part is not the mathematics in isolation but the way these mechanisms shape system architecture.

Historically, many systems treated the internal network as trusted. Firewalls and network location were expected to provide most of the protection. That model does not fit cloud systems, microservices, multi-tenant infrastructure, mobile clients, remote workers, or third-party APIs. Modern distributed systems do not treat network location as evidence of trust. Services may be compromised, and every request must be evaluated on its own merits.

Security goals

Before looking at mechanisms, it helps to separate the main goals.

Confidentiality means keeping data secret from unauthorized parties.

Integrity means detecting unauthorized modification.

Authentication means establishing who is on the other end of a connection or who created a message.

Authorization means deciding what an authenticated principal is allowed to do. A principal is any entity that can be identified and granted access: a user, a service, a device, or a background process.

Non-repudiation means being able to prove that a principal created or approved some data. In practice, this usually comes from digital signatures and audit logs.

One distinction worth keeping straight: confidentiality and integrity are separate properties. An encrypted message can be modified in transit without the recipient knowing if there is no integrity check. Conversely, a message can be fully visible to an attacker but arrive unmodified. Encryption alone is not enough.

Why security looks different in distributed systems

These goals apply to any computing system. What changes in a distributed system is how much harder they are to achieve and where the failure points appear.

A monolithic application running on one machine makes many implicit trust decisions. The OS enforces process boundaries. The same binary handles all operations. There is one identity model and typically one administrator.

A distributed system removes most of those implicit guarantees.

There is no single trust boundary. Clients call gateways, gateways call services, services call other services, and data may cross clouds, regions, and organizations.

Identity is not just about users. Services, workloads, containers, batch jobs, and background agents all need identities. A request that arrives at the payments service may have passed through four other services. Each of those services made a trust decision. Any of them could be compromised or misconfigured.

Authorization is not a one-time check. A request may cross many services, each of which needs to make a local decision about what the caller is allowed to do.

Secrets and keys are distributed across many machines. This creates problems of provisioning, rotation, revocation, and blast radius (how far the damage spreads if a credential is compromised).

One stolen authentication token or one misconfigured service can open a path through the rest of the system.

Distributed systems security ends up being mostly about identity, trust, and policy enforcement across boundaries.

The threat landscape

It helps to be concrete about what can go wrong.

Eavesdropping: An attacker captures unencrypted traffic between services.

Tampering: An attacker modifies messages in transit.

Replay: An attacker captures a valid message and retransmits it later. The original message was authentic and unmodified, so a simple integrity check does not catch it.

Stolen credentials: Authentication tokens, API keys, or certificates are extracted from a compromised machine, source repository, or container image.

Service impersonation: An attacker convinces one service that it is talking to a legitimate peer and receives the data or triggers the operations that a legitimate caller would.

Confused deputy: A service with broad permissions is tricked into using those permissions on behalf of an attacker. For example, a reporting service that can read any order might be induced by a crafted request to return orders the requesting user is not authorized to see.

Lateral movement: Once one component is compromised, the attacker moves through the system using the credentials and access that component holds.

Over-privileged services: A service has broader cloud or data permissions than it needs, so a compromise of that service becomes a much larger incident than it should be.

Broken authorization: A service authenticates callers correctly but fails to check whether the specific caller is allowed to perform the specific operation on the specific resource.

Leaked secrets: Credentials end up in configuration files, container images, source repositories, or log output.

Limiting permissions to only what each service needs, a principle called least privilege, reduces how much damage a compromised component can do.

Each threat maps directly to a design choice:

Eavesdropping: Encrypt all traffic, including traffic inside the cluster.
Replay: Include freshness mechanisms in every authenticated message or request.
Stolen credentials: Prefer short-lived tokens that expire quickly.
Lateral movement: Apply least privilege and segment the network so a compromised component can only reach what it needs.

The cryptographic foundation

Securing a distributed system requires a set of cryptographic building blocks. This section recaps the key ideas quickly, focusing on the vocabulary and on why each piece exists.

Symmetric encryption

Symmetric encryption uses the same secret key for both encryption and decryption. Alice and Bob share key K. Alice encrypts with K, and Bob decrypts with K.

Symmetric encryption is fast and efficient, which makes it the right choice for bulk data. The problem is key distribution: how do Alice and Bob obtain a shared key without exposing it to an attacker? In a large distributed system, pairwise manual key management does not scale.

While, symmetric encryption is the standard choice for bulk traffic, it requires a shared key to already exist. We need something else to establish that key in the first place.

Asymmetric cryptography

Asymmetric cryptography, also called public key cryptography, uses a pair of mathematically related keys: a public key that can be shared openly and a private key that must remain secret.

The relationship between the keys is what makes public key cryptography work. A message encrypted with the public key can be decrypted only with the corresponding private key.

This gives us two important capabilities.

A sender can encrypt data using a recipient’s public key, so only the holder of the corresponding private key can decrypt it. In distributed systems, this is usually not used to encrypt large amounts of application data directly. Asymmetric operations are relatively expensive, so protocols typically use them during setup to establish shared symmetric keys. Those shared keys are then used to encrypt application data with a symmetric algorithm.
The holder of a private key can create a digital signature on data. Anyone with the corresponding public key can verify that signature. This does not mean “encrypting the data with the private key.” A signature is a separate operation that produces a verifiable value derived from the data and the private key. If the data changes, signature verification fails. If the signature verifies correctly, the receiver knows that the signer possessed the private key and that the data was not modified after it was signed.

This second capability is what makes origin authentication possible without a pre-shared secret between every sender and receiver. A service can prove that it controls a private key corresponding to a known public key without ever revealing that private key.

Asymmetric cryptography therefore makes key establishment and authentication practical without pre-shared secrets. This is the foundation of certificates, public key infrastructure, and many distributed identity systems.

Hashes, MACs, and digital signatures

A cryptographic hash function takes input of any size and produces a fixed-size digest. A good hash function is deterministic, fast to compute, hard to invert, and hard to collide on purpose.

Hashes are useful for fingerprints and integrity checks, but a plain hash does not authenticate the sender. Anyone who changes the message can recompute the hash.

A message authentication code (MAC) adds a secret key. The sender computes a keyed digest over the message, and the receiver verifies it using the same shared secret. A MAC therefore provides both integrity and origin authentication, but only between parties that already share that secret key.

A digital signature provides similar assurances without requiring a shared secret between sender and receiver.

The sender first computes a hash of the message. The sender then uses its private key to produce a signature over that hash. The receiver recomputes the hash of the received message and uses the sender’s public key to verify the signature. If verification succeeds, two things follow:

The message was signed by someone who possessed the private key.
The message was not modified after it was signed, since even a small change would produce a different hash and cause verification to fail.

This is why digital signatures provide both origin authentication and integrity.

The distinction among these mechanisms is worth keeping clear:

Mechanism	Detects modification	Authenticates sender	Requires shared secret
Hash	Yes, against accidental or unauthenticated change	No	No
MAC	Yes	Yes	Yes
Digital signature	Yes	Yes	No

Digital signatures can also support stronger forms of accountability than MACs, because any verifier with the public key can check the signature. With good key management, identity binding, and audit records, they can support non-repudiation.

Freshness and replay resistance. Integrity checks tell you a message was not modified. They tell you nothing about whether it is a copy of an older message. An attacker who captures a valid signed “transfer $1000” request can replay it, and the signature check still passes because the original message was genuine. Distributed protocols combine integrity with freshness mechanisms. A nonce is a random value generated fresh for each session or exchange. The receiver checks that the nonce in an incoming message matches the one issued for the current session, and rejects any message that does not, so a replayed message from an earlier session fails even if its signature is valid. Timestamps, sequence numbers, and explicit expiration times work similarly. Replay attacks are easy to miss if you think only in terms of encryption and integrity.

TLS

Transport Layer Security, or TLS, is where these pieces come together into a protocol for securing communications over an untrusted network. It protects HTTPS traffic, gRPC connections, and service-to-service calls in most modern systems.

During the handshake, the server presents a certificate. The client validates the certificate against a trusted certificate authority and verifies that the server knows the corresponding private key. The two parties then use public-key mechanisms for authentication and key agreement, from which symmetric keys for the session are derived. All subsequent data is encrypted and integrity-protected using those symmetric keys.

The channel you get provides confidentiality, integrity, and server authentication. With appropriate configuration, TLS also authenticates the client.

TLS is a protocol that assembles public-key cryptography, symmetric encryption, hashes, signatures, and certificate validation into a secure channel.

Certificates and public key infrastructure

A public key alone is not enough. We also need a way to bind that key to an identity.

A certificate is a signed statement asserting that a particular public key belongs to a particular subject. A certificate authority, or CA, issues the signature. If you trust the CA and the certificate is valid, you can trust the binding between key and identity.

In browser-based HTTPS, the subject is usually a domain name. In distributed systems, the subject may be a service name, a workload identifier, or a namespace within a cluster.

Trust is organized as a chain. A service validating a certificate trusts a root CA, which may sign intermediate CAs, which sign the certificates issued to specific services, workloads, or users. This allows large public key infrastructure (PKI) deployments to be managed without every system needing to directly trust every other system.

Certificate expiration, revocation, and rotation are security design constraints, not just operational chores. If certificates cannot be replaced quickly and automatically, the system becomes brittle when a compromise occurs.

Mutual TLS

Standard TLS authenticates only the server. When a browser connects to a website, it validates the server’s certificate, but the server typically does not require the browser to present one in return.

Mutual TLS, or mTLS, authenticates both sides of the connection. Each party presents a certificate. Each verifies the other’s certificate against a trusted CA and that each side has the private key belonging to its certificate. Both sides come away with a cryptographically verified identity for the other, not just an IP address or an unverified claim in a request header.

In a microservice system, mTLS changes the trust model substantially. Consider the payments service receiving a connection. Without mTLS, it can see that the connection arrived from an IP address inside the cluster. With mTLS, it can verify that the caller is specifically the orders service, holding a certificate issued by the same CA that manages workload identities for this deployment. When the question is whether to execute a payment, that is a meaningful distinction.

mTLS turns “inside the cluster” from a passive trust assumption into a verifiable property of each individual connection.

The main operational challenge with mTLS is certificate management at scale. If issuance and rotation are manual or fragile, mTLS becomes a maintenance burden. Service meshes, covered later in these notes, exist largely to solve this problem.

Authentication and authorization

Authentication asks: who are you?

Authorization asks: what are you allowed to do?

These are different questions, and a system can answer the first correctly while getting the second completely wrong.

Consider an API endpoint that returns order details. A request arrives with a valid token. Authentication succeeds. But if the service does not also check whether this authenticated caller is allowed to read this specific order for this specific customer, an attacker with any valid token can read any order by changing the order ID in the request. Authentication passed. Authorization failed.

This failure pattern is called broken object-level authorization, and it appears consistently in security audits of real systems. It is not limited to naive implementations. It occurs in systems built by experienced engineers because the authentication check is obvious and centralized, while the authorization check requires knowing the business context of each resource and operation being accessed.

In distributed systems, authorization must happen at multiple layers.

The edge checks whether a client is allowed to call this API at all.

A backend service checks whether the caller is allowed to invoke this specific operation.

A data service checks whether this caller can access this specific record or field.

Each of these is a separate decision. None of them can be fully delegated to the layer above. A valid token at the gateway does not mean every resource inside the system is accessible.

Authentication alone is never enough.

Tokens

A token is a credential issued by an authentication service after a successful login or identity verification. A caller authenticates once, receives a token, and presents that token on subsequent requests. This avoids sending a password on every request, which would require every receiving service to verify passwords and would expose the password repeatedly over the network.

The receiving service validates the token: it checks that the token was issued by a trusted authority, that it has not expired, and that it is intended for this service. If validation passes, the service treats the caller as authenticated.

Tokens have a limited lifetime by design. A password is a long-lived secret that the user remembers and reuses indefinitely. A token is issued fresh, used for a bounded period, and then expires. If a token is stolen, the attacker can use it only until it expires. If a password is stolen, the attacker can use it until the user changes it. Short token lifetimes are one of the most practical limits on the damage from a credential theft.

Tokens can also carry claims, which are structured fields describing the caller. A token might state who the caller is, which permissions they hold, when the token was issued, and when it expires. This allows the receiving service to make an authorization decision from the token itself, without a separate database lookup.

Modern distributed systems standardize token issuance and validation through OAuth and OpenID Connect. The next sections cover these protocols and the JWT format used to encode them.

OAuth 2.1

OAuth is an authorization framework, not an authentication protocol. It lets a client obtain limited, delegated access to a protected resource, either on behalf of a user or for machine-to-machine communication. The question it answers is “what access is this client presenting?” not “who is this user?” The output of an OAuth flow is an access token. The service being called validates the token and uses its claims or associated metadata to decide what access to grant.

OAuth 2.1 consolidates the original OAuth 2.0 specification with several years of accumulated security improvements. It is still a draft at the IETF, but its practices are already widely followed and it is the version to use in new systems.

The most common flow is the authorization code flow. A user wants to access a resource through a client application. The client redirects the user to the authorization server, where the user authenticates directly and approves the requested scope of access. The authorization server redirects back to the client with a short-lived authorization code. The client exchanges that code for an access token (and optionally a refresh token) by making a back-channel request to the authorization server. The client then presents the access token to the resource server on each subsequent request. In the normal authorization code flow, the user’s password is entered at the authorization server rather than the client application.

Access tokens are short-lived, typically minutes to hours. A refresh token is a longer-lived credential the client presents to the auth server when the access token expires, to obtain a new access token without requiring the user to log in again.

OpenID Connect

OpenID Connect, or OIDC, is an identity layer built on top of OAuth. It allows a client to verify the identity of an end user based on authentication performed by an identity provider. An identity provider, or IdP, is a dedicated service that authenticates users and issues credentials confirming who they are. Google, Microsoft Entra, Okta, and Auth0 are common examples.

OAuth and OIDC are often used together, and the two protocols do different things.

OAuth answers: what access is being delegated? It produces an access token that backend services validate.

OIDC answers: who is the authenticated user? It produces an ID token that describes the user’s identity.

A web application typically uses OIDC to handle user login, establishing who the user is, then uses OAuth to obtain access tokens for calling backend APIs on that user’s behalf.

Connecting OAuth, OIDC, and JWTs

These three things often appear together in the same system, but they are not interchangeable.

A JSON Web Token, or JWT, is a compact, self-contained format for packaging and signing claims, not an authorization framework or an identity protocol. It has three parts: a header containing metadata about the signing algorithm, a payload containing the claims, and a signature.

OAuth access tokens and OIDC ID tokens are frequently encoded as JWTs, but they do not have to be. A JWT by itself has no security meaning until you know the protocol context it comes from. “We use JWTs” is not a description of a security architecture.

Signed JWTs are convenient because a receiver can verify the signature locally without contacting the issuer. This reduces latency and removes a runtime dependency on the identity provider. The cost is that revocation becomes hard. If a token is valid for one hour and an account is suspended five minutes in, the token remains usable for fifty-five more minutes unless the receiving service performs additional revocation checks. The tradeoff is between local validation and centralized revocation control: local validation is fast and requires no runtime dependency on the issuer, but enforcing revocation globally requires coordination with the issuer.

Verifying a JWT signature tells you the token is authentic, but not whether it is still valid. There are a few common approaches to the revocation problem. The most widely used is short-lived access tokens combined with a longer-lived refresh token. When the access token expires, the client presents the refresh token to the auth server to obtain a new one. That renewal is validated server-side, so a revoked account is cut off at the next renewal. The damage window from a stolen access token is bounded by its lifetime.

For situations where no gap between revocation and enforcement is acceptable, token introspection is an option: the receiving service calls the auth server on every request to confirm the token is still valid. This eliminates the revocation window but reintroduces a runtime dependency on the auth server, the same round-trip cost that self-contained tokens were designed to avoid.

Workload identity

Users are not the only principals in a distributed system.

Every service, container, virtual machine, batch job, and serverless function also needs an identity. When the orders service calls the payments service, payments needs to know not just that a valid token was presented, but specifically which workload is calling. Authorization decisions depend on caller identity. The orders service may be permitted to initiate a payment. An arbitrary internal process should not be allowed to do that.

Services need a way to prove their identity to each other, but the usual approaches do not work.

Usernames and passwords are designed for people.
Credentials baked into config files or container images end up in version control, container registries, and log systems, and they tend to stay there long after they should have been rotated.

SPIFFE, which stands for Secure Production Identity Framework for Everyone, is an open standard that defines what a workload identity looks like. Each service receives a short-lived X.509 certificate (the standard format used by TLS) with a URI that encodes its identity, such as spiffe://cluster.local/ns/default/sa/orders-service. Because it is a standard certificate, it works directly with TLS and mTLS with no special protocol needed. The identity is cryptographically verifiable, short-lived, and automatically rotated.

SPIRE is the most widely deployed production-ready implementation of SPIFFE. It runs a central server and an agent on every node.

When a workload starts and requests a certificate, the agent needs to confirm the workload is who it claims to be. It does this using platform-level information about the running process: in a Kubernetes cluster, for example, it confirms which pod is making the request. Only after that check passes does the agent issue the certificate. The agent renews it automatically before it expires. The workload never manages certificate rotation itself.

Cloud IAM

Cloud identity and access management (IAM) extends these ideas to cloud resources.

A service running in a cloud environment may need to read from object storage, publish to a message queue, write to a database, or retrieve secrets from a secret manager. Each of those operations requires the service to have both an identity and a set of permissions in the cloud provider’s control plane.

Cloud IAM policies bind identities to permissions on specific resources. Over-privileged identities become major escalation paths when compromised, the same pattern that appears at every layer of distributed system security. A microservice granted broad read/write access to all storage buckets because that was the easiest configuration becomes a far larger incident when compromised than one restricted to the specific bucket it needs.

Workload identity federation is a pattern in which a workload presents its verified identity credential to the cloud provider, which issues a short-lived access token tied to specific cloud permissions. This is increasingly preferred over shipping long-lived service account key files with the container image.

Zero Trust

Zero Trust is an architectural principle, not a product.

The traditional perimeter model treated everything inside the corporate network or data center as trusted. Firewalls enforced the boundary by allowing only traffic to specific internal IP addresses and ports. Remote workers connected through VPNs, which extended the trusted network to their machines over an encrypted link and granted them the same implicit trust as a local connection. Anything inside the perimeter was presumed legitimate, and security effort concentrated at the boundary rather than within it.

That assumption breaks down when services are distributed across clouds, when employees work from unmanaged devices outside any VPN, and when the internal network may itself contain compromised machines.

The Zero Trust principle is that network location should not by itself imply trust. A request arriving from inside the cluster still needs to be authenticated, authorized against policy, and evaluated on its own merits.

For a distributed system, Zero Trust leads to a specific set of design decisions:

Authenticate every service and workload, not just users at the edge.
Encrypt internal traffic as well as external.
Push authorization decisions as close to the resource as possible.
Prefer short-lived credentials over static secrets.
Apply least privilege at every layer.
Assume that partial compromise will happen.

Zero Trust does not mean treating every request as hostile in a way that makes the system unusable. It means trust is earned by presenting verified credentials and satisfying policy, not by appearing to be in the right network location.

An important consequence of assuming partial compromise is designing the system so that a breach of one component does not automatically spread. Micro-segmentation is one of the primary tools for achieving that.

Micro-segmentation

Micro-segmentation means dividing a system into fine-grained trust domains and explicitly controlling which communications are permitted between them.

The older perimeter model treated the entire internal network as a single trusted zone. Once an attacker gained access to any machine inside, lateral movement was relatively straightforward because internal communication was largely unrestricted.

Micro-segmentation reduces the blast radius of a compromise. If the orders service is not permitted to communicate directly with the database tier, then a compromise of the orders service does not automatically give an attacker database access. The attacker must also compromise a service that is permitted to make that connection.

In a microservice environment, micro-segmentation typically means specific services are allowed to talk to specific other services, on specific ports or protocols, and only when the caller presents a verified workload identity. It is implemented through a combination of network-level access controls (firewall rules that restrict which services can connect to which others), service mesh authorization rules, and cloud IAM.

Service meshes and API gateways

These two components are often discussed together but solve different problems.

API gateways

An API gateway is the single entry point for all external traffic. It acts as a reverse proxy: clients connect to the gateway, not to individual backend services, which means internal service structure is never exposed directly to the outside world.

The gateway accepts each incoming request, processes it against defined policies, and routes it to the appropriate backend service. For some requests, it may call multiple backend services and combine their responses before returning a result to the client.

The gateway handles several functions that apply to all incoming traffic:

TLS termination: The gateway often terminates incoming HTTPS connections. Internal services communicate over separate connections behind the gateway.
Authentication and token validation: The gateway checks that each request carries a valid credential before passing it inward. Requests without valid tokens are rejected at the boundary.
Rate limiting: The gateway tracks request volume per client and rejects requests that exceed defined limits, protecting backend services from being overwhelmed.
Routing: The gateway maps incoming requests to the correct internal service based on the URL path, headers, or other request attributes.

The gateway is not sufficient for resource-level or business-level authorization. It makes coarse-grained decisions: reject unauthenticated requests, block a client exceeding its rate limit. It does not know the business meaning of each request. Whether a specific user is allowed to access a specific order is a question only the service handling that data can answer. Each internal service is responsible for its own authorization decisions, regardless of what the gateway accepted.

Service meshes

The API gateway handles traffic entering the system from outside. Inside the system, services call each other constantly: the orders service calls payments, payments calls a fraud detection service, and so on. Securing all of that internal traffic presents a different problem. Each service would need to implement mTLS, manage certificates, enforce authorization policies, and emit telemetry. With dozens or hundreds of services, that is an enormous amount of duplicated effort, and it is easy for individual services to get it wrong.

A service mesh solves this by moving those responsibilities out of the application code and into the infrastructure. It inserts a small proxy process, called a sidecar, alongside each service. In the classic sidecar model, inbound and outbound service traffic passes through the proxy rather than going directly to or from the service. The application code does not change. It still makes plain network connections. The proxy intercepts those connections and automatically handles mTLS, workload identity verification, authorization policy enforcement, and traffic telemetry.

From the service’s perspective, it is making a normal connection. From the network’s perspective, every connection is mutually authenticated and encrypted. Authorization policies are enforced consistently across the whole system without any service having to implement them individually.

Istio and Linkerd are the most widely deployed service meshes. Both provide mTLS, workload identity, and policy enforcement as infrastructure-level services.

North-south and east-west traffic

These terms describe the direction of network traffic relative to a system’s boundary.

North-south traffic flows between external clients and the cluster: a browser sending a request to an API, or a mobile app calling a backend.

East-west traffic flows between services inside the cluster: the orders service calling the payments service, or any other internal call. The distinction is important because different mechanisms protect each type of traffic. API gateways are designed for the external boundary; service meshes are designed for internal communication.

The architectural distinction is straightforward:

API gateways handle north-south traffic: requests entering or leaving the system from external clients.

Service meshes handle east-west traffic: service-to-service calls inside the system.

Both are needed. Gateways protect the front door. Service meshes and local authorization checks protect what happens inside.

Secret management

Systems need API keys, database passwords, signing keys, TLS private keys, and other credentials. The challenge is not just storing them. It is also distributing them safely to services at runtime, rotating them, revoking them when compromised, auditing their use, and containing the blast radius when something leaks.

Common failure modes include secrets in source code or configuration files that end up in version control, credentials baked into container images, secrets passed through environment variables that appear in process listings or log output, and long-lived secrets that are never rotated.

A well-designed system minimizes long-lived static secrets. Where possible, it issues short-lived credentials to authenticated workloads at runtime. Dedicated secret management systems such as HashiCorp Vault, AWS Secrets Manager, Google Cloud Secret Manager, and Azure Key Vault provide controlled storage with access auditing and support for automatic rotation.

One mistake that appears with surprising frequency: base64 is an encoding, not encryption. A base64-encoded secret is fully readable by anyone who can see the encoded string.

Key and certificate rotation. Any credential can leak. A secure system is designed so that replacing credentials is a routine, automated event rather than an emergency. If rotating a certificate requires manual edits on hundreds of machines, downtime, or redeployment of multiple services, the design is brittle. Short certificate lifetimes are preferable to long ones: a certificate that expires in 24 hours limits the damage from a stolen credential far more than one that expires in a year. Rotation should be tested as a normal operational procedure, not only rehearsed during a compromise.

Common design mistakes

Specific mistakes can be easier to remember than abstract warnings. Here are some common mistakes:

Assuming the internal network is trusted.
Authenticating users but not services.
Using long-lived shared secrets everywhere.
Treating JWTs as a security architecture rather than a token format.
Relying on the API gateway to provide all security and ignoring internal calls.
Checking authentication but not authorization on individual objects or operations.
Over-privileging service accounts and cloud IAM roles.
Storing secrets in repositories, images, or environment variable dumps.
Failing to plan for revocation, rotation, and replay resistance.

These are design failures, not just implementation bugs. They reflect trust assumptions that were wrong from the start.

A worked example

Consider an online store implemented as a set of microservices: a gateway, a front end, an orders service, a payments service, an inventory service, and a user-profile service.

A user logs in through an identity provider. The provider runs an OIDC flow and issues an ID token that confirms the user’s identity and an OAuth access token that authorizes API access.

The API gateway receives the incoming request, validates the access token, applies rate limits and coarse-grained policy, and routes the request inward.

The frontend calls the orders service. That call travels over mTLS. The orders service receives it with a cryptographically verified workload identity for the caller, not just a network address. The orders service performs its own authorization check: is this the frontend service, and is this caller allowed to read this specific order for this specific user?

If the orders service needs to call the payments service, that call is also over mTLS. The payments service verifies the caller’s workload identity and applies its own local authorization policy before doing anything with the request.

Database credentials are not baked into the container image. They are retrieved at runtime from a secrets manager, or issued as short-lived credentials through workload identity federation.

All service certificates are short-lived and rotated automatically by the service mesh.

Read from the outside in, the full stack looks like this:

OIDC authenticates the user.
OAuth access tokens convey delegated authorization for API calls.
JWTs carry these tokens in a compact, verifiable format.
TLS and mTLS protect all communication channels.
Workload identity (SPIFFE) authenticates services to each other.
The API gateway secures north-south traffic.
The service mesh secures east-west traffic.
Local authorization checks protect individual resources and operations.
Secret management and certificate rotation ensure that a credential compromise is a recoverable event rather than a permanent one.