Zero trust is not a product. It's a network architecture decision.

Every network security vendor has rebranded their product as "zero trust" over the past five years. Firewalls are now "zero trust firewalls." VPNs are "zero trust network access." The marketing has obscured what zero trust actually means: an architectural principle that network location is not a trust signal. A request coming from inside your private network deserves exactly the same scrutiny as one coming from the public internet. This post covers what that means in practice for SaaS infrastructure.

The perimeter model and why it fails

Traditional network security assumes a hard boundary: outside the perimeter is untrusted, inside is trusted. Services inside the network talk to each other freely. The model worked when compute was on-premises and your employees were in one building.

It fails because: cloud infrastructure erases network boundaries; employees work from anywhere; internal services can be compromised individually; lateral movement after a perimeter breach is trivially easy when internal traffic is unrestricted.

The 2020 SolarWinds breach illustrated this precisely. An attacker who compromised one internal tool could move laterally to extremely sensitive systems because internal trust was implicit.

Never trust, always verify

The zero trust principle: every request, from any source, must authenticate and be authorized independently. Network adjacency is not authentication. Being on the same VPC subnet is not trust.

In practice, this means:

  • Every service-to-service call carries an authentication credential
  • Every service verifies that credential before processing the request
  • Credentials are short-lived and automatically rotated
  • Authorization is checked per-request, not per-connection

mTLS for service-to-service authentication

Mutual TLS (mTLS) extends standard TLS by requiring both parties to present certificates. In a microservices environment, each service has its own certificate that identifies it. When service A calls service B, B verifies A's certificate — and A verifies B's. Both sides are authenticated.

// Node.js mTLS server setup
import https from 'https';
import fs from 'fs';

const server = https.createServer({
  // Server's own certificate
  cert: fs.readFileSync('/etc/certs/auth-service.crt'),
  key:  fs.readFileSync('/etc/certs/auth-service.key'),

  // Trust only certs signed by your internal CA
  ca: fs.readFileSync('/etc/certs/internal-ca.crt'),

  // Require client certificate
  requestCert: true,
  rejectUnauthorized: true,
}, app);

// In middleware: extract verified client identity from cert
app.use((req, res, next) => {
  const cert = (req.socket as TLSSocket).getPeerCertificate();
  if (!cert || !cert.subject) {
    return res.status(401).json({ error: 'client_cert_required' });
  }
  // cert.subject.CN = 'spiffe://cluster.local/ns/default/sa/api-gateway'
  req.callerIdentity = cert.subject.CN;
  next();
});
// mTLS client: present your own certificate when calling other services
import https from 'https';

const agent = new https.Agent({
  cert: fs.readFileSync('/etc/certs/api-gateway.crt'),
  key:  fs.readFileSync('/etc/certs/api-gateway.key'),
  ca:   fs.readFileSync('/etc/certs/internal-ca.crt'),
});

const response = await fetch('https://auth-service.internal/validate', {
  // @ts-ignore — Node 18+ accepts agent
  dispatcher: new (require('undici').Agent)({ connect: { cert, key, ca } }),
  method: 'POST',
  body: JSON.stringify({ token }),
});

Short-lived certs vs long-lived certs

Certificate rotation is the operational nightmare of mTLS. With traditional PKI, certs last 1–2 years. If one is compromised, you're exposed until it expires or you manually revoke it — and certificate revocation (CRL/OCSP) has its own reliability problems.

Short-lived certificates are the better model for zero trust infrastructure. Issue certs with 24-hour validity and rotate them automatically. A compromised cert is worthless in hours. You don't need revocation infrastructure if certs expire before you could react anyway.

# Using SPIRE to issue a short-lived SVID (SPIFFE Verifiable Identity Document)
# The workload API is available at a Unix socket on every workload node

# Fetch an X.509 SVID valid for 1 hour
spire-agent api fetch x509 \
  -socketPath /tmp/spire-agent/public/api.sock \
  -write /tmp/certs/

# Files written: svid.0.pem (cert), svid.0.key (private key), bundle.0.pem (CA bundle)
# Certificates auto-rotate 5 minutes before expiry

The BeyondCorp model

Google's BeyondCorp (2014) is the canonical zero trust implementation for user access. The key insight: stop using VPNs. Instead, make every internal service accessible on the public internet, and use an identity-aware proxy to authenticate every request with a combination of device posture and user identity.

The BeyondCorp components translate to modern tooling:

  • Device inventory → MDM platform (Jamf, Intune) with continuous compliance checking
  • Identity-aware proxy → Cloudflare Access, Google IAP, or a self-hosted equivalent
  • Access policies → "user is in engineering group AND device is managed AND device OS is patched"
  • Trust scoring → dynamic, not static — a device that starts failing compliance checks loses access mid-session

SPIFFE and SPIRE for workload identity

SPIFFE (Secure Production Identity Framework for Everyone) is an open standard for workload identity in dynamic infrastructure. Every workload gets a SPIFFE ID: a URI like spiffe://cluster.local/ns/production/sa/auth-service. SPIRE is the reference implementation that issues SPIFFE identities.

In a Kubernetes cluster, SPIRE integrates with the kubelet attestation API to verify that a workload is who it claims to be based on its pod identity, namespace, and service account — not just what IP address it's on.

# SPIRE registration entry: register the auth-service workload
spire-server entry create \
  -spiffeID spiffe://cluster.local/ns/production/sa/auth-service \
  -parentID spiffe://cluster.local/spire/agent/k8s_sat/production/NODE_UID \
  -selector k8s:ns:production \
  -selector k8s:sa:auth-service \
  -ttl 3600

With SPIRE, the mTLS certificate the auth-service presents carries a SPIFFE ID in the Subject Alternative Name. Any receiver that validates the cert knows with cryptographic confidence that they're talking to the auth-service in the production namespace — not just something that managed to reach the internal network.

Authorization at every hop

Zero trust authentication (proving who you are) is only half the picture. You also need authorization at every service boundary. Just because the API gateway is allowed to call the auth service doesn't mean every API gateway request should be able to hit every auth service endpoint.

Use a policy engine like Open Policy Agent (OPA) for fine-grained service-to-service authorization:

# OPA policy: which services can call which endpoints
package authz.service

default allow = false

allow {
  caller_is_authorized
  request_is_authorized
}

caller_is_authorized {
  # Extract SPIFFE ID from validated mTLS cert
  caller_id := input.caller_spiffe_id
  allowed_callers[input.resource][input.action][caller_id]
}

allowed_callers := {
  "/internal/validate-token": {
    "POST": {
      "spiffe://cluster.local/ns/production/sa/api-gateway": true,
      "spiffe://cluster.local/ns/production/sa/webhook-service": true,
    }
  }
}
Zero trust is a journey, not a switch. Start with mTLS on your most sensitive internal service boundaries. Expand from there. Trying to implement full zero trust across your entire infrastructure in one pass will stall — pick the highest-value boundaries first.