Token introspection (RFC 7662): when JWTs aren't enough

JWTs have a well-known and well-documented limitation: once issued, a valid JWT remains valid until its expiry time. You cannot revoke a JWT without maintaining a denylist — and at that point you're doing a database lookup on every request anyway, which eliminates the main performance argument for JWTs. Token introspection, defined in RFC 7662, offers an explicit, standards-based alternative: issue an opaque token (just a random string), and let resource servers call a protected endpoint on the authorization server to validate it in real time.

Understanding when to choose each approach requires understanding what each model is actually trading away.

The JWT revocation problem in concrete terms

A user's account is compromised. You lock their account in your database. Your API receives a request bearing their access token. The token was issued an hour ago with a 15-minute TTL. If 14 minutes have elapsed since issuance, that token is still cryptographically valid and your middleware will accept it — unless you've implemented a revocation check.

The standard workarounds for JWT revocation each have costs:

  • Short TTL (1-5 minutes): Limits the revocation window but forces more frequent refresh token exchanges and increases load on the token endpoint.
  • Token denylist in Redis: Effective but requires a cache lookup on every request. You've added the network hop that JWTs were supposed to avoid.
  • Token version in user record: Cheap check — compare token.version == user.token_version — but requires fetching the user record, again defeating stateless validation.

Opaque tokens with introspection side-step this entirely. The authorization server is the only source of truth, and resource servers ask it directly.

How introspection works (RFC 7662)

The introspection endpoint is a protected resource that accepts a token and returns a JSON object describing its state. The critical field is active: a boolean that indicates whether the token is currently valid.

# Resource server calls the introspection endpoint
curl -X POST https://auth.example.com/oauth/introspect \
  -H "Authorization: Basic $(echo -n 'rs_client_id:rs_client_secret' | base64)" \
  -d "token=fFAGRNJru1FTz70BzhT3Zg&token_type_hint=access_token"
{
  "active": true,
  "scope": "read:documents write:documents",
  "client_id": "mobile_app_v2",
  "username": "alice@example.com",
  "sub": "user_01J3KXYZ",
  "aud": "https://api.example.com",
  "iss": "https://auth.example.com",
  "exp": 1722470400,
  "iat": 1722466800,
  "jti": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

If the token has been revoked, or never existed, the response is simply {"active": false}. No error code, no detail — this prevents information leakage about token validity to unauthorized callers.

The introspection endpoint must be protected. A compromised client that can introspect arbitrary tokens can check whether any string is a valid token, which is an enumeration risk. Authenticate callers with Basic auth (client_id + client_secret) or mutual TLS. Only resource servers should have introspection credentials — not end-user clients.

Implementing introspection in a resource server

import httpx
import hashlib
from functools import lru_cache
from typing import Optional
import time

INTROSPECT_URL = "https://auth.example.com/oauth/introspect"
INTROSPECT_CREDS = ("rs_client_id", "rs_client_secret")

# Simple in-process cache: token_hash -> (result, expires_at)
_cache: dict[str, tuple[dict, float]] = {}
CACHE_TTL = 30  # seconds — keep short for timely revocation propagation

def introspect_token(raw_token: str) -> Optional[dict]:
    token_hash = hashlib.sha256(raw_token.encode()).hexdigest()
    now = time.time()

    # Check cache
    if token_hash in _cache:
        result, expires_at = _cache[token_hash]
        if now < expires_at:
            return result if result.get("active") else None

    # Call introspection endpoint
    try:
        resp = httpx.post(
            INTROSPECT_URL,
            data={"token": raw_token, "token_type_hint": "access_token"},
            auth=INTROSPECT_CREDS,
            timeout=2.0,
        )
        resp.raise_for_status()
        result = resp.json()
    except (httpx.HTTPError, ValueError):
        # Fail open vs fail closed: your threat model decides.
        # For most APIs, fail closed (return None) is correct.
        return None

    # Cache positive results only. Never cache inactive tokens —
    # you want revocation to propagate immediately.
    if result.get("active"):
        # Don't cache beyond the token's own expiry
        token_exp = result.get("exp", now + CACHE_TTL)
        cache_until = min(now + CACHE_TTL, token_exp)
        _cache[token_hash] = (result, cache_until)

    return result if result.get("active") else None


def require_token(request) -> dict:
    auth_header = request.headers.get("Authorization", "")
    if not auth_header.startswith("Bearer "):
        raise AuthError(401, "Missing bearer token")

    token = auth_header[7:]
    claims = introspect_token(token)
    if not claims:
        raise AuthError(401, "Token invalid or revoked")

    return claims
Never cache an active: false response. If you cache it and the token is re-issued with the same value (extremely unlikely with a proper random generator, but defensive coding matters), you'll deny a legitimate request. Only cache positive results, and only for short windows.

Caching introspection responses

The main criticism of opaque tokens is latency: every API request triggers a network call to the authorization server. Caching is the standard mitigation, but the cache TTL is a security parameter, not just a performance parameter.

A 30-second cache TTL means revoked tokens remain usable for up to 30 seconds after revocation. A 5-minute TTL extends that window. For most applications, 30 seconds is an acceptable tradeoff. For high-security contexts (financial, medical), use a shorter TTL or no cache at all and provision your auth server to handle the full request volume.

For production deployments, use Redis as a shared cache across your API fleet rather than per-process dictionaries:

import redis
import json

r = redis.Redis(host="redis", port=6379, decode_responses=True)

def introspect_with_redis_cache(raw_token: str) -> Optional[dict]:
    cache_key = f"introspect:{hashlib.sha256(raw_token.encode()).hexdigest()}"
    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)

    result = call_introspection_endpoint(raw_token)
    if result and result.get("active"):
        ttl = min(30, result.get("exp", 0) - int(time.time()))
        if ttl > 0:
            r.setex(cache_key, ttl, json.dumps(result))

    return result if result and result.get("active") else None

Revocation propagation: the real advantage

With JWTs and a Redis denylist, revocation is two operations: write the jti to Redis, then every service that checks the denylist will see it within its own cache window. With introspection, revocation is one operation: mark the token inactive in the authorization server. Resource servers with a 30-second cache will automatically stop accepting it within 30 seconds — no coordination required, no denylist to synchronize across regions.

This simplicity compounds at scale. A 10-service microservice architecture using JWT denylists means 10 separate Redis lookups and 10 separate cache invalidation concerns. With introspection and a shared Redis cache keyed by token hash, you have one cache that all services read, and a single source of truth at the auth server. Revoke once, propagate automatically.

When JWTs are still the right choice

Opaque tokens with introspection are not universally superior. JWTs win when:

  • Third-party resource servers need to validate tokens without network access to your auth server. A vendor API that accepts your tokens cannot call your introspection endpoint — it needs a self-contained JWT it can verify with your public key.
  • High-throughput APIs where even a Redis lookup is too slow. Stateless JWT validation with a short TTL and acceptable revocation lag is a legitimate architectural choice.
  • Edge validation at a CDN or API gateway where a network call to a central auth server is not feasible.

A common pattern is to use opaque tokens externally (what clients receive and send) and have the authorization server issue JWTs internally for service-to-service calls. The introspection endpoint becomes a gateway that exchanges an opaque external token for a short-lived internal JWT, validated locally by downstream services. This gives you external revocability with internal performance.