A freshly-issued access token from a naive implementation can easily reach 2KB. That token gets sent as a header on every API request. Over a session with hundreds of requests, you are transmitting megabytes of redundant authorization data. Worse, many browsers and HTTP servers enforce header size limits — Apache defaults to 8KB for all headers combined, and hitting that limit manifests as cryptic 400 errors that are hard to diagnose. Token size is worth optimizing deliberately.
Where the bytes come from
A typical bloated token payload might look like this:
{
"sub": "usr_01H9XM3K5V8N2Q4P7RWTJ6YACB",
"iss": "https://auth.example.com",
"aud": ["https://api.example.com", "https://admin.example.com",
"https://billing.example.com", "https://analytics.example.com"],
"iat": 1698012345,
"exp": 1698015945,
"jti": "tok_01H9XN7F2M3P4Q5R6SWTJ7YACB",
"email": "alice@example.com",
"email_verified": true,
"name": "Alice Nguyen",
"picture": "https://lh3.googleusercontent.com/a/AAcHTtd...[100 chars]",
"organization_id": "org_01H9XM3K5V8N2Q4P7RWTJ6Y",
"organization_name": "Acme Corp Engineering Team",
"organization_slug": "acme-corp",
"roles": ["admin", "billing_manager", "developer", "support"],
"permissions": [
"projects:read", "projects:write", "projects:delete",
"members:read", "members:write", "members:invite",
"billing:read", "billing:write",
"settings:read", "settings:write",
"audit_logs:read", "api_keys:read", "api_keys:write"
],
"subscription_plan": "enterprise",
"subscription_status": "active",
"feature_flags": ["new_dashboard", "beta_api_v3", "advanced_analytics"]
}
The profile fields (name, picture, email) alone can add 300 bytes. The permissions array with descriptive strings adds another 300-400 bytes. The audience array grows with every service you add. Base64url encoding adds roughly 33% overhead. Sign with RS256 and you add a 342-byte signature. The result is a token that encodes the same authorization decision on every single request when that decision rarely changes.
Short claim names
The IANA JWT Claims Registry defines short names for standard claims: sub, iss, aud, iat, exp, jti, nbf. These are already short. The problem is custom claims, which teams often name verbosely to be self-documenting.
Compare the byte cost of equivalent payloads:
// Verbose: 87 bytes for this section
{
"organization_id": "org_01H9XM3K5V8N2Q4P7RWTJ6Y",
"subscription_plan": "enterprise"
}
// Abbreviated: 43 bytes — 50% reduction
{
"org": "org_01H9XM3K5V8N2Q4P7RWTJ6Y",
"plan": "enterprise"
}
Document your abbreviated claim names in your token documentation. The token is a machine-to-machine format; it does not need to be human-readable. A decoding library reads org as readily as organization_id.
Reference tokens vs inline permissions
The most significant size reduction comes from moving from self-contained tokens to reference tokens for permission data. Instead of embedding the full permissions list in the token, the token contains only an opaque reference ID. The resource server calls a token introspection endpoint to fetch the full permission set.
// Self-contained token payload: grows unboundedly with permissions
{
"sub": "usr_01H9XM",
"org": "org_01H9YN",
"permissions": ["projects:read", "projects:write", ...13 more]
}
// Token size: ~800 bytes
// Reference token payload: fixed size regardless of permission count
{
"sub": "usr_01H9XM",
"jti": "tok_01H9ZP"
}
// Token size: ~280 bytes
// Resource server introspects on first request, caches by jti
async function getPermissions(jti: string): Promise<string[]> {
const cached = await redis.get(`perm:${jti}`);
if (cached) return JSON.parse(cached);
const response = await fetch('https://auth.example.com/oauth/introspect', {
method: 'POST',
headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
body: new URLSearchParams({ token: jti }),
});
const data = await response.json();
// Cache for the remaining token lifetime
const ttl = data.exp - Math.floor(Date.now() / 1000);
await redis.setex(`perm:${jti}`, ttl, JSON.stringify(data.permissions));
return data.permissions;
}
The introspection call adds latency on the first request, but with Redis caching keyed to the token's jti, subsequent requests within the same token's lifetime pay only the cache lookup cost (~0.5ms). The tradeoff is that your resource servers now have a runtime dependency on the introspection endpoint.
Audience restrictions
Broad audience arrays are a common source of bloat. Each additional audience in the aud claim adds its full URL length. More importantly, a token accepted by four different services is a larger blast radius if the token is stolen — it works everywhere.
Issue audience-restricted tokens: each token is valid for exactly one API. The authorization server issues different tokens for different resource servers, each with a targeted audience. Clients that need to call multiple APIs request multiple tokens. This is the intended model under OAuth 2.0's resource indicators extension (RFC 8707).
// Request a token scoped to a specific resource
const tokenResponse = await fetch('https://auth.example.com/oauth/token', {
method: 'POST',
body: new URLSearchParams({
grant_type: 'client_credentials',
client_id: process.env.CLIENT_ID,
client_secret: process.env.CLIENT_SECRET,
// RFC 8707 resource indicator — token is only valid for this API
resource: 'https://api.example.com',
scope: 'projects:read projects:write',
}),
});
// The issued token has aud: ["https://api.example.com"] only
// Attempting to use it at https://admin.example.com will fail aud validation
Stripping profile data from access tokens
Access tokens are for authorizing API calls, not for conveying user profile information. Fields like name, email, picture, and email_verified belong in the ID token (consumed by the frontend) or fetched from a /userinfo endpoint. They have no business in an access token that APIs use for authorization decisions.
Remove all profile claims from your access token template. An API endpoint that needs the user's email should query your user service using the sub claim as the lookup key, not read it from the token.
Compression tradeoffs
DEFLATE compression applied to the JWT payload before base64url encoding can reduce a 1KB payload to 400-500 bytes. The zip header parameter in the JWT spec reserves this use. However, JOSE compression interacts poorly with timing attacks in certain contexts, and most JWT libraries do not implement it by default. The additional complexity — custom serialization, library support gaps — is rarely worth the savings when the other techniques above can achieve similar size reductions without any novel attack surface.