MFA recovery codes: generating, storing, and invalidating them safely

Recovery codes are the escape hatch of the MFA system. When a user loses their phone, swaps authenticator apps, or gets a new device, a recovery code is the only thing that gets them back into their account without a support ticket. This makes them critically important for user experience — and critically dangerous if you get the implementation wrong. A poorly implemented recovery code system is effectively a backdoor that bypasses the MFA you've worked to enforce.

This post covers cryptographic generation, hashed storage, the display-once UX contract, single-use enforcement, batch invalidation, and the admin recovery flow that you'll need eventually.

Generating codes cryptographically

Recovery codes must be generated with a cryptographically secure random number generator. Math.random(), Python's random module, and UUIDs based on timestamp entropy are not acceptable. You want OS-level entropy: crypto.randomBytes in Node.js, secrets in Python, crypto/rand in Go.

import secrets
import string

def generate_recovery_codes(count: int = 10) -> list[str]:
    """
    Generate recovery codes in the format XXXXX-XXXXX (10 chars, human-readable).
    Each code has ~52 bits of entropy — sufficient for rate-limited login.
    """
    alphabet = string.ascii_uppercase + string.digits
    # Remove ambiguous characters: 0/O, 1/I/L
    alphabet = alphabet.replace('0', '').replace('O', '').replace('1', '')
    alphabet = alphabet.replace('I', '').replace('L', '')

    codes = []
    for _ in range(count):
        part1 = ''.join(secrets.choice(alphabet) for _ in range(5))
        part2 = ''.join(secrets.choice(alphabet) for _ in range(5))
        codes.append(f"{part1}-{part2}")

    return codes

# Example output: ['B4KWR-X9MZP', 'QFVNH-3TYCG', ...]

The hyphenated 5-5 format is a deliberate UX decision. Groups of 5 characters are easy for humans to read, transcribe, and verify. Including only unambiguous characters prevents support tickets from users who typed a zero instead of an O.

Storage: hashed, not plaintext

Recovery codes should be stored hashed, just like passwords. If your database is breached, plaintext recovery codes give the attacker immediate bypass of every user's MFA. Bcrypt is the correct choice — it's deliberately slow and includes a salt.

import bcrypt
from datetime import datetime, timezone

def store_recovery_codes(user_id: str, codes: list[str], db) -> None:
    # Invalidate any existing codes first
    db.execute(
        "UPDATE recovery_codes SET invalidated_at = NOW() WHERE user_id = $1 AND used_at IS NULL AND invalidated_at IS NULL",
        user_id
    )

    for code in codes:
        # bcrypt automatically generates a salt
        code_hash = bcrypt.hashpw(
            code.upper().replace('-', '').encode(),
            bcrypt.gensalt(rounds=10)
        ).decode()

        db.execute(
            """
            INSERT INTO recovery_codes (user_id, code_hash, created_at)
            VALUES ($1, $2, $3)
            """,
            user_id, code_hash, datetime.now(timezone.utc)
        )


def verify_recovery_code(user_id: str, submitted_code: str, db) -> bool:
    """
    Try submitted code against all active recovery codes for this user.
    O(n) bcrypt comparisons — acceptable since n is small (10) and rate limiting is enforced.
    """
    normalized = submitted_code.upper().replace('-', '').replace(' ', '').encode()

    active_codes = db.fetchall(
        """
        SELECT id, code_hash FROM recovery_codes
        WHERE user_id = $1 AND used_at IS NULL AND invalidated_at IS NULL
        """,
        user_id
    )

    for row in active_codes:
        if bcrypt.checkpw(normalized, row['code_hash'].encode()):
            # Mark as used immediately — single use enforcement
            db.execute(
                "UPDATE recovery_codes SET used_at = NOW() WHERE id = $1",
                row['id']
            )
            return True

    return False

bcrypt with 10 rounds takes ~100ms per comparison. With 10 active codes, a single recovery code attempt takes up to 1 second. This is fine — it's actually a feature. Rate limit recovery code attempts aggressively: 5 attempts per hour per user, with exponential backoff and account lockout at 10 failed attempts.

The display-once contract

Recovery codes must be shown to the user exactly once: immediately after generation, before you store the hashes. After the user dismisses the page, the plaintext is gone. This is intentional — it forces users to store the codes themselves, rather than assuming they can always retrieve them from your app.

The UX pattern that works: show all 10 codes, require the user to acknowledge "I have saved these codes" (a checkbox, not just a button), then dismiss. Offer a "Download as text file" button that downloads a plain text file — this is more durable than screenshots.

// POST /api/mfa/recovery-codes/generate
export async function generateRecoveryCodes(req: Request, res: Response) {
  const user = req.user!;

  // Verify they've completed MFA setup — don't allow generation before TOTP is active
  if (!user.mfaEnabled) {
    return res.status(400).json({ error: 'MFA must be enabled before generating recovery codes' });
  }

  const plainCodes = generateRecoveryCodes(10);

  // Store hashes in DB (invalidates old codes)
  await storeRecoveryCodes(user.id, plainCodes, db);

  // Log the event
  await auditLog.write({
    userId: user.id,
    action: 'recovery_codes_generated',
    ip: req.ip,
    userAgent: req.headers['user-agent'],
  });

  // Return plaintext ONCE — never stored anywhere in our system after this response
  return res.json({
    codes: plainCodes,
    warning: 'Store these codes securely. They will not be shown again.',
    generatedAt: new Date().toISOString(),
  });
}

// GET /api/mfa/recovery-codes — only return count, never the codes themselves
export async function getRecoveryCodeStatus(req: Request, res: Response) {
  const count = await db.recoveryCode.countActive(req.user!.id);
  return res.json({ activeCount: count, total: 10 });
}

Batch invalidation on use

When a user successfully authenticates with a recovery code, invalidate all remaining codes immediately. This sounds aggressive, but the logic is sound: if a recovery code was used, either the user lost their authenticator (in which case they need to re-enroll MFA and get fresh codes), or an attacker used a stolen code (in which case the remaining codes are compromised too).

def consume_recovery_code(user_id: str, submitted_code: str, db) -> bool:
    if not verify_recovery_code(user_id, submitted_code, db):
        return False

    # Invalidate all remaining codes
    db.execute(
        """
        UPDATE recovery_codes
        SET invalidated_at = NOW(), invalidation_reason = 'sibling_used'
        WHERE user_id = $1 AND used_at IS NULL AND invalidated_at IS NULL
        """,
        user_id
    )

    # Flag the account: user must re-enroll MFA or generate new codes
    db.execute(
        "UPDATE users SET mfa_recovery_required = TRUE WHERE id = $1",
        user_id
    )

    return True

After using a recovery code, route the user through a flow that re-enrolls their authenticator (or explicitly skips it with a warning about running without MFA). Generate new recovery codes as part of that re-enrollment. Never leave a user with zero active codes and no way to generate new ones.

Admin recovery flow

You will have users who lose both their authenticator and their recovery codes. This happens. You need an out-of-band recovery process that doesn't undermine your security posture. The key principles:

Identity verification before any recovery: photo ID, video call, or other out-of-band channel depending on your security requirements.
Admin-initiated recovery generates a one-time recovery link, not a bypass code. The link expires in 15 minutes.
The recovery action is logged with the admin's identity and rationale.
The user is notified by email that account recovery was performed.

CREATE TABLE recovery_codes (
  id                  BIGSERIAL PRIMARY KEY,
  user_id             BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  code_hash           TEXT NOT NULL,
  created_at          TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  used_at             TIMESTAMPTZ,
  invalidated_at      TIMESTAMPTZ,
  invalidation_reason VARCHAR(64),  -- 'regenerated', 'sibling_used', 'admin_reset'
  generation_batch    UUID NOT NULL  -- groups codes created together
);

CREATE INDEX idx_recovery_codes_user_active
  ON recovery_codes(user_id)
  WHERE used_at IS NULL AND invalidated_at IS NULL;

The generation_batch field lets you invalidate all codes from a specific generation event if you discover a compromise, without touching codes from later batches. It also gives you audit visibility into exactly when each batch was created and how many have been consumed.