Checking passwords against 850 million breached credentials without storing them

The HaveIBeenPwned Pwned Passwords dataset contains over 850 million real-world passwords collected from data breaches. Every one of them has appeared in a breach and should never be accepted as a new password. The challenge is checking against this dataset without sending your users' passwords — even in hashed form — to an external service. The solution is k-anonymity, and it is elegant.

The naive approach and why it fails

The obvious approach is to send the full SHA-1 hash of the password to the HIBP API: GET /range/{fullHash}. But this reveals the exact hash to the HIBP service. While HIBP is trustworthy and free to use, you are establishing a privacy dependency on an external service. If HIBP were ever compromised, or if you wanted to run your own internal breach database mirror, you would need to send full hashes, which are effectively the password for common passwords that appear in rainbow tables.

K-anonymity and hash prefix matching

The HIBP Pwned Passwords API uses a k-anonymity model designed by Troy Hunt in collaboration with Cloudflare. It works as follows:

  1. Compute the SHA-1 hash of the candidate password
  2. Take only the first 5 hexadecimal characters (20 bits) of the hash
  3. Send that 5-character prefix to the API: GET /range/{prefix}
  4. The API returns all hash suffixes in the database that share that prefix, along with how many times each has appeared in breaches
  5. Compare the remaining 35 characters of your local hash against the returned list
  6. If your suffix appears in the list, the password is compromised

The server never sees your full hash. Any given 5-character prefix matches approximately 500–1000 entries in the database, so the server cannot determine which specific hash you were looking for — that is the k-anonymity guarantee. Your query is indistinguishable from 499 other queries about different passwords that share the same prefix.

Implementation in Python

import hashlib
import requests

def is_password_pwned(password: str) -> int:
    """
    Returns the number of times this password has appeared in breaches.
    Returns 0 if not found. Never sends the full hash to the API.
    """
    sha1 = hashlib.sha1(password.encode('utf-8')).hexdigest().upper()
    prefix = sha1[:5]
    suffix = sha1[5:]

    response = requests.get(
        f"https://api.pwnedpasswords.com/range/{prefix}",
        headers={"Add-Padding": "true"},  # pads response to fixed size for traffic analysis resistance
        timeout=5
    )
    response.raise_for_status()

    # Response format: "SUFFIX:COUNT\r\nSUFFIX:COUNT\r\n..."
    for line in response.text.splitlines():
        hash_suffix, count = line.split(':')
        if hash_suffix == suffix:
            return int(count)

    return 0

# Usage
count = is_password_pwned("password123")
if count > 0:
    print(f"This password has appeared {count} times in known data breaches.")
    # Reject the password and prompt the user to choose another

Implementation in JavaScript (Node.js)

import crypto from 'crypto';

async function isPasswordPwned(password) {
  const sha1 = crypto
    .createHash('sha1')
    .update(password, 'utf8')
    .digest('hex')
    .toUpperCase();

  const prefix = sha1.slice(0, 5);
  const suffix = sha1.slice(5);

  const response = await fetch(
    `https://api.pwnedpasswords.com/range/${prefix}`,
    { headers: { 'Add-Padding': 'true' } }
  );

  if (!response.ok) {
    // Fail open — don't block registration if HIBP is down
    console.warn('HIBP check failed:', response.status);
    return 0;
  }

  const text = await response.text();
  for (const line of text.split('\r\n')) {
    const [hashSuffix, count] = line.split(':');
    if (hashSuffix === suffix) {
      return parseInt(count, 10);
    }
  }

  return 0;
}

// Browser implementation using Web Crypto API
async function isPasswordPwnedBrowser(password) {
  const encoder = new TextEncoder();
  const data = encoder.encode(password);
  const hashBuffer = await crypto.subtle.digest('SHA-1', data);
  const hashArray = Array.from(new Uint8Array(hashBuffer));
  const sha1 = hashArray.map(b => b.toString(16).padStart(2, '0')).join('').toUpperCase();

  const prefix = sha1.slice(0, 5);
  const suffix = sha1.slice(5);

  const res = await fetch(`https://api.pwnedpasswords.com/range/${prefix}`);
  const text = await res.text();

  for (const line of text.split('\r\n')) {
    const [s, c] = line.split(':');
    if (s === suffix) return parseInt(c, 10);
  }
  return 0;
}
The Add-Padding: true header (introduced by Cloudflare) pads the response to a consistent size regardless of how many entries the prefix matches. Without it, response size alone could leak information about the prefix. Enable it in production.

When to check

There are three points in your authentication flow where a breach check adds value:

Registration

Block any password that appears in the breach database. Return a clear error: "This password has appeared in data breaches and cannot be used. Please choose a different password." Do not tell the user how many times it appeared — that leaks information about how common the password is.

Password change

Same check as registration. A user choosing a new password should not be able to set a compromised one.

Login (optional, higher stakes)

If a user successfully logs in with a password that is in the breach database, you can flag their account for a forced password reset on the next page load. This is more invasive than blocking at registration — the user already authenticated — but it catches users who set a compromised password before you added the check. Display a non-alarming message: "For your security, we recommend updating your password — it matches a known data breach." Make the update strongly encouraged but navigable around on the first occurrence.

Running your own HIBP mirror

For high-throughput applications or air-gapped environments, you can download the full Pwned Passwords dataset (available from HIBP as a torrent, approximately 40GB uncompressed) and run checks locally. The dataset is sorted by hash prefix, enabling binary search. A sorted flat file with mmap access can answer prefix queries in microseconds without a network call. The tradeoff is that you own the update cadence — HIBP adds millions of new hashes monthly and you need a process to pull updates.

← Back to blog Try Bastionary free →