Auth incident response: what to do when you suspect account takeover at scale

Account takeover at scale is qualitatively different from a single compromised account. When hundreds or thousands of accounts are affected simultaneously — through credential stuffing, a phishing campaign, or a session token exfiltration — the response must be coordinated, fast, and automated. Doing it manually is too slow; doing it with the wrong automated response causes more damage than the attack. This is the sequence that works.

Detection signals

Before you can respond, you need to know you have an incident. The signals that should trigger investigation:

Login success rate anomaly: a successful credential stuffing campaign will show an unusual combination of high attempt volume and abnormally low (1–3%) success rate, followed by a cluster of successful logins from unfamiliar IPs.
Geographic velocity anomalies: a cluster of impossible travel events or first-time-country logins across multiple unrelated accounts in the same time window.
Session anomalies: accounts with simultaneous active sessions from geographically distant locations, or sessions starting from datacenter IPs (not typical user IPs).
Downstream activity signals: spike in password change requests, email address changes, or high-value actions (payment method changes, data exports) from recently-authenticated sessions.
Support ticket volume: a sudden increase in "I didn't do that" or "someone changed my account" tickets is a lagging indicator but should trigger a proactive investigation.
Third-party notification: a partner, CERT, or breach notification service informs you that credentials associated with your service have appeared in a breach database.

// Anomaly detection query example (Postgres)
-- Accounts with successful logins from new countries in the last hour
-- that also had password/email changes in the same session
SELECT
  u.id,
  u.email,
  le.ip_address,
  ge.country_code,
  le.occurred_at,
  COUNT(ae.id) AS suspicious_actions
FROM login_events le
JOIN users u ON u.id = le.user_id
JOIN geo_events ge ON ge.ip = le.ip_address
LEFT JOIN audit_events ae ON
  ae.actor_id = le.user_id AND
  ae.event_type IN ('user.email_changed', 'user.password_changed', 'billing.modified') AND
  ae.occurred_at BETWEEN le.occurred_at AND le.occurred_at + INTERVAL '1 hour'
WHERE
  le.outcome = 'success' AND
  le.occurred_at > NOW() - INTERVAL '1 hour' AND
  ge.country_code NOT IN (
    SELECT DISTINCT ge2.country_code
    FROM login_events le2
    JOIN geo_events ge2 ON ge2.ip = le2.ip_address
    WHERE le2.user_id = le.user_id AND le2.occurred_at < NOW() - INTERVAL '1 hour'
  )
GROUP BY u.id, u.email, le.ip_address, ge.country_code, le.occurred_at
HAVING COUNT(ae.id) > 0
ORDER BY suspicious_actions DESC;

Scope assessment

Before taking any remediation action, determine the scope of the incident. Premature, overly broad responses cause unnecessary user impact; delayed, under-scoped responses leave attackers in accounts they have already compromised. Questions to answer in the first 15 minutes:

How many accounts are potentially affected?
What is the attack vector? Credential stuffing (known-breached passwords), session theft, phishing, or something else?
What is the attacker's apparent objective? Data exfiltration, financial fraud, pivoting to connected systems?
Is the attack ongoing or historical?
Are specific account types, organizations, or user segments disproportionately affected?

// Scope assessment queries
// How many accounts logged in from the attack IP range in the last 24h
SELECT COUNT(DISTINCT user_id) as affected_accounts
FROM login_events
WHERE
  outcome = 'success' AND
  occurred_at > NOW() - INTERVAL '24 hours' AND
  ip_address <<= '192.0.2.0/24';  -- attacker's observed CIDR

// What actions did these sessions take
SELECT
  ae.event_type,
  COUNT(*) as count
FROM audit_events ae
WHERE ae.actor_id IN (
  SELECT DISTINCT user_id FROM login_events
  WHERE outcome = 'success' AND
  occurred_at > NOW() - INTERVAL '24 hours' AND
  ip_address <<= '192.0.2.0/24'
)
AND ae.occurred_at > NOW() - INTERVAL '24 hours'
GROUP BY ae.event_type
ORDER BY count DESC;

Token revocation at scale

Once you have identified the affected account set, revoke all active sessions and tokens for those accounts. The implementation must be fast — an attacker with an active session will continue to have access until the session is revoked, and an account takeover's damage escalates with time.

// Mass token revocation for an incident
async function revokeAffectedSessions(
  affectedUserIds: string[],
  reason: string
): Promise<void> {
  const incidentId = crypto.randomUUID();
  const revokedAt = new Date();

  // Process in batches to avoid database timeouts
  const BATCH_SIZE = 500;
  for (let i = 0; i < affectedUserIds.length; i += BATCH_SIZE) {
    const batch = affectedUserIds.slice(i, i + BATCH_SIZE);

    await db.transaction(async (tx) => {
      // Revoke all refresh tokens
      await tx.refreshTokens.updateMany(
        { user_id: { $in: batch }, revoked: false },
        {
          revoked: true,
          revoked_at: revokedAt,
          revocation_reason: reason,
          revocation_incident_id: incidentId
        }
      );

      // Revoke all active sessions
      await tx.sessions.updateMany(
        { user_id: { $in: batch }, ended_at: null },
        {
          ended_at: revokedAt,
          end_reason: 'incident_revocation',
          incident_id: incidentId
        }
      );
    });

    // Publish revocation events so distributed services can clear caches
    await messageQueue.publishBatch(
      batch.map(userId => ({
        type: 'user.sessions_revoked',
        user_id: userId,
        incident_id: incidentId,
        reason
      }))
    );

    logger.info('Revoked sessions batch', {
      incident_id: incidentId,
      batch_start: i,
      batch_size: batch.length
    });
  }
}

// After revocation, force re-authentication with additional verification
async function flagForReauthentication(
  userIds: string[],
  requireMfaReenrollment: boolean
): Promise<void> {
  await db.users.updateMany(
    { id: { $in: userIds } },
    {
      require_reauth: true,
      require_mfa_reenrollment: requireMfaReenrollment,
      reauth_reason: 'security_incident'
    }
  );
}

Forced MFA re-enrollment

If MFA devices may have been enrolled by an attacker during the incident window, you must force re-enrollment — not just re-authentication. Forced MFA re-enrollment removes all existing MFA factors and requires the user to add a new one at next login. This ensures that an attacker who added their own authenticator app during an account takeover cannot continue to use it after the account is recovered.

// MFA re-enrollment enforcement at login
async function handleLoginPostIncident(
  userId: string,
  sessionId: string
): Promise<LoginResult> {
  const user = await db.users.findById(userId);

  if (user.require_mfa_reenrollment) {
    // Delete all existing MFA factors
    await db.mfaFactors.deleteAll(userId);

    // Clear the flag — user will go through MFA setup
    await db.users.update(userId, {
      require_mfa_reenrollment: false,
      mfa_reenrolled_at: new Date()
    });

    return {
      status: 'mfa_reenrollment_required',
      message: 'For your security, please re-enroll your two-factor authentication.',
      setup_url: `/auth/mfa/setup?session=${sessionId}`
    };
  }

  // Normal login flow
  return { status: 'authenticated' };
}

User communication

Communication timing and content during an incident is a balance between transparency and not providing a roadmap for further attacks. Do not delay communication to avoid bad press — delayed communication is worse press and potentially a legal violation under GDPR, CCPA, and state breach notification laws.

The initial notification to affected users should:

Be sent as soon as the affected account list is confirmed — do not wait for root cause analysis
State clearly what happened (your account was accessed from an unrecognized location)
State what you have done (we have ended all active sessions)
State what the user should do (reset your password and re-enroll MFA when you log back in)
Provide a direct link to the recovery flow, not a generic login link
Not include a link that looks like a phishing email — include specific account-identifying information that a phisher would not have

Preserve all logs from the incident period before doing any cleanup. The logs you need for the post-incident analysis are the same logs that may be evidence for law enforcement or a regulatory inquiry. Establish a chain of custody for the log data immediately: copy it to a separately controlled storage location with write protection. Do not run destructive queries or cleanup jobs during the active incident.

Post-incident, conduct a retrospective focused on detection gap (how long did the attack run before you noticed?), scope accuracy (did your initial scope assessment match the final count?), and recovery effectiveness (how long did remediation take, and what delayed it?). The answers drive the improvements to your detection and response tooling for the next incident.

← Back to blog Try Bastionary free →