Designing an audit log that survives a security incident

An audit log is only useful during an incident or an audit. The problem is that these are exactly the situations where an attacker has motivation to tamper with it, or where your normal database might not survive. An audit log stored in the same database as your application data, writable by the same credentials, is not an audit log — it is a list of events that may or may not reflect what actually happened. Building a real audit log requires architectural decisions that run counter to normal application database design.

The core requirement: append-only

An audit log must be append-only. You can insert new records but never modify or delete existing ones. This sounds simple but requires enforcement at multiple layers. Even if your application code never calls UPDATE on the audit table, if the application's database user has that privilege, a compromised application can modify the log. The database role used to write audit events should have only INSERT permission on the audit table, not UPDATE or DELETE.

-- PostgreSQL: restricted audit writer role
CREATE ROLE audit_writer;
GRANT INSERT ON audit_events TO audit_writer;
-- No UPDATE, DELETE, or TRUNCATE privileges granted

-- Row-level security as an additional guard
ALTER TABLE audit_events ENABLE ROW LEVEL SECURITY;

CREATE POLICY audit_insert_only ON audit_events
  FOR INSERT TO audit_writer
  WITH CHECK (true);  -- allow all inserts

-- No SELECT/UPDATE/DELETE policies for audit_writer
-- The reader role is separate with SELECT only
CREATE ROLE audit_reader;
GRANT SELECT ON audit_events TO audit_reader;

Event schema: who, what, when, where, outcome

Every audit event should answer five questions. Skimping on any of them makes the log less useful during an incident.

// Audit event schema
interface AuditEvent {
  // Identity
  id: string;               // Unique event ID (UUID v7 for sortability)
  event_type: string;       // "user.login", "api_key.created", "user.deleted"

  // Who
  actor_type: 'user' | 'api_key' | 'service' | 'system';
  actor_id: string;         // user_id, api_key_id, service_name
  actor_email?: string;     // Denormalized — survives user deletion
  org_id?: string;

  // What — the target of the action
  target_type?: string;     // "user", "organization", "api_key"
  target_id?: string;       // ID of the affected resource
  target_display?: string;  // Human-readable name, denormalized

  // Context before/after for change events
  changes?: {
    field: string;
    before: unknown;
    after: unknown;
  }[];

  // When
  occurred_at: Date;        // Authoritative timestamp from the server
  received_at: Date;        // When the log service received it (detect clock skew)

  // Where
  ip_address?: string;
  user_agent?: string;
  country_code?: string;
  request_id?: string;      // Correlate with application logs

  // Outcome
  outcome: 'success' | 'failure' | 'error';
  error_code?: string;
  error_message?: string;

  // Hash chain
  previous_hash: string;    // Hash of the preceding event in this stream
  event_hash: string;       // SHA-256 of this event's canonical fields
}

// Example events
const loginEvent: AuditEvent = {
  id: '018c8a2b-1234-7abc-9def-012345678901',
  event_type: 'user.login',
  actor_type: 'user',
  actor_id: 'user_abc123',
  actor_email: 'alice@example.com',
  org_id: 'org_xyz789',
  occurred_at: new Date(),
  received_at: new Date(),
  ip_address: '203.0.113.5',
  user_agent: 'Mozilla/5.0...',
  country_code: 'US',
  outcome: 'success',
  previous_hash: 'abc...def',
  event_hash: '123...456'
};

Hash chain integrity

A hash chain provides tamper evidence: each event's hash includes the hash of the previous event. Modifying any past event invalidates all subsequent hashes, making the tampering detectable. This is the same principle used in blockchain and certificate transparency logs.

// Computing the event hash
function computeEventHash(
  event: Omit<AuditEvent, 'event_hash'>
): string {
  // Canonical representation — deterministic field ordering
  const canonical = JSON.stringify({
    id: event.id,
    event_type: event.event_type,
    actor_id: event.actor_id,
    occurred_at: event.occurred_at.toISOString(),
    outcome: event.outcome,
    previous_hash: event.previous_hash
    // Include all fields that must be tamper-evident
  });

  return crypto.createHash('sha256').update(canonical).digest('hex');
}

// Verify hash chain integrity for an org's event stream
async function verifyHashChain(orgId: string): Promise<ChainVerifyResult> {
  const events = await db.auditEvents
    .findByOrg(orgId)
    .orderBy('occurred_at', 'asc');

  let previousHash = GENESIS_HASH;  // Known starting value
  const broken: string[] = [];

  for (const event of events) {
    if (event.previous_hash !== previousHash) {
      broken.push(event.id);
    }

    const expectedHash = computeEventHash({ ...event, event_hash: undefined });
    if (event.event_hash !== expectedHash) {
      broken.push(`${event.id}:hash_mismatch`);
    }

    previousHash = event.event_hash;
  }

  return {
    valid: broken.length === 0,
    totalEvents: events.length,
    brokenAt: broken
  };
}

Hash chains detect tampering but do not prevent it. For high-assurance environments, periodically checkpoint the chain's state to an external, independently controlled system — a separate AWS account, a third-party timestamp authority, or an enterprise SIEM. The external checkpoint makes it impossible to tamper with the log and regenerate consistent hashes without the external system detecting the discrepancy.

What to log

Log every authentication event (success and failure), every privileged action (admin operations, API key creation, permission changes), and every destructive operation (user deletion, data export, bulk operations). For auth-specific events:

User login (success, failure, MFA challenge, lockout)
Password change and reset
MFA enrollment and removal
API key creation, rotation, and revocation
OAuth application authorization and revocation
SSO connection creation and modification
User invitation, provisioning, and deprovisioning
Role and permission changes
Admin impersonation sessions

Retention and storage

SOC 2 typically requires 1 year of audit log retention. HIPAA requires 6 years. PCI DSS requires 1 year with 3 months immediately available. Size your storage accordingly — a busy application can generate millions of audit events per day.

A tiered storage approach works well: hot tier (queryable database) for recent events (90 days), warm tier (compressed object storage like S3) for older events that may be needed for investigations, cold tier (Glacier or equivalent) for archival beyond 1 year. Automate the tier transition and document the retrieval process so you are not scrambling to find old events during an audit.

← Back to blog Try Bastionary free →