Storing user PII — phone numbers, addresses, government IDs, date of birth — in plaintext in your database means a single SQL injection or misconfigured backup bucket exposes every user's personal data. Encryption at rest at the disk or database level (transparent data encryption) does not help: those protections only guard against physical theft of storage media. Application-level or column-level encryption means the data is ciphertext in the database, unreadable without the application keys even if someone gets direct database access.
Two models: column-level vs application-level
Column-level encryption is typically provided by the database itself. PostgreSQL does not have built-in column encryption (pgcrypto exists but is rarely appropriate at scale), but databases like SQL Server with Always Encrypted or MySQL Enterprise Transparent Data Encryption support it. The database handles encrypt/decrypt transparently during read/write operations.
Application-level encryption moves the encrypt/decrypt operations into your application code. The database stores opaque ciphertext. Your application layer holds the keys and performs the cryptographic operations before writes and after reads. This is the model with the strongest key isolation — the database server never sees plaintext, and a compromised database connection yields only ciphertext.
AES-256-GCM at the application layer
AES-256-GCM is authenticated encryption — it provides both confidentiality and integrity in a single operation. The authentication tag detects tampering. Each encryption uses a unique 12-byte nonce; the nonce is stored alongside the ciphertext (it is not secret).
import crypto from 'crypto';
const ALGORITHM = 'aes-256-gcm';
const KEY_LENGTH = 32; // 256 bits
const IV_LENGTH = 12; // 96 bits for GCM
const TAG_LENGTH = 16; // 128-bit auth tag
// Encrypt a field value
function encryptField(plaintext, key) {
const iv = crypto.randomBytes(IV_LENGTH);
const cipher = crypto.createCipheriv(ALGORITHM, key, iv);
const encrypted = Buffer.concat([
cipher.update(plaintext, 'utf8'),
cipher.final(),
]);
const authTag = cipher.getAuthTag();
// Store iv + authTag + ciphertext as a single blob
// Format: 12 bytes IV | 16 bytes tag | N bytes ciphertext
return Buffer.concat([iv, authTag, encrypted]).toString('base64');
}
// Decrypt a field value
function decryptField(encryptedBase64, key) {
const buf = Buffer.from(encryptedBase64, 'base64');
const iv = buf.subarray(0, IV_LENGTH);
const authTag = buf.subarray(IV_LENGTH, IV_LENGTH + TAG_LENGTH);
const ciphertext = buf.subarray(IV_LENGTH + TAG_LENGTH);
const decipher = crypto.createDecipheriv(ALGORITHM, key, iv);
decipher.setAuthTag(authTag);
return Buffer.concat([
decipher.update(ciphertext),
decipher.final(),
]).toString('utf8');
}
// Usage
const fieldKey = Buffer.from(process.env.PII_ENCRYPTION_KEY, 'base64');
const encrypted = encryptField(user.phoneNumber, fieldKey);
// Store `encrypted` in the database column
const decrypted = decryptField(storedValue, fieldKey); // original plaintext
Per-tenant key derivation
For multi-tenant SaaS, using a single application-wide encryption key is inadequate: a compromised key exposes all tenants. The better model is per-tenant key derivation from a master key stored in a hardware security module (HSM) or secrets manager like AWS KMS or HashiCorp Vault.
import { createHmac } from 'crypto';
// Derive a per-tenant key from master key using HKDF
// (node's built-in crypto.hkdfSync was added in v15.0)
function deriveTenantKey(masterKey, tenantId) {
// HKDF: Hash-based Key Derivation Function
// info binds the key to a specific purpose and tenant
return crypto.hkdfSync(
'sha256',
masterKey, // IKM: input key material
Buffer.alloc(32), // salt: 32 zero bytes (or a stored per-tenant salt)
Buffer.from(`pii-encryption:${tenantId}`), // info: context binding
KEY_LENGTH
);
}
// When writing tenant PII
async function encryptTenantPII(tenantId, fieldValue) {
const masterKey = await fetchMasterKeyFromKMS(); // fetched, never stored in memory long-term
const tenantKey = deriveTenantKey(masterKey, tenantId);
return encryptField(fieldValue, Buffer.from(tenantKey));
}
// Key rotation: re-encrypt fields with new derived key
// Old master key -> derive old tenant key -> decrypt
// New master key -> derive new tenant key -> encrypt
// Update master key reference in KMS, not per-row
Python: Fernet for simpler use cases
For Python applications, the cryptography library's Fernet scheme provides a clean authenticated encryption interface. Fernet uses AES-128-CBC with HMAC-SHA256 and handles IV generation, timestamp embedding, and base64 encoding automatically.
from cryptography.fernet import Fernet, MultiFernet
import base64
import os
# Generate a key (store this in your secrets manager, not here)
key = Fernet.generate_key()
fernet = Fernet(key)
# Encrypt
encrypted = fernet.encrypt(b"user@example.com")
# Returns URL-safe base64: gAAAAAB... (includes version, timestamp, IV, ciphertext, HMAC)
# Decrypt
plaintext = fernet.decrypt(encrypted)
# Key rotation with MultiFernet — tries each key in order
# New key is first (used for encryption), old keys follow (decryption only)
old_key = Fernet(os.environ['OLD_PII_KEY'])
new_key = Fernet(os.environ['NEW_PII_KEY'])
multi = MultiFernet([new_key, old_key])
# Re-encrypt all rows during rotation window
def rotate_field(encrypted_value):
return multi.rotate(encrypted_value) # decrypts with old, re-encrypts with new
Searchable encryption trade-offs
Encrypting columns removes the ability to search them. You cannot do WHERE phone_number = ? against ciphertext. There are three common approaches:
- Blind index: store a keyed HMAC of the plaintext alongside the ciphertext. Query the HMAC column. Reveals nothing about the plaintext but enables equality lookups. Does not support range queries or LIKE.
- Deterministic encryption: use a deterministic cipher (same plaintext always produces same ciphertext). Enables equality queries at the cost of leaking whether two rows have the same value. Weaker than randomized encryption.
- Decrypt-and-search: fetch all rows, decrypt in application, filter. Works only for small tables or with aggressive pre-filtering on unencrypted columns.
// Blind index: HMAC for equality search
import { createHmac } from 'crypto';
function blindIndex(value, indexKey) {
return createHmac('sha256', indexKey)
.update(value.toLowerCase().trim())
.digest('base64');
}
// Store both
await db.query(
'INSERT INTO users (phone_encrypted, phone_idx) VALUES ($1, $2)',
[encryptField(phoneNumber, piiKey), blindIndex(phoneNumber, indexKey)]
);
// Search by phone
const idx = blindIndex(searchPhone, indexKey);
const rows = await db.query(
'SELECT * FROM users WHERE phone_idx = $1',
[idx]
);
// Then decrypt the matched rows for exact comparison
GDPR data minimization
GDPR Article 5(1)(c) requires that personal data be adequate, relevant, and limited to what is necessary. Before encrypting a field, ask whether you need it at all. Common over-collection patterns in auth systems include storing full date of birth when only age verification is needed, storing full phone numbers when only the last four digits are needed for display, and storing home addresses when only country is needed for tax purposes.
Encryption is not a substitute for data minimization — but it is the correct defense for data you genuinely need. Column-level AES-256-GCM with per-tenant derived keys, combined with blind indexes for searchable fields and a master key in an HSM, is the architecture that satisfies both security and compliance requirements for most SaaS applications handling regulated personal data.