Stop logging PII: a configurable Node.js sanitizer logger

Node.js May 15, 2026

Logging is one of those topics that looks harmless until it is not.

A developer adds a request object to a debug statement. A payment error includes a card number. A support workflow logs an email address, a phone number, and a bearer token because "we only need it for troubleshooting." Two months later those logs are in a SIEM, a data lake, three alert rules, and a backup nobody remembers.

That is the part that bothers me about PII in logs: the first mistake is small, but the copies multiply quietly.

So I built a small Node.js example that sanitizes data at the logging boundary:

github.com/SBajonczak/PiiSanitizer

The idea is simple: before anything leaves your application as a log line, it passes through a configurable sanitizer. Common PII gets masked. Known sensitive object keys get redacted. Domain-specific identifiers can be configured without changing the logger code.

What the logger does

The repository currently provides a lightweight Node.js module with no runtime dependencies.

It can:

mask email addresses, phone numbers, IBAN-like values, and credit card numbers
redact sensitive keys such as password, token, authorization, and apiKey
walk nested objects and arrays without mutating the original input
emit either object records or JSON lines
accept custom rules for application-specific identifiers

The important design choice is that configuration owns the sensitive patterns. The logger should not need a new release every time a project discovers a new internal ID format.

Basic usage

Here is the smallest useful example:

import { createPiiLogger } from './src/index.js';

const logger = createPiiLogger({ format: 'json' });

logger.info('Login for jane.doe@example.com', {
  password: 'correct-horse-battery-staple',
  phone: '+49 170 1234567',
});

The output is safe to ship to container logs:

{
  "level": "info",
  "timestamp": "2026-05-14T12:51:45.760Z",
  "args": [
    "Login for [EMAIL]",
    {
      "password": "[REDACTED]",
      "phone": "[PHONE]"
    }
  ]
}

This is deliberately boring. Good logging infrastructure should be boring. The exciting part is everything that does not leak.

Configuring domain-specific rules

Every company has identifiers that do not look sensitive to a generic library.

In SAP-heavy landscapes, for example, a personnel number may show up as PERNR 12345678. Depending on the context, that can absolutely be personal data. A generic email masker will not catch it.

So the sanitizer supports custom regex rules and key rules:

import { createPiiLogger, defaultRules } from './src/index.js';

const logger = createPiiLogger({
  format: 'json',
  rules: [
    ...defaultRules,
    {
      name: 'sapPersonnelNumber',
      type: 'regex',
      pattern: '\\bPERNR[ -]?\\d{8}\\b',
      replacement: '[SAP_PERSONNEL_NUMBER]',
    },
    {
      name: 'employeeIdKey',
      type: 'key',
      keys: ['employeeId', 'pernr'],
      replacement: '[EMPLOYEE_ID]',
    },
  ],
});

logger.info('User jane.doe@example.com opened PERNR 12345678', {
  employeeId: '12345678',
  authorization: 'Bearer never-log-this',
});

Output:

{
  "level": "info",
  "args": [
    "User [EMAIL] opened [SAP_PERSONNEL_NUMBER]",
    {
      "employeeId": "[EMPLOYEE_ID]",
      "authorization": "[REDACTED]"
    }
  ]
}

The key rule is intentionally separate from the regex rule. Sometimes the value itself is harmless without context, but the field name makes it sensitive. A number called employeeId should be treated differently from the same number inside items[0].quantity.

Why sanitize at the logger boundary?

You can try to make every developer remember what not to log.

I would not bet a privacy incident on that.

The safer pattern is a boundary:

application code
  -> logger
  -> sanitizer
  -> stdout / log collector / SIEM

Application code can still make mistakes. The sanitizer catches the common ones before they leave the process.

This does not remove the need for good logging discipline. I still would not log raw HTTP requests, raw SAP payloads, payroll data, banking data, or identity documents. But a logger-level sanitizer gives you a last line of defense against accidental leakage.

A few implementation details

The sanitizer walks values recursively:

const output = sanitize({
  user: {
    name: 'Jane Doe',
    email: 'jane.doe@example.com',
    password: 'secret',
    profile: {
      phone: '+49 170 1234567',
    },
  },
});

Result:

{
  user: {
    name: 'Jane Doe',
    email: '[EMAIL]',
    password: '[REDACTED]',
    profile: {
      phone: '[PHONE]',
    },
  },
}

The original object is not mutated. That matters because logging should not change application state.

Credit card masking uses a Luhn check, so the sanitizer does not replace every long number it sees. That reduces false positives in boring business data, which is important if people are expected to keep this enabled.

Key matching ignores case, spaces, underscores, and dashes. These should all match the same rule:

accessToken
access_token
Access Token
access-token

Tests first, because this kind of code needs trust

I built the example test-first with Node's built-in test runner. No Jest, no Vitest, no dependency tree just to prove the point.

Run it with:

npm test

The tests cover:

masking common PII in strings
recursive object sanitization
custom SAP-style identifier rules
logger output through a custom sink
JSON-line logging for container environments

You can also run the example directly:

npm run example

What this is not

This is not a magic GDPR shield.

It does not decide whether you are allowed to process data. It does not classify every possible personal attribute. It does not replace data minimization, retention policies, access control, or a proper DPIA when one is needed.

It is a practical guardrail for a common failure mode: sensitive values accidentally ending up in logs.

And honestly, that is already useful.