Skip to content

Mask PII

In this recipe, we'll show you how to mask personally identifiable information (PII) captured in your application logs.

Guide setup

This guide assumes you've created a Datable.io account.

This recipe is best implemented after standardizing your log data, and makes a great compliment to the standardize logs recipe.

Code Sample

We've noticed that some of our log data includes PII - a payment service is logging credit card information. We need to make sure it doesn't reach our data infrastructure to ensure regulatory compliance.

First, we create a new pipeline, or a new transformation step if we're adding to an existing pipeline.

You will see the following pre-populated code in your transform step:

javascript
/***
* You have access to the following inputs:
*  - `metadata`:  { timestamp, datatype }
*    -> datatype is a string, and can be 'logs' or 'traces'
*  - `record`:    { resource, body, ... }
*/

// These are the key attributes of an opentelemetry formatted record
const { attributes, resource, body } = record
const { timestamp, datatype } = metadata

// Here we only allow records tagged as 'logs' to pass through,
if (datatype !== 'log') return null
/***
* You have access to the following inputs:
*  - `metadata`:  { timestamp, datatype }
*    -> datatype is a string, and can be 'logs' or 'traces'
*  - `record`:    { resource, body, ... }
*/

// These are the key attributes of an opentelemetry formatted record
const { attributes, resource, body } = record
const { timestamp, datatype } = metadata

// Here we only allow records tagged as 'logs' to pass through,
if (datatype !== 'log') return null

Pass-through unrelated data

Next, we want to narrow down our data stream to just the service that is emitting sensitive information. All other records are forwarded to the next step.

javascript
if (record.resource['service.name'] !== "paymentService") return record
if (record.resource['service.name'] !== "paymentService") return record

Identify and mask credit card information

Now that we've passed irrelevant logs forward, we can write logic against the payment service to mask the credit card information.

javascript
if (!regex.creditCard.test(record.body)) return record;

record.body = record.body.replace(regex.creditCard, match => {
    return 'x'.repeat(match.length - 4) + match.slice(-4);
});
return record
if (!regex.creditCard.test(record.body)) return record;

record.body = record.body.replace(regex.creditCard, match => {
    return 'x'.repeat(match.length - 4) + match.slice(-4);
});
return record

Credit card numbers embedded as text in log messages will now follow PCI-DSS standards, masking all but the last four digits.

Datable makes it easy to perform any transformation on your data with pure JavaScript. Check out our recipes on tail-based sampling for log data to start reducing your cloud costs.