Skip to content

Sampling Logs

In this recipe, we'll walk through how to implement sampling log data, which can dramatically reduce the cost of storing and processing your logs downstream.

Guide setup

This guide assumes you have created a Datable.io account, and are already ingesting log data into Datable from your applications.

In this example, we will assume the log data in our current transformation step has gone through a standardization process to align it with the OpenTelemetry specification.

Sample Code

First, we create a new pipeline, or a new transformation step if we're adding to an existing pipeline.

You will see the following pre-populated code in your transform step:

javascript
/***
* You have access to the following inputs:
*  - `metadata`:  { timestamp, datatype }
*    -> datatype is a string, and can be 'logs' or 'traces'
*  - `record`:    { resource, body, ... }
*/

// These are the key attributes of an opentelemetry formatted record
const { attributes, resource, body } = record
const { timestamp, datatype } = metadata

// Here we only allow records tagged as 'logs' to pass through,
if (datatype !== 'log') return null
/***
* You have access to the following inputs:
*  - `metadata`:  { timestamp, datatype }
*    -> datatype is a string, and can be 'logs' or 'traces'
*  - `record`:    { resource, body, ... }
*/

// These are the key attributes of an opentelemetry formatted record
const { attributes, resource, body } = record
const { timestamp, datatype } = metadata

// Here we only allow records tagged as 'logs' to pass through,
if (datatype !== 'log') return null

Define sampling rates

First, we'll define our sampling rates in a plain JS object.

javascript

const samplingRates = {
  error: 1.0,
  warn: 0.5,
  info: 0.1,
  debug: 0.01,
};

const samplingRates = {
  error: 1.0,
  warn: 0.5,
  info: 0.1,
  debug: 0.01,
};

Define shouldSample

Next, we'll define a "shouldSample" function that will dynamically sample the log data based on its severity number.

javascript
function shouldSample(record) {
    let samplingRate = 0.001
    if (severityNumber >= 17) { // ERROR
        samplingRate = samplingRates.error;
    } else if (severityNumber >= 13) { // WARN
        samplingRate = samplingRates.warn;
    } else if (severityNumber >= 9) { // INFO
        samplingRate = samplingRates.info;
    } else if (severityNumber >= 5) { // DEBUG
        samplingRate = samplingRates.debug;
    }

    return Math.random() < samplingRate;
}
function shouldSample(record) {
    let samplingRate = 0.001
    if (severityNumber >= 17) { // ERROR
        samplingRate = samplingRates.error;
    } else if (severityNumber >= 13) { // WARN
        samplingRate = samplingRates.warn;
    } else if (severityNumber >= 9) { // INFO
        samplingRate = samplingRates.info;
    } else if (severityNumber >= 5) { // DEBUG
        samplingRate = samplingRates.debug;
    }

    return Math.random() < samplingRate;
}

Execute shouldSample

Lastly, we run our record through shouldSample to determine if it will be forwarded downstream.

javascript
return shouldSample(record) ? record : null
return shouldSample(record) ? record : null

And that's all it takes to implement tail-based sampling of log data in Datable.

Check out our recipe on masking PII for an example of how Datable can not only filter your data, but transform it to ensure regulatory compliance.