Sampling Logs
In this recipe, we'll walk through how to implement sampling log data, which can dramatically reduce the cost of storing and processing your logs downstream.
Guide setup
This guide assumes you have created a Datable.io account, and are already ingesting log data into Datable from your applications.
In this example, we will assume the log data in our current transformation step has gone through a standardization process to align it with the OpenTelemetry specification.
Sample Code
First, we create a new pipeline, or a new transformation step if we're adding to an existing pipeline.
You will see the following pre-populated code in your transform step:
/***
* You have access to the following inputs:
* - `metadata`: { timestamp, datatype }
* -> datatype is a string, and can be 'logs' or 'traces'
* - `record`: { resource, body, ... }
*/
// These are the key attributes of an opentelemetry formatted record
const { attributes, resource, body } = record
const { timestamp, datatype } = metadata
// Here we only allow records tagged as 'logs' to pass through,
if (datatype !== 'log') return null
/***
* You have access to the following inputs:
* - `metadata`: { timestamp, datatype }
* -> datatype is a string, and can be 'logs' or 'traces'
* - `record`: { resource, body, ... }
*/
// These are the key attributes of an opentelemetry formatted record
const { attributes, resource, body } = record
const { timestamp, datatype } = metadata
// Here we only allow records tagged as 'logs' to pass through,
if (datatype !== 'log') return null
Define sampling rates
First, we'll define our sampling rates in a plain JS object.
const samplingRates = {
error: 1.0,
warn: 0.5,
info: 0.1,
debug: 0.01,
};
const samplingRates = {
error: 1.0,
warn: 0.5,
info: 0.1,
debug: 0.01,
};
Define shouldSample
Next, we'll define a "shouldSample" function that will dynamically sample the log data based on its severity number.
function shouldSample(record) {
let samplingRate = 0.001
if (severityNumber >= 17) { // ERROR
samplingRate = samplingRates.error;
} else if (severityNumber >= 13) { // WARN
samplingRate = samplingRates.warn;
} else if (severityNumber >= 9) { // INFO
samplingRate = samplingRates.info;
} else if (severityNumber >= 5) { // DEBUG
samplingRate = samplingRates.debug;
}
return Math.random() < samplingRate;
}
function shouldSample(record) {
let samplingRate = 0.001
if (severityNumber >= 17) { // ERROR
samplingRate = samplingRates.error;
} else if (severityNumber >= 13) { // WARN
samplingRate = samplingRates.warn;
} else if (severityNumber >= 9) { // INFO
samplingRate = samplingRates.info;
} else if (severityNumber >= 5) { // DEBUG
samplingRate = samplingRates.debug;
}
return Math.random() < samplingRate;
}
Execute shouldSample
Lastly, we run our record through shouldSample to determine if it will be forwarded downstream.
return shouldSample(record) ? record : null
return shouldSample(record) ? record : null
And that's all it takes to implement tail-based sampling of log data in Datable.
Check out our recipe on masking PII for an example of how Datable can not only filter your data, but transform it to ensure regulatory compliance.