Skip to content

Standardize Logs

In this recipe, we'll walk through the foundational step of structuring and standardizing application logs to the OTel specification.

Guide setup

This guide assumes you have created a Datable.io account.

In this example, Datable is ingesting logs from different sources, with each source forwarding data in a different format (semi-structured stdout logs, nginx server logs, and structured logs from backend services).

Sample Log formats

Standard out

bash
2024-07-27T15:23:47.123Z [INFO] Server started on port 3000
2024-07-27T15:23:48.456Z [DEBUG] Connected to database 'users_db'
2024-07-27T15:23:50.789Z [INFO] Received request: GET /api/users
2024-07-27T15:23:50.901Z [DEBUG] Query executed: SELECT * FROM users LIMIT 100
2024-07-27T15:23:51.012Z [INFO] Request completed: GET /api/users (200 OK) - 223ms
2024-07-27T15:23:52.345Z [WARN] High CPU usage detected: 85%
2024-07-27T15:23:47.123Z [INFO] Server started on port 3000
2024-07-27T15:23:48.456Z [DEBUG] Connected to database 'users_db'
2024-07-27T15:23:50.789Z [INFO] Received request: GET /api/users
2024-07-27T15:23:50.901Z [DEBUG] Query executed: SELECT * FROM users LIMIT 100
2024-07-27T15:23:51.012Z [INFO] Request completed: GET /api/users (200 OK) - 223ms
2024-07-27T15:23:52.345Z [WARN] High CPU usage detected: 85%

Semi-structured Nginx server logs

text
192.168.1.100 - john [10/Jul/2023:13:55:36 +0000] "GET /index.html HTTP/1.1" 200 2326 "http://example.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/91.0.4472.124 Safari/537.36"
203.0.113.195 - - [10/Jul/2023:13:55:36 +0000] "GET /index.html HTTP/1.1" 200 2326 "http://www.example.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/91.0.4472.124 Safari/537.36"
192.168.1.100 - john [10/Jul/2023:13:55:37 +0000] "POST /api/login HTTP/1.1" 302 0 "http://www.example.com/login" "Mozilla/5.0 (iPhone; CPU iPhone OS 14_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Mobile/15E148 Safari/604.1"
10.0.0.1 - - [10/Jul/2023:13:55:38 +0000] "GET /images/logo.png HTTP/1.1" 304 0 "http://www.example.com/about" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36"
203.0.113.195 - - [10/Jul/2023:13:55:39 +0000] "GET /css/main.css HTTP/1.1" 200 4891 "http://www.example.com/index.html" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/91.0.4472.124 Safari/537.36"
192.168.1.100 - john [10/Jul/2023:13:55:40 +0000] "GET /dashboard HTTP/1.1" 200 5632 "http://www.example.com/login" "Mozilla/5.0 (iPhone; CPU iPhone OS 14_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Mobile/15E148 Safari/604.1"
192.168.1.100 - john [10/Jul/2023:13:55:36 +0000] "GET /index.html HTTP/1.1" 200 2326 "http://example.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/91.0.4472.124 Safari/537.36"
203.0.113.195 - - [10/Jul/2023:13:55:36 +0000] "GET /index.html HTTP/1.1" 200 2326 "http://www.example.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/91.0.4472.124 Safari/537.36"
192.168.1.100 - john [10/Jul/2023:13:55:37 +0000] "POST /api/login HTTP/1.1" 302 0 "http://www.example.com/login" "Mozilla/5.0 (iPhone; CPU iPhone OS 14_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Mobile/15E148 Safari/604.1"
10.0.0.1 - - [10/Jul/2023:13:55:38 +0000] "GET /images/logo.png HTTP/1.1" 304 0 "http://www.example.com/about" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36"
203.0.113.195 - - [10/Jul/2023:13:55:39 +0000] "GET /css/main.css HTTP/1.1" 200 4891 "http://www.example.com/index.html" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/91.0.4472.124 Safari/537.36"
192.168.1.100 - john [10/Jul/2023:13:55:40 +0000] "GET /dashboard HTTP/1.1" 200 5632 "http://www.example.com/login" "Mozilla/5.0 (iPhone; CPU iPhone OS 14_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Mobile/15E148 Safari/604.1"

Structured logs (Pino/Node.js)

json
{"level":30,"time":1722187223456,"pid":12345,"hostname":"server-1","msg":"Application started successfully"}
{"level":20,"time":1722187224789,"pid":12345,"hostname":"server-1","msg":"User authentication attempt","username":"john_doe"}
{"level":40,"time":1722187225123,"pid":12345,"hostname":"server-1","msg":"Slow query detected","executionTime":3245}
{"level":50,"time":1722187226234,"pid":12345,"hostname":"server-1","msg":"Failed to process payment","orderId":"ORD-12345", "err":{"type":"PaymentError","message":"Insufficient funds"}}
{"level":30,"time":1722187223456,"pid":12345,"hostname":"server-1","msg":"Application started successfully"}
{"level":20,"time":1722187224789,"pid":12345,"hostname":"server-1","msg":"User authentication attempt","username":"john_doe"}
{"level":40,"time":1722187225123,"pid":12345,"hostname":"server-1","msg":"Slow query detected","executionTime":3245}
{"level":50,"time":1722187226234,"pid":12345,"hostname":"server-1","msg":"Failed to process payment","orderId":"ORD-12345", "err":{"type":"PaymentError","message":"Insufficient funds"}}

Each log source is collected and forwarded by Fluent Bit, which itself stores the original log in a "log" field.

With all of these disparate formats, querying for specific information and correlating across multiple data sources is impossible. Datable makes it easy to standardize all of these sources, before they reach downstream consumers for analytics, monitoring, and BI.

Sample Code

Given the above, we are ready to create a new pipeline to pre-process our data. Navigate to the pipelines page and click "Create Pipeline" to build a new DAG for your log data.

From here, click on the Code block for logs to enter the code editing environment.

You will see the following pre-populated code:

javascript

/***
* You have access to the following inputs:
*  - `metadata`:  { timestamp, datatype }
*    -> datatype is a string, and can be 'logs' or 'traces'
*  - `record`:    { resource, body, ... }
*/

// These are the key attributes of an opentelemetry formatted record
const { attributes, resource, body } = record
const { timestamp, datatype } = metadata

// Here we only allow records tagged as 'logs' to pass through,
if (datatype !== 'log') return null

/***
* You have access to the following inputs:
*  - `metadata`:  { timestamp, datatype }
*    -> datatype is a string, and can be 'logs' or 'traces'
*  - `record`:    { resource, body, ... }
*/

// These are the key attributes of an opentelemetry formatted record
const { attributes, resource, body } = record
const { timestamp, datatype } = metadata

// Here we only allow records tagged as 'logs' to pass through,
if (datatype !== 'log') return null

Normalize record

We'll write logic to parse the messages by source. Note, we'll want to map some string values to a corresponding SeverityNumber.

javascript
function normalizeStdOut(record) {
  function getSeverityNumberFor(stdOutRecord) {
    const severityMap = {
      TRACE: 1,
      DEBUG: 5,
      INFO: 9,
      WARN: 13,
      ERROR: 17,
      FATAL: 21
    };
    return severityMap[level] || 0;
  }

  const regex = /^(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z) \[(\w+)\] (.+)$/;
  const match = record.attributes.log.match(regex);

  const [, timestamp, level, message] = match;

  return {
    ...record,
    severityText: level,
    severityNumber: getSeverityNumber(level),
    body: message,
  };
}


function normalizeWebServer(record) {
  function getSeverityText(statusCode) {
    if (statusCode >= 500) return "ERROR";
    if (statusCode >= 400) return "WARN";
    return "INFO";
  }

  function getSeverityNumber(statusCode) {
    if (statusCode >= 500) return 17;
    if (statusCode >= 400) return 13;
    return 9;
  }

  const regex = /^(\S+) (\S+) (\S+) \[([^\]]+)\] "(\S+) (\S+) (\S+)" (\d+) (\d+) "([^"]*)" "([^"]*)"$/;
  const match = record.attributes.log.match(regex);

  const [
    ,
    ipAddress,
    clientId,
    userId,
    dateTime,
    method,
    path,
    httpVersion,
    statusCode,
    bytesSent,
    referer,
    userAgent
  ] = match;

  return {
    ...record,
    severityText: getSeverityText(parseInt(statusCode)),
    severityNumber: getSeverityNumber(parseInt(statusCode)),
    body: `${method} ${path} ${httpVersion} ${statusCode}`,
    attributes: {
      ...record.attributes,
      "http.client_ip": ipAddress,
      "http.user_id": userId,
      "http.method": method,
      "http.path": path,
      "http.version": httpVersion,
      "http.status_code": parseInt(statusCode),
      "http.response_content_length": parseInt(bytesSent),
      "http.referer": referer,
      "http.user_agent": userAgent

    },
    resource: {
      ...record.resource,
      "service.name": "web-server",
      "service.instance.id": "instance-001" // Example value
    }
  };
}


function normalizeJson(record) {
  function getSeverityText(level) {
    if (level >= 50) return "ERROR";
    if (level >= 40) return "WARN";
    if (level >= 30) return "INFO";
    if (level >= 20) return "DEBUG";
    return "TRACE";
  }

  const severityText = getSeverityText(record.attributes.log.level);
  const severityNumber = record.attributes.log.level
  const attributes = { 
    ...record.attributes,
    "process.pid": record.attributes.log.pid,
    "order.id": record.attributes.log.orderId,
    "error.type": record.attributes.log.err?.type,
    "error.message": record.attributes.log.err?.message
   }

   const resources = {
    ...record.resource,
    "service.name": "payment-service", // Example value
    "host.name": record.attributes.log.hostname
   }

  return {
    ...record,
    attributes,
    resources
    }
  };
function normalizeStdOut(record) {
  function getSeverityNumberFor(stdOutRecord) {
    const severityMap = {
      TRACE: 1,
      DEBUG: 5,
      INFO: 9,
      WARN: 13,
      ERROR: 17,
      FATAL: 21
    };
    return severityMap[level] || 0;
  }

  const regex = /^(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z) \[(\w+)\] (.+)$/;
  const match = record.attributes.log.match(regex);

  const [, timestamp, level, message] = match;

  return {
    ...record,
    severityText: level,
    severityNumber: getSeverityNumber(level),
    body: message,
  };
}


function normalizeWebServer(record) {
  function getSeverityText(statusCode) {
    if (statusCode >= 500) return "ERROR";
    if (statusCode >= 400) return "WARN";
    return "INFO";
  }

  function getSeverityNumber(statusCode) {
    if (statusCode >= 500) return 17;
    if (statusCode >= 400) return 13;
    return 9;
  }

  const regex = /^(\S+) (\S+) (\S+) \[([^\]]+)\] "(\S+) (\S+) (\S+)" (\d+) (\d+) "([^"]*)" "([^"]*)"$/;
  const match = record.attributes.log.match(regex);

  const [
    ,
    ipAddress,
    clientId,
    userId,
    dateTime,
    method,
    path,
    httpVersion,
    statusCode,
    bytesSent,
    referer,
    userAgent
  ] = match;

  return {
    ...record,
    severityText: getSeverityText(parseInt(statusCode)),
    severityNumber: getSeverityNumber(parseInt(statusCode)),
    body: `${method} ${path} ${httpVersion} ${statusCode}`,
    attributes: {
      ...record.attributes,
      "http.client_ip": ipAddress,
      "http.user_id": userId,
      "http.method": method,
      "http.path": path,
      "http.version": httpVersion,
      "http.status_code": parseInt(statusCode),
      "http.response_content_length": parseInt(bytesSent),
      "http.referer": referer,
      "http.user_agent": userAgent

    },
    resource: {
      ...record.resource,
      "service.name": "web-server",
      "service.instance.id": "instance-001" // Example value
    }
  };
}


function normalizeJson(record) {
  function getSeverityText(level) {
    if (level >= 50) return "ERROR";
    if (level >= 40) return "WARN";
    if (level >= 30) return "INFO";
    if (level >= 20) return "DEBUG";
    return "TRACE";
  }

  const severityText = getSeverityText(record.attributes.log.level);
  const severityNumber = record.attributes.log.level
  const attributes = { 
    ...record.attributes,
    "process.pid": record.attributes.log.pid,
    "order.id": record.attributes.log.orderId,
    "error.type": record.attributes.log.err?.type,
    "error.message": record.attributes.log.err?.message
   }

   const resources = {
    ...record.resource,
    "service.name": "payment-service", // Example value
    "host.name": record.attributes.log.hostname
   }

  return {
    ...record,
    attributes,
    resources
    }
  };

With our helper functions in place, we are ready to put everything together.

javascript
const { attributes, resource, body } = record
const { timestamp, datatype } = metadata

if (datatype !== 'log') return null

/** 
 * Function defintions...
 */

const formattedRecord = formatFluentBitToOpenTelmetry(record)
if (formattedRecord.resource["service.name"] === "stdOutServiceName") return normalizeStdOut(formattedRecord)
if (formattedRecord.resource["service.name"] === "webServerServiceName") return normalizeWebServer(formattedRecord)
if (formattedRecord.resource["service.name"] === "jsonServiceName") return normalizeJson(formattedRecord)

return record
const { attributes, resource, body } = record
const { timestamp, datatype } = metadata

if (datatype !== 'log') return null

/** 
 * Function defintions...
 */

const formattedRecord = formatFluentBitToOpenTelmetry(record)
if (formattedRecord.resource["service.name"] === "stdOutServiceName") return normalizeStdOut(formattedRecord)
if (formattedRecord.resource["service.name"] === "webServerServiceName") return normalizeWebServer(formattedRecord)
if (formattedRecord.resource["service.name"] === "jsonServiceName") return normalizeJson(formattedRecord)

return record

Now, our log data has been normalized, sanitized and structured for further downstream transformations and consumption.

After you standardize your log data, check out our recipes for filtering and sampling to start reducing the amount of log data you're sending to vendors.