Custom Log Formats

Not every log is the same...

Gonzo supports custom log formats through YAML configuration files, allowing you to parse logs from any application and convert them to OpenTelemetry (OTLP) attributes for analysis.

Using Built-in Formats

Gonzo includes pre-built formats in the formats directory:

Available formats:

  • loki-stream.yaml - Grafana Loki streaming (individual entries)

  • loki-batch.yaml - Loki batch format with multi-entry expansion

  • vercel-stream.yaml - Vercel logs

  • nodejs.yaml - Node.js application logs

  • apache-combined.yaml - Apache/Nginx access logs

Setup:

# Download and install format
mkdir -p ~/.config/gonzo/formats
cp <format-file>.yaml ~/.config/gonzo/formats/

# Use the format
gonzo --format=loki-stream -f logs.json

# List available formats
ls ~/.config/gonzo/formats/

Examples:

# Loki with logcli
logcli query --addr=http://localhost:3100 --follow '{service=~".+"}' -o jsonl 2>/dev/null | gonzo --format=loki-stream

# Loki Live Tail API using "wscat" (batch format)
wscat -c 'ws://localhost:3100/loki/api/v1/tail?query={service_name=~".%2B"}&limit=50' | gonzo --format=loki-batch

# Vercel logs
vercel logs <deployment_id> -j | gonzo --format=vercel-stream

# File with custom format
gonzo --format=nodejs -f application.log

Creating Your Own Custom Formats

Quick Start

1. Create a Format File

Create a YAML file in ~/.config/gonzo/formats/ directory:

mkdir -p ~/.config/gonzo/formats
vim ~/.config/gonzo/formats/myapp.yaml

2. Define Your Format

name: myapp
description: My Application Log Format
type: text

pattern:
  use_regex: true
  main: '^(?P<timestamp>[\d\-T:\.]+)\s+\[(?P<level>\w+)\]\s+(?P<message>.*)$'

mapping:
  timestamp:
    field: timestamp
    time_format: rfc3339
  severity:
    field: level
  body:
    field: message

3. Use the Format

gonzo --format=myapp -f application.log

Basic Structure

# Metadata
name: format-name           # Required: Unique identifier
description: Description    # Optional: Human-readable description
author: Your Name           # Optional: Format author
type: text|json|structured  # Required: Format type

# Pattern Configuration (for text/structured types)
pattern:
  use_regex: true|false     # Use regex or template matching
  main: "pattern"           # Main pattern for parsing
  fields:                   # Additional field patterns
    field_name: "pattern"

# JSON Configuration (for json type)
json:
  fields:                   # Field mappings
    internal_name: json_path
  array_path: "path"        # For nested arrays
  root_is_array: true|false # If root is an array

# Field Mapping
mapping:
  timestamp:                # Timestamp extraction
    field: field_name
    time_format: format
    default: value

  severity:                 # Log level/severity
    field: field_name
    transform: operation
    default: value

  body:                     # Main log message
    field: field_name
    template: "{{.field}}"

  attributes:               # Additional attributes
    attr_name:
      field: source_field
      pattern: "regex"
      transform: operation
      default: value

Format Types

text - Plain text logs with regex patterns:

type: text
pattern:
  use_regex: true
  main: 'your-regex-pattern-here'

json - JSON structured logs:

type: json
json:
  fields:
    timestamp: $.timestamp
    message: $.msg

structured - Fixed position logs (Apache-style):

type: structured
pattern:
  use_regex: true
  main: 'pattern-with-named-groups'

Common Regex Patterns

Pattern
Description
Example

[\d\-T:\.]+

ISO timestamp

2024-01-15T10:30:45.123

\w+

Word characters

ERROR, INFO

\d+

Digits

12345

[^\]]+

Everything except ]

Content inside brackets

.*

Any characters

Rest of line

\S+

Non-whitespace

Token or word

Time Formats

Format
Example
Description

rfc3339

2024-01-15T10:30:45Z

ISO 8601

unix

1705316445

Unix seconds

unix_ms

1705316445123

Unix milliseconds

unix_ns

1705316445123456789

Unix nanoseconds

auto

Various

Auto-detect format

"2006-01-02 15:04:05"

2024-01-15 10:30:45

Custom Go format

Field Transforms

  • uppercase: Convert to uppercase (info → INFO)

  • lowercase: Convert to lowercase (ERROR → error)

  • trim: Remove whitespace (" text " → "text")

  • status_to_severity: HTTP status to severity (200→INFO, 404→WARN, 500→ERROR)

Complete Examples

Example 1: Node.js Application Logs

Log format: [Backend] 5300 LOG [Module] Message +6ms

# Format for: [Backend] 5300 LOG [Module] Message +6ms
name: nodejs
type: text

pattern:
  use_regex: true
  main: '^\[(?P<project>[^\]]+)\]\s+(?P<pid>\d+)\s+(?P<level>\w+)\s+\[(?P<module>[^\]]+)\]\s+(?P<message>[^+]+?)(?:\s+\+(?P<duration>\d+)ms)?$'

mapping:
  severity:
    field: level
    transform: uppercase
  body:
    field: message
  attributes:
    project:
      field: project
    pid:
      field: pid
    module:
      field: module
    duration_ms:
      field: duration
      default: "0"

Example 2: Kubernetes/Docker JSON Logs

Format configuration:

name: k8s-json
type: json

json:
  fields:
    timestamp: time
    message: log
    stream: stream

mapping:
  timestamp:
    field: timestamp
    time_format: rfc3339
  body:
    field: message
  attributes:
    stream:
      field: stream
    container_name:
      field: kubernetes.container_name
    pod_name:
      field: kubernetes.pod_name
    namespace:
      field: kubernetes.namespace_name

Example 3: Apache Access Logs

Log format: 192.168.1.1 - - [14/Oct/2024:10:30:45 +0000] "GET /api/users HTTP/1.1" 200 1234

name: apache-access
type: structured

pattern:
  use_regex: true
  main: '^(?P<ip>[\d\.]+).*?\[(?P<timestamp>[^\]]+)\]\s+"(?P<method>\w+)\s+(?P<path>[^\s]+).*?"\s+(?P<status>\d+)\s+(?P<bytes>\d+)'

mapping:
  timestamp:
    field: timestamp
    time_format: "02/Jan/2006:15:04:05 -0700"
  body:
    template: "{{.method}} {{.path}} - {{.status}}"
  attributes:
    client_ip:
      field: ip
    http_method:
      field: method
    http_path:
      field: path
    http_status:
      field: status
    response_bytes:
      field: bytes

Advanced Features

Batch Processing

For logs where a single line contains multiple entries (like Loki batch format):

batch:
  enabled: true
  expand_path: "streams[].values[]"    # Arrays to expand
  context_paths: ["streams[].stream"]  # Metadata to preserve

How it works:

  1. Original line: {"streams":[{"stream":{"service":"app"},"values":[["1234","msg1"],["5678","msg2"]]}]}

  2. Gets expanded to: 2 separate log entries

  3. Each entry retains the stream metadata

Common patterns:

  • logs[] - Expand top-level array

  • streams[].values[] - Expand nested arrays (Loki)

  • events[].entries[] - Multi-level expansion

Nested JSON Fields

Access nested fields using dot notation:

attributes:
  user_id:
    field: user.id
  user_name:
    field: user.profile.name

Pattern Extraction

Extract values from within a field:

attributes:
  error_code:
    field: message
    pattern: 'ERROR\[(\d+)\]'  # Extracts code from "ERROR[404]: Not found"

Conditional Defaults

Use defaults when fields are missing:

attributes:
  environment:
    field: env
    default: "production"

HTTP Status Code to Severity Mapping

For web server logs, use the status_to_severity transform:

severity:
  field: http_status
  transform: status_to_severity

Status code mapping:

  • 1xx (100-199): DEBUG (Informational)

  • 2xx (200-299): INFO (Success)

  • 3xx (300-399): INFO (Redirection)

  • 4xx (400-499): WARN (Client Error)

  • 5xx (500-599): ERROR (Server Error)

Multiple Pattern Matching

Define additional patterns for specific fields:

pattern:
  use_regex: true
  main: '^(?P<base>.*)'
  fields:
    request_id: 'RequestID:\s*(\w+)'
    user_id: 'UserID:\s*(\d+)'

Testing & Troubleshooting

Test your format:

# Test with small sample
head -n 10 app.log | gonzo --format=myformat

# Test without TUI
gonzo --format=myformat -f app.log --test-mode

Common issues:

  1. Pattern not matching: Test regex at regex101.com, verify named groups (?P<name>...)

  2. Wrong timestamps: Check time_format matches exactly, use Go format syntax

  3. Missing attributes: Verify field paths (use dot notation for nested: user.profile.name)

  4. Performance issues: Use specific patterns instead of .*, avoid overly complex regex

Debug tips:

  • Start with simple patterns, add complexity gradually

  • Use defaults for optional fields

  • Test with various log samples

  • Check Gonzo output for parsing errors

Best Practices

  • Document your format: Add description and example log lines

  • Use meaningful names: Descriptive field names aid understanding

  • Handle edge cases: Provide defaults for optional fields

  • Test thoroughly: Verify with various log samples

  • Version control: Keep formats in Git for team sharing

  • Optimize patterns: Specific patterns perform better than generic ones

Additional Resources

  • Format Examples: https://github.com/control-theory/gonzo/tree/main/formats

  • Full Guide: https://github.com/control-theory/gonzo/blob/main/guides/CUSTOM_FORMATS.md

  • Quick Reference: https://github.com/control-theory/gonzo/blob/main/guides/FORMAT_QUICK_REFERENCE.md

  • Issue Tracker: https://github.com/control-theory/gonzo/issues

Last updated