# Custom Log Formats

Gonzo supports custom log formats through YAML configuration files, allowing you to parse logs from any application and convert them to OpenTelemetry (OTLP) attributes for analysis.

### Using Built-in Formats

Gonzo includes pre-built formats in the [formats directory](https://github.com/control-theory/gonzo/tree/main/formats):

**Available formats:**

* `loki-stream.yaml` - Grafana Loki streaming (individual entries)
* `loki-batch.yaml` - Loki batch format with multi-entry expansion
* `vercel-stream.yaml` - Vercel logs
* `nodejs.yaml` - Node.js application logs
* `apache-combined.yaml` - Apache/Nginx access logs

**Setup:**

```bash
# Download and install format
mkdir -p ~/.config/gonzo/formats
cp <format-file>.yaml ~/.config/gonzo/formats/

# Use the format
gonzo --format=loki-stream -f logs.json

# List available formats
ls ~/.config/gonzo/formats/
```

**Examples:**

```bash
# Loki with logcli
logcli query --addr=http://localhost:3100 --follow '{service=~".+"}' -o jsonl 2>/dev/null | gonzo --format=loki-stream

# Loki Live Tail API using "wscat" (batch format)
wscat -c 'ws://localhost:3100/loki/api/v1/tail?query={service_name=~".%2B"}&limit=50' | gonzo --format=loki-batch

# Vercel logs
vercel logs <deployment_id> -j | gonzo --format=vercel-stream

# File with custom format
gonzo --format=nodejs -f application.log
```

### Creating Your Own Custom Formats

#### Quick Start

#### 1. Create a Format File

Create a YAML file in `~/.config/gonzo/formats/` directory:

```
mkdir -p ~/.config/gonzo/formats
vim ~/.config/gonzo/formats/myapp.yaml
```

#### 2. Define Your Format

```
name: myapp
description: My Application Log Format
type: text

pattern:
  use_regex: true
  main: '^(?P<timestamp>[\d\-T:\.]+)\s+\[(?P<level>\w+)\]\s+(?P<message>.*)$'

mapping:
  timestamp:
    field: timestamp
    time_format: rfc3339
  severity:
    field: level
  body:
    field: message
```

#### 3. Use the Format

```
gonzo --format=myapp -f application.log
```

#### Basic Structure

```yaml
# Metadata
name: format-name           # Required: Unique identifier
description: Description    # Optional: Human-readable description
author: Your Name           # Optional: Format author
type: text|json|structured  # Required: Format type

# Pattern Configuration (for text/structured types)
pattern:
  use_regex: true|false     # Use regex or template matching
  main: "pattern"           # Main pattern for parsing
  fields:                   # Additional field patterns
    field_name: "pattern"

# JSON Configuration (for json type)
json:
  fields:                   # Field mappings
    internal_name: json_path
  array_path: "path"        # For nested arrays
  root_is_array: true|false # If root is an array

# Field Mapping
mapping:
  timestamp:                # Timestamp extraction
    field: field_name
    time_format: format
    default: value

  severity:                 # Log level/severity
    field: field_name
    transform: operation
    default: value

  body:                     # Main log message
    field: field_name
    template: "{{.field}}"

  attributes:               # Additional attributes
    attr_name:
      field: source_field
      pattern: "regex"
      transform: operation
      default: value
```

#### Format Types

**text** - Plain text logs with regex patterns:

```yaml
type: text
pattern:
  use_regex: true
  main: 'your-regex-pattern-here'
```

**json** - JSON structured logs:

```yaml
type: json
json:
  fields:
    timestamp: $.timestamp
    message: $.msg
```

**structured** - Fixed position logs (Apache-style):

```yaml
type: structured
pattern:
  use_regex: true
  main: 'pattern-with-named-groups'
```

#### Common Regex Patterns

| Pattern       | Description         | Example                 |
| ------------- | ------------------- | ----------------------- |
| `[\d\-T:\.]+` | ISO timestamp       | 2024-01-15T10:30:45.123 |
| `\w+`         | Word characters     | ERROR, INFO             |
| `\d+`         | Digits              | 12345                   |
| `[^\]]+`      | Everything except ] | Content inside brackets |
| `.*`          | Any characters      | Rest of line            |
| `\S+`         | Non-whitespace      | Token or word           |

#### Time Formats

| Format                  | Example              | Description        |
| ----------------------- | -------------------- | ------------------ |
| `rfc3339`               | 2024-01-15T10:30:45Z | ISO 8601           |
| `unix`                  | 1705316445           | Unix seconds       |
| `unix_ms`               | 1705316445123        | Unix milliseconds  |
| `unix_ns`               | 1705316445123456789  | Unix nanoseconds   |
| `auto`                  | Various              | Auto-detect format |
| `"2006-01-02 15:04:05"` | 2024-01-15 10:30:45  | Custom Go format   |

#### Field Transforms

* `uppercase`: Convert to uppercase (info → INFO)
* `lowercase`: Convert to lowercase (ERROR → error)
* `trim`: Remove whitespace (" text " → "text")
* `status_to_severity`: HTTP status to severity (200→INFO, 404→WARN, 500→ERROR)

### Complete Examples

#### Example 1: Node.js Application Logs

**Log format:** `[Backend] 5300 LOG [Module] Message +6ms`

```yaml
# Format for: [Backend] 5300 LOG [Module] Message +6ms
name: nodejs
type: text

pattern:
  use_regex: true
  main: '^\[(?P<project>[^\]]+)\]\s+(?P<pid>\d+)\s+(?P<level>\w+)\s+\[(?P<module>[^\]]+)\]\s+(?P<message>[^+]+?)(?:\s+\+(?P<duration>\d+)ms)?$'

mapping:
  severity:
    field: level
    transform: uppercase
  body:
    field: message
  attributes:
    project:
      field: project
    pid:
      field: pid
    module:
      field: module
    duration_ms:
      field: duration
      default: "0"
```

#### Example 2: Kubernetes/Docker JSON Logs

**Format configuration:**

```yaml
name: k8s-json
type: json

json:
  fields:
    timestamp: time
    message: log
    stream: stream

mapping:
  timestamp:
    field: timestamp
    time_format: rfc3339
  body:
    field: message
  attributes:
    stream:
      field: stream
    container_name:
      field: kubernetes.container_name
    pod_name:
      field: kubernetes.pod_name
    namespace:
      field: kubernetes.namespace_name
```

#### Example 3: Apache Access Logs

**Log format:** `192.168.1.1 - - [14/Oct/2024:10:30:45 +0000] "GET /api/users HTTP/1.1" 200 1234`

```yaml
name: apache-access
type: structured

pattern:
  use_regex: true
  main: '^(?P<ip>[\d\.]+).*?\[(?P<timestamp>[^\]]+)\]\s+"(?P<method>\w+)\s+(?P<path>[^\s]+).*?"\s+(?P<status>\d+)\s+(?P<bytes>\d+)'

mapping:
  timestamp:
    field: timestamp
    time_format: "02/Jan/2006:15:04:05 -0700"
  body:
    template: "{{.method}} {{.path}} - {{.status}}"
  attributes:
    client_ip:
      field: ip
    http_method:
      field: method
    http_path:
      field: path
    http_status:
      field: status
    response_bytes:
      field: bytes
```

### Advanced Features

#### Batch Processing

For logs where a single line contains multiple entries (like Loki batch format):

```yaml
batch:
  enabled: true
  expand_path: "streams[].values[]"    # Arrays to expand
  context_paths: ["streams[].stream"]  # Metadata to preserve
```

**How it works:**

1. Original line: `{"streams":[{"stream":{"service":"app"},"values":[["1234","msg1"],["5678","msg2"]]}]}`
2. Gets expanded to: 2 separate log entries
3. Each entry retains the stream metadata

**Common patterns:**

* `logs[]` - Expand top-level array
* `streams[].values[]` - Expand nested arrays (Loki)
* `events[].entries[]` - Multi-level expansion

#### Nested JSON Fields

Access nested fields using dot notation:

```
attributes:
  user_id:
    field: user.id
  user_name:
    field: user.profile.name
```

#### Pattern Extraction

Extract values from within a field:

```
attributes:
  error_code:
    field: message
    pattern: 'ERROR\[(\d+)\]'  # Extracts code from "ERROR[404]: Not found"
```

#### Conditional Defaults

Use defaults when fields are missing:

```
attributes:
  environment:
    field: env
    default: "production"
```

#### HTTP Status Code to Severity Mapping

For web server logs, use the `status_to_severity` transform:

```
severity:
  field: http_status
  transform: status_to_severity
```

**Status code mapping:**

* 1xx (100-199): DEBUG (Informational)
* 2xx (200-299): INFO (Success)
* 3xx (300-399): INFO (Redirection)
* 4xx (400-499): WARN (Client Error)
* 5xx (500-599): ERROR (Server Error)

#### Multiple Pattern Matching

Define additional patterns for specific fields:

```
pattern:
  use_regex: true
  main: '^(?P<base>.*)'
  fields:
    request_id: 'RequestID:\s*(\w+)'
    user_id: 'UserID:\s*(\d+)'
```

### Testing & Troubleshooting

**Test your format:**

```bash
# Test with small sample
head -n 10 app.log | gonzo --format=myformat

# Test without TUI
gonzo --format=myformat -f app.log --test-mode
```

**Common issues:**

1. **Pattern not matching**: Test regex at regex101.com, verify named groups `(?P<name>...)`
2. **Wrong timestamps**: Check time\_format matches exactly, use Go format syntax
3. **Missing attributes**: Verify field paths (use dot notation for nested: `user.profile.name`)
4. **Performance issues**: Use specific patterns instead of `.*`, avoid overly complex regex

**Debug tips:**

* Start with simple patterns, add complexity gradually
* Use defaults for optional fields
* Test with various log samples
* Check Gonzo output for parsing errors

### Best Practices

* **Document your format**: Add description and example log lines
* **Use meaningful names**: Descriptive field names aid understanding
* **Handle edge cases**: Provide defaults for optional fields
* **Test thoroughly**: Verify with various log samples
* **Version control**: Keep formats in Git for team sharing
* **Optimize patterns**: Specific patterns perform better than generic ones

### Additional Resources

* **Format Examples**: <https://github.com/control-theory/gonzo/tree/main/formats>
* **Full Guide**: <https://github.com/control-theory/gonzo/blob/main/guides/CUSTOM\\_FORMATS.md>
* **Quick Reference**: <https://github.com/control-theory/gonzo/blob/main/guides/FORMAT\\_QUICK\\_REFERENCE.md>
* **Issue Tracker**: <https://github.com/control-theory/gonzo/issues>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.controltheory.com/controltheory-documentation/gonzo-docs/advanced-features/custom-log-formats.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
