# Custom Log Formats

Gonzo supports custom log formats through YAML configuration files, allowing you to parse logs from any application and convert them to OpenTelemetry (OTLP) attributes for analysis.

### Using Built-in Formats

Gonzo includes pre-built formats in the [formats directory](https://github.com/control-theory/gonzo/tree/main/formats):

**Available formats:**

* `loki-stream.yaml` - Grafana Loki streaming (individual entries)
* `loki-batch.yaml` - Loki batch format with multi-entry expansion
* `vercel-stream.yaml` - Vercel logs
* `nodejs.yaml` - Node.js application logs
* `apache-combined.yaml` - Apache/Nginx access logs

**Setup:**

```bash
# Download and install format
mkdir -p ~/.config/gonzo/formats
cp <format-file>.yaml ~/.config/gonzo/formats/

# Use the format
gonzo --format=loki-stream -f logs.json

# List available formats
ls ~/.config/gonzo/formats/
```

**Examples:**

```bash
# Loki with logcli
logcli query --addr=http://localhost:3100 --follow '{service=~".+"}' -o jsonl 2>/dev/null | gonzo --format=loki-stream

# Loki Live Tail API using "wscat" (batch format)
wscat -c 'ws://localhost:3100/loki/api/v1/tail?query={service_name=~".%2B"}&limit=50' | gonzo --format=loki-batch

# Vercel logs
vercel logs <deployment_id> -j | gonzo --format=vercel-stream

# File with custom format
gonzo --format=nodejs -f application.log
```

### Creating Your Own Custom Formats

#### Quick Start

#### 1. Create a Format File

Create a YAML file in `~/.config/gonzo/formats/` directory:

```
mkdir -p ~/.config/gonzo/formats
vim ~/.config/gonzo/formats/myapp.yaml
```

#### 2. Define Your Format

```
name: myapp
description: My Application Log Format
type: text

pattern:
  use_regex: true
  main: '^(?P<timestamp>[\d\-T:\.]+)\s+\[(?P<level>\w+)\]\s+(?P<message>.*)$'

mapping:
  timestamp:
    field: timestamp
    time_format: rfc3339
  severity:
    field: level
  body:
    field: message
```

#### 3. Use the Format

```
gonzo --format=myapp -f application.log
```

#### Basic Structure

```yaml
# Metadata
name: format-name           # Required: Unique identifier
description: Description    # Optional: Human-readable description
author: Your Name           # Optional: Format author
type: text|json|structured  # Required: Format type

# Pattern Configuration (for text/structured types)
pattern:
  use_regex: true|false     # Use regex or template matching
  main: "pattern"           # Main pattern for parsing
  fields:                   # Additional field patterns
    field_name: "pattern"

# JSON Configuration (for json type)
json:
  fields:                   # Field mappings
    internal_name: json_path
  array_path: "path"        # For nested arrays
  root_is_array: true|false # If root is an array

# Field Mapping
mapping:
  timestamp:                # Timestamp extraction
    field: field_name
    time_format: format
    default: value

  severity:                 # Log level/severity
    field: field_name
    transform: operation
    default: value

  body:                     # Main log message
    field: field_name
    template: "{{.field}}"

  attributes:               # Additional attributes
    attr_name:
      field: source_field
      pattern: "regex"
      transform: operation
      default: value
```

#### Format Types

**text** - Plain text logs with regex patterns:

```yaml
type: text
pattern:
  use_regex: true
  main: 'your-regex-pattern-here'
```

**json** - JSON structured logs:

```yaml
type: json
json:
  fields:
    timestamp: $.timestamp
    message: $.msg
```

**structured** - Fixed position logs (Apache-style):

```yaml
type: structured
pattern:
  use_regex: true
  main: 'pattern-with-named-groups'
```

#### Common Regex Patterns

| Pattern       | Description         | Example                 |
| ------------- | ------------------- | ----------------------- |
| `[\d\-T:\.]+` | ISO timestamp       | 2024-01-15T10:30:45.123 |
| `\w+`         | Word characters     | ERROR, INFO             |
| `\d+`         | Digits              | 12345                   |
| `[^\]]+`      | Everything except ] | Content inside brackets |
| `.*`          | Any characters      | Rest of line            |
| `\S+`         | Non-whitespace      | Token or word           |

#### Time Formats

| Format                  | Example              | Description        |
| ----------------------- | -------------------- | ------------------ |
| `rfc3339`               | 2024-01-15T10:30:45Z | ISO 8601           |
| `unix`                  | 1705316445           | Unix seconds       |
| `unix_ms`               | 1705316445123        | Unix milliseconds  |
| `unix_ns`               | 1705316445123456789  | Unix nanoseconds   |
| `auto`                  | Various              | Auto-detect format |
| `"2006-01-02 15:04:05"` | 2024-01-15 10:30:45  | Custom Go format   |

#### Field Transforms

* `uppercase`: Convert to uppercase (info → INFO)
* `lowercase`: Convert to lowercase (ERROR → error)
* `trim`: Remove whitespace (" text " → "text")
* `status_to_severity`: HTTP status to severity (200→INFO, 404→WARN, 500→ERROR)

### Complete Examples

#### Example 1: Node.js Application Logs

**Log format:** `[Backend] 5300 LOG [Module] Message +6ms`

```yaml
# Format for: [Backend] 5300 LOG [Module] Message +6ms
name: nodejs
type: text

pattern:
  use_regex: true
  main: '^\[(?P<project>[^\]]+)\]\s+(?P<pid>\d+)\s+(?P<level>\w+)\s+\[(?P<module>[^\]]+)\]\s+(?P<message>[^+]+?)(?:\s+\+(?P<duration>\d+)ms)?$'

mapping:
  severity:
    field: level
    transform: uppercase
  body:
    field: message
  attributes:
    project:
      field: project
    pid:
      field: pid
    module:
      field: module
    duration_ms:
      field: duration
      default: "0"
```

#### Example 2: Kubernetes/Docker JSON Logs

**Format configuration:**

```yaml
name: k8s-json
type: json

json:
  fields:
    timestamp: time
    message: log
    stream: stream

mapping:
  timestamp:
    field: timestamp
    time_format: rfc3339
  body:
    field: message
  attributes:
    stream:
      field: stream
    container_name:
      field: kubernetes.container_name
    pod_name:
      field: kubernetes.pod_name
    namespace:
      field: kubernetes.namespace_name
```

#### Example 3: Apache Access Logs

**Log format:** `192.168.1.1 - - [14/Oct/2024:10:30:45 +0000] "GET /api/users HTTP/1.1" 200 1234`

```yaml
name: apache-access
type: structured

pattern:
  use_regex: true
  main: '^(?P<ip>[\d\.]+).*?\[(?P<timestamp>[^\]]+)\]\s+"(?P<method>\w+)\s+(?P<path>[^\s]+).*?"\s+(?P<status>\d+)\s+(?P<bytes>\d+)'

mapping:
  timestamp:
    field: timestamp
    time_format: "02/Jan/2006:15:04:05 -0700"
  body:
    template: "{{.method}} {{.path}} - {{.status}}"
  attributes:
    client_ip:
      field: ip
    http_method:
      field: method
    http_path:
      field: path
    http_status:
      field: status
    response_bytes:
      field: bytes
```

### Advanced Features

#### Batch Processing

For logs where a single line contains multiple entries (like Loki batch format):

```yaml
batch:
  enabled: true
  expand_path: "streams[].values[]"    # Arrays to expand
  context_paths: ["streams[].stream"]  # Metadata to preserve
```

**How it works:**

1. Original line: `{"streams":[{"stream":{"service":"app"},"values":[["1234","msg1"],["5678","msg2"]]}]}`
2. Gets expanded to: 2 separate log entries
3. Each entry retains the stream metadata

**Common patterns:**

* `logs[]` - Expand top-level array
* `streams[].values[]` - Expand nested arrays (Loki)
* `events[].entries[]` - Multi-level expansion

#### Nested JSON Fields

Access nested fields using dot notation:

```
attributes:
  user_id:
    field: user.id
  user_name:
    field: user.profile.name
```

#### Pattern Extraction

Extract values from within a field:

```
attributes:
  error_code:
    field: message
    pattern: 'ERROR\[(\d+)\]'  # Extracts code from "ERROR[404]: Not found"
```

#### Conditional Defaults

Use defaults when fields are missing:

```
attributes:
  environment:
    field: env
    default: "production"
```

#### HTTP Status Code to Severity Mapping

For web server logs, use the `status_to_severity` transform:

```
severity:
  field: http_status
  transform: status_to_severity
```

**Status code mapping:**

* 1xx (100-199): DEBUG (Informational)
* 2xx (200-299): INFO (Success)
* 3xx (300-399): INFO (Redirection)
* 4xx (400-499): WARN (Client Error)
* 5xx (500-599): ERROR (Server Error)

#### Multiple Pattern Matching

Define additional patterns for specific fields:

```
pattern:
  use_regex: true
  main: '^(?P<base>.*)'
  fields:
    request_id: 'RequestID:\s*(\w+)'
    user_id: 'UserID:\s*(\d+)'
```

### Testing & Troubleshooting

**Test your format:**

```bash
# Test with small sample
head -n 10 app.log | gonzo --format=myformat

# Test without TUI
gonzo --format=myformat -f app.log --test-mode
```

**Common issues:**

1. **Pattern not matching**: Test regex at regex101.com, verify named groups `(?P<name>...)`
2. **Wrong timestamps**: Check time\_format matches exactly, use Go format syntax
3. **Missing attributes**: Verify field paths (use dot notation for nested: `user.profile.name`)
4. **Performance issues**: Use specific patterns instead of `.*`, avoid overly complex regex

**Debug tips:**

* Start with simple patterns, add complexity gradually
* Use defaults for optional fields
* Test with various log samples
* Check Gonzo output for parsing errors

### Best Practices

* **Document your format**: Add description and example log lines
* **Use meaningful names**: Descriptive field names aid understanding
* **Handle edge cases**: Provide defaults for optional fields
* **Test thoroughly**: Verify with various log samples
* **Version control**: Keep formats in Git for team sharing
* **Optimize patterns**: Specific patterns perform better than generic ones

### Additional Resources

* **Format Examples**: <https://github.com/control-theory/gonzo/tree/main/formats>
* **Full Guide**: <https://github.com/control-theory/gonzo/blob/main/guides/CUSTOM\\_FORMATS.md>
* **Quick Reference**: <https://github.com/control-theory/gonzo/blob/main/guides/FORMAT\\_QUICK\\_REFERENCE.md>
* **Issue Tracker**: <https://github.com/control-theory/gonzo/issues>
