# Format Detection

Gonzo's intelligent format detection automatically recognizes and parses different log formats, extracting structured data for powerful analysis. From JSON to plain text, Gonzo adapts to your logs without configuration.

{% hint style="info" %}
**Automatic Operation:** Format detection works automatically - no configuration required. Gonzo analyzes each log entry and applies the appropriate parsing strategy in real-time.
{% endhint %}

### Supported Log Formats

Gonzo intelligently detects and processes multiple log formats simultaneously:

| Format            | Detection Method    | Structured Data         | Best For                              |
| ----------------- | ------------------- | ----------------------- | ------------------------------------- |
| **JSON**          | Valid JSON syntax   | Complete object parsing | Modern applications, microservices    |
| **Logfmt**        | Key=value pairs     | Field extraction        | Go applications, Heroku-style logs    |
| **Plain Text**    | Pattern recognition | Smart field extraction  | Traditional applications, system logs |
| **Mixed Formats** | Per-line analysis   | Format-specific parsing | Legacy systems, log aggregation       |

### JSON Log Format

#### Automatic JSON Detection

Gonzo automatically detects JSON logs and extracts all fields:

```json
{
  "timestamp": "2024-01-15T10:30:00Z",
  "level": "error",
  "service": "payment-api",
  "message": "Database connection timeout",
  "user_id": "12345",
  "request_id": "req_abc123",
  "duration": 30000,
  "endpoint": "/api/payment/process",
  "error": {
    "type": "TimeoutError",
    "details": "Connection pool exhausted"
  },
  "metadata": {
    "trace_id": "trace_xyz789",
    "span_id": "span_456def"
  }
}
```

**What Gonzo Extracts:**

* **All top-level fields** appear in Attributes panel
* **Nested objects** are flattened (e.g., `error.type`, `metadata.trace_id`)
* **Arrays** are handled intelligently
* **Data types** are preserved (strings, numbers, booleans)

#### JSON Processing Features

{% tabs %}
{% tab title="Field Extraction" %}
**Automatic Field Detection:**

```
Attributes Panel Shows:
├── timestamp = "2024-01-15T10:30:00Z"
├── level = "error"  
├── service = "payment-api"
├── message = "Database connection timeout"
├── user_id = "12345"
├── request_id = "req_abc123"
├── duration = 30000
├── endpoint = "/api/payment/process"
├── error.type = "TimeoutError"
├── error.details = "Connection pool exhausted"
├── metadata.trace_id = "trace_xyz789"
└── metadata.span_id = "span_456def"
```

**Search and Filter by Fields:**

```bash
# Filter by any extracted field
/"level":"error"
/"service":"payment-api"
/"user_id":"12345"
/error\.type.*Timeout
/metadata\.trace_id.*xyz789
```

{% endtab %}

{% tab title="Nested Object Handling" %}
**Deep Object Parsing:**

```json
{
  "request": {
    "method": "POST",
    "url": "/api/users",
    "headers": {
      "authorization": "Bearer token123",
      "content-type": "application/json"
    },
    "body": {
      "user": {
        "name": "John Doe",
        "email": "john@example.com"
      }
    }
  }
}
```

**Flattened in Attributes:**

```
request.method = "POST"
request.url = "/api/users"  
request.headers.authorization = "Bearer token123"
request.headers.content-type = "application/json"
request.body.user.name = "John Doe"
request.body.user.email = "john@example.com"
```

{% endtab %}

{% tab title="Array Processing" %}
**Array Handling:**

```json
{
  "errors": ["validation failed", "missing field"],
  "tags": ["production", "critical", "database"],
  "users": [
    {"id": 1, "name": "Alice"},
    {"id": 2, "name": "Bob"}
  ]
}
```

**Array Representation:**

```
errors[0] = "validation failed"
errors[1] = "missing field"
tags[0] = "production"
tags[1] = "critical"
tags[2] = "database"
users[0].id = 1
users[0].name = "Alice"
users[1].id = 2
users[1].name = "Bob"
```

{% endtab %}
{% endtabs %}

#### JSON Best Practices

**Optimize Your JSON Logs:**

```json
{
  // ✅ Good: Consistent field naming
  "timestamp": "2024-01-15T10:30:00Z",
  "level": "error",
  "service": "payment-api",
  "message": "Clear, descriptive message",
  
  // ✅ Good: Structured context
  "context": {
    "user_id": "12345",
    "request_id": "req_abc123",
    "trace_id": "trace_xyz789"
  },
  
  // ✅ Good: Consistent data types
  "duration": 1500,           // Number, not string
  "success": false,           // Boolean, not "false"
  "retry_count": 3,           // Number for analysis
  
  // ❌ Avoid: Inconsistent field names
  // Sometimes "userId", sometimes "user_id", sometimes "UserID"
  
  // ❌ Avoid: Everything as strings
  // "duration": "1500ms" instead of duration: 1500
}
```

### Logfmt Format

#### Automatic Logfmt Detection

Gonzo detects logfmt (key=value) format commonly used by Go applications:

```
time=2024-01-15T10:30:00Z level=error service=payment-api msg="Database connection timeout" user_id=12345 request_id=req_abc123 duration=30000ms endpoint="/api/payment/process" error="connection pool exhausted" trace_id=trace_xyz789
```

**Parsing Features:**

* **Key-value extraction** - All key=value pairs become searchable fields
* **Quoted value support** - Handles `msg="complex message with spaces"`
* **Type inference** - Numbers and booleans detected automatically
* **Space handling** - Robust parsing of various spacing patterns

#### Logfmt Processing Examples

{% tabs %}
{% tab title="Basic Parsing" %}
**Input Logfmt:**

```
level=info service=web-api method=GET path=/health status=200 duration=45ms
```

**Extracted Attributes:**

```
level = "info"
service = "web-api"  
method = "GET"
path = "/health"
status = 200
duration = "45ms"
```

**Filtering Examples:**

```bash
level=error                    # Error level logs
service=web-api               # Specific service
method=POST                   # POST requests only
status=[45][0-9]{2}           # 4xx and 5xx status codes
duration=[1-9][0-9]+ms        # Duration > 10ms
```

{% endtab %}

{% tab title="Complex Values" %}
**Quoted String Handling:**

```
time=2024-01-15T10:30:00Z level=error msg="User authentication failed: invalid token format" user_id=12345 ip="192.168.1.100" user_agent="Mozilla/5.0 (compatible; bot)"
```

**Parsed Results:**

```
time = "2024-01-15T10:30:00Z"
level = "error"
msg = "User authentication failed: invalid token format"
user_id = 12345
ip = "192.168.1.100"
user_agent = "Mozilla/5.0 (compatible; bot)"
```

**Special Character Support:**

* Handles spaces in quoted values
* Escapes quotes within values
* Supports empty values: `key=""`
* Handles special characters in keys and values
  {% endtab %}

{% tab title="Mixed Data Types" %}
**Type Inference:**

```
level=warn count=42 rate=3.14 enabled=true disabled=false empty="" path="/api/users"
```

**Detected Types:**

```
level = "warn"        # String
count = 42           # Number (integer)
rate = 3.14          # Number (float)
enabled = true       # Boolean
disabled = false     # Boolean  
empty = ""           # Empty string
path = "/api/users"  # String
```

**Benefits for Analysis:**

* Numeric fields enable mathematical comparisons
* Boolean fields allow true/false filtering
* Type-aware searching and sorting
  {% endtab %}
  {% endtabs %}

#### Logfmt Optimization Tips

**Structure Your Logfmt Logs:**

```go
// ✅ Good: Consistent key naming
log.Info("Request processed",
    "service", "web-api",
    "method", "POST", 
    "path", "/api/users",
    "status", 201,
    "duration", time.Since(start),
    "user_id", userID,
    "request_id", requestID,
)

// ✅ Good: Use structured context
log.Error("Database error",
    "error", err.Error(),
    "query", "SELECT * FROM users WHERE id = ?",
    "query_duration", queryTime,
    "connection_pool_size", poolSize,
    "active_connections", activeConns,
)

// ❌ Avoid: Inconsistent formatting
// Sometimes: user_id=123
// Sometimes: userId=123  
// Sometimes: UserID=123
```

### Plain Text Format

#### Intelligent Plain Text Processing

Gonzo analyzes plain text logs and extracts structured information using pattern recognition:

```
2024-01-15 10:30:00 [ERROR] payment-api: Database connection timeout for user 12345 (duration: 30000ms, endpoint: /api/payment/process)
```

**Automatic Extraction:**

* **Timestamp detection** - Various timestamp formats
* **Log level identification** - ERROR, WARN, INFO, DEBUG, etc.
* **Service name extraction** - Service or component names
* **Contextual information** - IDs, durations, endpoints extracted via patterns

#### Plain Text Pattern Recognition

{% tabs %}
{% tab title="Timestamp Patterns" %}
**Supported Timestamp Formats:**

```bash
# ISO 8601 formats
2024-01-15T10:30:00Z
2024-01-15T10:30:00.123Z
2024-01-15 10:30:00

# Common log formats  
Jan 15 10:30:00
15/Jan/2024:10:30:00
2024-01-15 10:30:00.123

# Syslog formats
Jan 15 10:30:00
<134>Jan 15 10:30:00

# Custom formats (auto-detected)
[2024-01-15 10:30:00]
(2024-01-15 10:30:00)
```

**What Gonzo Extracts:**

* Normalizes all timestamps for time-series analysis
* Maintains original format in log display
* Enables time-based filtering and sorting
  {% endtab %}

{% tab title="Log Level Detection" %}
**Recognized Log Levels:**

```bash
# Standard levels (case-insensitive)
ERROR, ERR, E
WARN, WARNING, W
INFO, I
DEBUG, DBG, D
TRACE, T
FATAL, F

# Bracketed formats
[ERROR], [WARN], [INFO]
(ERROR), (WARN), (INFO)

# Prefixed formats  
ERROR:, WARN:, INFO:
ERR-, WARN-, INFO-

# Numeric levels
0 (Emergency), 1 (Alert), 2 (Critical)
3 (Error), 4 (Warning), 5 (Notice)
6 (Info), 7 (Debug)
```

**Color Coding:**

* All detected levels get appropriate color coding
* Consistent with JSON/logfmt level handling
* Enables severity-based filtering
  {% endtab %}

{% tab title="Service Extraction" %}
**Service Name Patterns:**

```bash
# Common service patterns Gonzo recognizes:

# After timestamp and level
2024-01-15 10:30:00 [ERROR] web-api: message
2024-01-15 10:30:00 ERROR web-api: message

# In brackets/parentheses
2024-01-15 10:30:00 [web-api] ERROR: message
2024-01-15 10:30:00 (web-api) ERROR: message

# Colon-separated
web-api: 2024-01-15 10:30:00 ERROR message
web-api ERROR: message

# Structured prefixes
[service=web-api] ERROR: message
service:web-api ERROR: message
```

**Service Distribution:**

* Extracted service names appear in service distribution analysis
* Enables service-based filtering
* Supports multi-service correlation
  {% endtab %}
  {% endtabs %}

#### Enhanced Plain Text Processing

**Contextual Information Extraction:**

```bash
# Gonzo automatically extracts common patterns:

# User IDs
"user 12345", "user_id: 12345", "uid=12345"
→ Extracted as user context

# Request IDs  
"request abc123", "req_id: abc123", "rid=abc123"
→ Extracted as request correlation

# Durations
"took 1500ms", "duration: 30s", "elapsed 2.5s"
→ Extracted as performance metrics

# HTTP Status
"status 404", "returned 500", "HTTP/1.1 200"
→ Extracted as response status

# IP Addresses
"from 192.168.1.100", "client 10.0.0.1"
→ Extracted as network context

# Error Codes
"error E001", "code TIMEOUT", "errno 404"
→ Extracted as error classification
```

### Mixed Format Handling

#### Multi-Format Log Sources

Gonzo handles mixed format sources intelligently:

```bash
# Example: Combined application and infrastructure logs
gonzo -f app.log -f nginx.log -f syslog --follow

# app.log (JSON format):
{"timestamp":"2024-01-15T10:30:00Z","level":"error","service":"app","message":"DB timeout"}

# nginx.log (Combined Log Format):
192.168.1.100 - - [15/Jan/2024:10:30:00 +0000] "GET /api/users HTTP/1.1" 504 0

# syslog (Traditional format):
Jan 15 10:30:00 server01 app[1234]: Database connection failed
```

**Per-Line Format Detection:**

* Each log line analyzed independently
* Appropriate parser applied automatically
* Consistent field extraction across formats
* Unified analysis in Attributes panel

#### Format Transition Handling

**Real-World Scenario:** Application log format changes over time

```bash
# Old format (plain text):
2024-01-15 10:29:59 ERROR: Database connection failed

# Transition period (mixed):
2024-01-15 10:30:00 ERROR: Starting JSON logging
{"timestamp":"2024-01-15T10:30:01Z","level":"info","message":"JSON logging active"}

# New format (JSON):
{"timestamp":"2024-01-15T10:30:02Z","level":"error","message":"Database timeout","duration":30000}
```

**Gonzo Handles This Seamlessly:**

* Detects format change automatically
* Continues processing without interruption
* Maintains consistent analysis across formats
* No configuration changes required

### Format Detection Optimization

#### Performance Considerations

**Optimizing Format Detection:**

```bash
# For high-volume logs, format detection adds minimal overhead:
# JSON: ~0.1ms per line (fastest)
# Logfmt: ~0.2ms per line  
# Plain text: ~0.5ms per line (most complex patterns)

# Monitor performance impact:
time gonzo -f large-log-file.log --test-mode

# Optimize for known formats:
# If you know your logs are JSON, ensure valid JSON syntax
# If using logfmt, follow consistent key=value patterns
# For plain text, use consistent timestamp/level patterns
```

#### Memory Usage Optimization

**Field Extraction Memory Management:**

```bash
# Gonzo automatically manages memory for extracted fields:
# - Limits number of unique field names tracked
# - Uses efficient string interning for repeated values
# - Automatically garbage collects old field data

# Monitor memory usage:
ps aux | grep gonzo

# Tune memory limits if needed:
gonzo -f logs.log --memory-size=20000  # Increase field storage
gonzo -f logs.log --memory-size=5000   # Reduce for constrained systems
```

### Format-Specific Features

#### JSON-Specific Enhancements

**Advanced JSON Processing:**

```bash
# Schema Evolution Detection
# Gonzo adapts when JSON schema changes:
# - New fields appear automatically in Attributes
# - Removed fields don't break processing
# - Type changes are handled gracefully

# Performance Optimization for JSON:
# - Streaming JSON parser for large objects
# - Lazy field extraction (only parse displayed fields)
# - Efficient handling of repeated field names
```

#### Logfmt-Specific Enhancements

**Logfmt Processing Optimizations:**

```bash
# Key Standardization
# Gonzo normalizes key variations:
# user_id, userId, UserID → unified for analysis

# Value Type Inference
# Automatic detection of:
# - ISO timestamps
# - Duration formats (1s, 500ms, 2h30m)
# - Size formats (1KB, 2MB, 1.5GB)
# - Boolean variations (true/false, yes/no, 1/0)
```

#### Plain Text Intelligence

**Advanced Pattern Learning:**

```bash
# Gonzo learns from your log patterns:
# - Adapts to your specific timestamp formats
# - Recognizes custom log level naming
# - Learns service name patterns
# - Extracts domain-specific identifiers

# Custom Pattern Support:
# While automatic, you can influence detection:
# - Use consistent formatting for better recognition
# - Include context clues (brackets, colons, equals signs)
# - Maintain consistent field ordering
```

### Troubleshooting Format Detection

#### Common Issues and Solutions

**Field Not Extracted:**

```bash
# Issue: Expected field doesn't appear in Attributes panel
# Solutions:

# 1. Check log format consistency
# Ensure: {"user_id": "123"}  # Consistent naming
# Avoid: {"userId": "123"} then {"user_id": "123"}  # Mixed naming

# 2. Verify JSON validity
echo '{"test": "value"}' | gonzo  # Test with simple JSON

# 3. Check logfmt syntax
echo 'level=info msg="test message"' | gonzo  # Verify parsing

# 4. Review plain text patterns
# Use consistent timestamp and level formats
```

**Performance Issues:**

```bash
# Issue: Slow processing with format detection
# Solutions:

# 1. Check for malformed logs
# Invalid JSON can slow parsing significantly
grep -v '^{' json.log  # Find non-JSON lines in JSON logs

# 2. Optimize log structure
# Avoid deeply nested objects in JSON
# Use consistent field ordering in logfmt

# 3. Monitor resource usage
htop  # Check CPU/memory during processing

# 4. Adjust buffer sizes
gonzo -f logs.log --log-buffer=2000  # Smaller buffer for memory
gonzo -f logs.log --update-interval=5s  # Slower updates
```

**Mixed Format Confusion:**

```bash
# Issue: Wrong format detection on some lines
# Solutions:

# 1. Separate by format when possible
# JSON logs: gonzo -f app-json.log
# Plain text: gonzo -f app-text.log

# 2. Use format-specific tools for preprocessing
# Clean data before Gonzo analysis
grep '^{' mixed.log | gonzo  # JSON lines only

# 3. Verify line endings and encoding
file log-file.log  # Check file encoding
dos2unix log-file.log  # Fix Windows line endings
```

### Best Practices

#### 🎯 **Optimize Log Format Design**

1. **Choose appropriate formats** - JSON for rich structure, logfmt for simplicity, plain text for human readability
2. **Maintain consistency** - Use the same field names, timestamp formats, and log levels across your application
3. **Include context** - Always include request IDs, user IDs, and correlation identifiers
4. **Structure your data** - Group related fields in objects (JSON) or use consistent prefixes (logfmt)

#### 📊 **Maximize Analysis Value**

1. **Use structured logging** - JSON and logfmt provide much richer analysis than plain text
2. **Include performance metrics** - Add duration, memory usage, and other quantitative data
3. **Add business context** - Include user IDs, feature flags, and business-relevant information
4. **Maintain type consistency** - Use numbers for numeric data, booleans for flags

#### ⚡ **Performance Optimization**

1. **Validate JSON syntax** - Invalid JSON significantly slows processing
2. **Use consistent patterns** - Helps Gonzo optimize parsing strategies
3. **Avoid extreme nesting** - Deep JSON objects can impact performance
4. **Monitor resource usage** - Adjust buffer sizes based on log volume and system capacity

#### 🔍 **Debugging Format Issues**

1. **Test with small samples** - Verify format detection with simple examples
2. **Check encoding and line endings** - Ensure logs are UTF-8 with Unix line endings
3. **Monitor field extraction** - Verify expected fields appear in Attributes panel
4. **Use format-specific tools** - Validate JSON with `jq`, test logfmt patterns manually

### Real-World Format Examples

Learn how to create custom formats for popular tools:

* [Live Tailing Grafana Loki Logs with Gonzo](https://www.controltheory.com/blog/live-tailing-grafana-loki-logs-with-gonzo/) - Loki format integration

### What's Next?

Now that you understand format detection, explore how it enhances other Gonzo features:

* **Log Analysis** - Structured data improves pattern detection and analytics
* **AI Integration** - Better format detection provides richer context for AI analysis
* **Configuration** - Tune format detection settings for your specific logs
* **Integration Examples** - See format detection in real-world scenarios

Or start optimizing your log formats immediately:

```bash
# Test your current log formats
gonzo -f your-logs.log

# Check what fields are extracted in the Attributes panel
# Optimize your logging code based on what Gonzo detects
# Use structured formats (JSON/logfmt) for richer analysis
```

***

**You now understand how Gonzo intelligently processes any log format!** 🎯 From automatic JSON parsing to intelligent plain text recognition, format detection ensures you get maximum analytical value from any log source without manual configuration.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.controltheory.com/backup/format-detection.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
