# AI Providers Guide (v1)

Master each AI provider option for Gonzo. From cloud-based OpenAI to privacy-focused local models, this comprehensive guide helps you choose and configure the perfect AI solution for your needs.

{% hint style="info" %}
**Already completed basic setup?** This guide provides advanced configuration, optimization tips, and provider-specific best practices for production use.
{% endhint %}

### Provider Comparison Matrix

| Provider         | Setup Complexity | Privacy Level    | Cost Model    | Performance | Best Use Case                               |
| ---------------- | ---------------- | ---------------- | ------------- | ----------- | ------------------------------------------- |
| **OpenAI**       | Low              | Cloud            | Pay-per-use   | Excellent   | Production incidents, complex analysis      |
| **Ollama**       | Medium           | Complete         | Hardware only | Good        | Privacy-sensitive, unlimited usage          |
| **LM Studio**    | Low-Medium       | Complete         | Hardware only | Good        | Development, testing, experimentation       |
| **Azure OpenAI** | Medium           | Enterprise Cloud | Pay-per-use   | Excellent   | Enterprise compliance, hybrid cloud         |
| **Custom APIs**  | High             | Configurable     | Varies        | Varies      | Specialized models, existing infrastructure |

### OpenAI Provider Deep Dive

#### Model Selection Strategy

**Production Deployment:**

```bash
# Tier 1: Critical incidents (use best model)
export GONZO_PROD_MODEL="gpt-4"
alias gonzo-incident='gonzo --ai-model="$GONZO_PROD_MODEL"'

# Tier 2: Regular monitoring (balanced cost/performance)
export GONZO_MONITOR_MODEL="gpt-3.5-turbo"
alias gonzo-monitor='gonzo --ai-model="$GONZO_MONITOR_MODEL"'

# Tier 3: Development/testing (cost-optimized)
export GONZO_DEV_MODEL="gpt-3.5-turbo"
alias gonzo-dev='gonzo --ai-model="$GONZO_DEV_MODEL"'
```

#### Advanced OpenAI Configuration

**Cost Optimization Settings:**

```bash
# ~/.config/gonzo/openai-optimized.yml
ai-provider: "openai"
ai-model: "gpt-3.5-turbo"              # Default to cheaper model

# Context management for cost control
ai-context-size: 2000                  # Reduce token usage
ai-smart-context: true                 # Intelligent context trimming
ai-batch-requests: true                # Batch multiple questions

# Usage limits
ai-daily-token-limit: 100000          # Daily token budget
ai-warn-at-usage: 80000               # Warning threshold
ai-auto-downgrade: true                # Auto-switch to cheaper model when near limit
```

**Enterprise OpenAI Setup:**

```bash
# Enterprise organization configuration
export OPENAI_ORG_ID="org-your-organization-id"
export OPENAI_API_KEY="sk-your-enterprise-key"

# Advanced request configuration
export OPENAI_REQUEST_TIMEOUT=60       # Longer timeout for complex analysis
export OPENAI_MAX_RETRIES=5           # More retries for reliability
export OPENAI_BACKOFF_FACTOR=2        # Exponential backoff
```

#### OpenAI Model Characteristics

{% tabs %}
{% tab title="GPT-4 Family" %}
**GPT-4 (Recommended for Production)**

* **Cost:** $0.03/1K input tokens, $0.06/1K output
* **Context:** 8K tokens
* **Strengths:** Best reasoning, complex log analysis, accurate root cause identification
* **Best for:** Critical incidents, complex debugging, production monitoring

**GPT-4 Turbo**

* **Cost:** $0.01/1K input tokens, $0.03/1K output
* **Context:** 128K tokens
* **Strengths:** Large context, cost-effective, latest training data
* **Best for:** Large log files, comprehensive analysis, cost-sensitive production

```bash
# Production incident response
gonzo -f incident-logs.log --ai-model="gpt-4"

# Large log file analysis
gonzo -f huge-logfile.log --ai-model="gpt-4-turbo"
```

{% endtab %}

{% tab title="GPT-3.5 Family" %}
**GPT-3.5 Turbo (Best Balance)**

* **Cost:** $0.0015/1K input tokens, $0.002/1K output
* **Context:** 4K tokens
* **Strengths:** Fast, cost-effective, good quality
* **Best for:** Development, regular monitoring, routine analysis

**GPT-3.5 Turbo 16K**

* **Cost:** $0.003/1K input tokens, $0.004/1K output
* **Context:** 16K tokens
* **Strengths:** Larger context than standard, still affordable
* **Best for:** Medium-large log analysis, development with context needs

```bash
# Daily monitoring
gonzo -f /var/log/app.log --follow --ai-model="gpt-3.5-turbo"

# Development debugging
gonzo -f debug.log --ai-model="gpt-3.5-turbo-16k"
```

{% endtab %}

{% tab title="Usage Recommendations" %}
**Cost-Performance Matrix:**

| Use Case              | Recommended Model | Monthly Est. Cost\* |
| --------------------- | ----------------- | ------------------- |
| Development (daily)   | gpt-3.5-turbo     | $5-15               |
| Production monitoring | gpt-3.5-turbo     | $20-50              |
| Incident response     | gpt-4             | $10-30              |
| Large-scale analysis  | gpt-4-turbo       | $15-40              |

\*Based on typical log analysis usage patterns

**Model Switching Strategy:**

```bash
# Start with cheaper model, escalate as needed
gonzo -f logs.log --ai-model="gpt-3.5-turbo"
# If analysis needs more capability, press 'm' and switch to gpt-4
```

{% endtab %}
{% endtabs %}

#### OpenAI Rate Limiting and Quotas

**Understanding Rate Limits:**

```bash
# Check your current limits
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
     https://api.openai.com/v1/dashboard/billing/subscription

# Monitor usage in real-time
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
     https://api.openai.com/v1/dashboard/billing/usage
```

**Handling Rate Limits:**

```bash
# Configure automatic retry with backoff
export OPENAI_MAX_RETRIES=5
export OPENAI_RETRY_DELAY=1

# Use batch processing for high-volume analysis
# Filter logs first to reduce API calls
tail -f /var/log/app.log | grep ERROR | gonzo --ai-model="gpt-3.5-turbo"
```

### Ollama Provider Deep Dive

#### Model Selection for Log Analysis

**Recommended Models by Use Case:**

| Model          | Size  | RAM Required | Quality     | Best For                                    |
| -------------- | ----- | ------------ | ----------- | ------------------------------------------- |
| **llama3:8b**  | 4.7GB | 8GB+         | Excellent   | General log analysis, production ready      |
| **llama3:70b** | 40GB  | 64GB+        | Outstanding | Complex analysis, enterprise use            |
| **mistral**    | 4.1GB | 8GB+         | Good        | Fast analysis, resource-constrained systems |
| **codellama**  | 3.8GB | 8GB+         | Good        | Technical logs, code-related issues         |
| **mixtral**    | 26GB  | 32GB+        | Excellent   | Complex reasoning, multi-language logs      |

#### Advanced Ollama Configuration

**Performance Optimization:**

```bash
# ~/.ollama/config.json
{
  "gpu_layers": 35,                    # Use GPU acceleration if available
  "context_length": 4096,              # Context size for analysis
  "batch_size": 512,                   # Batch processing size
  "threads": 8,                        # CPU threads for processing
  "numa": true                         # NUMA optimization
}
```

**Memory Management:**

```bash
# Conservative memory usage
export OLLAMA_MAX_LOADED_MODELS=1     # Only keep one model in memory
export OLLAMA_FLASH_ATTENTION=1       # Memory-efficient attention

# High-performance setup
export OLLAMA_MAX_LOADED_MODELS=3     # Keep multiple models loaded
export OLLAMA_GPU_LAYERS=35           # Use GPU acceleration
```

#### Multi-Model Ollama Setup

**Strategy: Different Models for Different Tasks**

```bash
# Download specialized models
ollama pull llama3:8b                 # General analysis
ollama pull codellama                 # Technical/code logs  
ollama pull mistral                   # Fast analysis
ollama pull mixtral                   # Complex reasoning

# Model selection automation
cat > ~/.config/gonzo/ollama-smart.yml << EOF
ai-provider: "ollama"
ai-auto-select: true

# Model preferences by log type
model-preferences:
  error-analysis: "llama3:8b"
  performance-analysis: "mixtral"
  security-analysis: "llama3:8b"
  code-analysis: "codellama"
  quick-analysis: "mistral"
EOF
```

**Smart Model Switching:**

```bash
#!/bin/bash
# ollama-smart-switch.sh

select_model_for_analysis() {
    local log_type="$1"
    
    case "$log_type" in
        "error"|"exception"|"failure")
            echo "llama3:8b"
            ;;
        "performance"|"slow"|"timeout")
            echo "mixtral"
            ;;
        "security"|"auth"|"access")
            echo "llama3:8b"
            ;;
        "code"|"compile"|"syntax")
            echo "codellama"
            ;;
        *)
            echo "mistral"  # Default fast model
            ;;
    esac
}

# Usage: gonzo -f logs.log --ai-model="$(select_model_for_analysis error)"
```

#### Ollama Performance Tuning

**System Optimization:**

```bash
# CPU optimization
export OMP_NUM_THREADS=8              # Match your CPU cores
export OLLAMA_NUM_PARALLEL=2          # Parallel requests

# Memory optimization
export OLLAMA_MAX_LOADED_MODELS=1     # Conservative memory usage
sudo sysctl vm.swappiness=10          # Reduce swapping

# GPU optimization (if available)
export OLLAMA_GPU_LAYERS=35           # Use GPU acceleration
export CUDA_VISIBLE_DEVICES=0         # Select specific GPU
```

**Monitoring Ollama Performance:**

```bash
# Monitor resource usage
watch -n 1 'ps aux | grep ollama | head -5'

# Check model loading time
time ollama run llama3:8b "test"

# Monitor GPU usage (if applicable)
nvidia-smi -l 1
```

### LM Studio Provider Deep Dive

#### Model Recommendations for LM Studio

**Balanced Models (8-16GB RAM):**

```bash
# In LM Studio, search and download:
- "microsoft/DialoGPT-medium"          # 1.5GB, fast responses
- "meta-llama/Llama-2-7b-chat-hf"     # 7GB, good quality
- "mistralai/Mistral-7B-Instruct-v0.1" # 7GB, excellent instruction following
```

**High-Performance Models (32GB+ RAM):**

```bash
# Premium models for powerful systems:
- "meta-llama/Llama-2-13b-chat-hf"    # 13GB, excellent reasoning
- "meta-llama/Llama-2-70b-chat-hf"    # 70GB, GPT-4 level quality
- "microsoft/WizardCoder-15B-V1.0"    # 15GB, specialized for code/logs
```

#### LM Studio Configuration

**Server Settings:**

```json
{
  "server": {
    "port": 1234,
    "host": "localhost",
    "cors": true,
    "max_tokens": 2048,
    "temperature": 0.7,
    "top_p": 0.9,
    "context_length": 4096
  }
}
```

**Model-Specific Tuning:**

```bash
# For log analysis, optimize for factual responses
# In LM Studio server settings:
# - Temperature: 0.3 (more factual, less creative)
# - Top-p: 0.8 (focused responses)
# - Max tokens: 1024 (concise analysis)
```

#### LM Studio Best Practices

**Model Management:**

```bash
# Keep models organized
# Create folders in LM Studio:
# - "Log Analysis" folder for specialized models
# - "General" folder for multipurpose models
# - "Fast" folder for quick response models

# Monitor disk space (models are large)
du -sh ~/LMStudio/models/

# Regular cleanup of unused models
# Remove old/unused models to free space
```

**Performance Optimization:**

```bash
# System settings for LM Studio
# Allocate more RAM to LM Studio process
# Close other applications when running large models
# Use SSD storage for model files

# Network optimization
# Ensure localhost networking is optimal
ping localhost
curl -o /dev/null -s -w "%{time_total}" http://localhost:1234/v1/models
```

### Azure OpenAI Service

#### Enterprise Setup

**Azure Resource Configuration:**

```bash
# Azure CLI setup
az login
az account set --subscription "your-subscription-id"

# Create Azure OpenAI resource
az cognitiveservices account create \
  --name "gonzo-openai" \
  --resource-group "your-rg" \
  --kind "OpenAI" \
  --sku "S0" \
  --location "eastus"

# Get endpoint and keys
az cognitiveservices account show \
  --name "gonzo-openai" \
  --resource-group "your-rg" \
  --query "properties.endpoint"

az cognitiveservices account keys list \
  --name "gonzo-openai" \
  --resource-group "your-rg"
```

**Gonzo Configuration for Azure:**

```bash
# Azure OpenAI environment variables
export AZURE_OPENAI_KEY="your-azure-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
export AZURE_OPENAI_API_VERSION="2023-12-01-preview"

# Configure Gonzo for Azure
export OPENAI_API_KEY="$AZURE_OPENAI_KEY"
export OPENAI_API_BASE="$AZURE_OPENAI_ENDPOINT"
export OPENAI_API_TYPE="azure"
export OPENAI_API_VERSION="$AZURE_OPENAI_API_VERSION"
```

**Model Deployment in Azure:**

```bash
# Deploy models in Azure OpenAI Studio
# 1. Visit Azure OpenAI Studio
# 2. Go to "Deployments" section  
# 3. Create new deployment:
#    - Model: gpt-35-turbo or gpt-4
#    - Deployment name: gonzo-analysis
#    - Version: Latest

# Use deployed model
gonzo -f logs.log --ai-model="gonzo-analysis"
```

#### Azure-Specific Features

**Private Endpoints:**

```bash
# Configure private endpoint for enhanced security
az network private-endpoint create \
  --resource-group "your-rg" \
  --name "gonzo-openai-pe" \
  --vnet-name "your-vnet" \
  --subnet "your-subnet" \
  --private-connection-resource-id "/subscriptions/.../openai-resource" \
  --connection-name "gonzo-connection"
```

**Managed Identity:**

```bash
# Use managed identity for authentication
export AZURE_CLIENT_ID="your-managed-identity-id"
export AZURE_USE_MANAGED_IDENTITY="true"

# No API key needed with managed identity
unset OPENAI_API_KEY
```

### Custom API Providers

#### AWS Bedrock Integration

**Setup with Bedrock Proxy:**

```bash
# Use bedrock-proxy for OpenAI compatibility
# Install: https://github.com/yourcompany/bedrock-proxy

# Configuration
export AWS_REGION="us-east-1"
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"

# Proxy configuration
export OPENAI_API_KEY="bedrock-proxy-key"
export OPENAI_API_BASE="https://your-bedrock-proxy.amazonaws.com/v1"

# Available Bedrock models through proxy
gonzo -f logs.log --ai-model="claude-3-sonnet"
gonzo -f logs.log --ai-model="llama2-70b"
```

#### Google Cloud Vertex AI

**Vertex AI Proxy Setup:**

```bash
# Use vertex-ai-proxy for OpenAI compatibility
export GOOGLE_APPLICATION_CREDENTIALS="path/to/service-account.json"
export GOOGLE_CLOUD_PROJECT="your-project-id"

# Proxy configuration
export OPENAI_API_KEY="vertex-proxy-key"
export OPENAI_API_BASE="https://your-vertex-proxy.googleapis.com/v1"

# Available models
gonzo -f logs.log --ai-model="text-bison"
gonzo -f logs.log --ai-model="chat-bison"
```

#### Self-Hosted Models

**Hugging Face Transformers:**

```bash
# Self-hosted API using transformers
# Deploy with: https://github.com/huggingface/text-generation-inference

export OPENAI_API_KEY="hf-local-key"
export OPENAI_API_BASE="http://your-hf-server:8080/v1"

# Available self-hosted models
gonzo -f logs.log --ai-model="CodeLlama-34B-Instruct"
gonzo -f logs.log --ai-model="Mistral-7B-Instruct"
```

### Provider Selection Decision Tree

#### Choose Based on Your Needs

```bash
# Decision flowchart:

# Data Privacy Critical?
if [ "$PRIVACY_CRITICAL" = "yes" ]; then
    # Use local models
    if [ "$SYSTEM_RAM" -gt 16 ]; then
        echo "Use Ollama with llama3:8b or larger"
    else
        echo "Use LM Studio with smaller models"
    fi
    
# Cost Sensitivity High?
elif [ "$COST_SENSITIVE" = "yes" ]; then
    if [ "$USAGE_VOLUME" = "high" ]; then
        echo "Use Ollama for unlimited usage"
    else
        echo "Use OpenAI gpt-3.5-turbo"
    fi
    
# Enterprise Compliance Required?
elif [ "$ENTERPRISE" = "yes" ]; then
    echo "Use Azure OpenAI or custom enterprise API"
    
# Maximum Quality Needed?
else
    echo "Use OpenAI gpt-4"
fi
```

#### Multi-Provider Strategy

**Hybrid Approach:**

```bash
# ~/.config/gonzo/multi-provider.yml
providers:
  development:
    provider: "ollama"
    model: "mistral"
    endpoint: "http://localhost:11434"
    
  monitoring:
    provider: "openai"
    model: "gpt-3.5-turbo"
    cost_limit: 50  # Monthly USD limit
    
  incidents:
    provider: "openai"  
    model: "gpt-4"
    priority: "high"
    
  enterprise:
    provider: "azure"
    model: "gpt-4"
    endpoint: "https://company.openai.azure.com/"
```

**Provider Fallback Chain:**

```bash
#!/bin/bash
# provider-fallback.sh

try_providers() {
    local log_file="$1"
    
    # Try primary provider (OpenAI)
    if gonzo -f "$log_file" --ai-model="gpt-4" --timeout=30; then
        return 0
    fi
    
    # Fallback to local Ollama
    echo "OpenAI failed, trying Ollama..."
    if gonzo -f "$log_file" --ai-model="llama3" --timeout=60; then
        return 0
    fi
    
    # Final fallback to LM Studio
    echo "Ollama failed, trying LM Studio..."
    OPENAI_API_BASE="http://localhost:1234/v1" \
    gonzo -f "$log_file" --timeout=60
}
```

### Performance Comparison

#### Benchmark Results

**Analysis Speed (avg response time):**

| Provider  | Model         | Simple Query | Complex Analysis | Large Context |
| --------- | ------------- | ------------ | ---------------- | ------------- |
| OpenAI    | gpt-4         | 2-3s         | 8-12s            | 15-25s        |
| OpenAI    | gpt-3.5-turbo | 1-2s         | 3-5s             | 8-12s         |
| Ollama    | llama3:8b     | 5-8s         | 15-25s           | 30-45s        |
| Ollama    | mistral       | 3-5s         | 10-15s           | 20-30s        |
| LM Studio | Various       | 4-10s        | 12-30s           | 25-60s        |

**Quality Assessment (log analysis accuracy):**

| Provider | Model         | Technical Accuracy | Context Understanding | Actionable Insights |
| -------- | ------------- | ------------------ | --------------------- | ------------------- |
| OpenAI   | gpt-4         | 95%                | 90%                   | 85%                 |
| OpenAI   | gpt-3.5-turbo | 85%                | 80%                   | 75%                 |
| Ollama   | llama3:8b     | 80%                | 75%                   | 70%                 |
| Ollama   | mistral       | 75%                | 70%                   | 65%                 |

#### Resource Usage

**Memory Requirements:**

```bash
# OpenAI: Minimal local memory (API-based)
# Memory usage: ~50MB (Gonzo only)

# Ollama: Model-dependent memory
# llama3:8b: ~8GB RAM
# mistral: ~6GB RAM  
# mixtral: ~24GB RAM

# LM Studio: Model-dependent memory
# Similar to Ollama but with GUI overhead
# Add ~500MB for LM Studio application

# Monitor actual usage:
ps aux | grep -E "(gonzo|ollama|lmstudio)" | awk '{print $4, $11}'
```

### Best Practices by Provider

#### OpenAI Best Practices

✅ **Do:**

* Use gpt-3.5-turbo for development and routine analysis
* Reserve gpt-4 for complex incidents and production issues
* Monitor API usage and costs regularly
* Implement token budgets and alerts
* Use specific, targeted questions for better responses

❌ **Don't:**

* Send sensitive data without understanding OpenAI's data policies
* Use gpt-4 for simple queries that gpt-3.5-turbo can handle
* Ignore rate limits and quotas
* Include unnecessary context that increases token usage

#### Ollama Best Practices

✅ **Do:**

* Keep Ollama service running as a daemon
* Use appropriate model sizes for your hardware
* Monitor system resources during model loading
* Download models during off-peak hours
* Use GPU acceleration when available

❌ **Don't:**

* Load multiple large models simultaneously without sufficient RAM
* Ignore model update notifications
* Run Ollama on systems with insufficient memory
* Use CPU-only inference for large models

#### LM Studio Best Practices

✅ **Do:**

* Organize models in logical folders
* Test models before production use
* Monitor disk space for model storage
* Use appropriate model settings for log analysis
* Keep LM Studio updated

❌ **Don't:**

* Download models without checking system requirements
* Run multiple models simultaneously without adequate resources
* Ignore model performance metrics
* Use default settings without optimization

### What's Next?

Now that you understand all AI provider options, learn how to use them effectively:

* **Using AI Features** - Master AI-powered workflows and practical usage patterns
* **Log Analysis** - Combine AI insights with algorithmic analysis
* **Configuration** - Set up provider-specific configurations

Or start using your chosen provider immediately:

```bash
# Quick provider test
gonzo -f your-logs.log --ai-model="your-chosen-model"

# Try AI features:
# - Press 'i' for log analysis
# - Press 'c' for interactive chat  
# - Press 'm' to compare different models
```

***

**You now have complete mastery over AI provider selection and configuration!** 🚀 Whether you choose cloud-based APIs for maximum quality or local models for privacy and cost control, you can optimize your AI setup for any scenario.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.controltheory.com/backup/ai-providers-guide-v1.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
