Custom & Local Models

gptcgt is not locked to commercial providers. You can add your own local Ollama endpoints, vLLM instances, or any OpenAI-compatible API — including models from OpenRouter's marketplace of 200+ models.

OpenRouter Integration

The easiest way to access a wide variety of models is through OpenRouter. With a single API key, you get access to hundreds of models from every major provider.

  1. Get an API key from openrouter.ai
  2. Add it in the onboarding wizard or settings
  3. gptcgt automatically fetches the latest available models from OpenRouter's API
  4. Enable specific models in your settings under openrouter_active_models

Custom Model Definitions

Add custom models to your global config:

# ~/.gptcgt/global.toml

[[custom_models]]
id = "custom/llama-3-70b"
name = "Local LLaMA 3 70B"
provider = "custom"
context_window = 8192
input_price_per_1k = 0.0    # Free if running locally
output_price_per_1k = 0.0
quality_tier = "standard"
supported_features = ["tools", "streaming"]

Connecting to Ollama

If you're running Ollama locally:

  1. Start Ollama: ollama serve
  2. Pull a model: ollama pull llama3
  3. Add the custom model config above with provider = "custom"
  4. Set the base URL environment variable:
    export CUSTOM_API_BASE=http://localhost:11434

The CustomAgent class routes requests through the same LiteLLM client, so all features (streaming, tool calls, token counting) work seamlessly.

vLLM & Other OpenAI-Compatible Servers

Any server that implements the OpenAI Chat Completions API format works:

# Example for a vLLM server
export CUSTOM_API_BASE=http://your-gpu-server:8000/v1

Set the model ID in your custom definition to match the model name your server expects.

Model Registry

gptcgt maintains a ModelRegistry that catalogs all available models. On startup, it:

  1. Loads bundled model definitions from src/data/models.json
  2. Loads your custom models from ~/.gptcgt/global.toml
  3. Fetches live pricing from LiteLLM's pricing database (1.5s timeout)
  4. If you have an OpenRouter key, fetches available models from their API

Models are searchable by ID, provider, quality tier, and capability. The router considers all registered models when selecting the best one for your task.

Quality Tiers

Every model is assigned a quality tier that determines when it gets selected:

  • budget — Fast and cheap (GPT-3.5, Gemini Flash)
  • standard — Good balance of quality and cost (GPT-4o mini, Claude 3.5 Haiku)
  • max — Best available quality (Claude 3.5 Sonnet, GPT-4o, Gemini 2.5 Pro)

Custom models default to standard tier unless you specify otherwise.