Custom & Local Models
gptcgt is not locked to commercial providers. You can add your own local Ollama endpoints, vLLM instances, or any OpenAI-compatible API — including models from OpenRouter's marketplace of 200+ models.
OpenRouter Integration
The easiest way to access a wide variety of models is through OpenRouter. With a single API key, you get access to hundreds of models from every major provider.
- Get an API key from openrouter.ai
- Add it in the onboarding wizard or settings
- gptcgt automatically fetches the latest available models from OpenRouter's API
- Enable specific models in your settings under
openrouter_active_models
Custom Model Definitions
Add custom models to your global config:
# ~/.gptcgt/global.toml
[[custom_models]]
id = "custom/llama-3-70b"
name = "Local LLaMA 3 70B"
provider = "custom"
context_window = 8192
input_price_per_1k = 0.0 # Free if running locally
output_price_per_1k = 0.0
quality_tier = "standard"
supported_features = ["tools", "streaming"]Connecting to Ollama
If you're running Ollama locally:
- Start Ollama:
ollama serve - Pull a model:
ollama pull llama3 - Add the custom model config above with
provider = "custom" - Set the base URL environment variable:
export CUSTOM_API_BASE=http://localhost:11434
The CustomAgent class routes requests through the same LiteLLM client, so all features (streaming, tool calls, token counting) work seamlessly.
vLLM & Other OpenAI-Compatible Servers
Any server that implements the OpenAI Chat Completions API format works:
# Example for a vLLM server
export CUSTOM_API_BASE=http://your-gpu-server:8000/v1Set the model ID in your custom definition to match the model name your server expects.
Model Registry
gptcgt maintains a ModelRegistry that catalogs all available models. On startup, it:
- Loads bundled model definitions from
src/data/models.json - Loads your custom models from
~/.gptcgt/global.toml - Fetches live pricing from LiteLLM's pricing database (1.5s timeout)
- If you have an OpenRouter key, fetches available models from their API
Models are searchable by ID, provider, quality tier, and capability. The router considers all registered models when selecting the best one for your task.
Quality Tiers
Every model is assigned a quality tier that determines when it gets selected:
- budget — Fast and cheap (GPT-3.5, Gemini Flash)
- standard — Good balance of quality and cost (GPT-4o mini, Claude 3.5 Haiku)
- max — Best available quality (Claude 3.5 Sonnet, GPT-4o, Gemini 2.5 Pro)
Custom models default to standard tier unless you specify otherwise.