ELO Arena

gptcgt includes a built-in competitive ranking system for AI models. Every time models compete in Ensemble or Battle mode, their ELO ratings are updated — just like chess rankings. Over time, the system learns which models perform best and routes tasks to them more often.

How ELO Works

The ELO rating system was originally designed for chess. In gptcgt, it works the same way:

Every model starts at 1200 ELO
When a model wins a head-to-head comparison (Ensemble or Battle), its rating goes up
The losing model's rating goes down
If a weak model beats a strong model, the point swing is larger (upset bonus)
If a strong model beats a weak model, the swing is smaller (expected outcome)

When Matches Happen

Ensemble Mode — 3 models compete. The Arbiter picks the winner. The 2 losers each take an ELO hit.
Battle Mode — 2 models compete. You manually select the winner.

Multi-Way Dampening

In Ensemble mode where 1 model beats 3, the winner's ELO gain is dampened to prevent hyper-inflation. The system divides the delta by a factor of the number of losers to keep ratings stable over time.

How Routing Uses ELO

When the router selects a model for your task, it uses ELO as a tiebreaker:

First, it filters models by your quality tier (Standard, Max, etc.)
Then, it filters by task complexity
Among the remaining candidates, higher ELO models are preferred
Cost is used as a secondary tiebreaker — if two models have similar ELO, the cheaper one wins

This means the more you use gptcgt, the smarter its model selection becomes — tailored to your specific projects and coding style.

Leaderboard

ELO ratings, match counts, win rates, and total spend per model are stored in a SQLite database at ~/.gptcgt/elo.db. The leaderboard is visible in the application's settings panel.

Data Tracked

Field	Description
ELO Rating	Current competitive rating
Matches Won	Total head-to-head wins
Matches Lost	Total head-to-head losses
Win Rate	Percentage of matches won
Total Spent	Cumulative $ spent on this model