Autonomous Mode

Autonomous mode lets you give gptcgt a high-level goal — like “Build a REST API for user management” — and walk away. The system plans, implements, tests, and iterates without human intervention, pausing only when it needs your input or reaches a safety boundary.

How It Works

  1. Plan Generation — The orchestrator drafts a project plan with numbered subtasks, stored in .gptcgt/phase.md
  2. Subtask Execution Loop — Each subtask flows through the full DAG pipeline: intent analysis → context gathering → model routing → code generation → testing → arbiter verification
  3. Self-Healing — If a test fails, the TesterAgent regenerates the code (up to 3 attempts) using the failure output as feedback
  4. Phase Tracking — Progress updates .gptcgt/phase.md in real-time so the AI always knows where it is
  5. Completion Summary — When all subtasks finish, you get a summary of what was done, what failed, and what it cost

Safety Boundaries

Autonomous mode has 4 independent safety checks that prevent runaway execution:

1. Budget Guard

Before every subtask, the system checks total spend against your configured limits. If spent exceeds the budget, execution pauses immediately.

# In ~/.gptcgt/global.toml
max_spend_per_task = 2.0    # $2 max per individual task
daily_spend_limit = 10.0    # $10/day hard stop (BYOK mode)

2. Iteration Cap

A hard ceiling on how many subtasks the autonomous loop will execute before pausing for your review.

# In ~/.gptcgt/global.toml
max_autonomous_iterations = 50

3. Token Cap

Each individual task is capped at 500,000 tokens by default. This prevents accidental context window explosions when processing large files.

max_tokens_per_task = 500000

4. User Cancellation

Press Ctrl+C or Escape at any time. The system cancels the current subtask gracefully and preserves all work done so far.

Agent Communication

In autonomous mode, multiple agents collaborate via a PubSub message bus:

  • Orchestrator — Plans and coordinates the overall workflow
  • Coder Agent — Generates code changes as unified diffs
  • Tester Agent — Generates and runs tests in an isolated sandbox
  • Arbiter — Scores and validates each change before approval

Agents communicate through structured AgentMessage objects with type, sender, recipient, and payload. Messages are logged in the bottom-left log panel so you can see exactly what each agent is doing.

Agent Memory

The system maintains memory across sessions:

  • .gptcgt/phase.md — Project file map with line counts, modification dates, and development phases
  • .gptcgt/project.md — Auto-detected tech stack summary (language, framework, test runner, linter)
  • .gptcgt/memory.json — Telemetry entries recording which models were used, costs, and success rates
  • .gptcgt/agents/tester.md — The TesterAgent's memory file of failure patterns it has learned from

This memory prevents the agents from repeating the same mistakes and helps them understand the project's structure without needing to re-analyze every time.

Crash Recovery

If gptcgt crashes mid-autonomous-run, your work isn't lost:

  • A PID-locked running.lock file detects the crash on next startup
  • State is auto-saved atomically to .gptcgt/recovery/state.json
  • Unapplied diffs are backed up to .gptcgt/recovery/diffs/
  • On restart, gptcgt offers to resume from where it left off

Best Practices

  • Always commit before starting — Run git commit so you have a clean rollback point
  • Be specific with your goal — “Build a user registration system with email verification and password reset” works better than “build auth”
  • Start with smaller iteration caps while you learn the system's behavior
  • Review the phase.md after an autonomous run to understand what was done