Overview

Kimchi Coding is an autonomous AI coding agent. You describe a task in plain English — "add error handling to the API routes", "refactor this module to async/await", "write tests for the auth service" — and the agent executes it: reading files, writing code, running commands, and checking its own output.

This is different from AI code completion (Copilot-style inline suggestions). Kimchi Coding handles multi-step tasks with minimal hand-holding — you hand over a job and the agent does it.

How it works

A coding harness is the runtime that turns an LLM into an autonomous agent. Kimchi Coding's harness manages the full loop: it reads your codebase, plans the work, writes code, runs commands, and verifies the results — all in a terminal session.

%%{init: {'theme': 'dark'}}%%
flowchart LR
    A["You describe a task"] --> B["Kimchi Coding harness"]
    B --> C["Reads, writes, runs, verifies"]
    C --> D["Completed work"]
    style B fill:#2d2d2d,stroke:#888,color:#e0e0e0

Under the hood, the harness runs an orchestrator that classifies your task, breaks it into phases, and assigns each subtask to the right model. A planning step gets a reasoning-capable model; a bulk code-generation step gets a fast, high-throughput executor. You don't pick the model — the harness does.

Why Kimchi Coding?

	Claude Code / Cursor	Other API providers	Kimchi Coding
Rate limits	Yes — sessions cut off mid-task	Varies	No rate limits
Cost	$100–200/month subscriptions	Pay-per-token, no routing intelligence	Pay-per-token, smart routing cuts costs
Model lock-in	Anthropic-only	Single model per request	Automatically picks the right model per task
Data residency	Anthropic's / provider's infra	Provider's infra	Your cluster (self-hosted) or Kimchi's GPUs
Claude models	Direct from Anthropic	Not available	Via Kimchi proxy — same models, no lock-in

Key capabilities

Multi-model orchestration

By default, kimchi runs in multi-model mode. The orchestrator classifies each task and delegates subtasks to specialised models across five roles:

Role	Responsibility
Orchestrator	Runs the main loop, classifies tasks, delegates work
Planner	Designs the approach, writes specs
Builder	Code implementation — picks heavier models for complex tasks
Reviewer	Code review — picks the strongest model by tier
Explorer	Codebase exploration and research — light models for scans, heavy for analysis

You describe the task. The orchestrator picks the right model for each subtask based on complexity and model capabilities. Configure role assignments with the /multi-model command or in ~/.config/kimchi/harness/settings.json.

When multi-model mode is active, subagents can only use models that are configured in the role pool. If a subagent is spawned with an explicit model that is not part of the configured roles, the request is rejected and the allowed models are listed. Omitting the model parameter falls back to the current session model.

Built-in models have tier and description baked in. External models (Anthropic, OpenAI, or any non-builtin provider) default to standard tier, vision: false, and an auto-generated description. To give the orchestrator better routing information, add a modelMetadata section to settings.json:

{
  "modelRoles": {
    "builder": ["kimchi-dev/minimax-m2.7", "anthropic/claude-sonnet-4-5"]
  },
  "modelMetadata": {
    "anthropic/claude-sonnet-4-5": {
      "tier": "heavy",
      "description": "Strong general-purpose model — use for complex builds and thorough reviews."
    }
  }
}

Field	Default	Description
`tier`	`standard`	`light`, `standard`, or `heavy`. Used for complexity-based routing.
`description`	Auto-generated	Shown to the orchestrator so it can match model strengths to task requirements.
`vision`	`false`	Whether the model supports image input.

Metadata can also be managed interactively via /multi-model → "Edit model metadata" — this is the only in-app path for configuring or overriding metadata, so model selection stays uninterrupted. Custom overrides can be reset to defaults from the same menu. Metadata for builtin models can be overridden the same way.

Phase tracking

Every LLM request is tagged with a work phase for analytics and cost attribution:

Phase	Description
explore	Reading files, tracing imports, understanding code structure
plan	Designing, breaking down tasks, writing specs
build	Writing, modifying, or refactoring code
review	Verifying correctness, analysing output
research	External sources: web docs, library APIs, version changelogs

Phases appear in the footer (e.g. ↳ build) and are included in every request tag for cost reporting.

Ferment — autonomous project mode

For multi-step projects that span sessions, Ferment provides structured planning, execution tracking, and persistent context. Describe a goal — "Build Tetris", "Add Google OAuth login" — and the harness breaks it into phases, executes each one using specialised subagent workers, grades the results, and picks up exactly where it left off if a session ends.

Start a Ferment session at any time with the /ferment command. See the Ferment overview for the full mental model.

Session persistence

Sessions (including agent runs) are saved to disk and fully recoverable. Prompt history is loaded into up/down arrow navigation so you can reuse past prompts without retyping.

Kimchi Coding vs. Kimchi Inference

Kimchi offers two products:

Kimchi Coding (this section) — an autonomous AI coding agent with multi-model orchestration, phase tracking, and cost attribution. You describe a task and the agent handles it.
Kimchi Inference — a serverless API for sending LLM requests from any OpenAI-compatible client. You control the model, prompt, and integration.

Kimchi Coding uses Kimchi Inference under the hood for model access.

Next steps

Quickstart

Install Kimchi, get an API key, and run your first coding session.

Ferment

Autonomous project mode — structured planning, execution, and grading across sessions.

CLI reference

Commands, flags, configuration, and resource management.