Overview

Kimchi gives you instant access to production-ready open-source LLMs — Kimi K2.6, Kimi K2.5, MiniMax M2.7, and Nemotron — via an OpenAI-compatible API. No GPUs to provision, no clusters to manage.

Quickstart

IDE setup recipes

The CLI handles setup automatically, but if you prefer manual configuration or need tool-specific tweaks, follow these step-by-step guides:

Available models

Model IDBest forContextOutput
kimi-k2.6Agentic coding, image analysis (latest)260K32K
kimi-k2.5Agentic coding, image analysis260K32K
minimax-m2.7Code execution, debugging200K32K
nemotron-3-super-fp4Fast inference, cost-efficient tasks128K32K

A common pattern is to pair Kimi K2.6 for planning with MiniMax M2.7 for code execution.

Optional — GSD multi-agent setup

Get Shit Done (GSD) orchestrates multiple AI agents — planner, researcher, executor, and verifier — each using the model best suited to the task.

If you skipped GSD during CLI setup, install it manually:

For OpenCode (GSD 1.x):

npx gsd-opencode

For GSD 2.0 (standalone TUI):

gsd config

See the OpenCode recipe for full GSD configuration examples with model assignments per agent role.

How it works

Your IDE / CLI ──► Kimchi config ──► https://llm.kimchi.dev/openai/v1 ──► Open-Source Model
                                              │
                                     OpenAI-compatible API
  • Serverless Model APIs: Kimchi runs models on its own GPU infrastructure. You pay per token.
  • Self-hosted deployments: Run the same models on your own Kubernetes cluster when per-token costs exceed compute costs or compliance requires it. Learn more

Pricing

Pay-per-token. Separate rates for input and output tokens. The free tier has generous limits and no credit card requirement.

When you need more throughput, upgrade to a paid plan. See rate limits for details.

Next steps