Deploy a Custom Model with Kimchi

This tutorial walks you through taking a model stored in Google Cloud Storage and deploying it with Kimchi, so you can call it through an OpenAI-compatible API.

Introduction

Base model — a general-purpose model like LLaMA 3, Mistral, etc.
Fine-tuned model — that base model, further trained on your data (for your domain, tone of voice, tasks, etc.).
Custom model in Kimchi — any model whose weights live in your storage and are deployed through Kimchi into your Kubernetes cluster.

Prerequisites

Kimchi set up and connected to a cluster
A fine-tuned model saved in a GCS bucket

Typical layout:

gs://my-ml-bucket/models/llama3-support-bot-v1/
├── model.safetensors
├── tokenizer.json
├── config.json
└── metadata.json

📘
If you're unsure about formats, start with Hugging Face-style folders and Safetensors.

Step 1: Prepare your model artifacts in GCS

gs://my-ml-bucket/models/llama3-custom-model/

Step 2: Connect your repository in Kimchi

Log in to the Cast AI console
Navigate to Kimchi → Hosted Models → Repositories
Click Connect
Paste your bucket path
Run the provided script

Step 3: Register your custom model

Select a model from the detected list
Pick the base model
Add a description
If needed, enable LoRA adapter

Step 5: Call your custom model

kubectl run test-client --rm -i --tty --image=alpine/curl -- /bin/sh

curl http://castai-ai-optimizer-proxy.castai-agent.svc.cluster.local:443/openai/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json' \
  -H 'Authorization: Bearer $CASTAI_API_KEY' \
  -v -X POST -d '{
    "model": "Llama-3.2-3B-Instruct-abliterated",
    "messages": [
      {
        "role": "user",
        "content": "What kind of instance types to use in GCP for running an AI training model?"
      }
    ]
  }'