Deploy a custom model with Kimchi
This tutorial walks you through taking a model stored in Google Cloud Storage and deploying it with Kimchi, so you can call it through an OpenAI-compatible API.
Introduction
- Base model — a general-purpose model like LLaMA 3, Mistral, etc.
- Fine-tuned model — that base model, further trained on your data (for your domain, tone of voice, tasks, etc.).
- Custom model in Kimchi — any model whose weights live in your storage and are deployed through Kimchi into your Kubernetes cluster.
Prerequisites
- Kimchi set up and connected to a cluster
- A fine-tuned model saved in a GCS bucket
Typical layout:
gs://my-ml-bucket/models/llama3-support-bot-v1/
├── model.safetensors
├── tokenizer.json
├── config.json
└── metadata.jsonIf you're unsure about formats, start with Hugging Face-style folders and Safetensors.
Step 1: Prepare your model artifacts in GCS
gs://my-ml-bucket/models/llama3-custom-model/Step 2: Connect your repository in Kimchi
- Log in to the Cast AI console
- Navigate to Kimchi → Hosted Models → Repositories
- Click Connect
- Paste your bucket path
- Run the provided script
Step 3: Register your custom model
- Select a model from the detected list
- Pick the base model
- Add a description
- If needed, enable LoRA adapter
Step 5: Call your custom model
kubectl run test-client --rm -i --tty --image=alpine/curl -- /bin/shcurl http://castai-ai-optimizer-proxy.castai-agent.svc.cluster.local:443/openai/v1/chat/completions \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'Authorization: Bearer $CASTAI_API_KEY' \
-v -X POST -d '{
"model": "Llama-3.2-3B-Instruct-abliterated",
"messages": [
{
"role": "user",
"content": "What kind of instance types to use in GCP for running an AI training model?"
}
]
}'Updated 25 days ago
