v0.24.0 — Latest
Free & Open Source

Fine-tuneany LLMin one command_

Stop wrestling with training scripts. Soup gives you 11 training methods, 30 ready-made recipes, and 2-5x faster training with Unsloth — all from a single YAML config.

MIT LicensePython 3.9+No vendor lock-in
1Install
$ pip install soup-cli
2Configure
$ soup init
✓ Created soup.yaml
3Train
$ soup train
> Training started...

Integrates with your entire ML stack

HuggingFaceHuggingFace
OllamaOllama
vLLMvLLM
DeepSpeedDeepSpeed
UnslothUnsloth
ONNXONNX
NVIDIA TensorRTNVIDIA TensorRT
W&BW&B
SGLangSGLang
FlashAttentionFlashAttention
HuggingFaceHuggingFace
OllamaOllama
vLLMvLLM
DeepSpeedDeepSpeed
UnslothUnsloth
ONNXONNX
NVIDIA TensorRTNVIDIA TensorRT
W&BW&B
SGLangSGLang
FlashAttentionFlashAttention

Performance that pays for itself

Spend less on GPU hours. Train faster. Ship sooner.

2-5x
Unsloth Backend
faster training
60%
Liger Kernel
memory savings
2-4x
FlashAttention
attention speedup
N-GPU
DeepSpeed / FSDP
linear scaling

Everything you need. Nothing you don't.

One CLI replaces your entire fine-tuning stack. No scripts, no boilerplate, no 200-line configs.

11 Training Methods

SFT, DPO, GRPO, PPO, KTO, ORPO, SimPO, IPO, Pretrain, Embedding, and Reward Model — all from a single CLI.

Vision + Audio Multimodal

Fine-tune models with text, image, and audio data. Full multimodal support out of the box.

One YAML Config

Configure everything — model, dataset, training params — in a single, readable YAML file.

30 Ready-Made Recipes

Pre-built configs for Llama 3.2, Qwen 3, Gemma 3, Phi-4, DeepSeek R1, Mistral — search, preview, and use instantly.

Unsloth 2-5x Speedup

Built on the Unsloth backend with Liger Kernel, FlashAttention, curriculum learning, freeze training, and loss watchdog.

Export Anywhere

Export to GGUF, ONNX, TensorRT, AWQ, GPTQ — deploy to Ollama, serve with vLLM/SGLang, or migrate from competitors in one command.

60 seconds to your first model

Three commands. That's it.

No boilerplate. No setup guides. No “just follow these 47 steps”. Install, init, train. Done.

Step 1: Install
$ pip install soup-cli
Collecting soup-cli
Installing collected packages: soup-cli
Successfully installed soup-cli-0.24.0

You're spending too much time on infrastructure

Other tools make you fight the plumbing. Soup lets you focus on what actually matters — your model and your data.

The Old Way
With Soup
200+ lines of training script
1 YAML config file
Manual GPU & memory tuning
Auto-detected, zero config
No standard eval pipeline
Built-in benchmarking & eval
Complex export & conversion
soup export --format gguf
Rewrite config for each tool
soup migrate --from axolotl
Hours of boilerplate setup
soup recipes use llama3-sft

Every hour spent on training infrastructure is an hour not spent improving your model.

Reclaim Your Time
New in v0.24

Already using another tool? Switch in 30 seconds

One command converts your existing config. No rewriting, no guessing — just migrate and train.

LLaMA-Factory
LLaMA-Factory
Auto-converted to Soup
$ soup migrate --from llamafactory config.yaml
Axolotl
Axolotl
Auto-converted to Soup
$ soup migrate --from axolotl config.yml
Unsloth
Unsloth
Auto-converted to Soup
$ soup migrate --from unsloth notebook.ipynb

See the difference

LLaMA-Factory config
llama3_lora_sft.yaml
model_name_or_path: meta-llama/Llama-3.1-8B
stage: sft
finetuning_type: lora
lora_rank: 64
lora_alpha: 16
lora_dropout: 0.05
lora_target: all
dataset: alpaca_en
template: llama3
cutoff_len: 2048
per_device_train_batch_size: 4
gradient_accumulation_steps: 4
num_train_epochs: 3
learning_rate: 2.0e-5
lr_scheduler_type: cosine
warmup_ratio: 0.1
quantization_bit: 4
output_dir: ./saves/llama3-lora
Soup config (auto-generated)
soup.yaml
base: meta-llama/Llama-3.1-8B
task: sft

data:
  train: ./data/alpaca_en.jsonl
  max_length: 2048

training:
  epochs: 3
  lr: 2e-5
  quantization: 4bit
  lora:
    r: 64
    alpha: 16

output: ./saves/llama3-lora

Soup auto-detects everything else — optimizer, scheduler, target modules, batch size.

Full Pipeline, One Tool

From training to deployment — Soup covers the entire LLM workflow.

1

Train

Fine-tune with 11 methods

2

Chat

Test your model interactively

3

Eval

Benchmark performance

4

Export

GGUF, ONNX, TensorRT, AWQ, GPTQ

5

Serve

vLLM, SGLang, transformers

6

Push

Upload to HuggingFace Hub

From pip install to deployed model in under 5 minutes

Start Building Now
340k+
Compatible Models
on HuggingFace Hub
11
Training Methods
SFT to full RLHF
30
Recipes
ready-made configs

Built for the ML Stack you already use

First-class integrations with the tools powering production ML. Deploy anywhere, track everything.

Deploy & Serve

Ollama
Ollama
One-command local deploy
vLLM
vLLM
2-4x inference throughput
SGLang
SGLang
RadixAttention backend
llama.cpp
llama.cpp
GGUF export

Training & Infra

Unsloth
Unsloth
2-5x faster training
DeepSpeed
DeepSpeed
ZeRO 2/3 multi-GPU
FSDP2
FSDP2
PyTorch native sharding
FlashAttention
FlashAttention
v2/v3 auto-detect

Ecosystem

HuggingFace
HuggingFace
Push models to Hub
OpenAI API
OpenAI API
Compatible server
W&B
W&B
Cloud experiment tracking
TensorBoard
TensorBoard
Local metrics viz

Export Formats

GGUF
GGUF
Ollama & llama.cpp
ONNX
ONNX
ONNX Runtime deploy
TensorRT
TensorRT
High-throughput GPU
AWQ/GPTQ
AWQ/GPTQ
Quantized deployment

Works with your favorite models

Fine-tune any Hugging Face-compatible model. Soup supports all major architectures out of the box.

Llama 3.1
Llama 3.2
Qwen 2.5
Qwen 3
Gemma 3
Phi-4
DeepSeek R1
DeepSeek V3
Mistral
Mixtral MoE
Qwen2-VL
Qwen2-Audio
StarCoder
TinyLlama
Yi

...and any model on Hugging Face Hub

Free forever. MIT Licensed.

Your competitors are already fine-tuning.

Every day without custom models is a day your product falls behind. Soup gets you from zero to fine-tuned in under 5 minutes.

11
Training methods
30
Ready recipes
<60s
Setup time
No credit cardNo sign-upNo vendor lock-inWorks offline