How do I fine-tune Llama with Soup CLI?

Install Soup CLI with 'pip install soup-cli', then run 'soup train --base meta-llama/Llama-3.2-1B --task sft --data your_data.json'. Soup supports all Llama variants with optional Unsloth 2-5x speedup.

What is DPO training and how does Soup CLI support it?

DPO (Direct Preference Optimization) aligns LLMs with human preferences without a reward model. Soup CLI supports DPO along with 10 other methods: SFT, GRPO, PPO, KTO, ORPO, SimPO, IPO, Pretrain, Embedding, and Reward Model training.

How do I install Soup CLI?

Install from PyPI: 'pip install soup-cli'. Requires Python 3.9+. Optional extras: 'pip install soup-cli[fast]' for Unsloth speedup, 'pip install soup-cli[serve]' for inference server.

What models does Soup CLI support?

Soup CLI supports Llama 3.2, Qwen 3, DeepSeek V3/R1, Mistral, Phi-4, Gemma 3, and any Hugging Face transformers-compatible model. 30 ready-made recipes included. It works with both full fine-tuning and QLoRA/LoRA methods.

Can I migrate from LLaMA-Factory or Axolotl to Soup CLI?

Yes. Run 'soup migrate --from llamafactory config.yaml' or 'soup migrate --from axolotl config.yml' to automatically convert your existing training config to Soup format. Also supports Unsloth notebook migration.

v0.24.0 — Latest

Free & Open Source

Fine-tuneany LLMin one command_

Stop wrestling with training scripts. Soup gives you 11 training methods, 30 ready-made recipes, and 2-5x faster training with Unsloth — all from a single YAML config.

Get Started — It's Free

Star on GitHub

MIT LicensePython 3.9+No vendor lock-in

soup-cli

1Install

$ pip install soup-cli

2Configure

$ soup init

✓ Created soup.yaml

3Train

$ soup train

> Training started...

Integrates with your entire ML stack

HuggingFace

Ollama

vLLM

DeepSpeed

Unsloth

ONNX

NVIDIA TensorRT

W&B

SGLang

FlashAttention

HuggingFace

Ollama

vLLM

DeepSpeed

Unsloth

ONNX

NVIDIA TensorRT

W&B

SGLang

FlashAttention

Performance that pays for itself

Spend less on GPU hours. Train faster. Ship sooner.

2-5x

Unsloth Backend

faster training

60%

Liger Kernel

memory savings

2-4x

FlashAttention

attention speedup

N-GPU

DeepSpeed / FSDP

linear scaling

Everything you need. Nothing you don't.

One CLI replaces your entire fine-tuning stack. No scripts, no boilerplate, no 200-line configs.

11 Training Methods

SFT, DPO, GRPO, PPO, KTO, ORPO, SimPO, IPO, Pretrain, Embedding, and Reward Model — all from a single CLI.

Vision + Audio Multimodal

Fine-tune models with text, image, and audio data. Full multimodal support out of the box.

One YAML Config

Configure everything — model, dataset, training params — in a single, readable YAML file.

30 Ready-Made Recipes

Pre-built configs for Llama 3.2, Qwen 3, Gemma 3, Phi-4, DeepSeek R1, Mistral — search, preview, and use instantly.

Unsloth 2-5x Speedup

Built on the Unsloth backend with Liger Kernel, FlashAttention, curriculum learning, freeze training, and loss watchdog.

Export Anywhere

Export to GGUF, ONNX, TensorRT, AWQ, GPTQ — deploy to Ollama, serve with vLLM/SGLang, or migrate from competitors in one command.

60 seconds to your first model

Three commands. That's it.

No boilerplate. No setup guides. No “just follow these 47 steps”. Install, init, train. Done.

Step 1: Install

$ pip install soup-cli

Collecting soup-cli

Installing collected packages: soup-cli

Successfully installed soup-cli-0.24.0

You're spending too much time on infrastructure

Other tools make you fight the plumbing. Soup lets you focus on what actually matters — your model and your data.

The Old Way

With Soup

You Save

200+ lines of training script

1 YAML config file

93% less code

Manual GPU & memory tuning

Auto-detected, zero config

Hours saved

No standard eval pipeline

Built-in benchmarking & eval

No extra tools

Complex export & conversion

soup export --format gguf

One command

Rewrite config for each tool

soup migrate --from axolotl

30 sec switch

Hours of boilerplate setup

soup recipes use llama3-sft

Instant start

Every hour spent on training infrastructure is an hour not spent improving your model.

Reclaim Your Time

New in v0.24

Already using another tool? Switch in 30 seconds

One command converts your existing config. No rewriting, no guessing — just migrate and train.

LLaMA-Factory

Auto-converted to Soup

$ soup migrate --from llamafactory config.yaml

Axolotl

Auto-converted to Soup

$ soup migrate --from axolotl config.yml

Unsloth

Auto-converted to Soup

$ soup migrate --from unsloth notebook.ipynb

See the difference

LLaMA-Factory config

llama3_lora_sft.yaml

model_name_or_path: meta-llama/Llama-3.1-8B
stage: sft
finetuning_type: lora
lora_rank: 64
lora_alpha: 16
lora_dropout: 0.05
lora_target: all
dataset: alpaca_en
template: llama3
cutoff_len: 2048
per_device_train_batch_size: 4
gradient_accumulation_steps: 4
num_train_epochs: 3
learning_rate: 2.0e-5
lr_scheduler_type: cosine
warmup_ratio: 0.1
quantization_bit: 4
output_dir: ./saves/llama3-lora

Soup config (auto-generated)

soup.yaml

base: meta-llama/Llama-3.1-8B
task: sft

data:
  train: ./data/alpaca_en.jsonl
  max_length: 2048

training:
  epochs: 3
  lr: 2e-5
  quantization: 4bit
  lora:
    r: 64
    alpha: 16

output: ./saves/llama3-lora

Soup auto-detects everything else — optimizer, scheduler, target modules, batch size.

Migration Guide

Full Pipeline, One Tool

From training to deployment — Soup covers the entire LLM workflow.

Train

Fine-tune with 11 methods

Chat

Test your model interactively

Eval

Benchmark performance

Export

GGUF, ONNX, TensorRT, AWQ, GPTQ

Serve

vLLM, SGLang, transformers

Push

Upload to HuggingFace Hub

From `pip install` to deployed model in under 5 minutes

Start Building Now

340k+

Compatible Models

on HuggingFace Hub

Training Methods

SFT to full RLHF

Recipes

ready-made configs

Built for the ML Stack you already use

First-class integrations with the tools powering production ML. Deploy anywhere, track everything.

Deploy & Serve

Ollama

One-command local deploy

vLLM

2-4x inference throughput

SGLang

RadixAttention backend

llama.cpp

GGUF export

Training & Infra

Unsloth

2-5x faster training

DeepSpeed

ZeRO 2/3 multi-GPU

FSDP2

PyTorch native sharding

FlashAttention

v2/v3 auto-detect

Ecosystem

HuggingFace

Push models to Hub

OpenAI API

Compatible server

W&B

Cloud experiment tracking

TensorBoard

Local metrics viz

Export Formats

GGUF

Ollama & llama.cpp

ONNX

ONNX Runtime deploy

TensorRT

High-throughput GPU

AWQ/GPTQ

Quantized deployment

Works with your favorite models

Fine-tune any Hugging Face-compatible model. Soup supports all major architectures out of the box.

Llama 3.1

Llama 3.2

Qwen 2.5

Qwen 3

Gemma 3

Phi-4

DeepSeek R1

DeepSeek V3

Mistral

Mixtral MoE

Qwen2-VL

Qwen2-Audio

StarCoder

TinyLlama

...and any model on Hugging Face Hub

Free forever. MIT Licensed.

Your competitors are
already fine-tuning.

Every day without custom models is a day your product falls behind. Soup gets you from zero to fine-tuned in under 5 minutes.

Start Fine-Tuning Now

Star on GitHub

Training methods

Ready recipes

<60s

Setup time

No credit cardNo sign-upNo vendor lock-inWorks offline

Fine-tuneany LLMin one command_

Performance that pays for itself

Everything you need. Nothing you don't.

11 Training Methods

Vision + Audio Multimodal

One YAML Config

30 Ready-Made Recipes

Unsloth 2-5x Speedup

Export Anywhere

Three commands. That's it.

You're spending too much time on infrastructure

Already using another tool? Switch in 30 seconds

See the difference

Full Pipeline, One Tool

Train

Chat

Eval

Export

Serve

Push

From pip install to deployed model in under 5 minutes

Built for the ML Stack you already use

Deploy & Serve

Training & Infra

Ecosystem

Export Formats

Works with your favorite models

Your competitors are already fine-tuning.

From `pip install` to deployed model in under 5 minutes

Your competitors are
already fine-tuning.