Question 1

How do I fine-tune Llama with Soup CLI?

Accepted Answer

Install Soup CLI with 'pip install soup-cli', then run 'soup train --base meta-llama/Llama-3.2-1B --task sft --data your_data.json'. Soup supports all Llama variants with optional Unsloth 2-5x speedup.

Question 2

What is DPO training and how does Soup CLI support it?

Accepted Answer

DPO (Direct Preference Optimization) aligns LLMs with human preferences without a reward model. Soup CLI supports DPO along with 10 other methods: SFT, GRPO, PPO, KTO, ORPO, SimPO, IPO, Pretrain, Embedding, and Reward Model training.

Question 3

How do I install Soup CLI?

Accepted Answer

Install from PyPI: 'pip install soup-cli'. Requires Python 3.9+. Optional extras: 'pip install soup-cli[fast]' for Unsloth speedup, 'pip install soup-cli[serve]' for inference server.

Question 4

What models does Soup CLI support?

Accepted Answer

Soup CLI supports Llama 3.2, Qwen 3, DeepSeek V3/R1, Mistral, Phi-4, Gemma 3, and any Hugging Face transformers-compatible model. 30 ready-made recipes included. It works with both full fine-tuning and QLoRA/LoRA methods.

Question 5

Can I migrate from LLaMA-Factory or Axolotl to Soup CLI?

Accepted Answer

Yes. Run 'soup migrate --from llamafactory config.yaml' or 'soup migrate --from axolotl config.yml' to automatically convert your existing training config to Soup format. Also supports Unsloth notebook migration.

Task	Data Format	Use Case
sft	alpaca/sharegpt/chatml/llava	Instruction tuning
dpo	prompt+chosen+rejected	Preference alignment
grpo	prompts + reward fns	Reasoning (DeepSeek-R1)
kto	prompt+completion+label	Unpaired preference
orpo	prompt+chosen+rejected	Reference-free alignment
simpo	prompt+chosen+rejected	Length-normalized preference
ipo	prompt+chosen+rejected	Regularized preference
ppo	prompts + reward model/fn	Full RLHF stage 3
pretrain	plaintext (raw text)	Continued pre-training
embedding	anchor+positive(+negative)	Sentence embeddings
reward_model	prompt+chosen+rejected	RLHF stage 2

Training Methods

Supervised Fine-Tuning (SFT)

Direct Preference Optimization (DPO)

Group Relative Policy Optimization (GRPO)

PPO / Full RLHF Pipeline

All Training Tasks

Running Training