Question 1

How do I fine-tune Llama with Soup CLI?

Accepted Answer

Install Soup CLI with 'pip install soup-cli', then run 'soup train --base meta-llama/Llama-3.2-1B --task sft --data your_data.json'. Soup supports all Llama variants with optional Unsloth 2-5x speedup.

Question 2

What is DPO training and how does Soup CLI support it?

Accepted Answer

DPO (Direct Preference Optimization) aligns LLMs with human preferences without a reward model. Soup CLI supports DPO along with 10 other methods: SFT, GRPO, PPO, KTO, ORPO, SimPO, IPO, Pretrain, Embedding, and Reward Model training.

Question 3

How do I install Soup CLI?

Accepted Answer

Install from PyPI: 'pip install soup-cli'. Requires Python 3.9+. Optional extras: 'pip install soup-cli[fast]' for Unsloth speedup, 'pip install soup-cli[serve]' for inference server.

Question 4

What models does Soup CLI support?

Accepted Answer

Soup CLI supports Llama 3.2, Qwen 3, DeepSeek V3/R1, Mistral, Phi-4, Gemma 3, and any Hugging Face transformers-compatible model. 30 ready-made recipes included. It works with both full fine-tuning and QLoRA/LoRA methods.

Question 5

Can I migrate from LLaMA-Factory or Axolotl to Soup CLI?

Accepted Answer

Yes. Run 'soup migrate --from llamafactory config.yaml' or 'soup migrate --from axolotl config.yml' to automatically convert your existing training config to Soup format. Also supports Unsloth notebook migration.

DPO training guide: align LLMs with human preferences

When to use DPO

1. DPO dataset format

2. Config

3. Train

4. Evaluate

DPO variants in Soup CLI

Related