Question 1

How do I fine-tune Llama with Soup CLI?

Accepted Answer

Install Soup CLI with 'pip install soup-cli', then run 'soup train --base meta-llama/Llama-3.2-1B --task sft --data your_data.json'. Soup supports all Llama variants with optional Unsloth 2-5x speedup.

Question 2

What is DPO training and how does Soup CLI support it?

Accepted Answer

DPO (Direct Preference Optimization) aligns LLMs with human preferences without a reward model. Soup CLI supports DPO along with 10 other methods: SFT, GRPO, PPO, KTO, ORPO, SimPO, IPO, Pretrain, Embedding, and Reward Model training.

Question 3

How do I install Soup CLI?

Accepted Answer

Install from PyPI: 'pip install soup-cli'. Requires Python 3.9+. Optional extras: 'pip install soup-cli[fast]' for Unsloth speedup, 'pip install soup-cli[serve]' for inference server.

Question 4

What models does Soup CLI support?

Accepted Answer

Soup CLI supports Llama 3.2, Qwen 3, DeepSeek V3/R1, Mistral, Phi-4, Gemma 3, and any Hugging Face transformers-compatible model. 30 ready-made recipes included. It works with both full fine-tuning and QLoRA/LoRA methods.

Question 5

Can I migrate from LLaMA-Factory or Axolotl to Soup CLI?

Accepted Answer

Yes. Run 'soup migrate --from llamafactory config.yaml' or 'soup migrate --from axolotl config.yml' to automatically convert your existing training config to Soup format. Also supports Unsloth notebook migration.

Key	Tasks	Description
`dpo_beta`	DPO	DPO beta parameter
`kto_beta`	KTO	KTO beta parameter
`orpo_beta`	ORPO	ORPO beta parameter
`simpo_gamma`	SimPO	SimPO gamma parameter
`cpo_alpha`	SimPO	CPO alpha parameter
`ipo_tau`	IPO	IPO tau parameter
`grpo_beta`	GRPO	GRPO beta parameter
`num_generations`	GRPO	Number of generations per prompt
`reward_fn`	GRPO, PPO	Reward function (accuracy/format/path.py)
`reward_model`	PPO	Path to reward model
`ppo_epochs`	PPO	PPO training epochs
`ppo_clip_ratio`	PPO	PPO clip ratio
`ppo_kl_penalty`	PPO	PPO KL penalty
`loraplus_lr_ratio`	All	LoRA+ learning rate ratio
`use_galore`	All	Enable GaLore optimizer
`moe_lora`	All	Target MoE expert layers
`moe_aux_loss_coeff`	All	Router load-balancing loss
`use_liger`	All	Liger Kernel fused ops
`use_flash_attn`	All	FlashAttention v2/v3
`use_ring_attention`	All	Ring FlashAttention
`rope_scaling_type`	All	RoPE scaling (linear/dynamic/yarn/longrope)
`neftune_alpha`	All	NEFTune noisy embeddings (0-50)
`packing`	SFT	Sample packing for efficiency
`curriculum`	All	Enable curriculum learning
`curriculum_metric`	All	Sort metric (length)
`curriculum_buckets`	All	Number of difficulty buckets (1-20)
`loss_watchdog`	All	Enable loss watchdog
`loss_watchdog_threshold`	All	Loss spike threshold (≤100)
`loss_watchdog_patience`	All	Patience before stopping (≤1000)
`freeze_layers`	All	Freeze bottom N layers (≤1000)
`freeze_ratio`	All	Freeze ratio of layers
`embedding_loss`	Embedding	Loss function
`embedding_pooling`	Embedding	Pooling strategy

Configuration

Config Structure

Templates

Task-Specific Config Keys