Configuration
Soup uses a single YAML config file for all settings. Run soup init to generate one.
Config Structure
yaml
base: meta-llama/Llama-3.1-8B-Instruct # HuggingFace model ID (required)
task: sft # Training task
# backend: unsloth # 2-5x faster (pip install 'soup-cli[fast]')
# modality: text # text, vision, or audio
data:
train: ./data/train.jsonl # Path to training data
format: alpaca # Data format (auto-detected if omitted)
val_split: 0.1 # Validation split ratio
max_length: 2048 # Max sequence length (64-1048576)
# image_dir: ./data/images # For vision modality
# audio_dir: ./data/audio # For audio modality
training:
epochs: 3
lr: 2e-5
batch_size: auto # auto or integer
quantization: 4bit # none, 4bit, 8bit
# quantization_aware: false # Enable QAT
# optimizer: adamw_8bit
# gradient_checkpointing: true
lora:
r: 64
alpha: 16
dropout: 0.05
# target_modules: auto # Auto-detected per model
# use_dora: false # Weight-decomposed LoRA
output: ./outputTemplates
Soup includes 15 built-in templates:
bash
soup init --template chat # Conversational fine-tune
soup init --template code # Code generation
soup init --template medical # Domain expert
soup init --template reasoning # GRPO reasoning (DeepSeek-R1 style)
soup init --template vision # Vision/multimodal fine-tune
soup init --template audio # Audio/speech fine-tune
soup init --template kto # KTO unpaired preference
soup init --template orpo # ORPO (no reference model)
soup init --template simpo # SimPO length-normalized preference
soup init --template ipo # IPO regularized preference
soup init --template rlhf # Full RLHF pipeline (SFT -> RM -> PPO)
soup init --template pretrain # Continued pre-training on raw text
soup init --template moe # MoE fine-tuning (ScatterMoE LoRA)
soup init --template longcontext # 128k+ context fine-tuning
soup init --template embedding # Sentence embedding fine-tuningTask-Specific Config Keys
| Key | Tasks | Description |
|---|---|---|
dpo_beta | DPO | DPO beta parameter |
kto_beta | KTO | KTO beta parameter |
orpo_beta | ORPO | ORPO beta parameter |
simpo_gamma | SimPO | SimPO gamma parameter |
cpo_alpha | SimPO | CPO alpha parameter |
ipo_tau | IPO | IPO tau parameter |
grpo_beta | GRPO | GRPO beta parameter |
num_generations | GRPO | Number of generations per prompt |
reward_fn | GRPO, PPO | Reward function (accuracy/format/path.py) |
reward_model | PPO | Path to reward model |
ppo_epochs | PPO | PPO training epochs |
ppo_clip_ratio | PPO | PPO clip ratio |
ppo_kl_penalty | PPO | PPO KL penalty |
loraplus_lr_ratio | All | LoRA+ learning rate ratio |
use_galore | All | Enable GaLore optimizer |
moe_lora | All | Target MoE expert layers |
moe_aux_loss_coeff | All | Router load-balancing loss |
use_liger | All | Liger Kernel fused ops |
use_flash_attn | All | FlashAttention v2/v3 |
use_ring_attention | All | Ring FlashAttention |
rope_scaling_type | All | RoPE scaling (linear/dynamic/yarn/longrope) |
neftune_alpha | All | NEFTune noisy embeddings (0-50) |
packing | SFT | Sample packing for efficiency |
curriculum | All | Enable curriculum learning |
curriculum_metric | All | Sort metric (length) |
curriculum_buckets | All | Number of difficulty buckets (1-20) |
loss_watchdog | All | Enable loss watchdog |
loss_watchdog_threshold | All | Loss spike threshold (≤100) |
loss_watchdog_patience | All | Patience before stopping (≤1000) |
freeze_layers | All | Freeze bottom N layers (≤1000) |
freeze_ratio | All | Freeze ratio of layers |
embedding_loss | Embedding | Loss function |
embedding_pooling | Embedding | Pooling strategy |