Fine-tune Qwen 3 on a custom dataset

Qwen 3 is one of the strongest open-weight models for reasoning and multilingual tasks. This guide shows how to fine-tune Qwen 3 7B on your own dataset with Soup CLI.

1. Install

bash
pip install 'soup-cli[fast]'

2. Dataset (ShareGPT format)

Qwen 3 handles multi-turn conversations well. Use the ShareGPT format:

json
[
  {
    "conversations": [
      {"from": "user", "value": "Explain gradient descent in one paragraph."},
      {"from": "assistant", "value": "Gradient descent is..."}
    ]
  }
]

3. Config

yaml
base:
  model: Qwen/Qwen3-7B-Instruct

task: sft

data:
  train: conversations.json
  format: sharegpt

training:
  backend: unsloth
  epochs: 2
  learning_rate: 1.5e-4
  batch_size: 4
  gradient_accumulation_steps: 4
  max_seq_length: 4096
  lora:
    enabled: true
    r: 32
    alpha: 64
    target_modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]

Why target all projections? Qwen 3 benefits from training LoRA on both attention and MLP projections for instruction-following tasks.

4. Train

bash
soup train --config qwen3.yaml

5. Serve with vLLM

bash
pip install 'soup-cli[serve-fast]'
soup serve --adapter ./runs/qwen3/latest --backend vllm --port 8000

The model is now available at http://localhost:8000/v1/chat/completions with OpenAI-compatible API.

Tips

  • Qwen 3 uses a 151k vocab — stick to max_seq_length: 4096 or higher for best results.
  • Use training.neftune_alpha: 5 to inject noise into embeddings — improves generalization on small datasets.
  • For reasoning tasks, try training.packing: true for efficient long-context training.

Related

  • [DPO alignment guide](/docs/dpo-training-guide)
  • [Data formats reference](/docs/data-formats)