Fine-tuning

Fine-tuning is the process of taking a pre-trained large language model and continuing to train it on a smaller, curated dataset to specialize it for a specific task, domain, or style. The model retains its general language ability while adapting weights to the new objective.

Common reasons to fine-tune:

Domain expertise — legal, medical, financial language
Brand voice — consistent tone for a product
Task specialization — function-calling reliability, structured output
Performance — smaller fine-tuned model can outperform a larger general one on a narrow task

Modern practice uses parameter-efficient fine-tuning (PEFT) — LoRA, QLoRA — which trains only a small adapter on top of frozen base weights. This drops VRAM requirements by 10-100×, making fine-tuning practical on a single GPU. Full fine-tuning (updating all weights) is reserved for largest-scale projects.

Fine-tuning is distinct from:

Pre-training: initial training on the full web corpus
RLHF / DPO: alignment from human preferences (often a stage of fine-tuning)
Prompt engineering: changing only the input, not the model
RAG: retrieving context at inference time, not modifying the model

For most product use cases in 2026, RAG and prompt engineering reach acceptable quality without fine-tuning. Fine-tuning becomes worth it when you have a narrow, repeatable task and at least a few hundred high-quality examples.

Sources

See also