RAG vs fine-tuning — when do you choose which?

Question

Randhir Jassal · Accepted Answer

Default: RAG. Fine-tuning has a specific narrow role most people apply too broadly. The single most common interview question on LLM productionization, and the most common mistake teams make in real life. Side-by-side | | RAG | Fine-tuning | |---|---|---| | What it changes | What the model SEES at query time | What the model has INTERNALIZED | | Adds knowledge | ✅ Yes — adds documents to the index | ❌ Not really — even fine-tuning to "remember" is unreliable | | Updates content | ✅ Add a chunk → queryable instantly | ❌ Re-train each time data changes | | Citation of sources | ✅ Naturally — you know which chunk fed the answer | ❌ Model has "absorbed" the data; you can't point to where | | Changes style / format | ⚠️ Possible via prompt | ✅ Native — change the model's output shape | | Changes language register | ⚠️ Via prompt | ✅ Native | | Inference cost | Same model + retrieval cost | Same | | One-time training cost | None | $thousands per fine-tune (depending on data + base model) | | Privacy of training data | Stays in your DB | Baked into model weights — can't un-bake | | Hallucination risk | Low when prompt forces context-only answers | Higher — the model still hallucinates plausibly | | Right for | Adding knowledge | Changing style / format / classification | When fine-tuning is actually the right tool 1. Specific output format / structure — "Always output JSON matching this schema." Doable via prompt but fine-tuning is more reliable for high-volume calls. 2. Domain-specific terminology / register — A medical-document model, a legal-brief model, a code-review model with your company's style. 3. Classification / extraction — "Given a customer email, return one of {complaint, query, praise, escalation}." Fine-tuning a small model is cheap and accurate. 4. Distillation — Train a small (cheap) model to mimic GPT-4o's behavior on a narrow task. In all four, fine-tuning shapes how the model responds, not what it knows. When fine-tuning is the wrong tool (but tempting) - "Let's fine-tune the model on our docs so it knows our company." → ❌ This is a RAG problem. - "Let's fine-tune it on our 50,000 support tickets to learn our products." → ❌ RAG. The fine-tune absorbs patterns but won't reliably recall facts. - "We need real-time prices." → ❌ Function-calling, not fine-tuning. A fine-tuned model still hallucinates with confidence — there's no "I don't know" mechanism baked into the weights. Hybrid: RAG + a small fine-tuned classifier Common real-world pattern: Fine-tune a small / cheap model (gpt-4o-mini or a fine-tunable open model) for the router — fast, cheap classification. Then RAG handles knowledge queries; function-calling handles real-time data. Best of both. Cost difference at scale For 1M queries/month at 1000 tokens each: | Pattern | Monthly cost (rough, Azure 2026) | |---|---| | Pure GPT-4o + RAG | ₹50,000 | | Fine-tuned GPT-4o + RAG | ₹50,000 + one-time fine-tune ₹50,000 | | GPT-4o-mini + RAG | ₹5,000 | | Fine-tuned GPT-4o-mini + RAG | ₹5,000 + one-time ₹15,000 | The biggest cost lever isn't fine-tuning. It's choosing a cheaper base model (mini) for tasks where the gap doesn't matter. Common interview trap "Our company wants to make an internal chatbot that knows our HR policies. Should we fine-tune the model on our handbook?" Answer: No. Use RAG. The handbook is knowledge, not style. Fine-tuning would: - Cost more - Bake the data into a model file that's awkward to update - Still hallucinate when the user asks about something not in the handbook - Make it impossible to add citations back to source Instead, chunk the handbook, embed it, store in Azure SQL VECTOR / Azure AI Search, retrieve the top-K relevant chunks per question, ask the LLM to answer using only those chunks with citations. Interview-grade summary "RAG adds knowledge at query time by retrieving relevant chunks and feeding them to the LLM. Fine-tuning changes how the model responds — its style, format, or classification behavior. For 95% of 'feed our d…

RAG vs fine-tuning — when do you choose which?

Side-by-side

When fine-tuning is actually the right tool

When fine-tuning is the wrong tool (but tempting)

Hybrid: RAG + a small fine-tuned classifier

Cost difference at scale

Common interview trap

Interview-grade summary

RAG vs fine-tuning — when do you choose which?

Side-by-side

When fine-tuning is actually the right tool

When fine-tuning is the wrong tool (but tempting)

Hybrid: RAG + a small fine-tuned classifier

Cost difference at scale

Common interview trap

Interview-grade summary

Why does Python dominate AI/ML development — what are the real reasons?

Tokens, context windows, and the O(n²) attention cost — what every dev should know

LLM sampling parameters — temperature, top-p, top-k — when to tune each

Why does Python dominate AI/ML development — what are the real reasons?

Tokens, context windows, and the O(n²) attention cost — what every dev should know

LLM sampling parameters — temperature, top-p, top-k — when to tune each

	RAG	Fine-tuning
What it changes	What the model SEES at query time	What the model has INTERNALIZED
Adds knowledge	✅ Yes — adds documents to the index	❌ Not really — even fine-tuning to "remember" is unreliable
Updates content	✅ Add a chunk → queryable instantly	❌ Re-train each time data changes
Citation of sources	✅ Naturally — you know which chunk fed the answer	❌ Model has "absorbed" the data; you can't point to where
Changes style / format	⚠️ Possible via prompt	✅ Native — change the model's output shape
Changes language register	⚠️ Via prompt	✅ Native
Inference cost	Same model + retrieval cost	Same
One-time training cost	None	$thousands per fine-tune (depending on data + base model)
Privacy of training data	Stays in your DB	Baked into model weights — can't un-bake
Hallucination risk	Low when prompt forces context-only answers	Higher — the model still hallucinates plausibly
Right for	Adding knowledge	Changing style / format / classification

Pattern	Monthly cost (rough, Azure 2026)
Pure GPT-4o + RAG	~₹50,000
Fine-tuned GPT-4o + RAG	~₹50,000 + one-time fine-tune ~₹50,000
GPT-4o-mini + RAG	~₹5,000
Fine-tuned GPT-4o-mini + RAG	~₹5,000 + one-time ~₹15,000

Related questions

Related questions