Author: Yassir Manaf
Fine-Tuning vs. RAG: How I Actually Choose
Most teams reach for fine-tuning when they need RAG, and RAG when they need fine-tuning. Here’s how I actually make the call.
LLM Agent vs Single Call: How I Decide Before Writing a Line of Code
Most teams reach for agents too early. Here’s the decision framework I use to choose between an LLM agent and a single call — before writing a line of code.
How I Audit an AI System Before It Goes Live
Most AI systems pass the demo but aren’t ready for production. Here’s the six-area audit I run as a fractional Tech Lead — and the gaps I find in every system.
Prompt Injection Production: 4 Critical Attack Vectors and How to Defeat Them
Prompt injection is easy to miss in testing and dangerous in production. Here’s what it actually looks like in a live LLM system — and the layered defenses that catch it.
RAG in Production: Beyond the Demo
Every RAG demo works. Production is where things fall apart — quietly, in ways that are hard to debug. Here’s what I’ve learned building RAG systems that actually ship.
Structured Outputs with LLMs: Moving Beyond Raw Text
I spent two years writing regex to parse JSON from LLM responses. Structured outputs ended that. Real before/after from a production pipeline — metrics, tradeoffs, and the failure modes that don’t go away.
Multi-Tenant LLM Architecture: Isolation Patterns That Actually Work
The first time a tenant’s prompt leaked into another tenant’s context window, I found out from a support ticket. Here are the multi-tenant isolation patterns that held up in production — and the ones that didn’t.
LLM Caching in Production: The What, When, and How
Caching LLM responses isn’t like caching a REST API. The inputs are fuzzy, the outputs are non-deterministic, and most traditional strategies break. Here’s the caching hierarchy I built in production — and the traps I walked into along the way.