Your AI Pipeline Is Burning Money and You're Calling It "Good Enough"

30 Minute Talk

Most production LLM pipelines are assembled from defaults and "best practices" that nobody actually measured. In this talk, we'll instrument a real retrieval pipeline end to end, isolate the cost impact of five specific levers — retrieval depth, conditional reranking, context window size, model routing, and retriever strategy — and show how combining them cuts cost by 86% on a full 300-query benchmark while improving answer quality by 1.4%. You'll leave with a reproducible methodology, concrete numbers, and a more informed take on where your inference budget is actually going.

Presented by

Andrew Plassard