Your AI Pipeline Is Burning Money and You're Calling It ‟Good Enough”

30 Minute Talk
Sunday at 2:00 PM in Ballroom B

Most production LLM pipelines are assembled from defaults and "best practices" that nobody actually measured. In this talk, we'll instrument a real retrieval pipeline end to end, isolate the cost impact of five specific levers — retrieval depth, conditional reranking, context window size, model routing, and retriever strategy — and show how combining them cuts cost by 86% on a full 300-query benchmark while improving answer quality by 1.4%. You'll leave with a reproducible methodology, concrete numbers, and a more informed take on where your inference budget is actually going.

Presented by