30 Minute Talk
Most production LLM pipelines are assembled from defaults and "best practices" that nobody actually measured. In this talk, we'll instrument a real retrieval pipeline end to end, isolate the cost impact of five specific levers — retrieval depth, conditional reranking, context window size, model routing, and retriever strategy — and show how combining them cuts cost by 86% on a full 300-query benchmark while improving answer quality by 1.4%. You'll leave with a reproducible methodology, concrete numbers, and a more informed take on where your inference budget is actually going.
PyOhio 2026