Your VLM Is Hallucinating Your Genes

30 Minute Talk

Multimodal vision-language models are becoming the hammer when every document problem looks like a nail. But VLMs can struggle with reading order on complex layouts, hallucinate critical character-level details, and don't scale when you need sub-millisecond latency. When your documents are clinical genomic reports, a single wrong character in a variant string can mean a completely different mutation, moving hallucinations from annoying quirks to clinical risk. This talk walks through the strategic decision of when VLMs are the right tool, when older OCR methods + algorithmic pipelines beat them, and what a practical hybrid actually looks like when one wrong character can put a patient on the wrong line of therapy.

Presented by

Ethan Hill