PERSPECTA

News from every angle

Back to headlines

This researcher has a new way to measure AI performance. It's BS, literally.

Peter Gostev, AI capability lead at Arena Peter Gostev Peter Gostev's BullshitBench tests AI models with nonsensical questions to spot BS detection. Google Gemini 3.0 struggles with BullshitBench,…

25 Mar, 09:00 — 25 Mar, 09:00
PostShare