When models cannot get the simple stuff right, that’s a cause for serious concern.
I have been thinking about a question that nobody in enterprise software seems to want to sit with: why can the most advanced AI models in the world solve Olympiad-level mathematics but fail to reliably extract a total from an invoice?
