HyperAI

During its highly anticipated GPT-5 livestream on Thursday, OpenAI showcased several performance charts meant to highlight the model’s capabilities—only to have them quickly draw scrutiny for glaring inaccuracies. One chart, intended to demonstrate GPT-5’s performance in “deception evals across models,” featured inconsistent scaling that undermined its credibility. In the “coding deception” segment, GPT-5 was shown with a 50.0% deception rate, while OpenAI’s own o3 model—described as smaller and less advanced—was displayed with a higher bar despite having a lower score of 47.4%. Another chart presented a similar issue: GPT-5’s score was actually lower than o3’s in a given category, yet the bar representing GPT-5 was larger. In the same visualization, o3 and GPT-4o were shown with identical bar sizes despite having different scores. The inconsistencies were so apparent that CEO Sam Altman acknowledged the error live, calling it a “mega chart screwup.” A marketing team member later apologized on social media, referring to the misrepresentation as an “unintentional chart crime.” While it remains unclear whether GPT-5 itself was used to generate the visuals, the incident is a notable misstep for OpenAI on a day meant to underscore the model’s advancements—particularly its claimed “significant improvements in reducing hallucinations.” The technical missteps contrast sharply with the company’s messaging around precision and reliability, casting a shadow over the launch. OpenAI has not yet issued a formal response to inquiries about the errors.

OpenAI’s GPT-5 launch marred by flawed charts sparking backlash

Related Links