No AGI in Sight: What This Means for LLMs
GPT-5 has sealed the deal. It is one in a line of underachieving flagship models from major AI labs. Llama 4 (April 2025): Meta’s “10M token context window” collapsed at 300K tokens. The model scored 16% on aider polyglot coding benchmarks - worse than older, smaller models. Meta got caught using a different “optimized for conversationality” version for their marketing benchmarks than what they actually released. Grok 4 (July 2025): Despite xAI’s claims of frontier performance, the model was “benchmaxxed and overcooked.” When asked for its surname, it searched the internet and called itself “MechaHitler.” Musk, who calls AI “more dangerous than nukes,” released it without any safety reports - breaking from industry standards that even OpenAI follows. ...