Posts

In the foreground, a narrow mountain pass climbs steeply; a caravan of weary figures trudges single-file, each bent under heavy packs stuffed with growing scrolls. Above them, incomplete rope bridges dangle between cliffs, unreachable and unfinished. Far below lies a vast, empty valley with wide roads stretching open and unused. The air feels heavy, each step slow, the silence broken only by the rhythm of burdened progress.

AI Under the Hood: Part I: Understanding the Machine

The LLM is always the bottleneck. In every production system I’ve built, the same pattern emerges. Your backend sits idle. Your database purrs at 5% CPU. And your language model - the brilliant, expensive centerpiece - grinds through requests at 2 seconds while users expect 200 milliseconds. Costs spiral. Latency kills user experience. The problem isn’t configuration or model selection. It’s the architecture itself. Transformers were designed for parallel training but forced into sequential generation. Stateless by design but requiring growing state. Optimized for throughput but deployed for latency. Every optimization exists to bridge these gaps. To make LLMs fast, you first need to understand why they’re slow, and that starts with how they actually work under the hood. ...

A dusk-blue mural in muted sienna and cobalt: in the left foreground a gaunt scholar in a frayed suit hesitates over a ledger whose signed contract dissolves into a ribbon of ones and zeros; to his right a lanky, chrome Mephistophelean automaton offers a hollow bronze crown and a smoking lightbulb, both faintly luminous; behind them a half-built tower of servers, scaffolds, and leaking circuit-pipes stretches into the skyline while small human figures in coveralls climb ladders, carrying wrenches, laptops and coffee—the promised apotheosis receding into everyday maintenance as vines of flowing code stitch the scene together

No AGI in Sight: What This Means for LLMs

GPT-5 has sealed the deal. It is one in a line of underachieving flagship models from major AI labs. Llama 4 (April 2025): Meta’s “10M token context window” collapsed at 300K tokens. The model scored 16% on aider polyglot coding benchmarks - worse than older, smaller models. Meta got caught using a different “optimized for conversationality” version for their marketing benchmarks than what they actually released. Grok 4 (July 2025): Despite xAI’s claims of frontier performance, the model was “benchmaxxed and overcooked.” When asked for its surname, it searched the internet and called itself “MechaHitler.” Musk, who calls AI “more dangerous than nukes,” released it without any safety reports - breaking from industry standards that even OpenAI follows. ...

abstract image of scientist and journaling

Lab Journal #1: Hierarchical Narrative Generation System

Project Overview This entry marks the beginning of documentation for what I’m calling the Progressive State Transformer (PST) - a hierarchical narrative generation system designed to create coherent, structured stories using LLMs that I’ve been working on (on and off) for the past months. The core innovation is maintaining persistent state across multiple narrative layers, allowing for more cohesive long-form text generation than traditional single-pass approaches. Once good novels can be written with this system, could we use it for factual information in Enterprise Knowledge Base Systems? ...

Fueling the Software Sector's Vibrancy & The Unnatural State of Software Economics

One factor that made me switch careers from scientist to software engineer is the attractive vibrancy the software sector radiates. Being a sovereign scientist felt like such a privileged position that it seemed impossible for anyone to ever deserve it, yet you’d see senior software engineers given autonomy for designing and operating systems so impactful they’d make your eyes water (often while being paid gloriously to do so). There is clearly a product-market-fit for creativity in the software sector’s labor market, at least in the current era, and I am wondering why. ...

Optimising LLM Implementation: Focusing on Word-Wrangling Tasks for Long-Term Utility

ChatGPT is great. But the behavioural family of which it is a member - generative AI - is not the type AI researchers are capitalizing on in order to build the next generation of more powerful AI models. The Evolution Towards AGI The next models on the path to artificial general intelligence (AGI) will include agentic behaviour, meaning models that experience streams of data and act autonomously upon them. Put simplistically: “AI agents are like generative AI (GenAI) in an eternal for loop”. ...

Impressions from WASM I/O 2024: Paths to Commercialization

WebAssembly (Wasm) represents a fundamental architectural shift that transcends traditional deployment boundaries. Its core innovation lies in enabling developers to write and deploy self-contained, portable components that can seamlessly transition between different execution contexts - from browser to server to edge - while maintaining consistent behavior and security guarantees. This architectural approach eliminates the historical divide between client and server development, allowing teams to focus on business logic rather than infrastructure concerns, ultimately pointing toward a future where application topology becomes a deployment-time decision rather than a development-time constraint. ...