Scaling
I didn't expect The Scaling Era to feel this practical. Books about AI usually fall into two camps: the breathless and the grim. This one reads more like a field manual. Calm, unfussy, occasionally contrarian.
The thesis is simple enough to sound obvious and uncomfortable at the same time: keep pushing scale, and also remove the constraints we keep putting on our models. Most of the argument after that is bookkeeping.
The core claim is that intelligence improves predictably with scale. Not as a miracle, but as a curve. Double the compute and data, shave off a predictable chunk of error. This is underrated because predictable progress feels boring. But boring is powerful. If you can forecast returns, you can plan. You can commit to a roadmap without hoping for a breakthrough.
I think people underestimate how unusual this is. Most research areas don't have reliable scaling laws. You try things, some work, most don't, and progress is lumpy. AI right now is more like manufacturing: you can see the production function and optimize against it.
There's a biological sanity check the book leans on. Nature didn't make large brains because she loves elegance. She paid for them because more compute plus more training time works. Bigger brains with longer childhoods correlate with richer behavior. Scale plus experience compounds. That's not a license to waste money. It's a reminder that "try harder" has an unusually good ROI right now, if you know where the returns come from.
One clarification I found useful: pretraining is not statistical parroting. The better frame is representation learning. Predicting the next token forces the model to internalize structure. Syntax, causality, social patterns, small physics, all the stuff we call "common sense" when we can't be bothered to name the parts. The model doesn't memorize the world. It builds a compressed sketch of it.
Post-training, by contrast, doesn't create new knowledge. It shapes what's already there. It decides which internal patterns should surface when a human asks for help. If pretraining is reading the library, post-training is learning which books matter for the job.
The book's other big idea is what it calls "unhobbling." Models are surprisingly capable in the raw, but we keep asking them to work with their hands tied. No planning, no tools, tiny context. Unhobbling is simply letting the model act like a competent worker: plan before doing, break problems into subproblems, call tools when appropriate, check work with a separate pass.
You don't need elaborate agent architectures to get a big win here. A simple planner-solver-verifier scaffold converts a jittery oracle into something that behaves like it understands the stakes.
There's a third lever that most teams ignore: test-time compute. You can't always retrain. Budgets end, chips are busy. But you can give the model more room to think at inference. More tokens, more attempts, a small search before committing to an answer. For many tasks this alone is equivalent to having a larger model.
I find myself thinking about this as three separate interest rates compounding together. Training scale is the principal everyone sees: parameters, data, steps. Post-training is alignment: tuning the thing toward what humans actually value. Test-time compute is the leverage most budgets ignore: thinking tokens, multiple tries, verification. If any one is zeroed out, you're leaving returns on the table.
The uncomfortable part, at least for people who love novelty, is that none of this requires a breakthrough. The frontier has moved far enough that the limiting factor is operational. Do you have the discipline to collect good preference data? To wire a verifier? To pay for a few more tokens when the question is hard? It's less a moonshot and more a factory. One that turns compute into reliability.
To be clear, I'm not claiming this picture is complete. There may be walls we haven't hit yet. The scaling laws could bend or break at some point we can't currently see. And there are real questions about whether "more compute" remains economically or physically feasible at the scales being projected.
The book doesn't predict a particular date for AGI. It points out that forecasts without a cost model are fortune cookies. If you want a sober view of the future, ask who's paying for the next order of magnitude and where the electricity comes from.
But if the current trends hold for even a few more years, the implications are significant. And the operational discipline the book describes, the unsexy work of scaffolding and verification and token budgets, may matter more than any single architectural innovation.
I keep coming back to one idea: we routinely forbid models from drafting and revising, then act surprised at the error rate. No human edits a report without drafting. Why do we expect different from machines?