In using the tools I’m limited with at my day job (github copilot, yuck), I’ve had to find some workarounds to the built in difficulties. Microsoft is doing a lot in that tool to really limit its costs, which, sure. But it does make this tool much less powerful than the unfettered tools and models (and I’ll point out, every time a developer has a bad experience with Agentic AI tools, you risk losing a customer for life). They are doing things like artificially limiting the context windows down to 100k tokens, adding stuff to the system prompt to make sure it doesn’t run for too long, etc. They’re obviously also using the “dumber” models and just calling them by their standard name, for instance there is no way in hell that the GPT-5 model in GHC is anything but the medium version or worse, because that thing compared to what I use on my own time is significantly nerfed.

I’m not 100% sure on this, but I also have a feeling they’re doing some JIT routing between the different version for GPT-5-Codex. I know OpenAI has said they’re doing this on this model, but my experience with using it on my ChatGPT sub on personal projects is that it never feels down-tuned, but the version on GHC sometimes does some things that not even GPT-4.1 would have done.

The good news here is that you can explicitly state in your prompts or chatmode or whatever you’re doing, that you want it to THINK REALLY HARD, or INVESTIGATE DEEPLY, something to that effect. When you do this, it seems to trigger something on the GPT-5-Codex side that causes it to go the “long route” on its work, really dig deep, and hopefully come up with a much better solution to the problem.

As nice of a trick as that is, it does bring up some philosophical questions for me. It’s obviously clear why telling a bunch of nodes in a neural network to “think” will send the problem down different paths vs not doing that, but this is emergent behaviour based on the training of using “human” data. If someone on some Body Building board from 2004 told another person on the board to “Stop being a dummy and think really hard”, then the model internalizes that, and here we are.

By philosophical, what I mostly mean is, what does this mean for the models of tomorrow. It sure feels like these models can “think”, and we build in some amount of this behaviour into the tools by having it “talk to itself” in the background, then once it’s finished display the final decision to the user. But when we’re on GPT-9, and we all kind of agree on the fact that we’ve hit “AGI” (insofar as we’ll ever agree on that, but I suspect it’s the same definition as other controversial topics in life, “you know it when you see it”), will we actually want that model to “think hard”, do we actually want it talking to itself, coming up with its own goals and directions?

I recently watched a talk by Geoffrey Hinton that sent me down this path. He’s rather alarmist about it, but I think I’m somewhere in the middle. But what I don’t feel great about is the fact that humanity might come down to “hoping some insanely powerful company does the right things” when it comes to value alignment. The good news is theoretically the people running those companies are also human (for now) so maybe we’ll be okay. But if the last 1000 years has taught us anything, it’s that you almost never align values across economic differences.