John Schulman (Reasoning, RLHF)

2026-05-30 ยท Source
We probably use the same mental machinery thinking about one month from now, one year from now, or a hundred years from now. We're not actually doing some kind of reinforcement learning where we need to worry about a discount factor that covers that timescale and so forth.
I would rather have a scenario where we're all continually releasing things that are a little better than what came before. We'd be doing this while making sure we're confident that each diff improves on safety and alignment in correspondence to the improvement in capability.
The way we train them with RLHF, that does feel very safe even though the models are very smart. The model is just trying to produce a message that is pleasing to a human. It has no concern about anything else in the world other than whether the text it produces is approved
Obviously if you were doing something where the model has to carry out a long sequence of actions which involve tools, then it might have some incentive to do a lot of wacky things that wouldn't make sense to a human in the process of producing its final result.
The only way to do something that involves a lot of steps is to have learning and memory that gets updated during the task.
This was built on top of GPT-3.5, which was done training at the beginning of 2022. That model was quite good at language and code. We quickly realized that it was actually quite good at coding help. That was one of the things we were excited about.
We found a lot of gains through post-training. So I would expect us to keep pushing this methodology and probably increasing the amount of compute we put into it.
So I would say a bigger model has a bigger library of different computations, including lots of stuff that's dormant and only being used some of the time, but it has more space to look for circuits to do something useful.
I would expect AI to be used much more widely and for more technically sophisticated tasks. I gave the programming example earlier, doing longer projects, but also helping with various kinds of research. I hope that we can use AI to accelerate science in various ways, because you can potentially have the models understand all the literature in a given field and be able to sift through tons of data.
In pre-training you're basically training to imitate all of the content on the Internet or on the web, including websites and code and so forth. So you get a model that can generate content that looks like random web pages from the Internet. The model is also trained to maximize likelihood where it has to put a probability on everything.
Most of the training data is more like doing single steps at a time. I would expect us to do more for training the models to carry out these longer projects.
I'd be really excited to see more research on using base models to do simulated social science. These models have a probabilistic model of the whole world and you can set up a simulated questionnaire or conversation and look at how anything is correlated. Any traits that you might imagine, you can see how they might be correlated with other traits.
People do like bullet points. They like structured responses. People do often like the big info dumps that they get from the models. So it's not completely clear how much is just a quirk of the particular choices and design of the post-training processes, and how much is actually intrinsic to what people actually want.

โ† All readings