Discussion about this post

User's avatar
dribrats's avatar

Not a text chat

Not a five minute convo

But a real deep dive over an actual beverage

Daniel's avatar

"The OpenAI explanation of why their Codex tool has to be restrained from talking about goblins gives us a clue. The chatbot style isn’t a result of the transformer neural network picking up latent features of the entire training set; it’s a feature set at a much higher level, in the “system prompts” and the optimisation of the actual product in testing and manual tweaking, rather than anything happening at the level of vector spaces."

This is wrong, I'm afraid, there's a very large human-feedback reinforcement component in all these models where annotators have labelled massive response corpuses as racist, helpful, cheerful, pornographic, difficult-to-understand etc. etc. and that's incorporated directly into the training process via e.g. https://cameronrwolfe.substack.com/p/ppo-llm. Much of the "chatbot style" is actually baked into the network weights.

20 more comments...

No posts

Ready for more?