Red Button, Blue Button

there was the red button/blue button twitter poll. It sparked a huge discussion. Rules were: if more >50% people press blue button everybody lives, if <50% only red pressers live (ie. red pressers always survive and blue if only >50% other people choose blue)

there are many different frames why to press one or the other. One would be game theory, where the red is the dominant and rational action

but there are a lot other see more

the decisions also hinges a lot on how you model the society, what other people would press. if you give it to ai:

One way to think about [LLMs] is that they contain the ability to simulate many different characters or personas.

See The persona selection model (Anthropic)

reasoning-trained models converge on red while helpful-assistant models lean blue. and this mirrors how training shapes behavior. RLHF gave us helpful assistants; RLVR (verifiable rewards on math/code) gave us reasoners. Neither produces robust social reasoning.

Softmax's actual pitch is a third training paradigm: RLSE — Reinforcement Learning from Social Evolution. The idea is that just as coding skills jumped when models got trained in environments with executable feedback (not just by reading GitHub), social/cooperative reasoning probably needs environments where multi-agent dynamics — bargaining, reputation, trust repair, coalition formation, public goods — are real and provide structural feedback.