A reinforcement learning agent beat Pokémon Red

I found this intriguing project by David Rubinstein and a team of engineers including Keelan Donovan, Daniel Addis, Kyoung Whan Choe, Joseph Suarez, and Peter Whidden where they used reinforcement learning (RL) to beat Pokémon Red. According to the project site, as of February 2025, they managed to do it using a <10 million parameter policy (60500x smaller than DeepSeekV3) and with minimal simplifications.

What makes RL special is how you collect training data. The data is almost always fresh. No need to build complex data collection systems, manage large datasets or worry if the dataset is out of date. If you can build a system that can create new data on the fly, you can start training.

With RL, we built an agent with a super tiny neural network with no pretraining (the agent starts by literally pressing random buttons!) and we still achieved amazing results.

The site explains the inner workings of reinforcement learning in the context of playing Pokémon, how the system was built, and some conclusions and future work. It looks like there was a lot of proverbial hand-holding to get it to a stage of completion, as to be expected, so one of the end goals is to allow a model to go from start to finish autonomously. It’s at least Pokémon world domination for now.

Filed under:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.