Theia Vogel wrote about why LLMs freak out over the seahorse emoji
and why they keep claiming there is one whenever asked:
Maybe LLMs believe a seahorse emoji exists because so many humans in the training data do. Or maybe it’s a convergent belief – given how many other aquatic animals are in Unicode, it’s reasonable for both humans and LLMs to assume (generalize, even) that such a delightful animal is as well. A seahorse emoji was even formally proposed at one point, but was rejected in 2018.
Regardless of the root cause, many LLMs begin each new context window fresh with the mistaken latent belief that the seahorse emoji exists. But why does that produce such strange behavior? I mean, I used to believe a seahorse emoji existed myself, but if I had tried to send it to a friend, I would’ve simply looked for it on my keyboard and realized it wasn’t there, not sent the wrong emoji and then gone into an emoji spam doomloop. So what’s happening inside the LLM that causes it to act like this?
Theia then dug into the weeds and explored the tokens and training data to figure out what could be causing it. It should be obvious by now but, as always, don’t believe everything the chatbots output.
Filed under: algorithms language models