DeepMind enters the language model arena

DeepMind has been hard at work in the realm of AI for many years. Now it has a language model that apparently rivals the likes of GPT-3 (OpenAI), BERT (Google), and GSLM (Facebook).

Until now, DeepMind has been conspicuous by its absence. But the UK-based company behind some of the most impressive achievements in AI, including AlphaZero and AlphaFold, has joined the discussion by dropping three new studies on large language models on the same day. DeepMind’s main result is an AI with a twist: it’s enhanced with an external memory in the form of a vast database containing passages of text, which it uses as a kind of cheat sheet when generating new sentences.

Called RETRO (for “Retrieval-Enhanced Transformer”), the AI matches the performance of neural networks 25 times its size, cutting the time and cost needed to train very large models. The researchers also claim that the database makes it easier to analyze what the AI has learned, which could help with filtering out bias and toxic language.  

It’s important to note that while RETRO could help with bias filtering this hasn’t actually been tested yet:

According to DeepMind, RETRO could help address this issue because it is easier to see what the AI has learned by examining the database than by studying the neural network. In theory, this could allow examples of harmful language to be filtered out or balanced with non-harmful examples. But DeepMind has not yet tested this claim. “It’s not a fully resolved problem, and work is ongoing to address these challenges,” says Laura Weidinger, a research scientist at DeepMind.

Bias management in language models is a crucial aspect of AI ethics so I hope this model (and more like it) actually deal with that rather than deferring all the work onto its users who often don’t bother.

(via Technology Review)

Categories: AI Computing Tech

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.