Can AI help to revive endangered languages?

Latin is best known as a dead language associated with Ancient Rome, scientific names, and rich white boys in private schools. But what about languages like Harappan which remain indecipherable after centuries of obscurity? Al Williams looked at the ways AI is being used to uncover some of the mysteries of dead languages, albeit with limited results:

There are many languages that have been forgotten like Etruscan and Rongorongo. If you knew they had morphed into other languages we do know, this could help unlock them. Even languages we do understand fairly well today were not always known in modern times. Egyptian hieroglyphics, for example, were a mystery until the discovery of the Rosetta stone which had the same message in the hieroglyphs, Egyptian demotic script, and Greek. At least we knew how to read Greek.


If we had the true universal translator, it would be fun to see what the dolphins and the bees are saying to each other. It seems like if we could do that, reading Indus would be easy. The truth is, though, AI is a long way from being able to totally replace humans and human insight into creative problem solving and that’s important to remember as people look to us as people who understand technology to help them make decisions about AI.

For example, researchers think the people who wrote the Indus script seals wrote right to left because sometimes the left characters are scrunched up as sometimes happens when you get close to the edge of your paper. However, since the seals were probably meant to be used as stamps, that doesn’t necessarily imply the language itself is right to left. For instance, Chinese used to be written top to bottom and right to left, but in modern formats is written row-wise left to right. Both styles persist, but you know which way to read from context. Insights like that are still the purview of humans, at least for now.

The catalyst for the article was a blog post by Alizeh Kohari asking whether machine learning could decipher extinct languages:

For most of the 20th century, the Indus inscriptions were widely accepted as representations of an undeciphered language. Then, in 2004, a group of Harvard researchers — cultural neurobiologist and comparative historian Steve Farmer, computational theorist Richard Sproat, and philologist Michael Witzel — published a paper essentially rubbishing nearly all existing research on the matter. The Indus seals, they claimed, were nothing more than a collection of religious or political symbols — similar to, say, highway signs — and all attempts to decipher them as a language were a waste of time. To underscore their point, Farmer offered a $10,000 reward to anyone who could find an Indus inscription containing at least 50 symbols. 

So what is a dead language?

While the term “dead language” is used interchangeably, “endangered” might be a better adjective as there are three distinct types:

  • Dead languages are no longer native to any community but may still be in use (like Latin)
  • Extinct languages are no longer spoken by anyone (like Yahgan in Chile)
  • Dormant languages are dead languages but they form part of an ethnic identity (like Sawkna in Libya)

The Endangered Languages Project is a collaboration with a mission to map out all the various languages with dwindling numbers of speakers (or none at all). While we can’t assume AI can solve these kinds of problems, tasks such as natural language processing could lead the way in time.

See also: Salvaging the Dormant: On Language, Machine Learning Is Now Translating ‘Dead’ Languages, and DeepMind’s new AI model helps decipher, date, and locate ancient inscriptions

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.