Arabic and Artificial Intelligence
By Liam Nagle / Arab America Contributing Writer
Arabic is one of the most widely spoken languages in the world. The United Nations recognizes it as one of the six official languages, and Modern Standard Arabic is an official or co-official language in 24 different countries.
The number of people who speak Arabic is enormous – around 300-400 million speak the language worldwide. However, within Arabic, there are a wide variety of local dialects. Although Modern Standard Arabic was developed to facilitate more accessible communication between Arab states, many of these countries’ inhabitants are more comfortable using their local dialects over Modern Standard Arabic when possible. And, given the wide disparity in Arabic dialects, some local dialects may have difficulty understanding one another. Even those who understand Modern Standard Arabic might be unable to understand all the dialects due to their differences!
Modern Standard Arabic vs. Local Dialects
With the development of artificial intelligence, this raises another problem. Many AI large language model (LLM) algorithms—such as ChatGPT—use standardized language versions. In Arabic’s case, the one most often utilized is Modern Standard Arabic. However, because many Arabic speakers prefer to use their dialects, they might not want or be able to use such algorithms. Many businesses, for example, use their local dialects because their customers use them more often than Modern Standard Arabic.
As such, the LLMs aren’t helpful when conducting business for many of these artificial intelligence companies. This issue could fit into a larger one associated with globalization. One such issue with the increasing rate of globalization is the potential for the gradual assimilation of smaller subcultures into larger ones. If we continue on this current trend, we could either see the isolation/alienation of the local Arabic dialects within LLM systems or businesses increasingly adopting Modern Standard Arabic, perhaps causing us to gradually lose the local Arabic dialects worldwide.
Proposed Solutions: Integrating Local Dialects into AI Systems
Given this, and given the benefits of businesses being able to use LLMs, there have been attempts to rectify this issue. One potential solution that has been proposed is to expose the LLM algorithms to all Arabic movies and television shows to give them a broader understanding of all the Arabic dialects, accents, and general ways of speaking.
In doing so, we can unlatch the LLMs’ attachment to Modern Standard Arabic, and thereby make it easier for all Arabic speakers – not just those who know Modern Standard Arabic – to access and use LLMs for personal or commercial purposes. This could also help another critical issue with LLMs – that is, the fact that LLMs can only reiterate the information provided to them. By giving the LLMs the means to understand local Arabic dialects, they may have an easier time pulling information originally published using said dialects. This means that said LLMs will have a larger pool of information to pull from, perhaps helping to mitigate this more significant issue.
Benefits of Dialect Diversity in AI
This also means that people who use such LLMs may be able to pull from a larger pool as well! By allowing multiple Arabic dialects to be represented in the LLMs, the LLMs may also be able to translate such workers for a wider audience. Various educational topics, such as books, journals, news, etc., can be translated for both professional and personal use. Movies, documentaries, and other forms of videos could possibly either be given translated audio or subtitles for a wider audience. Additionally, this could also work the other way around. For example, such forms of media could be translated into the local Arabic dialects rather than simply Modern Standard Arabic, thereby reaching out to more Arabic speakers.
It is clear that there are plenty of benefits to artificial intelligence, but there are also a variety of risks and consequences. As we continue, we must ensure that the Large Language Models incorporate a more extensive array of languages. In doing so, such LLMs will be able to pull from a more significant amount of information, while also allowing for others to access new forms of media that they might not have had otherwise. At the very least, we certainly wouldn’t want to run the risk of neglecting the local Arabic dialects by only prioritizing Modern Standard Arabic and other major languages to the detriment of smaller ones!
Check out our Blog here!