Skip to main content

The AI revolution is leaving Arabic speakers behind

The vast majority of references and big data AI tools such as ChatGPT scan to generate results are in English. This must be addressed if we are to avoid a digital language divide
'Its lack of data makes the tool's results in Arabic unable to distinguish the nuances, accuracy and depth needed to generate quality content.' (AFP)

When I first read Homo Deus: A Brief History of Tomorrow, I simply couldn't agree with historian Yuval Noah Harari's prophecy that algorithms and data would become "the highest authority". I thought back then: "Rather, data in certain languages."

I was afraid that even in the battle of humanity against the intrusion of artificial intelligence, some linguistic groups would remain outside the battle… and outside the account.

When I started using ChatGPT-4, I realised that my fears had already been realised. The battle is now raging, and the Arabic language has not been "invited to participate".

As a journalist and trainer interested in artificial intelligence, I cannot help but notice how today's artificial intelligence tools are revolutionising the content industry, research and communication channels.

However, I also regretfully note how these same tools of change reveal a deep linguistic chasm that leaves speakers of Arabic at a disadvantage compared to their English-speaking counterparts.

Stay informed with MEE's newsletters

Sign up to get the latest alerts, insights and analysis, starting with Turkey Unpacked

 

I've spent a lot of time experimenting with tools such as ChatGPT. I gave it prompts in both Arabic and English, and I saw how its results rose or fell in terms of quality by changing the language.

AI and the language gap

It's not just the wording or the syntax that causes ChatGPT Arabic results to be of poor quality, but the information itself. I "had a bit of fun" experimenting with ChatGPT's "prompt engineering" in two different languages and watching it change like a naughty chameleon or a "schizophrenic".

In its English results, ChatGPT appeared to me as a researcher in his early career years: a diligent, hard worker who pays particular attention to details, structures and accuracy.

Kuwait unveils blonde AI-generated newsreader called 'Fedha'
Read More »

As for its Arabic results, they suggested that it embodied the stereotypical image of a lazy archive management employee who does not want to bother his head by searching for references on the correct shelves or paying attention to his linguistic structures while he carelessly tells me: "This is all I have. Just leave now."

A tool like ChatGPT has yet to be trained on a massive amount of high-quality, diverse and representative Arabic written data. Its lack of data makes the tool's results in Arabic unable to distinguish the nuances, accuracy and depth needed to generate quality content.

Many will say: "But the problem is that the Arabic language is complex, and it is difficult to comprehend all its unique grammar, syntax and vocabulary."

So I say: "Mankind has been able to send a man to the moon, but it cannot teach a chatbot Arabic grammar rules?"

Impact on Arabic-speaking users

Apart from frustration, Arabic-speaking users of AI tools face profound consequences of the language divide. At the top of the list is the limited access to information, as the vast majority of references and big data these tools scan to generate their results are mainly available in English.

This discrepancy hinders the ability of Arabic-speaking users to leverage AI for professional and personal growth and perpetuates a digital divide with long-lasting repercussions.

Now, AI tools can be "trimmed" by providing prompts in English, obtaining results in that language, and then translating them into Arabic. But even this solution raises another problem, which is translation. 

The quality of translation tools available on the internet, such as Google Translate, from English to Arabic is also low, and may cause serious consequences for those who do not master the source language and, therefore, cannot understand the defects in the translation.

This discrepancy hinders the ability of Arabic-speaking users to leverage AI for professional and personal growth and perpetuates a digital divide

Another consequence of the decline in the Arabic version of AI tools is the missed opportunities. Many journalists and content creators working in Arabic have the needed skills to produce high-quality content, but the inadequacy of a large part of artificial intelligence tools for the Arabic language poses a major obstacle for them.

Some AI-driven video-making tools, for example, are still unable to properly regulate the direction of writing in Arabic. Imagine the extreme frustration felt by every journalist or content creator when he/she wants to start the title of the article with a number, only to find that the words overlapped, their positions were reversed and the title became incomprehensible.

Missed opportunities mean less ability to use these tools to create high-quality content, less ability to achieve significant reach, and less ability to contribute to the growth and development of the Arabic content industry ecosystem.

These opportunities concern individuals as well as companies and institutions. As the poor quality of AI-driven tools in its Arabic version means that professionals and entrepreneurs will have to continue their struggle in the context of unfair competition that characterises the global market.

As for cultural misunderstandings, this is another aspect of the shortcomings of artificial intelligence tools in their Arabic version in terms of understanding cultural specificities and sensitivities. These bots, trained on a limited quantity of data compared to their counterparts from other languages, may tend to generate culturally inappropriate or aggressive content, which increases the user's sense of alienation and reduces his confidence towards it.

I am not talking here about the cultural specifics of Arabic-speaking groups versus others speaking other languages, but rather about the specific and different groups within the circle of Arabic-speaking communities themselves.

Bridging the gap

What does it mean to bridge the AI-language gap? It means the existence of a real will to invest in training these tools on massive, diverse and representative Arabic data. Only big, high-quality data can effectively improve AI performance.

Bridging the gap will also require the development of research tools supported by artificial intelligence capable of a greater and more accurate understanding of the Arabic language. If we acknowledge that Arabic is a difficult and complex language, the only solution lies in developing upgraded research tools that are capable of understanding its complexities and variables.

Finally, bridging the language gap in artificial intelligence can be achieved through joint efforts between AI developers, linguists and experts in the field. Only a concerted effort between academia, the tech industry and community stakeholders can ensure that Arabic-speaking users have access to the same high-quality AI-generated content as their English-speaking counterparts.

Saudi Arabia: Tech activist pulls out of conference over rights concerns
Read More »

I will put myself in the shoes of AI tool developers such as ChatGPT and ask the following questions: "Why do I have to train my tool on larger and better data in Arabic? What's in it for me?"

Now there may be a ready answer: the number of Arabic speakers, which amounts to about 400 million people, makes Arabic a good market in which it is important to invest.

But I am confident that the intelligence of AI developers would only be satisfied with a much deeper answer.

"The technology shapes us, and we shape it," says Mira Murati, CTO of OpenAI, which developed ChatGPT. 

As far as I am concerned, the Arabic version of ChatGPT technology makes me worry that the society it will form will not only be incapable of entering the post-information society, but also unable to simply access information.

This is apart from the axiom that we, Arabic speakers, do not shape that technology.

I welcome the fact that Murati, the woman behind ChatGPT, is an advocate for the regulation and governance of artificial intelligence. I'm just not sure that the governance she means includes the principle of equal opportunity.

Equal opportunity means that all human beings around the world can benefit from artificial intelligence and use it for their service so that Harari's prophecy will not be a “curse” for Arabic speakers alone.

The views expressed in this article belong to the author and do not necessarily reflect the editorial policy of Middle East Eye.

This article is available in French on Middle East Eye French edition.

Amal El Mekki is an award-winning Tunisian journalist and media trainer based in Switzerland. Her work focuses on human rights, migration, and artificial intelligence.
Middle East Eye delivers independent and unrivalled coverage and analysis of the Middle East, North Africa and beyond. To learn more about republishing this content and the associated fees, please fill out this form. More about MEE can be found here.