Skip to main content

Saudi-IBM project developing generative AI in several Arabic dialects

Language is used by millions worldwide but is one of the least represented when it comes to tech
Guests attend the Global AI (Artificial Intelligence) 2020 Summit in the Saudi capital Riyadh on 21 October 2020 (AFP)

A generative AI programme that is being developed in a Saudi collaboration with IBM is due to work with several Arabic dialects, according to the companies.

The Saudi Data and Artificial Intelligence Authority (SDAIA) has announced that its Arabic large language model (LLM) for text generation will be included in IBM's AI and data platform, watsonx.

With access to multiple LLMs, watsonx is used by companies to create editorial content, develop chatbots and write programming code.

Examples of where it can be used include scripting for video games and customer service chatbots for companies.

SDAIA's LLM, known as ALLaM, is notable for its ability to retrieve and generate information in both audio and text formats in multiple Arabic dialects, an ability that developers have struggled with for years.

Stay informed with MEE's newsletters

Sign up to get the latest alerts, insights and analysis, starting with Turkey Unpacked

 

"This collaboration will serve as a catalyst for further technological advancements," said Esam Alwagait, director of the SDAIA of his organisation's partnership with IBM, one of the world's oldest technology companies.

Further development of the language model could lead to the proliferation of Arabic GPT text generators, like Google's Gemini, X's Grok and OpenAI's ChatGPT.

According to its developers, ALLaM "has been trained on hundreds of millions of articles in both Arabic and English".

Challenges of working with Arabic

The ability to deal with Arabic dialects has traditionally been a challenge for developers working with Arabic language-based software models.

Arabic has a standard form used in news broadcasts and formal correspondence, known as Modern Standard Arabic (MSA), which is understood by most Arabs from Morocco to Oman.

However, in day-to-day use, Arabic speakers use regional dialects that may differ considerably from each other and from MSA.

Arabic and AI: Why voice-activated tech struggles in the Middle East
Read More »

One difficulty when dealing with dialects is that a given word in one dialect might have a different meaning in another dialect or in formal Arabic.

That means an AI-based language model must not just recognise the meaning of an individual word, but also its use in different regional contexts.

Another difficulty is that a lot of colloquial use of Arabic online is done using Latin characters instead of Arabic letters, meaning that the dataset that developers can call upon is much smaller than it would be for other languages.

A Middle East Eye article in 2019 highlighted some of these difficulties and ways some of these difficulties could be overcome.

According to Palestinian researcher Mustafa Jarrar of Birzeit University, one way to overcome such difficulties is increasing the amount of language data made available to developers.

The more input developers can get into their models, the more accurate their end results will be.

Saudi Arabia's development of ALLaM comes amid a global push towards incorporating AI technologies into societies.

The kingdom has invested billions into developing alternatives to its largely fossil fuel-based economy.

Investments have gone into technologies, such as AI, tourism and construction projects, such as Neom, the mega city project.

Middle East Eye delivers independent and unrivalled coverage and analysis of the Middle East, North Africa and beyond. To learn more about republishing this content and the associated fees, please fill out this form. More about MEE can be found here.