THERE are an estimated 137 languages spoken by various ethnic groups in Malaysia. According to a report published last year, 80% of these languages are used by minority groups.
While Malaysia is rich in cultural diversity, this linguistic wealth is often under- represented. Taufik Rosman, treasurer of Wikimedia Community User Group Malaysia (WCUGM), shares that it has been a personal goal of his to bring more Malaysian languages to the digital sphere. He is concerned that if some ethnic languages practised by various communities in Malaysia fail to establish a presence online, they may fade away permanently over time.
“Once these languages and more go online, we believe they have a chance to be around for a longer time,” he says in an interview with LifestyleTech.
For more stories checkout the StarSpecial: Malaysia Day 2024
Taufik proudly shared that on May 28, Wikipedia Kadazandusun successfully graduated from the Wikimedia Incubator – a platform for developing and testing new language versions of Wiki projects – and was launched with over 900 articles.
The project took a long time to launch, as Taufik explains that it needed more volunteers or contributors to consistently create new articles.
“Previously, we only had around three active contributors. Now, we have more contributors from Sabah, where some represent regions that don’t get much attention online,” he says.
Meet the contributors
In Tuaran, Sabah, a group of teaching students from the Institute Teacher of Education Kent Campus founded the Kent Wiki Club and started making active contributions for various dialects spoken by the Kadazandusun community on Wiktionary.
“We formed the club and started looking for members to make more contributions in the Kadazandusun dialect to Wiktionary. Eventually, we started translating articles from English and Malay to create entries for Wikipedia Kadazandusun,” says Kadazandusun Language teacher trainee Jurina Jonimin, 22.
Club chairperson Bluster Jainon, 21, says members of the club were driven by the motto “Okon nopo ko yati, isai po?” which translates to “If not us, then who else?”.
“The club was formed in 2022 to spread knowledge about our culture and language,” Bluster says, adding that the main dialect used for Wikipedia Kadazandusun is Dusun Bundu-Liwan.
After inaugurating the club with WCUGM, Bluster says the motto changed to ‘Boros nopo nga guas toilaan (Language is the root of knowledge)’ to reflect the club’s aim to spread knowledge through sharing about languages on Wikipedia.
WCUGM Project Coordinator Farouk Azim Abd Rahman proudly reveals that the Wikipedia Kadazandusun page made history by being the first to publish several Wiki articles.
“For example, an article on a movie called Sinakagon started out in Wikipedia Kadazandusun first. It was regarded as the first movie truly in Kadazandusun,” he says.
In Kota Belud, Sabah, Bajau-sama Wikipedia project lead Mohd Syafiq Yahya says he started uploading content to Wikikamus and Wikicommons on anything related to Bajau-sama culture, such as the meaning of words, various types of food, and traditional clothing during the pandemic, to preserve the language.
“Based on how I see our lives now, I fear that our culture is being forgotten by the youth. By doing this project for Wikimedia, perhaps this is the only possible way for the language to survive,” he says.
After WCUGM learned about the possibility of the Mendriq language facing extinction, the group initiated an engagement project with the Orang Asli community in Gua Musang, Kelantan.
Taufik says they organised a two-day event last year to show community members how to add Mendriq words to Wiktionary for translating from Malay.
“Members of the community managed to add more than 100 words for Mendriq. Before this, there were no such resources for the dialect online. Now it’s freely available for anyone to use,” he says.
Similar efforts were made for the Kensiu language spoken by the Negrito tribe, a community of about 300 people in Kampung Lubuk Legong, Kedah.
“There is an assumption that people in rural areas are not interested in technology, but you will be surprised as they all have their own smartphones,” Taufik says.
Meeting the mark
Farouk Azim explains that several important requirements had to be met before a new language could be approved for launch as an official Wikipedia page.
“They have targets to ensure the communities are focused on continuously improving the website, adding more articles, and maybe even growing the number of contributors,” he says.
The requirements include three months of regular editing, with each member making about 11 edits per month. Farouk Azim adds that the articles must also be high quality, with citations or references to reliable sources.
“They also need to be able to contact a language expert to verify the entries to ensure that the texts are based on actual terms spoken in the language and not just gibberish,” he says.
He adds that there are linguists and community experts that members often contact to verify their entries.
“Translating an existing Wikipedia article is not just limited to the words on the subject, but they also have to ensure that technical terms on the website such as ‘login’ are accurate as well,” he says.
When WCUGM sets out for community engagement, Farouk Azim says they have to make sure that the community involved has access to the Internet with their own devices.
“We have to research whether they have Internet capabilities because, for now, we do not have capabilities to aid communities that have no Internet at all,” he says, adding that there are plans to seek help from other organisations.
Farouk Azim announces that soon, more Wikipedia pages in local languages will be launched, including Bajau-sama (spoken by ethnic groups in Sabah), Iban (the largest ethnic group in Sarawak with various dialects), and Semai (used by the Orang Asli communities in Pahang and Perak).
He is also hopeful that one day the content of ethnic local languages in Wikipedia will be enough to train artificial intelligence (AI) models.“Of course, our contributors can’t use AI tools for the translation work simply because there’s not enough data to train the existing models.
“This is why we’re committed to bringing more of these languages to Wikipedia, with the goal of eventually gathering enough data to train AI tools. We view our efforts as the beginning of a much larger journey, rather than the end,” he says.