Alibaba reveals progress with large language model research as Chinese Big Tech firms continue to push for ChatGPT rival


By Ann Cao

A group of researchers from DAMO Academy have unveiled a new audiovisual language model called Video-LLaMA. The new DAMO Academy model is an enhancement from previous vision-LLMs as it can tackle two challenges in video understanding. — SCMP

Alibaba Group Holding’s in-house research unit is making progress with its own large language models (LLMs), as Chinese Big Tech companies continue to pile into the artificial intelligence (AI) space in an attempt to come up with a rival to OpenAI’s ChatGPT.

A group of researchers from DAMO Academy unveiled a new audiovisual language model called Video-LLaMA, which helps the system to understand visual and auditory content in videos, in a research paper published last week on ArXiv, an online scientific paper repository.

The codes have also been open-sourced by the researchers on online developer community GitHub. Alibaba owns the South China Morning Post.

LLMs, which are trained through machine learning, are the underpinning of AI-powered chatbots like ChatGPT. LLMs allow the chatbots to answer sophisticated queries, generate detailed writings, code, or other content.

The new DAMO Academy model is an enhancement from previous vision-LLMs as it can tackle two challenges in video understanding: capturing the temporal changes in visual scenes and integrating audiovisual signals, according to the three researchers, Zhang Hang, Li Xin and Bing Lidong.

In a case demonstrated by the researchers, when given a video of a man playing saxophone on stage, the model was able to describe in text both the background sound of applause and visual content of the video. By comparison, previous models, such as MiniGPT-4 and LLaVA, mainly focus on static image comprehension, the researchers said.

Meanwhile, the researchers noted that the model is still “an early-stage prototype” with a few limitations, such as its limited ability to handle long videos including films and TV shows.

The move comes as a part of broader efforts by Alibaba, which is in the midst of its largest-ever corporate restructuring, to double down on its investment in the development and application of LLMs.

Alibaba’s cloud unit in April unveiled its own alternative to ChatGPT – Tongyi Qianwen – which is based on DAMO’s LLMs, marking one of the earliest Chinese companies to join the ChatGPT bandwagon, along with search engine giant Baidu which launched its Ernie Bot in March. The service had received more than 200,000 beta testing applications from corporate clients, Alibaba chairman and CEO Daniel Zhang Yong said in a conference call with analysts last month.

DAMO first introduced its LLM called AliceMind last September, when deputy head Zhou Jingren unveiled it at the World AI Conference in Shanghai. He described it as a multimodal pre-trained language model that is able to process different types of inputs including text, images, audio, and video.

Alibaba has started to work with partners to develop industry-specific AI models, Zhang said. For instance, it is planning to launch cloud products and enterprise solutions based on its AI model, and integrate AI capabilities into various products, including its workplace collaboration tool DingTalk. – South China Morning Post

Follow us on our official WhatsApp channel for breaking news alerts and key updates!
   

Next In Tech News

Polish e-commerce Allegro's unit sues Alphabet for $568 million
Elon Musk's X lifts price for premium-plus tier to pay creators
US crypto industry eyes possible day-one Trump executive orders
Britannica didn’t just survive. It’s an AI company now
'Who's next?': Misinformation and online threats after US CEO slaying
What is (or was) 'perks culture’?
South Korean team develops ‘Iron Man’ robot that helps paraplegics walk
TikTok's rise from fun app to US security concern
Musk, president? Trump says 'not happening'
Jeff Bezos says most people should take more risks. Here’s the science that proves he’s right

Others Also Read