Why artificial intelligence often struggles with math


AI can write poetry, but it struggles with math; AI’s math problem reflects how much the new technology is a break with computing’s past. — The New York Times

In the tech world, one class of learners stood out as a seeming enigma. They are hardworking, improving and remarkably articulate. But curiously, these learners – artificially intelligent chatbots – often struggle with maths.

Chatbots such as Open AI’s ChatGPT can write poetry, summarise books and answer questions, often with human-level fluency.

These systems can do maths, based on what they have learned, but the results can vary and be wrong.

They are fine-tuned for determining probabilities, not doing rules-based calculations. Likelihood is not accuracy, and language is more flexible, and forgiving, than maths.

“The AI chatbots have difficulty with maths because they were never designed to do it,” said Kristian Hammond, a computer science professor and AI researcher at Northwestern University.

The world’s smartest computer scientists, it seems, have created AI that is more liberal arts major than numbers whiz.

That, on the face of it, is a sharp break with computing’s past. Since the early computers appeared in the 1940s, a good summary definition of computing has been “maths on steroids”.

Computers have been tireless, fast, accurate calculating machines. Crunching numbers has long been what computers are really good at, far exceeding human performance.

Traditionally, computers have been programmed to follow step-by-step rules and retrieve information in structured databases. They were powerful but brittle. So, past efforts at AI hit a wall.

Yet, more than a decade ago, a different approach broke through and began to deliver striking gains.

The underlying technology, called a neural network, is loosely modelled on the human brain.

This kind of AI is not programmed with rigid rules, but learns by analysing vast amounts of data. It generates language, based on all the information it has absorbed, by predicting what word or phrase is most likely to come next – much as humans do.

“This technology does brilliant things, but it doesn’t do everything,” Hammond said. “Everybody wants the answer to AI to be one thing. That’s foolish.”

At times, AI chatbots have stumbled with simple arithmetic and maths word problems that require multiple steps to reach a solution, something recently documented by some technology reviewers. The AI’s proficiency is getting better, but it has shortcomings.

Speaking at a recent symposium, Kristen DiCerbo, chief learning officer of Khan Academy, an education non-profit that is experimenting with an AI chatbot tutor and teaching assistant, introduced the subject of maths accuracy. “It is a problem, as many of you know,” DiCerbo told the educators.

A few months ago, Khan Academy made a significant change to its AI- powered tutor, called Khanmigo.

It sends many numerical problems to a calculator program instead of asking the AI to solve the maths. While waiting for the calculator program to finish, students see the words “doing maths” on their screens and a Khanmigo icon bobbing its head.

“We’re actually using tools that are meant to do maths,” said DiCerbo, who remains optimistic that conversational chatbots will play an important role in education.

For more than a year, ChatGPT has used a similar workaround for some maths problems. For tasks such as large-number division and multiplication, the chatbot summons help from a calculator program.

Maths is an “important ongoing area of research”, OpenAI said in a statement, and a field where its scientists have made steady progress.

Its new version of GPT achieved nearly 64% accuracy on a public database of thousands of problems requiring visual perception and mathematical reasoning, the company said. That is up from 58% for the previous version.

The AI chatbots often excel when they have consumed vast quantities of relevant training data – textbooks, drills and standardised tests.

The effect is that the chatbots have seen and analysed very similar, if not the same, questions before. A recent version of the technology that underlies ChatGPT scored in the 89th percentile in the maths SAT test for high school students, the company said.

The technology’s erratic performance in maths adds grist to a spirited debate in the AI community about the best way forward in the field. Broadly, there are two camps.

On one side are those who believe that the advanced neural networks, known as large language models, that power AI chatbots are almost a singular path to steady progress and eventually to artificial general intelligence, or AGI, a computer that can do anything the human brain can do. That is the dominant view in much of Silicon Valley.

But there are sceptics who question whether adding more data and computing firepower to the large language models is enough. Prominent among them is Yann LeCun, chief AI scientist at Meta.

The large language models, LeCun has said, have little grasp of logic and lack common-sense reasoning. What’s needed, he insists, is a broader approach, which he calls “world modelling”, or systems that can learn how the world works much as humans do. And it may take a decade or so to achieve.

In the meantime, though, Meta is incorporating AI-powered smart assistant software into its social media services, including Facebook, Instagram and WhatsApp, based on its large language model, LLaMA. The current models may be flawed, but they still do a lot.

David Ferrucci led the team that built IBM’s famed Watson computer, which beat the best-ever human Jeopardy! players in 2011.

Like most computer scientists, Ferrucci finds the latest AI technology undeniably impressive – but mainly for its language skills, not for its accuracy.

His startup, Elemental Cognition, develops software to improve business decision-making in fields such as finance, travel and drug discovery. It uses large language models as one ingredient, but also more rules-based software.

That structured software, Ferrucci said, is the computing infrastructure that currently runs much of the world’s essential systems, such as banking, supply chains and air traffic control. “For a lot of things that really matter, painful precision is required,” he said.

Kirk Schneider, a high school maths teacher in New York, says he views the incursion of AI chatbots into education as inevitable. School administrators can try to ban them, but students are going to use them, he said.

Schneider still has some qualms. “They’re usually fine, but usually isn’t good enough in maths. It’s got to be accurate,” he said. “It’s got to be right.”

Those occasional slipups have turned out, though, to be a teaching opportunity. Schneider often divides his classes into small groups of students, and the chatbot answers can be a focal point of discussion. Compare your answer to the bot’s. Who’s right? How did each of you arrive at your solution?

“It teaches them to look at things with a critical eye and sharpens critical thinking,” he said. “It’s similar to asking another human – they might be right and they might be wrong.”

It seems like a life lesson for his students, one worth remembering long after they have forgotten the Pythagorean theorem: Don’t believe everything an AI program tells you. Don’t trust it too much. – The New York Times

Maths

   

Next In Tech News

Study: New coating can make China’s stealth aircraft invisible to anti-stealth radar
Apple chief returns to China as Beijing prepares to fete CEOs
Meet the San Francisco billionaire who paid US$6.2mil for a single banana
Shazam rides high, reaching 100 billion song recognitions
South Korea prosecutors seek 5-yr jail term for Samsung Elec chief in appeals case
Cyberattacks cost British businesses $55 billion in past five years, broker says
Rio Tinto-backed lithium tech startup set to raise second round of funds
Maybank: DuitNow Transfer and DuitNow QR services temporarily unavailable (Nov 25)
French government offers to buy Atos' advanced computing activities
Diehard gamers are fuelling demand for esports hotels in China

Others Also Read