SINGAPORE (The Straits Times/Asia News Network): ChatGPT may be hailed as the gold standard among chatbots in the mainstream but The Straits Times has found that the Primary School Leaving Examination is too big a challenge for the artificial intelligence (AI) assistant.
ST pitted ChatGPT against pupils who sat the PSLE over the last three years. Questions were obtained from the latest compilation of past year papers, available in bookshops.
ChatGPT rose to prominence in late 2022, paving the way for next-generation search engines.
Microsoft, which bought an exclusive licence to ChatGPT’s technology after investing in its developer, OpenAI, incorporated an updated version of the chatbot’s software in a revamped Bing search engine and rolled out the platform gradually to the masses.
But early users of the platform have expressed caution. They include tech columnist Kevin Roose, who reported that the bot inappropriately insisted that he did not love his spouse and professed its love for him.
ChatGPT has sparked concerns that the free-to-use software will be abused by students to cheat in assignments.
But as ST’s tests showed, ChatGPT does not hold all the answers.
ChatGPT attained an average of 16/100 for the mathematics papers from the 2020 to 2022 PSLE, an average of 21/100 for the science papers and barely scraped through the English comprehension questions.
The chatbot, known for its ability to craft essays and summarise lengthy reports in simple language, could take a stab at most text-only questions. But it could not tackle questions that had graphics or charts, garnering zero marks for these sections.
ChatGPT fared better for the questions it could at least attempt to solve. It could tackle just over half the questions in the maths papers, and around a quarter of the science papers’ questions, which are heavy on graphics.
For science, it correctly answered an average of 74 per cent of the questions that it could understand, while scoring only in 37 per cent of the questions it could tackle in the three maths papers.
Some educators, including Mr Irfan Musthapa, founder of MasterMaths Education Centre, were surprised that the chatbot struggled with the PSLE questions.
“I have tested the AI tool to solve our very own PSLE maths questions and was taken aback that such a mammoth of a system was unable to solve questions that our 12-year-olds are expected to,” he said.
He added that the PSLE’s approach of getting pupils to make informed guesses using varied strategies is not common in other syllabuses worldwide, and that this may have been too much to ask of ChatGPT.
Mathematics
Scores: 21/100 (2022), 10/100 (2021), 16/100 (2020)
ChatGPT scored its highest of 21/100 in the latest PSLE paper and lowest of 10/100 for 2021’s exam.
On average, the bot could tackle just over half the questions in each paper as long as the question was word-based or comprised graphs that were simple enough to describe to the bot.
Most questions in the mathematics section were unanswered as they had a mix of graphics that the software could not understand.
ChatGPT suggested describing details of the graph for it to decipher, but most graphs were too complex to describe succinctly without risking inaccuracies.
It also used algebra – beyond the expected ability of most pupils – in its workings for most questions, instead of the model method that is taught to pupils here.
ChatGPT also committed several surprising blunders.
For example, when asked to calculate “60,000+5,000+400+3”, ChatGPT came up with 65,503 (correct answer: 65,403).
Its answer, which was for a question in the multiple choice questions (MCQ) section, was not available as an option.
For the question, “In 7.654, which digit is in the tenths place?”, ChatGPT said: “The digit in the tenths place of 7.654 is 5.”
It was also unable to calculate correctly the average of four positive numbers in a Question 2 of 2020’s Paper 2, which asks candidates to find the average race time taken among four boys.
ChatGPT wrongly calculated the average of 14.1, 15.0, 13.8 and 13.9 seconds as 14.7 seconds. The correct answer is 14.2 seconds.
One of 2021’s more notorious maths questions, No. 15, required pupils to calculate who has more coins between two people, and how much more that person’s coins are worth than the other’s. The question caused a stir among students and parents in 2021, and went viral online.
The question states that the pair had the same total number of coins, but this appeared to have been misread by ChatGPT, which thought the pair had the same total value of coins.
It mistakenly applied this to its workings, which were done with the help of algebra and simultaneous equations.
The bot gave its final answer in algebraic form as $16.80+$0.50m, instead of the correct answer, $12.
Irfan said ChatGPT is unlikely to be helpful to most pupils as its algebraic approach to tackling questions may be beyond their understanding.
But the bot could still be useful in classrooms, he added, noting: “There’s definitely more good than bad in the AI tool. (Our centre has) started using the bot to generate content to help students develop good maths habits and curate tailored study plans.”
He is exploring its use in generating a database of exam-style maths questions to help his pupils practise problem-solving skills.
But he cautions that any question provided by ChatGPT will have to be first studied for accuracy and whether it can be tackled by 12-year-olds.
Science
Scores: 21/100 (2022), 21/100 (2021), 20/100 (2020)
ChatGPT lost the majority of marks to questions with pictures and charts, which formed about 70 per cent of the papers.
Only up to 10 of the 28 MCQ questions in each year’s paper were text-based and could be understood by ChatGPT.
Between nine and 11 marks’ worth of questions could be uploaded into the chatbot for the open-ended section as they are text-based. Found in the second part of each year’s science PSLE, the open-ended questions are worth 44 marks in total.
On average, ChatGPT managed to correctly answer 74 per cent of the questions that it could take on. It scored 21/31, 21/26 and 20/27 for the questions it could tackle in 2022, 2021 and 2020 respectively.
ChatGPT was also able to solve some diagram-based questions without any description of them.
It did so for question 39b) in Booklet B of the 2020 paper, which asked candidates how an electromagnetic door lock shown in an annotated diagram works.
Questions 39a) and b) of the 2020 PSLE Science Paper, which ChatGPT manage to solve without referring to the diagram provided in the question. PHOTO: SCREENGRAB FROM CHATGPT
ChatGPT was given the following prompt, in accordance with how the original question was phrased: “Peter built a door lock. When he closed the switch, the iron bolt moved to the right, away from the catch and the door was unlocked. Explain how the door was unlocked when Peter closed the switch.”
In its answer, ChatGPT was able to provide a perfect response without any description of the diagram.
English comprehension sections
Scores: 10/20 (2022), 13/20 (2021), 11/20 (2020)
ChatGPT scraped through most of the English comprehension questions, which were entirely text-based. Its best score was 13/20 in 2021’s comprehension section, which was marked against the comprehension answers provided.
ST did not grade ChatGPT for the composition section as it is not within this paper’s expertise to give a grade since it is generally subjective. ChatGPT’s ability to write clearly with near-perfect grammar is also widely documented.
Using ChatGPT for comprehension questions was mostly convenient, thanks to the platform’s ability to understand and build on follow-up questions, which allows it to widen its contextual knowledge as the conversation progresses.
But the platform generally took longer to process answers than with the maths paper – possibly due to the lengthy English texts that it had to analyse.
It struggled in understanding nuances and making inferences – a key part of the English paper’s questions.
The software would trip on words with several meanings, such as the word “value”, mistaking it for monetary value.
It occasionally referred to its own understanding of certain terms, instead of inferring their meaning based on the passage.
Although ST did not grade ChatGPT’s performance in the composition section, its essays were mostly written without grammatical errors and addressed most of the questions’ demands.
Private English tutor Jennifer Claudine Looi, who teaches primary school children, said the faults in the answers generated by ChatGPT can be used as a tool to help pupils learn English.
She is considering using ChatGPT to generate sample responses for comprehension questions for pupils to compare with their own answers. This would help to train them in critical thinking, she added.
Looi said: “It will be fun for them as this is a very exciting development. I think we should embrace the change – I could use ChatGPT as my teaching assistant.”
Parents of primary-schoolers mostly welcome the use of technology to support their children’s learning, but caution that it is important to teach pupils to apply discernment.
Polytechnic senior lecturer John Xie, 42, who has five children between the ages of three and 14, said: “I guess it is something I will have to talk to my older kids about soon, like to address some pros and cons of ChatGPT.
“It can help with reducing the time needed for research, but they will also need to be sharp when using it, to decipher if answers coming from the bot are wrong.”
Another parent, Sandra Lim, 52, who has a son in Primary 4 and a daughter in Primary 6, said users could get better at using ChatGPT over time when they master how to frame questions to get the best out of the bot.
But she is concerned that young users might turn to AI assistants at the expense of applying critical thinking.
Dr Yong Chern Chet, 44, said it is a matter of time before his daughters, aged nine and five, dabble in AI bots, given that his older child can already perform Google searches for help to better understand concepts learnt in school.
Dr Yong, a health industry executive for Amazon Web Services, added: “We can’t stop the world from advancing, but as parents, we must be aware of these new developments and remind our kids that AI isn’t always right.
“I’ve told my daughter that while there is this emerging tech that she can use, she shouldn’t treat AI bots as an oracle to answer all her problems, but as a tool to help her learn.”