AI detectors ‘biased’ against non-Anglophones, Stanford tests suggest


AI chatbots like ChatGPT have spurred fears about increasing cheating and plagiarism in schools and academia, but tools designed to detect AI-generated content are also causing unease as they tend to ‘incorrectly’ label writing by non-native English speakers as generated by AI, according to researchers at Stanford University. — dpa

DUBLIN: The spread of generative artificial intelligence (AI) has stoked concerns about cheating and plagiarism in education and academia.

In response, a cottage industry of watchdog systems has sprung up, as teachers and publishers turn to so-called detector programmes to scan essays and articles for signs that they were generated by AI chatbot ChatGPT.

But these bot stewards too are causing unease as they tend to "incorrectly" label writing by non-native English speakers as generated by AI, according to researchers at Stanford University.

They put seven of the most-widely-used detectors to the test, using 91 essays written by non-native English speakers for the benchmark Test of English as a Foreign Language (TOEFL).

More than half the essays were dismissed as AI-generated, with one detector turning out to be almost completely off the mark, labelling nearly 98% of the tests as written by AI.

When it came to native speakers, the detectors proved more accurate and were able to "correctly classify more than 90% of essays written by eighth-grade students from the US as human-generated," the researchers said.

At the same time, however, the tests suggested that the use of "complex and fancier words" – not typically a staple of eighth-grade essays – were "more likely to be classified as human written."

Either way, the researchers said their findings mean the detectors should not be seen as reliable.

"Our current recommendation is that we should be extremely careful about and maybe try to avoid using these detectors as much as possible," said Stanford's James Zou.

"It can have significant consequences if these detectors are used to review things like job applications, college entrance essays or high school assignments," Zou warned.

Part of the problem, it seems, is that the detectors are geared up to red-flag "low perplexity" English as AI-generated.

In other words, if you use layman's terms or plain English – or "common words," as the Stanford teams puts it – the detector bots might dismiss you as another bot. – dpa

   

Next In Tech News

US finalizes awards to BAE Systems, Rocket Lab for semiconductor chips
Social media sites call for Australia to delay its ban on children younger than 16
Study: New coating can make China’s stealth aircraft invisible to anti-stealth radar
Apple chief returns to China as Beijing prepares to fete CEOs
Meet the San Francisco billionaire who paid US$6.2mil for a single banana
Shazam rides high, reaching 100 billion song recognitions
South Korea prosecutors seek 5-yr jail term for Samsung Elec chief in appeals case
Cyberattacks cost British businesses $55 billion in past five years, broker says
Rio Tinto-backed lithium tech startup set to raise second round of funds
Maybank: DuitNow Transfer and DuitNow QR services temporarily unavailable (Update on Nov 25: issue resolved)

Others Also Read