So-called generative artificial intelligence programs, like ChatGPT, are capable of creating texts in a matter of seconds. This has prompted some researchers to develop software capable of detecting writing produced by AI. But these new tools are not without their shortcomings, as a new study reveals.
Scientists at Stanford University in the USA have been studying AI-generated text detection tools, testing seven of them. To this end, they submitted 91 essays written by non-native English speakers as part of the TOEFL (Test of English as a Foreign Language) English proficiency test.
The research team found that these software programs incorrectly identified these texts as being produced by AI, rather than humans. One of them even claimed that 98% of these writings were the product of artificial intelligence.
Surprisingly, these AI text detection tools had a much easier time recognizing the essays of US eighth-graders. They managed to identify them correctly in 90% of cases, the researchers explain in their study, published in the journal, Patterns.
These results demonstrate that AI text generators have a very particular style. ChatGPT and other generative AI tools tend to produce texts that verge on perfection: they don't contain the slightest spelling or grammatical error. But they're pretty basic, and they don't contain complicated grammatical constructions or more unusual words (formal terms, slang, etc).
AI-produced texts give "the illusion of correctness," according to Melissa Heikkilä, a journalist specializing in matters of artificial intelligence. "The sentences [these large language models] produce look right – they use the right kinds of words in the correct order. But the AI doesn’t know what any of it means. These models work by predicting the most likely next word in a sentence. They haven’t a clue whether something is correct or false, and they confidently present information as true even when it is not," she writes in the MIT Technology Review.
AI-generated text detection tools have been designed to recognize these stylistic features. They rely on algorithms that assess the complexity of a piece of writing by looking, for example, at the words and turns of phrase used. If they are rudimentary, the software will tend to assume that the text is the work of AI, not a person with a limited vocabulary.
"If you use common English words, the detectors will give a low perplexity score, meaning my essay is likely to be flagged as AI-generated. If you use complex and fancier words, then it's more likely to be classified as human-written by the algorithms," says senior study author, James Zou, of Stanford University, in a statement.
As a result, the researcher and his colleagues urge caution in the use of AI text detection software, particularly in educational or professional settings. These tools are not 100% reliable and, above all, can be easily fooled by changing a few words or turns of phrase. – AFP Relaxnews