You are here

How to spot an AI cheater

"Labyrinthian mazes". I don't know what exactly struck me about these two words, but they caused me to pause for a moment. As I read on, however, my alarm bells started to ring. I was judging a science-writing competition for 14-16 year-olds, but in this particular essay, there was a sophistication in the language that seemed unlikely from a teenager.

I ran the essay through AI detection software. Within seconds, Copyleaks displayed the result on my screen and it was deeply disappointing: 95.9% of the text was likely AI-generated. I needed to be sure, so I ran it through another tool: Sapling, which identified 96.1% non-human text. A third confirmed the first two, but was slightly lower in its scoring: 89% AI. So then I ran it through yet another software called Winston AI. It left no doubt: 1% human. Four separate AI detection softwares all had one clear message: this is an AI cheater.

I had known for some time that AI-written content was causing serious challenges to many industries, including my own profession of journalism. Yet here I was, caught by surprise because a student thought it would be acceptable to submit an AI-drafted entry for a writing competition. Of course, students trying to cheat isn't anything new. What struck me was the possibility that the intentional use of AI could be more widespread than I had realised. Staring at the fake student essay before me, I couldn't help but worry. As a mother to a young eight-year-old child with a whole lot of educational journey still before her, seeing AI used by a school-child caused me great concern about the integrity and value of the learning process in the future.

So, how might we spot the AI cheaters? Could there be cues and tells? Fortunately, new tools are emerging. However, as I would soon discover, the problem of AI fakery spans beyond the world of education – and technology alone won't be enough to respond to this change.

In the case of student cheating, the reassuring news is that teachers and educators already have existing tools and strategies that could help them check essays. For example, Turnitin, a plagiarism prevention software company that is used by educational institutions, released AI writing detection in April. Its CEO Chris Caren told me that the software's false positive rate (when it wrongly identifies human-written text as AI) stands at 1%.

There are also web tools like the ones I used to check my student essay, like Copyleaks, Sapling, and Winston AI, or others like GPTZero and the "AI classifier" released by OpenAI, the creator of ChatGPT. Most are free to use: you simply paste in text on their websites for a result.

How can AI detect another AI? The short answer is pattern recognition. The longer answer is that checkers use unique identifiers that differentiate human writing from computer-generated text. "Perplexity" and "Burstiness" are perhaps the two key metrics in AI text-sleuthing.

Perplexity measures how well a language model performs in writing good, grammatically correct, possible sentences – in short how well it predicts the next word. Humans tend to write with different perplexity to AIs, with more unpredictable and diverse sentences.

Burstiness refers to the variance of the sentences. In written text, AI tends to be more uniform across the board: its sentence structure and lengths are generally regular and less creative in its word choice and usage of phrases. The frequency and combination of terms, repeated phrases and sentence structures create clusters that lack the variation of an extended vocabulary and flourishing style that a human-written text would normally display.

However, AI is getting ever-better at sounding human. And already it's clear that these spotting tools are not foolproof. In a recent paper by researchers at Stanford University, GPT detectors showed bias against non-native English writers. They evaluated the performance of seven widely used GPT detectors on 91 TOEFL (Test of English as a Foreign Language) essays from a Chinese forum and 88 US eighth-grade essays from the Hewlett Foundation’s ASAP (Automated Student Assessment Prize) dataset. The detectors accurately measured the US student essay but falsely labelled more than half of the TOEFL essays as "AI-generated" (average false-positive rate: 61.3%). 

To GPTZero's CEO Edward Tian, detection is only half the solution. He believes that the cure to irresponsible AI usage is not detection, but in new writing verification tools. This would help to restore transparency in the writing process, he says. His vision is of enabled students who transparently and responsibly disclose AI involvement as they write. "We started building the first human verification tool for students to prove they are the writer," Tian says.

Human in the loop

Here is the real challenge for humans as AI-produced writing spreads: we probably cannot rely on tech to spot it. A sceptical, inquisitive attitude toward information, which routinely stress-tests its veracity, is therefore important. After all, I only thought to check my student essay with an AI-checker because I was suspicious in the first place.

The war on disinformation has already shown us that automated tools alone do not suffice, and we need humans in the loop. One person who has seen this first-hand is Catherine Holmes, legal director of the Foreign, Commonwealth and Development Office at Whitehall, who has been working within UK's national security departments for decades. When seeking to corroborate information that could be false, she says, people's judgement remains vital. "You are trying to figure out whether this bit of information is actually accurate based on a human being's actual insight."

It's the same in the world of fraud. At global accounting firm PricewaterhouseCoopers, where forensic services director Rachael Joyce helps clients with investigations into fraud and misconduct, human oversight and insight is a key part of the process: "The human element brings a layer of critique and expertise of context to investigations that AI is not very good at."

So, what AI-checking can you do yourself? Over the past few years, I've been researching and writing a book called The Truth Detective, which is about how to enhance your critical thinking. Here are some basic questions I've learnt that could help you get started with your AI detective work.

Your first task is to verify. Can you verify and check the sources? Can you check the evidence – both written and visual? How do you do this? Cross-check. If you cannot cross-check or find supporting material from other reputable sources, your suspicions should be raised. "There is this hallucination problem with generated AI where it will make things up," says Caren from Turnitin. "Factchecking is super important as a consumer of the content or as someone who's using AI to help them be more productive."

The next step is to take a closer look at the text. Some clues can be found in spelling, usage of grammar and punctuation. For now, the default language for AI is still American English. If the spelling and grammar is not appropriate for the publication or the author writing it, ask: why? Does it include quotes? If so who are the quotes by – do these people or institutions exist?  Do this also for any references used, and check what date they are from: a tell is that AI is often still limited in terms of what data source it can access, and it can be unaware of recent news. Are there any references to specific knowledge? A lack of it may indicate fraudulence.

Finally, check the tone, voice and style of writing. There are linguistic patterns that are still stilted in AI-generated text (at least for now). A particular giveaway is an abrupt change in tone and voice.

The following example is perhaps a stark reminder that AI can easily make things up that can seem plausible and very real, but absolutely need cross-checking. 

In June 2023, in what the courts described as an "unprecedented" situation, Steven A Schwartz a lawyer in New York, tried to file a motion that got himself into the hotseat with the judge. Why? The citations and judicial opinions he submitted simply did not exist. He had used ChatGPT, which had assured him the cases were real and that they could be found on legal research sites, such as Westlaw and LexisNexis. As an example, in response to Schlowwartz's request to "show [him]" evidence for a case, ChatGPT responded: "Certainly! Here’s a brief excerpt…" It then continued to provide extended hallucinated excerpts and favourable quotations. Schwartz said he was mortified. He had believed ChatGPT to be a search engine similar to Google.

Not all cases will be this blatantly obvious, however. So, as we all glide into an artificially drafted future, it's clear that a human questioning mindset will be needed. Indeed, our investigative skills and critical thinking techniques could be in more demand than ever before.

Alex O'Brien