A common misconception surrounding GenAI, as mentioned by TEQSA in their GenAI toolkit, is the notion that works generated by AI are easily detectable. This belief helps markers and Unit Coordinators rely on detection tools such as Turnitin AI detection.
However, a recent study at the University of Reading in the UK (Scarfe et al., 2024) showed that the vast majority of 100% AI generated assessments went undetected. Additionally, these AI submissions scored 6% higher on average in the assessment compared to real student work. This raises critical questions about the current role of AI within education and whether educators should continue to simply rely on Turnitin Checker as their primary measure against generated content.
The study, completed in 2024, included AI responses to both short answer questions (SAQ) and essay-style questions, submitted to markers as any student would. Surprising, only 6% of AI submissions were flagged as ‘suspicious’, and of those, half were for suspected AI in particular. While the essay submissions were run through a similarity/AI detection tool as part of the marking process, none of the essay submissions were flagged as AI. In addition, it was 83% likely that any AI submission would outperform a student, calling into question the design of the assessment, rather than the performance of the cohort, raising whether an ‘essay-style’ assessment really gauges student understanding.
AI detection tools are notoriously bad at detecting AI in texts under 300 words, so SAQ and multiple-part exams are particularly at risk to have AI go through undetected. The study concludes that “students could cheat undetected using AI and in doing so attain a better grade then those who did not cheat.”
While AI writing may have a certain voice, AI models are constantly advancing, as are the people writing the AI prompts. By installing or using a ‘humaniser’, GenAI text can be rewritten in a more ‘human-style’, intended to bypass AI detectors that look for text with low perplexity and burstiness. Prompts can also be engineered to ensure that the text generated has more variation in tone and structure, such as “Please use a wide variety of words and phrases to make the text more engaging and less predictable.”.
Consequently, the faith in AI detection sites such as Turnitin AI Detection, and in human markers being able to correctly identify GenAI content is becoming increasingly inadequate as LLMs and the students behind them consistently improve.
Rather than solely relying on AI detection tools, we need to consider how AI may be used within an assessment context, and how to promote authentic learning. If AI can score higher marks than real students, it might show an insufficiency in appropriate assessment design, rather than a failure in education or learning. Simply put, do not believe the GenAI is always detectable, and do not base teaching, assessments, or education on the assumption that it is. We must stay up-to-date with emerging technologies, and embrace the change, rather than try to fight it.
Read the study here: