In our previous post we looked at the impact of Large Language Models on the classroom and reached the pessimistic conclusion that whilst they aren’t currently good enough to provide educationally useful services, they are so good at cheating that schools are going to have to change what they do as a result.
Perhaps we are being too pessimistic - but if we’re not, and if LLM cheating is a genuine problem, what should schools do?
The really key question is what to do about unsupervised written tasks. Currently most schools will set tasks like this at some point – either everyday homework tasks, or more formal and high stakes tasks that may contribute towards national qualifications. Previously, it would have been really hard for students to cheat at these tests, but now it’s incredibly easy.
How should schools respond? Here are some possible options.
ONE: Keep unsupervised written tasks. Allow students to use LLMs, either completely or with some restrictions.
This is a popular suggestion, for a few reasons. Some will say that LLMs are the tool of the future, so it’s actively a good thing to get students to use them. Others will say that we should teach students to use them judiciously and to sign academic integrity statements stating that they have only used them in certain ways.
I think what these arguments miss is the fundamental reason we set assessments to begin with. The point of an assessment is not the product but the process. The value of the work students produce in an assessment is not in the work itself but in understanding it represents and the thinking that went into creating it. This obviously true of formative assessments, but it’s also, perhaps surprisingly, true of summative assessments as well. The point of a summative assessment is not the quality of work that is produced but the inferences that work can support. LLMs completely alter these inferences.
TWO: Keep unsupervised written tasks, but change them so that LLMs are unable to help.
One suggestion I have heard a few times is to set assessments that require students to critique the response from an LLM, or to write an essay and also write a supporting statement explaining how they went about writing the essay.
The problem with this is that LLMs are pretty good at writing these critiques and supporting statements too.
THREE: Keep unsupervised written tasks, but back them up with a viva, where the student has to defend their written work to a panel of experts
This is currently used for a lot of advanced degrees, but it is incredibly resource- intensive and completely unscalable. I suspect that even in small-scale formats there would also be huge issues with reliability.
FOUR: Get rid of unsupervised written tasks. Replace them with supervised written tasks or unsupervised non-written tasks that are harder to game.
A) Supervised written tasks in class
Instead of setting written tasks for homework, set them in class instead. This definitely solves the cheating problem, but it does mean losing a lot of class time.
B) Supervised written tasks to be completed in after school “prep” clubs
To solve the problem of losing precious learning time, have supervised after school homework or prep clubs where students complete extended writing tasks. A lot of boarding schools run prep sessions like this as standard. Perhaps it’s something that day schools need to think about. The problem is that this is also resource-intensive and not every student will be able to attend.
C) Unsupervised tasks on timed online apps
Keep setting unsupervised tasks, but not writing tasks. There are short activities on online apps which can be timed and are therefore much harder to cheat at.
D) Unsupervised non-written tasks - revision for in-class tests
If the homework task is to revise or prep for an in-class test, that reduces the incentives to use LLMs - but this also means using class time for written assessments.
What about A-levels / university assessments?
You’ll have noticed that all my solutions above basically involve getting rid of unsupervised writing assessments. You might be thinking, well maybe that can work at school, but what about A-level or university? Lengthy unsupervised writing assessments are a vital part of those courses and not one that can be so easily replaced by shorter supervised assessments.
I agree. Solving this problem for A-level and university assessment is going to be a lot harder. But I still think it is a problem that has to be solved, and it would be interesting to see some proposals from universities who are taking the threat seriously.
Which schools are coping the best?
In the UK, state schools can only sign up for certain regulated qualifications which have quite a limited amount of coursework. Independent schools have a wider choice, and many of them have chosen qualifications with a much larger element of coursework or non-examined assessment.
The nature of the final exam often determines the shape and structure of the curriculum in previous years. That isn’t always a good thing, and there’s a huge list of ways in which exam pressures contribute to poorer education outcomes. These are often referred to as ‘negative washback’ in the literature.
However, it is probably only fair to point out that there are also examples of positive washback, and this may well be one of them. I can see the difference with the subject I know best, English Language. LLMs are less of an issue for state schools, because they are already in the habit of doing lots of timed supervised writing tasks to prepare for exams. Even the writing tasks they set for homework are less at risk, because students know there is nothing to be gained by cheating at them if the final exam is going to be supervised.
By contrast, independent schools spend a lot of time teaching and preparing for coursework tasks, and if the coursework task contributes 40% to a high stakes qualification then there really is an incentive to cheat. It’ll be interesting to see how this one plays out. Will students at independent schools get less experience of practising independent writing?
Yes, the Viva or oral assessment will be resource intensive, but it is scalable if we transfer the amount of time dedicated to marking written pieces to the oral assessment. I discuss this here: https://paulgmoss.com/2023/09/26/doing-less-to-do-more/
Can you clarify? There seems to be a typo. Is it "not in the work itself but in the understanding it represents"? Or perhaps "not in the work itself but in understanding what it represents"?