Six things schools need to know about AI for the year ahead
Are schools ready for the impact of large language models?
This is the first September where schools and universities will have to grapple with the impact of Large Language Models (LLMs) like Chat GPT.
Are they ready?
Based on the research we've carried out at No More Marking, here are six key aspects of LLMs we think education leaders should know about.
ONE: LLMs are out there in the wild
There's limited data on this for school students, but it does seem that amongst university students, LLMs have been widely adopted.
Even if students haven't specifically heard of ChatGPT or LLMs, they might still be using them. I spoke to a group of school students last year who had never heard of ChatGPT or LLMs. But they did tell me that if you asked Snapchat nicely it would do your homework for you. How is that possible? Snapchat has integrated an AI chatbot powered by ChatGPT.
TWO: LLMs are great at language
They can write really well. They barely make any technical errors, and they are excellent at mimicking the tone or style of particular writers or genres. We did some research last year showing that they could perfectly and accurately produce writing that aced a test of writing for 8-year-olds and fooled most teachers.
THREE: LLMs make a lot of factual errors
There is an assumption that LLMs make the types of errors that you would get when Googling a topic - common misconceptions or perhaps deliberate misinformation.
But this is only part of the problem with LLMs. Yes, they will repeat the kinds of misconceptions and misinformation that are out there already on the internet. But in addition to this, they will make basic maths errors, and they invent completely new and plausible "facts" that are totally incorrect. In the discussions I've had with school leaders, they are often really surprised when I explain this. As one head said to me, I'd heard they made mistakes but I didn't realise they weren't as good as a pocket calculator!
FOUR: So they are not great at independent teaching, resource creation, or assessment
This is more controversial - but I think the error rate and error type of LLMs limits their educational applications. I don’t think they can be used as independent personal tutors, because the potential for confusion and misunderstanding is huge.
What about helping teachers with resource creation and lesson planning? Again, I don’t think LLMs can operate independently. Teachers will need to check and re-check their outputs. Depending on an individual teacher’s workflow, perhaps an LLM will be able to save them some time, but it is not a magic silver bullet for workload problems.
Their unreliability also means they are not well-equipped to assess students' work. We've researched this extensively - see here and here for a start.
FIVE: They are good at taking assessments
Although LLMs make a lot of factual errors, they are still pretty good at passing exams. They do really well on tests measuring writing proficiency, but also on written assessments of other topics too. And they can pass professional exams with pretty high scores. (But note, as we've shown here, this does not mean they can practice those professions!)
Given this, LLMs are a threat to the traditional model of written assessments that are completed out of class in unsupervised conditions. It's now much harder to tell if this work is really being completed by the student or if they have got the LLM to do it for them.
There are some who would argue that this is not a big deal - why not let students use LLMs for help?
We disagree, and have explained why here. Briefly: the point of an assessment is not the quality of the final product, but the inferences you can make about a student's thinking based on that final product. Work completed by LLMs destroys those inferences.
SIX: There is no reliable way of spotting if students are using them to cheat
Can you use AI to spot AI cheaters? A lot of tools claim this is possible, but most of the emerging data shows they don't work reliably. Not only that, but as well as failing to identify AI writing, they also misclassify real human writing as AI. In practice, this means false accusations of cheating, which are incredibly serious and enormously corrosive to classroom relationships.
So what should schools & universities do?
Being good at language and bad at accuracy is a toxic combination. It basically means that LLMs are really good at educationally & socially harmful things like cheating, but really bad at educationally & socially useful things like aiding instruction, assessing accurately, etc.
This is obviously the worst of both worlds.
Normally, if a new educational technology came along that didn't add much positive value, I'd just say we should ignore it. But because LLMs are so good at doing bad things, you can't ignore them. Schools and universities are going to have to respond in some way.
In our future posts we will look at possible ways for educational institutions to respond to this challenge.
really enjoyed this Daisy, thank you. I wanted to ask about point 2 and point 6: every time I've asked it a question similar to the types of question I might ask my students, it's been extremely easy to spot a GPT answer due to length and grammatical/syntactical *accuracy*. of course, i can spend a long time refining the prompt to get it to resemble student work more, but most students won't be doing this (presumably).
when you did your study with the 8 GPT essay-seeds, how much prompting did you have to do before you got an essay that you thought *could* fool a teacher, or were they all in "one take"?
Our school has adopted a structured assessment response to AI:
Red Tasks: No generative AI permissible
Yellow Tasks: Some generative AI is permitted
Green Tasks: generative AI is expected
By focusing on the core assessment constructs of a task helps determine what category they should be.
More explanation here:
https://adriancotterell.com/2023/06/05/focus-on-the-assessment-construct/