AI Powered Essay Marking
This week we will release our AI-enhanced Comparative Judgement site for general use to our customers, a landmark that represents the end of a long journey.
It began, as many of these journeys do, with a dinnertime message from Daisy last December.
‘I think this thing really can mark. I’ve just given it four essays and the marks are credible.’
I did not know it, but that was the last civilised dinner I was to eat for very many strange and terrible days.
We’d been aware of GPT for some time, but after getting it to write various stories in the style of Samuel Beckett, we had dismissed it as an amusing novelty, a surprisingly good pastiche artist, but nothing more. There was nothing in the literature that suggested it could do anything more than predict the next word in a sequence. How on earth, then, could it produce marks?
I went to a box room at the top of the house and locked myself in, in order to be alone with my aching miseries.
There began 3 dark months wrestling with GPT in my shed, while sending Daisy despatches as she toured Australia. Here’s a flavour:
Chris: The problem with GPT is there is no clue to where the number comes from!
Daisy: I keep thinking that we are still only using GPT-3 for our website, and yet GPT-3.5 is a huge improvement. So we have to remember that with everything we do there is a huge upgrade that is waiting in the wings.
Daisy: GPT is a step up from spellcheckers in the same way that Hal9000 is a step up from a caveman with an axe!
It seems to me now almost incredibly wonderful that, with that swift fate hanging over us, men could go about their petty concerns as they did.
At the height of our GPT madness:
Daisy: It is now judging with no mark scheme- is that right?
Chris: Yup
Daisy: Chris this is TOTALLY INSANE. I can’t even!!
Chris: The beauty of this, is that it extracts the mark scheme from the work
No one would have believed in the last years of the nineteenth century that this world was being watched keenly and closely by intelligences greater than man
And the depths:
Chris: Results are underwhelming to say the least. I’ve put the text of the pupils with the highest and lowest human score by AI average. I’m starting to think GPT is a mass hallucination.
With wine and food, the confidence of my own table, and the necessity of reassuring my wife, I grew by insensible degrees courageous and secure.
According to the trite example over-shared on Twitter I was undergoing the usual WOW! to disillusionment journey that would eventually allow me to see the light, and understand the purpose of GPT and embrace its full potential? So is Monday’s release that moment?
No.
I would love to say that I have gained a better understanding of GPT and LLMs in general over the last 3 months, but I am still left with all the questions I had at the start.
How does GPT produce a mark? Is there an underlying model of language value abstracted away with some kind of model that OpenAI are training?
Does GPT genuinely use the mark scheme or is its feedback a pastiche of the mark scheme and essay?
If pressed I would say my sympathies are with computational linguist Prof Emily M. Bender from the University of Washington:
These things are systems for haphazardly stitching together bits of the training data to come up with something that sounds plausible, but there’s no thought, communicative intent and there’s no reasoning or truth or anything like that.
Now even the PE reports I wrote for my pupils back in the early 90s had some communicative intent even if they were a little light on truth! Oh Emily where were you back in January before I locked myself in my shed?
So, you may ask, why are we releasing a GPT powered feedback engine for our subscribers?
The answer is simple. We, and Emily, might just be wrong.
“Yet across the gulf of space, minds that are to our minds as ours are to those of the beasts that perish, intellects vast and cool and unsympathetic, regarded this earth with envious eyes, and slowly and surely drew their plans against us.”
Footnote
If you’re interested in learning more, you can take part in our info webinar on the 15th June at 4pm. You can also read some of the research we have published over this period.
I have really valued NMM’s skeptical stance and insights on Generative AI - but I am having trouble understanding this post! It - and much of your recent work - calls ChatGPT into question, and yet you are launching a product using the very same technology? Is this a change of heart or a new (but skeptical) test of its functionality? Please do keep posting your ongoing thought processes and testing as most of us don’t have the intellectual commitment to spend months in a shed doing this sort of hard thinking — and we very much appreciate its fruits!
Has there been an update to the privacy policy to accommodate these changes?