AI Powered Essay Marking

This week we will release our AI-enhanced Comparative Judgement site for general use to our customers, a landmark that represents the end of a long journey.

Jun 03, 2023

It began, as many of these journeys do, with a dinnertime message from Daisy last December.

‘I think this thing really can mark. I’ve just given it four essays and the marks are credible.’

I did not know it, but that was the last civilised dinner I was to eat for very many strange and terrible days.

We’d been aware of GPT for some time, but after getting it to write various stories in the style of Samuel Beckett, we had dismissed it as an amusing novelty, a surprisingly good pastiche artist, but nothing more. There was nothing in the literature that suggested it could do anything more than predict the next word in a sequence. How on earth, then, could it produce marks?

I went to a box room at the top of the house and locked myself in, in order to be alone with my aching miseries.

There began 3 dark months wrestling with GPT in my shed, while sending Daisy despatches as she toured Australia. Here’s a flavour:

Chris: The problem with GPT is there is no clue to where the number comes from!

Daisy: I keep thinking that we are still only using GPT-3 for our website, and yet GPT-3.5 is a huge improvement. So we have to remember that with everything we do there is a huge upgrade that is waiting in the wings.

Daisy: GPT is a step up from spellcheckers in the same way that Hal9000 is a step up from a caveman with an axe!

It seems to me now almost incredibly wonderful that, with that swift fate hanging over us, men could go about their petty concerns as they did.

At the height of our GPT madness:

Daisy: It is now judging with no mark scheme- is that right?

Chris: Yup

Daisy: Chris this is TOTALLY INSANE. I can’t even!!

Chris: The beauty of this, is that it extracts the mark scheme from the work

No one would have believed in the last years of the nineteenth century that this world was being watched keenly and closely by intelligences greater than man

And the depths:

Chris: Results are underwhelming to say the least. I’ve put the text of the pupils with the highest and lowest human score by AI average. I’m starting to think GPT is a mass hallucination.

With wine and food, the confidence of my own table, and the necessity of reassuring my wife, I grew by insensible degrees courageous and secure.

According to the trite example over-shared on Twitter I was undergoing the usual WOW! to disillusionment journey that would eventually allow me to see the light, and understand the purpose of GPT and embrace its full potential? So is Monday’s release that moment?

No.

Thank you for reading No More Marking. This post is public so feel free to share it.

I would love to say that I have gained a better understanding of GPT and LLMs in general over the last 3 months, but I am still left with all the questions I had at the start.

How does GPT produce a mark? Is there an underlying model of language value abstracted away with some kind of model that OpenAI are training?

Does GPT genuinely use the mark scheme or is its feedback a pastiche of the mark scheme and essay?

If pressed I would say my sympathies are with computational linguist Prof Emily M. Bender from the University of Washington:

These things are systems for haphazardly stitching together bits of the training data to come up with something that sounds plausible, but there’s no thought, communicative intent and there’s no reasoning or truth or anything like that.
https://www.bbc.co.uk/sounds/play/m001md54

Now even the PE reports I wrote for my pupils back in the early 90s had some communicative intent even if they were a little light on truth! Oh Emily where were you back in January before I locked myself in my shed?

So, you may ask, why are we releasing a GPT powered feedback engine for our subscribers?

The answer is simple. We, and Emily, might just be wrong.

“Yet across the gulf of space, minds that are to our minds as ours are to those of the beasts that perish, intellects vast and cool and unsympathetic, regarded this earth with envious eyes, and slowly and surely drew their plans against us.”

Footnote

If you’re interested in learning more, you can take part in our info webinar on the 15th June at 4pm. You can also read some of the research we have published over this period.

No More Marking

Can ChatGPT provide feedback? Our latest research

Back in January we wrote two posts asking if ChatGPT could provide useful feedback to students. We could see that ChatGPT was capable of providing fluent prose paragraphs about the strengths and weaknesses of an essay. In the first post, we outlined our concern that this model of feedback itself is flawed…

2 years ago · 4 likes · Daisy Christodoulou

No More Marking

More GPT marking data - is it better than humans at predicting future grades?

Back in January we started experimenting to see if ChatGPT could mark writing. We were fairly optimistic, and we put quite a bit of work into integrating the GPT-3 API into our website. All this work means it’s now fairly straightforward for us to assess writing using either human judgement or AI judgement, and to compare the two. Here’s a screenshot of…

2 years ago · 4 likes · 1 comment · Chris Wheadon and Daisy Christodoulou

No More Marking

Can GPT-3 mark writing? The data is in...

Back in January we asked whether ChatGPT could reliably mark students’ writing. Since then we have integrated GPT-3 into our Comparative Judgement software and run a trial involving 8 schools. Here’s how the trial worked We ran two tasks: one in Year 5 and one in Year 7. Each trial consisted of four schools who submitted writing from 10 students. All the s…

2 years ago · 7 likes · Daisy Christodoulou and Chris Wheadon

No More Marking

Can ChatGPT provide useful feedback?

In our last post, we looked at the way ChatGPT’s written feedback is superficially impressive but not that helpful pedagogically. In this post, we’ll look at ways it can provide something more useful. Part of the challenge here is that providing good feedback on writing is hard for humans to do, let alone AI. Ideally, you want something that a student ha…

2 years ago · Daisy Christodoulou

No More Marking

Can ChatGPT give feedback?

In our previous post, we looked at how ChatGPT can provide relatively plausible marks for pieces of writing. It can also provide written feedback, which has got everyone excited about the potential time savings. If you give ChatGPT the mark scheme and ask it to give the essay a grade and explain why, it can produce a comment that uses the language of the …

2 years ago · Daisy Christodoulou

No More Marking

Can ChatGPT mark writing?

Can artificial intelligence systems mark more accurately than humans? Definitely, and they have been able to since the 1960s. In 1968, Dr Ellis Page developed Project Essay Grade (PEG), an automated essay marking system. PEG was very reliable. If you gave it the same essay on two different days, it awarded it the same mark, which is definitely not always…

3 years ago · Daisy Christodoulou

Will Orr-Ewing

Jun 4, 2023

I have really valued NMM’s skeptical stance and insights on Generative AI - but I am having trouble understanding this post! It - and much of your recent work - calls ChatGPT into question, and yet you are launching a product using the very same technology? Is this a change of heart or a new (but skeptical) test of its functionality? Please do keep posting your ongoing thought processes and testing as most of us don’t have the intellectual commitment to spend months in a shed doing this sort of hard thinking — and we very much appreciate its fruits!

Expand full comment

2 replies by Daisy Christodoulou and others