No, and you wouldn't ask an undergraduate who had flunked English to mark them either.
What prompts did you give chatGPT to mark with? Evaluating the prompt you used is fundamental to getting GPT to do anything for you, especially something as complex as marking. I'm little surprised you saw worse performance with GPT-4 being that every piece of research I've seen indicates it performs better than 3.5 in pretty much every task.
AQA are researching whether it is possible to mark short answer STEM questions with AI, but last I heard without much success.
Great, insightful article. I wonder whether you've looked at, or have plans to look at using GPT-4 to mark STEM subjects?
I just want to say thanks for this valuable research. Love you real pupil real classroom guys.
Have I misunderstood the test? Wouldn’t a better test of ChatGPT v No More Marking be to make it do comparative judgement with all the scripts, and rank them accordingly, rather than use a mark scheme? Isn’t the point of No More Marking that mark schemes are ambiguous?