What is Comparative Judgement and why does it…

Daisy Christodoulou

Sep 7, 2025

Humans and AI prefer comparative judgements to absolute ones

Read →

7 Comments

Mark Aveyard

Sep 7

You're raising a super important point for teachers to consider in their use of AI.

But since I've spent no time in my life associating the number 4 with a particular shade of purple, the color game isn't a good analogy for marking with absolute standards.

And since I've spent many years reading and writing, I have strong programmed criteria and intuitions about someone's writing ability without requiring comparisons on the same task.

For example, when I first read a substack essay years ago, I didn't sit there in confusion until someone showed me another post on the same topic as a comparison point.

Comparison grading works better under high-constraints testing where the entire point is to differentiate students, even if they all perform really well (or really poorly).

For many other situations, inside and outside of education, absolute criteria are critically important for evaluation.

Expand full comment

Reply (1)

Daisy Christodoulou

Sep 7

So if you were given 8 essays on a topic you know well - and a rubric that you were trained on - do you think you would grade all 8 correctly?

And do you think five other markers who were all as experienced and well qualified as you would agree with you, if they were independently asked to grade all 8 with the same rubric?

Expand full comment

Reply (2)

Mark Aveyard

Sep 7

I didn't mention rubrics because I think they typically force the evaluator to assess writing in an artificial way that reduces external validity. So the "absolute criteria" that I value involves the analytical *and* intuitive judgement of the expert.

And especially when rubrics are used in high-stakes processes (university admissions, graduation, workplace assessments, etc) they will be designed to measure the most objectively-verifiable components of writing, else complaints are raised, lawyers are called, alumni donations are mentioned, and other fun outcomes.

The variability in human judgement that you rightly expect to observe is indeed problematic if we're trying to grade like robots. Robots will definitely do robot things better than humans can, and we should proceed full speed ahead whenever we need that kind of feedback.

Expand full comment

Englishman in Switzerland

Sep 10

In writing, maybe there is no correct grade. It's not like maths.

Expand full comment

Benjamin Woods

Sep 7

Fascinating. I’ve just started experimenting with LLM referenced marking and am finding it very useful. Will refer my head teacher to your material.

On a different note, I would like to suggest that many of the issues identified with absolute marking stem from some very shoddy, widely used rubrics. In the IB and Australian systems, the criteria descriptors for marking essays are written in vague, borderline esoteric language that is barely comprehensible to most teachers, let alone students. When I’m feeling snarky, I think that this is so the respective systems can avoid accountability. If teachers and schools took/had the time to craft clear, task-specific criteria, it would go a long way to building confidence in the process of assessment. Even better, students would have comprehensible guidance about what they need to do and how they can improve.

Thank you.

Expand full comment

Reply (1)

Daisy Christodoulou

Sep 7

There is an interesting research literature on the different styles of rubric. The vague ones have the problems you mention, but the more specific ones arguably have bigger problems - they end up stereotyping responses in unintended ways, and don’t improvise consistency that much. So a lot of big assessment orgs have gone back to the vaguer style. We review the literature at the start of this paper. https://www.tandfonline.com/doi/abs/10.1080/0969594X.2019.1700212

Expand full comment

Mitch

Oct 19

Thanks Daisy

Expand full comment

No More Marking

What is Comparative Judgement and why does it…