4 Comments
User's avatar
Mark Aveyard's avatar

You're raising a super important point for teachers to consider in their use of AI.

But since I've spent no time in my life associating the number 4 with a particular shade of purple, the color game isn't a good analogy for marking with absolute standards.

And since I've spent many years reading and writing, I have strong programmed criteria and intuitions about someone's writing ability without requiring comparisons on the same task.

For example, when I first read a substack essay years ago, I didn't sit there in confusion until someone showed me another post on the same topic as a comparison point.

Comparison grading works better under high-constraints testing where the entire point is to differentiate students, even if they all perform really well (or really poorly).

For many other situations, inside and outside of education, absolute criteria are critically important for evaluation.

Expand full comment
Daisy Christodoulou's avatar

So if you were given 8 essays on a topic you know well - and a rubric that you were trained on - do you think you would grade all 8 correctly?

And do you think five other markers who were all as experienced and well qualified as you would agree with you, if they were independently asked to grade all 8 with the same rubric?

Expand full comment
Benjamin Woods's avatar

Fascinating. I’ve just started experimenting with LLM referenced marking and am finding it very useful. Will refer my head teacher to your material.

On a different note, I would like to suggest that many of the issues identified with absolute marking stem from some very shoddy, widely used rubrics. In the IB and Australian systems, the criteria descriptors for marking essays are written in vague, borderline esoteric language that is barely comprehensible to most teachers, let alone students. When I’m feeling snarky, I think that this is so the respective systems can avoid accountability. If teachers and schools took/had the time to craft clear, task-specific criteria, it would go a long way to building confidence in the process of assessment. Even better, students would have comprehensible guidance about what they need to do and how they can improve.

Thank you.

Expand full comment
Daisy Christodoulou's avatar

There is an interesting research literature on the different styles of rubric. The vague ones have the problems you mention, but the more specific ones arguably have bigger problems - they end up stereotyping responses in unintended ways, and don’t improvise consistency that much. So a lot of big assessment orgs have gone back to the vaguer style. We review the literature at the start of this paper. https://www.tandfonline.com/doi/abs/10.1080/0969594X.2019.1700212

Expand full comment