The human in the loop

Jan 31

Human assessment in the age of AI

Read →

6 Comments

Tim Dingman

Jul 24

The link for "humans are much better at Comparative Judgement (CJ) than they are at absolute judgement" is broken

Expand full comment

Wendy Winnard

Mar 11

Really insightful for my research. Up to now there is little evidence that LLM can reliably grade structured response questions in A level chemistry exams. Human in the loop examiners could comparatively grade 1000s, but one of the bugbears as a teacher is being able to decipher handwriting. Has anyone got a tool that can do this?

Expand full comment

Mark Aveyard

Jan 31

Thank you, that was informative. But I don't understand why AI avoids some of the hallucination errors under CJ prompts/tasks? Is ChatGPT ordinarily doing some kind of CJ process on a massive scale or is does it behave differently within a CJ process? And if so, how does it "know" to behave differently?

Expand full comment

Reply (1)

Chris Wheadon

Feb 1

LLMs continue to hallucinate, but the powerful statistical model built into CJ allows us to isolate the hallucinations and minimise their impact. The LLMs in this way are no different to humans in the way they are not perfectly reliable. The existence of LLMs doesn't remove the need for old fashioned statistics to generate reliable, reproducible measurement scales.

Expand full comment

Reply (1)

Mark Aveyard

Feb 9

Thanks, Chris, is that statistical model described somewhere?

Expand full comment

Marginal Gains

Jan 31

Excellent post! Thank you for sharing.

I am continuously looking for examples of edge case scenarios, and only by solving them will we eventually decide if an activity can be fully automated and replace humans or if it will be a human-in-the-loop activity for now.

Expand full comment

No More Marking

The human in the loop