Superintelligent judges

Jan 12

Can AI models judge as well as humans?

7 Comments

Curious, but interesting.

The first behaviour seems explainable, given that they process the text sequentially. If you forced a human to read the two of them sequentially without referring back to the first, I suspect you'd get a bias one way or the other! I don't know that recent LLMS cannot skip around the text, but at least in the vanilla form, they don't - which would mean less accuracy in assessing the first piece of text! (More recent LLMs try to do self-criticism and various other tricks)

The second one seems particularly characteristic of trying to get feedback from an LLM. In my experience, they either agree with everything you say, or even when you take the counter-argument, they still agree. Alternatively, they steadfastly stick to some position even when presented with obvious contradictions. They're the ultimate yes-man.

Expand full comment

Reply (1)

Daisy Christodoulou

Jan 13

Yes, it is uncanny how many similarities they have with fluent bullshitters!!

Expand full comment

Jan

Jan 13Edited

It all feels rather too close to Forster's The Machine Stops. It feels as if someone has invented something, AI, and now everyone is scurrying around trying to find a use for it. Possibly the best way to assess a child's or student's work is to read it and assess it . If you're the teacher you presumably set the work and know the kids.

Expand full comment

Reply (1)

Daisy Christodoulou

Jan 13

I have always liked that EM Forster short story. Prescient in lots of ways.

Expand full comment

Danuta Tomasz

Jan 13

How opportune to be reading this after ploughing through the night AI action plan. My view always has been that to use this stuff well you need to understand its limitations and foibles. And how you keep teachers one step away from the kids.

Expand full comment

Jan

Jan 14

I think we should all be wary of A1. We ignore history at our peril. Hope this rather long link works!

https://theconversation.com/artificial-intelligence-what-five-giants-of-the-past-can-teach-us-about-handling-the-risks-247125?utm_medium=email&utm_campaign=Latest%20from%20The%20Conversation%20for%20January%2013%202025%20-%203225232919&utm_content=Latest%20from%20The%20Conversation%20for%20January%2013%202025%20-%203225232919+CID_533e79220050fe001387d490af91408c&utm_source=campaign_monitor_uk&utm_term=Artificial%20intelligence%20what%20five%20giants%20of%20the%20past%20can%20teach%20us%20about%20handling%20the%20risks

Expand full comment

Stuart Farquharson-Roberts

Jan 13Edited

Just reaching for my 'I, Robot'. We're at the stage where we don't know why Robots are making the decisions they are. Science Fiction becomes reality and I am fascinated!

To the point though - I don't see human marking being replaced by AI just yet. Not just because this study has highlighted shortcomings I wasn't aware of, but because I think for now, the human touch is key.

I am currently running a trial with my Year 10 class where I have set them the task of using ChatGPT in homework to get feedback on the end of autumn term assessment essay question. I manually marked the essays before returning them. My feedback was very limited. I have fed students AI prompts including the mark scheme, and they have been sending me back links to their conversations so I can assess the quality of the feedback they have been getting. I have also tasked the students to re-write their piece based on the AI feedback. I will then mark that piece also so I can see if it has added value.

ChatGPT has said that it will analyse the students' before and after for content and style and will grade how likely it is that the student just copied an AI answer out, by analysing the level to which writing style has changed.

I am hoping that students will show greater improvement because the feedback is more detailed and personalised than any we as teachers could hope to provide in a DIRT session in lesson time.

But, the thing that I am most nervous of is parents responses to this being set as homework. Why is the teacher delegating feedback and advice to their children to a robot?

As someone very new to this topic, will parents say the same thing about AI marking of work that this study acknowledges human expert judges are, for now, better at doing?

Expand full comment

No More Marking

Superintelligent judges