What do you prefer - absolute error or comparative error? Some of the resistance that Comparative Judgement faces more generally seems to be based on similar instincts of resistance to types of error that people are not so familiar with.
On human error vs machine error, what does recourse to review look like in a machine error world? People seem to have an instinct that some recourse to a second authority is part of what it means for something to be fair, and certainly the concept of remarking is highly embedded in our high-stakes exam system, but it would seem to be meaningless once you have already scaled the machine in the initial marking? I think tennis is an interesting example here; the technology has existed for a while such that all line calls could be automated, but stakeholders seem more comfortable with a system of human error with (some) potential for review, at least at the highest stakes tournaments.
Yes, I think recourse to review is extremely important, and that’s another strength of our 90-10 validation model - we can review every human-AI disagreement and potentially have a mechanism for overturning AI errors on review.
Thanks for the response, Daisy. But isn't there an issue here with what it is that people would demand a review of? In a 90/10 judgement split, human-AI disagreement happens meaningfully only at the level of individual comparisons, not at the level of overall rating for an item, as you don't have enough human-only judgements to form a human-only rating. But it is the rating not individual comparisons that the assessed are going to question.
Does length correlate with quality? I'm not sure it does. I think lack of length correlates with a lack of quality, which isn't the same thing. Lots of concise writing is produced which covers everything expected in a piece of work but when it falls outside of "the Goldilocks zone" then it will either lack breadth or depth, or, as suggested, turn into gobbledygook.
Very interesting. I am basing my feelings on reading Year 13 stats reports written in New Zealand. We have word limits to try and encourage students to be concise. <2000 green light, between 2000 and 2500 orange light, >2500 red light. We don't have any authority to penalise students if they go above 2500 but we're trying to discourage the waffle. Most exemplars we issue that go to the highest grade (Excellence in NZ) are written in about 1800 words. With all that in mind and factoring students writing ability in we feel like the "Goldilocks zone" is the orange zone for us.
I think I'm still asking the question does the marking justify the time spent? Of course teachers want to help the kids they teach improve and learn more and the kids deserve to get useful feedback but I remain unconvinced that most marking makes that much difference. More especially in the current educational climate for UK state schools where exam success seems to be all that really matters. I was reading an article the other day about ways to help with revision for exams. It was mostly focused on ways to help you remember. Sad to say I feel, as someone who took their A levels over fifty years ago, that so little has changed. If anything it's worse. The current GCSE and A systems are designed to favour those who are good at formal exams. Are marking systems also designed to encourage and promote ways of getting good grades rather than developing secure learning and ways of applying that learning in different situations. I suspect this is a factor in why many students struggle once they start a degree course. I'm aware I've rambled beyond your original point for I think this area is complex.
What do you prefer - absolute error or comparative error? Some of the resistance that Comparative Judgement faces more generally seems to be based on similar instincts of resistance to types of error that people are not so familiar with.
On human error vs machine error, what does recourse to review look like in a machine error world? People seem to have an instinct that some recourse to a second authority is part of what it means for something to be fair, and certainly the concept of remarking is highly embedded in our high-stakes exam system, but it would seem to be meaningless once you have already scaled the machine in the initial marking? I think tennis is an interesting example here; the technology has existed for a while such that all line calls could be automated, but stakeholders seem more comfortable with a system of human error with (some) potential for review, at least at the highest stakes tournaments.
Yes, I think recourse to review is extremely important, and that’s another strength of our 90-10 validation model - we can review every human-AI disagreement and potentially have a mechanism for overturning AI errors on review.
Thanks for the response, Daisy. But isn't there an issue here with what it is that people would demand a review of? In a 90/10 judgement split, human-AI disagreement happens meaningfully only at the level of individual comparisons, not at the level of overall rating for an item, as you don't have enough human-only judgements to form a human-only rating. But it is the rating not individual comparisons that the assessed are going to question.
Does length correlate with quality? I'm not sure it does. I think lack of length correlates with a lack of quality, which isn't the same thing. Lots of concise writing is produced which covers everything expected in a piece of work but when it falls outside of "the Goldilocks zone" then it will either lack breadth or depth, or, as suggested, turn into gobbledygook.
See this analysis - https://www.cambridgeassessment.org.uk/Images/426173-how-much-do-i-need-to-write-to-get-top-marks-.pdf
Very interesting. I am basing my feelings on reading Year 13 stats reports written in New Zealand. We have word limits to try and encourage students to be concise. <2000 green light, between 2000 and 2500 orange light, >2500 red light. We don't have any authority to penalise students if they go above 2500 but we're trying to discourage the waffle. Most exemplars we issue that go to the highest grade (Excellence in NZ) are written in about 1800 words. With all that in mind and factoring students writing ability in we feel like the "Goldilocks zone" is the orange zone for us.
I think I'm still asking the question does the marking justify the time spent? Of course teachers want to help the kids they teach improve and learn more and the kids deserve to get useful feedback but I remain unconvinced that most marking makes that much difference. More especially in the current educational climate for UK state schools where exam success seems to be all that really matters. I was reading an article the other day about ways to help with revision for exams. It was mostly focused on ways to help you remember. Sad to say I feel, as someone who took their A levels over fifty years ago, that so little has changed. If anything it's worse. The current GCSE and A systems are designed to favour those who are good at formal exams. Are marking systems also designed to encourage and promote ways of getting good grades rather than developing secure learning and ways of applying that learning in different situations. I suspect this is a factor in why many students struggle once they start a degree course. I'm aware I've rambled beyond your original point for I think this area is complex.