Is it possible to develop a tutor-proof test?

Feb 21

Or should we focus on tests worth teaching to instead?

14 Comments

Isn't part of the problem the way tests are marked? The drive to make assessment marking less subjective has created a situation where external exams are marked not for the intelligence of the answer but for the presence of discoursearkers that imply evaluation or comparison or another form of analysis but with no marks supplied for actual perception in the response to the question being asked.

This is not a random assertion. I have been teaching English for 20 years. To gain a pass grade for a question a student must include the discourse markers but they can obvious fail to understand the text or the question. I have been in an AQA meeting when the audience of teachers turned on the exam board representative because they presented us with 2 answers. The perceptive ( but flawed) one got below half marks. One that had the discourse markers but showed no understanding passed. This is reflected across all exam boards and all the difficult questions. And this is what tutors do, they rehearse students writing in a very particular way. They do not aid the student understanding which is what the test that could not be practiced would do.

I don't disagree with practice, but the way we award marks bears very little relevance to student understanding and we are destroying not just English as a subject but all humanities subjects, as a result.

Reply (1)

Daisy Christodoulou

Feb 21

Yes, I totally agree with this, and this is exactly the problem Comparative Judgement solves.

The exam boards want to eliminate unreliability for a good reason. It's not just them being pedantic. If an essay is marked ten times and gets a different grade each time, that is a major problem!

However, as you point out, by introducing very strict and prescriptive mark schemes, they introduce another problem: it is possible to tick all the boxes on the mark scheme without producing a quality piece of writing. Similarly, a piece of writing can miss a few boxes and still be high quality.

It is much harder than anyone thinks to design a set of objective tick boxes that reward holistic quality.

In the worst case scenario, the tick boxes don't even increase reliability. You sacrifice marker discretion and don't gain any reliability in return.

Comparative Judgement eliminates the mark scheme and reinstates the discretion of the marker, but it also delivers much higher reliability. The best of all worlds!

Some further reading: https://blog.nomoremarking.com/validity-and-primary-writing-assessments-f301833f9262

https://substack.nomoremarking.com/p/how-to-write-a-good-rubric-for-humans

https://www.tandfonline.com/doi/abs/10.1080/0969594X.2019.1700212

Reply (1)

Jeremy Latham

Feb 21

Is there any sense comparative marking will be adopted by exam boards? I have followed your work for awhile and I am familiar with how good it is. It's intersubjective marking rather than objective, which is interesting to me as l also teach Philosophy and I know that objectivity is both an illusion and dangerous.

Questions could be much more straightforward if comparative marking was in place across the board. The combination would significantly impact the way we teach if developing interesting critical discourse was rewarded.

John Nichols

Feb 21

As President of The Tutors' Association (in the UK) and a tutor/former teacher, a lot of this is highly relevant to me.

Tutors are not a magical different species to teachers nor is what they do fundamentally different (there are important differences relating to the structure of tutorials versus classrooms or lecture halls, but the principles are the same).

If it is possible to prepare for a test, it is possible to ask an experienced and knowledgeable person to help you prepare and, therefore, it cannot be 'tutor-proof'.

You can have a test that is *impossible* to prepare for, by simply making what it is you assess (and, ideally, when you assess it) completely random - but this will not produce a test that most people would regard as 'fair'. It just makes it even more problematic if any knowledge of a test's contents become leaked.

Most people would likely agree that you can make a test fair if everyone has an equal chance and that same test will be useful if it assesses skills students will need in order to thrive at whatever selected institution (or career) they are trying to obtain entry to. For this to work, you need to specify exactly what it is you will assess and let everyone use their best efforts to prepare. When they do prepare, they will also be making themselves more suitable for the institution they want to get in to.

If your concern is that students from disadvantaged backgrounds cannot or will not prepare as effectively, the solution is to *help them prepare*, not to try to make it difficult or impossible for anyone to prepare.

Education is simply a way of helping people become good at things. In 19th Century Britain, Imperial China and many other places besides, great value was ascribed to passing exams whose structure and content was both known and regarded as valuable. In many cases, people came from nothing, worked hard, prepared and excelled in such a meritocratic framework. The use of tutors is not inherently bad - it is only a form of preparation.

Angus Russell

Feb 21

I had another thought. With children now required to be in full time education until 18, GCSEs may be relevant for A levels but for IB or tech courses or apprenticeships hardly. So, why don't the orgs offering the courses run their own entry exams? They know what people applying for those courses need to know. So schools have to focus on outcomes not exams.

Reply (1)

Jan

Feb 25

Sad to say the reforms introduced by Gove over a decade ago stipulated that 5 good passes at GCSE were the requirement for any meaningful courses whether HE or FE. Any apprenticeships on offer go to students with A levels. Not surprisingly they've seen that earning while learning is a better option than a student loan.

Stan

Feb 21

So people are trying to make an assessment that won’t be impacted by carefully guided practice with maybe 25% additional time spent on the subject.

Imagine that with something like music where I think everyone agrees an additional 25% hours for months guided by an expert will produce an improvement.

How would anyone tell the difference between someone with more natural musical talent and someone who has had more practice?

Laura Creighton

Feb 21

I'm working on a different problem. We need to find out exactly what it is that the foreign students do not know (which we expect them to) so we can give them a remedial class with the if they need it. (Figuring out that some of them can skip an introductory course is another benefit). It may be that this has promise for your problem. Keep advancing the problems until, for every student, they are clearly up against new things they haven't learned. You will need an AI for this. Then evaluate them on how well they learn the new material.

Reply (1)

Shaun Brien

Feb 21

This would actually be so useful. I was working with an international student on Friday who’s in their final year of school and I was shocked when the word “pier” threw them completely off. That lack of background knowledge in their vocabulary created a significant disadvantage.

Angus Russell

Feb 21

My understanding is that this type of situation was one of the factors behind GL's CAT4 assessment protocol. To provide a neutral approach based on cognitive potential, not academic curriculum driven assessment?

Theodore Whitfield

Feb 27

We already have "tutor-proof exams", at least in theory. That's what an IQ test is supposed to be! The argument is that these tests directly assess fundamental cognitive skills such as pattern matching, etc. but do not depend on substantive knowledge.

We don't currently use IQ testing as a form of high-stakes assessment, and we can have a debate about whether or not that's right or wrong. But there's no doubt that if we **did** go ahead and implement any such "tutor-proof test" it would be very similar to a standard IQ test.

Brian Huskie

Feb 23

Am I right that the difference between "tutor-proof tests" and the alternative is the difference between testing fluid IQ and crystallized IQ? And aren't the two highly correlated? So doesn't it mostly not matter?

Personally, I'd lean more towards tutor-able tests, particularly since Khan Academy et. al. is available to most students, and that learning a specific curriculum will capture (a little bit) more than just g.

I also doubt it will matter much. In the aggregate, the most intelligent students will score the highest, regardless.

Jan

Feb 21

I'm not convinced that any formal exams can assess cognitive potential. I often listen to A Life Scientific on BBC R4 and many of the eminent sciences interviewed on the programme were not academic high flyers at school. Many came to higher education later and by different routes. I was part of the 11 plus generation. In my two form entry junior school, out of 60 4th years, now known as year 6, 5 pupils passed. Decades later when I was doing some research I realised that in areas with few grammar schools the pass marks were manipulated because there weren't enough places. You were far more likely to get a grammar place the further north you went in England. I wasn't one of those that passed. My parents tried to get me into a local private school. I had to sit an entrance exam which required the knowledge of technical vocabulary such as adjectives, adverbs, verbs and nouns. Back in the dark days of the 20th century state primary schools used the terms describing words, doing words and naming words. The letter rejecting me said that despite being a nice child I appeared to have no knowledge of the English language. As I'd be expected to study French this would hold me back. I ended up at one of the early comprehensive schools. Did my A levels, including French, and gained a 1st in English from The University of Manchester. I rather wanted to go back to the school and replay the dress shop scene from Pretty Woman...."Big mistake. Big, huge". I guess I'm just raising the question, after decades in teaching across age groups from primary to higher education.

LJRodgers

Feb 21

This is widespread in Northern Ireland under the guise it supports social mobility and yet the evidence overwhelmingly disputes this. https://www.bbc.co.uk/news/uk-northern-ireland-63562654

No More Marking

Is it possible to develop a tutor-proof test?