Is it possible to develop a tutor-proof test?

Or should we focus on tests worth teaching to instead?

Feb 21, 2026

At No More Marking, most of the assessments we provide are fairly low-stakes. However, we do have experience with high-stakes tests, and we know how challenging they are to design.

If you are using a test as a selection mechanism for a prestigious institution, you will have armies of very smart parents and well-paid tutors trying to crack the code of the test.

Over the past decade or so, a couple of phrases have cropped up to describe the way these selection tests should work. First, people argue that we should have “tutor-proof tests” that cannot be cracked by the parents and tutors. Second, we should have “tests worth teaching to”, so that if students are being prepped for the test, the prep is worthwhile.

Do these two concepts hold water? In this post, we’ll examine the idea of tutor-proof tests.

Thanks for reading No More Marking! This post is public so feel free to share it.

Some historical background

Historically, many famous English public schools selected pupils at age 13 using the Common Entrance exam.

Common Entrance exams are linked to a defined curriculum. The advantage of this is great coherence and clarity for students and teachers at the prep schools and public schools. The disadvantage is it probably restricts the pool of students who can apply to the public schools.

Not all independent schools operated on this model. I went to a selective secondary school, City of London School for Girls, which used a more curriculum-neutral test consisting of a reading comprehension, writing task and maths paper. I hadn’t attended a private prep school or had a private tutor, but the test resembled a lot of what I had done at my state primary school, so I was not at a massive disadvantage compared to others. Had CLSG run Common Entrance, it’s unlikely I would have even applied, let alone got in.

However, whilst the test I sat was more curriculum-neutral than Common Entrance, it was not completely curriculum-neutral, and nor was it immune to tutoring and preparation. In the last decade or so, even this kind of maths, reading and writing assessment has been criticised for excluding talented but disadvantaged students who don’t have access to good schools and tutors.

The tutor-proof test

Is it possible to design a test so content-free that it captures something like raw potential, or the underlying ability to flourish in an academic environment? Verbal reasoning tests reward vocabulary knowledge, which can be taught. Numerical reasoning tests reward maths knowledge, which can also be taught. But what about non-verbal reasoning tests? These are the kinds of tests where you are given four shapes and then asked: which shape continues the sequence?

You can see how these tests are less tied to curriculum knowledge, and there is serious research in this area suggesting that they might therefore be useful for identifying talented but disadvantaged students. David Card is a Nobel laureate who has done research showing that a non-verbal test administered at second grade in a district in Florida “led to large increases in the fractions of economically disadvantaged and minority students placed in gifted programs.” Jonathan Wai is another researcher who has done a lot of interesting work on these types of questions, and who has also been involved with talent identification programmes.

In large-scale government-run school systems with lots of disadvantaged students, non-verbal assessments can help identify students who are able but poorly served by their schooling.

But there are big differences between low-stakes talent-identification across a government school system and high-stakes entry to prestigious selective schools. When an expensive tutor hears the phrase “tutor-proof test”, he doesn’t interpret that as a warning but a challenge.

Practice effects

There is a huge literature on “practice effects”, which essentially show that if you practice a specific skill, you will get better at that specific skill. If you practice touch typing every day, you will get better at it. If you practice your multiplication tables every day, you’ll get better at them. If you practice tying your shoelaces every day, you’ll get better at it.

The practice effect is one of the most robust findings in cognitive psychology, and poses an enormous challenge to the idea of the tutor-proof test.

The response of test developers to this challenge is to say that they can create enough new question types that practice on past question types won’t deliver huge gains.

That is, they’ll say that you can practice tying your shoelaces, but then the test will be on a different kind of knot, so you won’t have any advantage. From a cognitive science point of view, this is a tricky one. It is true that the practice effect holds for practice of a specific skill. It is also true that transfer to different contexts is hard, and that so-called “far transfer” is exceptionally difficult. So yes, the test developers are right to say that the more novel the question type, the less valuable the practice of old question types is.

But “less valuable” is not the same as “not valuable at all”. And whilst far transfer is extremely difficult, near transfer is more possible. Even if practice of old question types gives you quite small gains, in a high-stakes environment those small gains can be the difference between success and failure.

Also, to make this system work, you require test developers to constantly create new types of question that are as different as possible from what has gone before. This poses a number of difficult technical challenges.

First, there are obvious constraints to just how many new types of short non-verbal test questions it is possible to create. If you are running 3 test sessions a year, after ten years you will need to come up with thirty different types of question. There are limits to how many ways you can vary the essential concept of looking at a 2D shape and moving it around in some way.

Second, if you really are creating very new questions for each round of tests, then you need to run a new validation process each time. Good validation processes take time: ideally you want to wait a few years and gather information on whether the students who passed that test are thriving at their new school. But if you are constantly having to create new question types, you don’t have the time for that.

Third, even if your system works for the first few years, there is no guarantee it will keep working over time as tutors learn more about it and optimise their teaching. This is a classic Goodhart’s Law problem: when a measure becomes a target, it loses value as a measure.

We see numerous examples of this in our work and research. A really famous one is that early AI essay markers delivered pretty good levels of agreement with human markers, and seemed to have solved the problem of AI marking. However, on closer investigation it turned out that they were largely just rewarding the length of the essay. In a low-stakes environment, it is possible that this wouldn’t cause too many problems. But in a high-stakes assessment where students, teachers and parents are all striving to do as well as they can, the system will break down, because students will realise that the way to succeed is to write the same sentence a couple of hundred times.

Likewise, it is possible that tutors find ways of teaching tips and tricks that help students answer the non-verbal questions, but that systematically break the link between the question and what it is supposed to be measuring.

What is the impact on students?

The extensive literature on the practice effect shows it delivers substantial gains. But there is a chance that even the substantial gains reported in the literature underestimate its effect, because most of the research is lab-based, and may not properly account for the scale and effect of real-world intensive practice in some environments. Tutoring for entrance exams is taken very seriously by a lot of very smart people, and it is big business.

Many students will be preparing for their entrance exam 18 months or 2 years in advance, and will be doing several hours of practice every week. The question is, would you rather that prep is on shape rotation? Or would you rather students were reading interesting books and doing maths problems?

It’s also worth remembering that the original impulse for introducing tests like this was the social justice aspect – that schools wanted to find a way of identifying talented but disadvantaged students. But once a non-verbal test becomes a target, it is going to discriminate against those students too, as you are much less likely to get any practice of those tests in a typical state school – whereas you will be taught reading, writing and maths. The worst-case outcome is that the non-verbal test is as socially exclusionary as Common Entrance, just with none of its educational benefits.

When you stop and think about it, the concept of the tutor-proof test does not really hold water. Of course you get better at something if you practice it. That is a good thing, and that is why education works! The whole point of education is to practice valuable things and get better at the valuable things. A good assessment should promote practice of the valuable things. It shouldn’t remove the valuable things and replace them with less valuable things, on the grounds that some students will get more practice of the valuable things.

Which brings us to another popular concept: we should create “tests that are worth teaching to”. Is this a better guide to assessment design? We’ll discuss that in a future post.

Jeremy Latham

Feb 21

Isn't part of the problem the way tests are marked? The drive to make assessment marking less subjective has created a situation where external exams are marked not for the intelligence of the answer but for the presence of discoursearkers that imply evaluation or comparison or another form of analysis but with no marks supplied for actual perception in the response to the question being asked.

This is not a random assertion. I have been teaching English for 20 years. To gain a pass grade for a question a student must include the discourse markers but they can obvious fail to understand the text or the question. I have been in an AQA meeting when the audience of teachers turned on the exam board representative because they presented us with 2 answers. The perceptive ( but flawed) one got below half marks. One that had the discourse markers but showed no understanding passed. This is reflected across all exam boards and all the difficult questions. And this is what tutors do, they rehearse students writing in a very particular way. They do not aid the student understanding which is what the test that could not be practiced would do.

I don't disagree with practice, but the way we award marks bears very little relevance to student understanding and we are destroying not just English as a subject but all humanities subjects, as a result.

2 replies by Daisy Christodoulou and others

John Nichols

As President of The Tutors' Association (in the UK) and a tutor/former teacher, a lot of this is highly relevant to me.

Tutors are not a magical different species to teachers nor is what they do fundamentally different (there are important differences relating to the structure of tutorials versus classrooms or lecture halls, but the principles are the same).

If it is possible to prepare for a test, it is possible to ask an experienced and knowledgeable person to help you prepare and, therefore, it cannot be 'tutor-proof'.

You can have a test that is *impossible* to prepare for, by simply making what it is you assess (and, ideally, when you assess it) completely random - but this will not produce a test that most people would regard as 'fair'. It just makes it even more problematic if any knowledge of a test's contents become leaked.

Most people would likely agree that you can make a test fair if everyone has an equal chance and that same test will be useful if it assesses skills students will need in order to thrive at whatever selected institution (or career) they are trying to obtain entry to. For this to work, you need to specify exactly what it is you will assess and let everyone use their best efforts to prepare. When they do prepare, they will also be making themselves more suitable for the institution they want to get in to.

If your concern is that students from disadvantaged backgrounds cannot or will not prepare as effectively, the solution is to *help them prepare*, not to try to make it difficult or impossible for anyone to prepare.

Education is simply a way of helping people become good at things. In 19th Century Britain, Imperial China and many other places besides, great value was ascribed to passing exams whose structure and content was both known and regarded as valuable. In many cases, people came from nothing, worked hard, prepared and excelled in such a meritocratic framework. The use of tutors is not inherently bad - it is only a form of preparation.

12 more comments...

No More Marking

Discussion about this post

Ready for more?