Last month, Nick Gibb, the former UK Schools Minister, published a memoir about his 10+ years in office. He was central to the big educational reforms England introduced since 2010: phonics teaching, a knowledge-rich curriculum, school autonomy, and assessment reform.
I wrote the foreword to the book, and I’ve written before on here about the success of these reforms. So I’m hardly a neutral observer, but for what it is worth I think the book is fantastic, and it has been getting rave reviews from others too (see here, here and here.)
In my foreword, I emphasised the importance of content knowledge not just for education, but for politicians and policymakers. Nick Gibb really understood education. He’d done his research and visited hundreds of schools. He was given the time to develop his understanding too, because unlike most junior ministers, he wasn’t reshuffled to a completely different department after 15 months.
The educationalist Siegfried Engelmann liked to talk about the importance of the “picky, picky detail”. If you’re trying to teach a student what a verb is, it matters if all the examples you give have the verb as the second word. Picky detail really matters for policy reform too. In this post, I will focus on two aspects of England’s assessment and accountability reforms where attention to detail made a huge difference, and where policymakers in other areas could learn a lot.
The Phonics Screening Check (PSC)
The Phonics Screening Check is a test for all Year 1 pupils to check they can decode words. It consists of 40 words, 20 of which are made-up “pseudowords” like charb, yot, and zob.
The inclusion of these words was controversial, and to many outsiders it seemed odd that a reading test would assess made-up words. But they were included for a good reason: to check pupils really could understand the phonetic code, and that they hadn’t just memorised very common words by sight.
The pseudoword controversy illustrates something important about the relationship between the curriculum and assessment. When you’re arguing about the curriculum, or even designing the curriculum, it’s easy to talk in vague terms, to promise everything to everyone and to assume that all disagreements are just “false dichotomies”.
Once you start designing assessments, however, you have to confront hard choices, real trade-offs and true dichotomies. As Dylan Wiliam says, “assessment operationalises curriculum.” It forces you to be specific about what you mean by 21st century skills, or critical thinking, or phonics.
I think including nonsense words was the right call. However, there is one aspect of the PSC I have always been less keen on: the pass mark, which is set at 32 out of 40. As I have written many times here, student performance is continuous—it doesn’t divide neatly into categories. Thresholds invite distortions, and the PSC has seen a distorting clustering of scores just above 32.
In an ideal world, I’d prefer an accountability metric based on an average, not a threshold. Still, I think the PSC has been a success, and is deservedly becoming a model for other countries to copy.
Progress 8
Progress 8 is a good example of how you can have a rigorous accountability metric that doesn’t depend on a threshold.
When I began teaching in 2007, the main accountability metric for secondary schools was the proportion of students getting 5 A*-C at GCSE, including English and Maths.
This had three big flaws:
It was a threshold measure. Schools got no credit for students who improved from an F to a D, or a C to an A*. But they got huge credit for a student moving up one mark from a D to a C. In turn, this incentivised creating intervention classes for those students just below the C - and ignoring everyone else.
It was very narrow. It only measured performance on 5 subjects. In practice, some qualifications like BTECs counted as several GCSEs, which meant a student could spend the final 2 years of secondary studying just English, Maths and BTEC Health & Social Care.
It was an attainment measure. A school with a high-attaining cohort on entry found the metric trivially easy to achieve, whereas a school with a low-attaining cohort could deliver brilliant teaching and still not do well.
In practice, therefore, the 5 A*-C metric incentivised the following behaviours amongst school leaders: intervention groups focussing on threshold students; curriculum narrowing & the constant search for subject “equivalences” that weren’t really equivalent; recruitment of higher-attaining students on entry. Focussing on these would deliver better results than focussing on improving teaching & learning.
I remember talking to a deputy head in about 2012 who had been given a couple of weeks to get the school’s entire Year 11 cohort to sit and pass the European Computer Driving Licence. He wasn’t happy about it - but everyone else was doing it, so if you didn’t, you’d be at a disadvantage. It was a race to the bottom, which is the marker of a bad metric.
Progress 8 addressed all three flaws. It is an average, not a threshold, so all students count towards it. It rewards performance on 8 subjects, not 5 (or, in practice, 3). It measures the value a school adds to a student, not raw attainment.
Is it perfect? No, no metric can be. But so far, it is incentivising far more productive behaviour than the search for the next European Computer Driving Licence.
Metrics & public sector reform
The PSC and Progress 8 are important for thinking about wider public sector reform.
In the private sector, we rely on a metric that is so familiar and so simple to understand that we often don’t think of it as a metric: the price.
And we tend to think about prices as just being the sum that we pay for something. But prices are more than this: they are a decentralised system that aggregates knowledge from countless different sources. If there is a hurricane in Brazil that wipes out one of the major producers of coffee, the price of your morning cup goes up and maybe you switch to tea or to another brand of coffee, or, if you really value the Brazilian coffee, you pay more for it and cut back on something else in your life. You don’t have to know anything about the hurricane or Brazil to make these decisions. All you need to know is the price.
Public sector systems lack prices and have to find other sources of data and information to replicate their role. That’s not easy. The only Soviet economist to win a Nobel Prize got it for his work on such a system. In Chile in the 1970s, socialist economists designed a similar attempt called Project Cybersyn. More recently, there has been talk of how the enormous amounts of data generated in the modern world can be used to replicate prices - and criticism about how such systems could never work. One of the best books I have ever read is a novelisation of the Soviet attempt to live without prices.
I am dubious about the ability of such systems to work across an entire economy. However, in almost every developed country, health and education have significant state intervention, and the managers and policymakers responsible for them need some kind of information about how the systems are working. In the absence of prices, metrics and targets are vital. They might not be perfect - and maybe they will never be as flexible and sensitive as a price - but there is still a big difference between a well-designed metric and a poorly-designed one.
Policymakers in state systems want to know answers to the following questions: Is the system improving? Is it getting worse? Are our reforms working? What types of behaviour do we want to see more of? What do we want to see less of?
Targets and metrics give you answers to these questions - and it is worth obsessing over the picky detail to check you are getting the right answers.