How do you know your feedback is working?
Rapid and large-scale evaluation of writing feedback
One of the major problems with a lot of classic education research papers is that they are based on very small numbers of students. This means that if the paper does show a certain intervention is effective, it is entirely possible that it is the result of chance and not the intervention.
This problem is compounded when the interventions involve writing assessments, because traditional writing assessments are quite unreliable. Again, this adds yet more noise to the results.
We have a new assessment model which addresses both of these problems and makes it easy, quick, and reliable to evaluate the impact that feedback has on writing.
We trialled the new approach last year, and are running a bigger project in March this year for Year 6 students.
Here is how it works.
Students take part in our established Year 6 writing assessment in March. We expect about 30,000 students will take part in this.
Schools will receive extensive feedback reports, with a mix of AI & human feedback. They will share the reports with their students and can provide their own feedback too.
Students will then redraft their original piece of writing.
Schools can then submit this redrafted piece of work to be assessed again as part of a national assessment window. The scores of both the original and redrafted pieces of work will be on the same scale, allowing us to measure the impact of the feedback.
Both the original and redrafted writing will be assessed using our Comparative Judgement plus AI model. This is highly reliable and dramatically reduces the teacher workload.
We ran a project like this last year, but gave schools very short notice about the redraft. This meant that whilst approximately 36,000 students from 900 schools took part in the original assessment, only 3,851 from 73 schools took part in the follow up. This year we have given schools more notice, so we hope that we’ll get more taking part in the redraft.
The project is not a gold-standard randomised controlled trial, but it will still provide schools with rapid and useful information about how students respond to feedback. It would also be possible to use the same Comparative Judgement plus AI write-feedback-redraft model as part of an RCT.
Improving the feedback
We’re also planning a couple of changes to the feedback that students get.
Last year, we gave every student a set of five multiple-choice questions that were created by us - not AI. We created three sets of questions, and then split students into three groups based on their scaled score. Students in the lowest-scoring group got a set of questions on capital letters, students in the middle group got questions on run-on sentences, and students in the top-scoring group got questions on vocabulary.
This year, we will continue to allocate question sets by scaled score, but we are going to introduce a little bit of AI into the mix.
Students in the lowest-scoring group will continue to receive a set of questions on capital letters. These questions will be created by us, but we will use AI to customise them slightly. We will make the content used in the questions match the content used in each individual student’s story. E.g. if the student has written about two children called Ilsa and Bob, the questions will mention Ilsa and Bob.
We’ll do something similar for the middle third of students. They’ll get a set of questions on run-on sentences, created by us but tweaked by AI to include the content of their story.
For students in the top third, we will be making a more substantial change. These students will get a set of questions entirely designed by AI. The questions will focus on more creative aspects of writing.
We’re currently developing and trialling these new question types, and will shortly be emailing our participating schools to get their opinion on them.
If you are not currently a participating school but would like to be, you can join us! Read more about the project and how to take part here.
Could this model work at a smaller scale?
One of the big advantages of this model is the scale - thousands of participating students. However, we have had a lot of requests from schools who would like to try it out at a smaller scale, in their own school or class. Obviously you would not be able to generalise as much from a smaller scale, but we agree that it would be incredibly valuable for an individual school or class teacher to be able to get such rapid feedback on their interventions. We can also place these bespoke individual assessments onto our national scale by including anchor scripts from previous assessments, which means that even small assessments can have some of the benefits of scale. We are looking at ways that we can make this write-feedback-redraft cycle easy for an individual school or teacher to implement. Get in touch if this interests you.
Further reading and information
We published a series of posts about last year’s project: the original intro post, our trial school results, the full set of results, a qualitative analysis of one school’s results.
A guide to all of our feedback reports
Our events page - we have two online introductory webinars scheduled in the next six weeks.

