Can LLMs be personal tutors?
Four big challenges
Think how amazing it would be to have a personal tutor who is an expert in every subject under the sun and available on-demand 24/7.
That is the incredibly exciting promise of Large Language Models - that they will be able to teach you anything you want, whenever you want.
However, I think the barriers to getting there are more significant than we imagine.
Here are four challenges that LLM tutors have to overcome.
1. LLMs are good at providing explanations, but explanations are over-rated
LLMs are good at providing explanations. The problem is that pedagogically, explanations are over-rated.
Thomas Kuhn, the famous philosopher of science, once asked why it was that a group of students could all read the same chapter of a physics textbook and say they had understood it – but then get the questions at the end of the chapter totally wrong.
Kuhn concluded that what these students really needed was not explanations but lots and lots of examples and questions.
He was right. Questions are important for two reasons: they force the student into mental activity, which is necessary for learning. And they tell the student and the teacher if the student actually has understood what has been taught.
The research also shows that students often don’t like this. They prefer to read, reread and highlight explanations than to answer questions. That’s probably because rereading an explanation is easy, but answering questions is hard. It’s also because reading an explanation feels like you have understood something. It gives you the illusion of understanding, whereas answering a set of questions exposes the reality that you don’t.
What we need are not LLMs that answer questions from students. We need LLMs that ask students questions.
But the problem with that is…
2. LLMs are not as good at creating precise questions
LLMs still have problems with hallucinations, and this is a real problem when you want to create banks of questions and answers where precision and accuracy really matter.
We have experience of this with the feedback we provide on our writing assessments. We provide LLM-generated written feedback for students and teachers. At that level of generality, the LLM does a good job.
But we also wanted something more precise – so we asked the LLM to generate a series of multiple-choice questions based on each student’s piece of writing. It found that task much harder, and a number of errors crept in. Errors like these can cause enormous confusion for novices. [We ended up creating our own and allocating them based on the students’ scaled score.]
When I talk about the error rate of LLMs, the inevitable response I get is “yes but humans aren’t perfect either”. That is absolutely true. In the great “algorithms vs humans” debate, here at No More Marking we are mostly on the side of the algorithms, because we know that humans make so many mistakes.
However, in this particular case – the creation of personalised questions – the correct comparison is not between error-prone LLMs and error-prone humans. The correct comparison is between error-prone LLMs and older technologies which have largely eliminated errors. Which brings me to my third point.
3. Pre-LLM technologies are very good at creating error-free, scalable and personalised resources
The original technology for creating error-free and scalable educational resources is about half a millennium old – it’s the printing press. Once you have a really good set of questions (or indeed an explanation) you can proofread it and get it checked over by multiple other humans and then get it printed as many times as you need.1
Of course, printed textbooks aren’t personalised or interactive. But personalised and interactive resources do exist already too – not for as long as the printing press, of course, but for several decades.
Many online learning platforms consist of enormous banks of accurate questions. Students can proceed through them at their own pace and receive personalised feedback and next steps based on their pattern of correct and incorrect questions. There are many platforms like this. They obviously vary in style and quality, but the best of them have decent track records.
So, one major question for me is this: how are LLMs going to improve on these pre-existing technologies? What can they offer that is better?
And this also brings me to my fourth point. These very effective pre-LLM digital tutors have been around for decades, and they have not made the human teacher or the physical classroom obsolete. Why?
4. There is a limit to what students will learn on their own and on a screen
The Covid pandemic provided us with a natural experiment in the effectiveness of online learning. Did everybody say at the end of it, fantastic, actually, it turns out that we don’t really need physical schools and human teachers any more?
No. Everybody said: we need to get the kids back into school. The global data shows that students learnt less when schools were closed, not more, even in countries where they had access to the internet and many brilliant online learning tools. And even before Covid, we knew that online learning courses had very high drop out rates.
The structure and discipline of in-person classrooms are important, and online platforms lack this structure. So even if they are full of brilliant content and sound pedagogical principles, they may not be as effective as in-person teaching.
For LLM tutors to succeed where other online learning platforms have not, they have to overcome this problem. Either they have to find ways of incorporating the structure and discipline of an in-person class, or they have to be so much more engaging and compelling than existing online learning platforms that they will eliminate the need for structure and discipline as students will prefer using them to doing anything else online.
The latter is going to be very hard and is largely beyond the control of any online learning platform, as it is competing against entertainment platforms that aren’t constrained by learning. Optimising for one parameter is easier than optimising for two.
Questions about questions
So, to sum up, here are the four questions you need to ask of any LLM tutor.
Does it rely solely on explanations?
If it does use questions, how does it ensure they are accurate?
In what ways is it better than pre-existing online learning systems that don’t use LLMs?
Is it integrated with a traditional classroom, or is it designed for students to use on their own? If the latter, how will it get high completion rates?
Some systems are engaging seriously with these questions and coming up with good answers, and I will profile a few in a future post. But many are not, and the risk is that LLMs just get added to the long line of technological innovations that promised and failed to improve education.
Some of the earliest printed books do have quite a few errors, and completely eliminating all errors in any format is not easy. Andrej Karpathy’s “march of nines” is as true of Gutenberg’s books as of Waymo’s self-driving cars. But a modern textbook that is in its 2nd edition is likely to have vanishingly few errors. EG this textbook is the one I know best and neither I or several colleagues / students have spotted any errors in it.

