10 Comments
User's avatar
Claudio Calligaris's avatar

For the vast majority of students, I think that this is exactly where AI tutoring breaks down…

”The structure and discipline of in-person classrooms are important, and online platforms lack this structure. So even if they are full of brilliant content and sound pedagogical principles, they may not be as effective as in-person teaching.”

Expand full comment
dakkster's avatar

The most obvious reason why LLMs should never ever ever ever be used as anything even remotely close to anything delivering facts to anyone is that hallucinations are mathematically impossible to avoid. The entire model is predicated on hallucinations being necessary as a sort of release valve of garbage probability calculations. 25% of GPT5's answers contain hallucinations. So if you have a person who is far from an expert on a subject, which is pretty much every student, then that student will not be able to know when the LLM is hallucinating or not, which means that the entire concept is completely meaningless.

Expand full comment
Jan's avatar

I definitely think creating questions and more importantly maybe listening to the questions that kids ask is an important part of any teacher's life. I'm talking from a primary teacher's perspective but I think it's as important whatever the age of the learner. I wouldn't be prepared to have that run by LLMs. When I was working with primary aged kids I regularly got them to work out questions for other kids in the class to investigate.

Expand full comment
Ruth Poulsen's avatar

Great summary. I particularly like the last point, about getting students to persist when the learning is hard. Teachers who do this skillfully employ their relationship with the student !

Expand full comment
Francesco Rocchi's avatar

A recent post by Carl Hendrick points to a AI tutor model that might practically work. The experiment has lots of limitations and there are many caveats, but it's quite impressive anyway. May I ask you what your take about this is?

https://carlhendrick.substack.com/p/the-algorithmic-turn-the-emerging

Expand full comment
MITCHELL WEISBURGH's avatar

I built AI questioners/coaches into a course on mindsets. It was a ton of work, because they go off on tangents, but I think we got to the point where they are helpful. Here’s a demo of a lesson that contains one of the interactions. https://mindshiftingwithmitch.blog/demo-preview/

I’d be interested in your reaction.

Expand full comment
Stephen Caldwell's avatar

Currently working on a blended model at Impress Education for students at Hastings Academy. Your article cristallises the daily challenges with elegance and perspicacity. Thank you Daisy. An absolute pleasure to read.

Expand full comment
Stephen Caldwell's avatar

Hi Daisy, would you be up for a brief chat about the work we do at Impress. Becky Allen met me recently and she was sufficiently impressed to want to work on a consultancy basis with me. Wondered if you might be similarly interested? It would be a dream team in my eyes!

Expand full comment
Eva Keiffenheim MSc's avatar

Great post, as always! Daisy.

I’ve been reading a recent review of AI tutoring evidence ("AI Tutors for Durable Learning") that discusses a 2025 Harvard physics study (Kestin et al.), which explicitly addresses your second challenge regarding hallucinations and precision.

The researchers developed what might be called the "Harvard Model" of pedagogical engineering. To solve the accuracy problem, they fed the LLM pre-written, expert-verified solutions. The AI tutor then scaffolded the student toward that verified solution using active learning principles.

This approach effectively combined the reliability of "pre-LLM technologies" (verified question banks) with the interactivity of an LLM. The results seem to be promising:

- Effectiveness: It outperformed human active learning instruction (effect sizes of d = 0.73–1.3).

- Efficiency: 70% of students completed the material in under 60 minutes (vs. a full class period).

It was used as a flipped classroom tool—students did the "grunt work" of foundational acquisition with the AI tutor, freeing up class time for the human teacher to handle higher-order synthesis (addressing your 4th challenge on structure).

Yet, this study—like almost all current AI research—failed to measure retention beyond the immediate term. So, while we solved the hallucination problem, we still don't know if the learning sticks! But it does suggest a path forward with using AI not as an "answer generator," but as a dynamic interface for verified curricula.

Expand full comment
Mark Aveyard's avatar

It's a very well designed experiment, I'm saving this for classroom use as an experiment with strong internal validity controls.

But are we actually concerned about helping Harvard students learn more efficiently and enjoyably? The authors arguments for generalizability aren't convincing.

When this works in community colleges, we've got something truly exciting.

Expand full comment