Discussion about this post

User's avatar
Francesco Rocchi's avatar

I had read of this RCT in your previous post and I was intrigued. There are a few remarks which came to my mind: in general terms, I think that what Google tested wasn't a model for AI tutoring, but for human, AI-assisted tutoring.

1) While the error rate was very low, 23.6% of the answers of the LLM were changed by the tutors before being submitted to students. Those answers weren't wrong altogether, but the tutor intervened to tweak them somehow. Basically, responses from the AI-tutors benefited from a revision the human tutors didn't receive. The same happened when human tutors where assisted by Tutor Co-PIlot, only with the roles reversed. I think Google should have taken this into account, when calculating the effectiveness of the AI-tutors compared to the human ones.

2) Since every response from AI tutors was revised by human tutors, the typical LLM drift effect was eliminated. Errors and subpar answers can't stack up thanks to the oversight, while an unsupervised AI tutor might eventually go off track. On the other hand, if the sessions with AI tutors are brief enough not prevent bugs to accumulate, the problem might be less significant or it may resolve itself whenever the AI is rebooted.

I'm happy to see that the classroom setting remains central to learning. The social side of learning is still important, as noted by you when talking about the mistakes human teachers occasionally make in class.

Furthermore, recent research seems to show that making mistakes can be so beneficial that one should consider making them deliberately. Without going that far, being in class and seeing someone else make a mistake can be quite useful (as can, more broadly, hearing the different perspectives of other students.

Neural Foundry's avatar

That 0.14% error rate is wild considering how LLMs usually drift. The bigger point about favoring targeted questions over open-ended explainations really lands tho, I remember spinning my wheels way more often when teachers just talked at me versus when they actualy checked understanding with specific prompts.

2 more comments...

No posts

Ready for more?