really enjoyed this Daisy, thank you. I wanted to ask about point 2 and point 6: every time I've asked it a question similar to the types of question I might ask my students, it's been extremely easy to spot a GPT answer due to length and grammatical/syntactical *accuracy*. of course, i can spend a long time refining the prompt to get it to resemble student work more, but most students won't be doing this (presumably).
when you did your study with the 8 GPT essay-seeds, how much prompting did you have to do before you got an essay that you thought *could* fool a teacher, or were they all in "one take"?
It was only 8 year olds but we have some more data coming up soon with older students. And yes, all one takes. Our study involved teachers trying to spot GPT essays from students they didn't know, which will of course make it harder.
I agree that weirdly, the best way of spotting a GPT answer is that it will be too accurate! But even without refining the prompt there are ways students can (deliberately or not) get around this - eg write their own first paragraph and get GPT to write the rest, scatter in a few sentences of their own.
Even without doing that, I still think it is harder for humans to spot GPT essays than you think, especially in real life conditions when people have 30-60 scripts to get through in an hour.
yeah, i hear that completely. it's just at odds with my own informal "research", where i've taken student responses to questions i've asked them, but the exact same Q into an LLM (GPT1, GPT4, Bard, Bing) and the LLM models stick out like the sorest of thumbs!! This includes when I've tried to simulate plausible prompts that a student might have used to wrap around the question. I think maybe a difference is that I am looking at relative short answer questions, where the longest responses might be a couple of sentences, as opposed to what you are looking at which is an entire text
I just don’t think there’s any substitution for being a well rounded, knowledgeable individual. It makes you more interesting and usually with a terrific sense of humour. Learning is laborious and it takes A LOT of time. Somewhere along the line we’ve forgotten that, and the example we’re setting for our young learners by this instant gratification, is to set them up for failure along the way.
I disagree a bit with 4. I just used ChatGPT to explain a maths problem I was stuck on and it did an amazing job - I needed to evaluate an infinite geometric series. I attempted it myself but got nowhere. ChatGPT explained the process and gave the answer. I still couldn't get my head around something (an alternating sign in the series). It cleared it up instantly. The maths checks out. It's all right and the explanation is solid. The alternative would be a tedious email dialogue with the professor, or waiting until his office hours next week. I also find LangAI an excellent spanish tutor. Afterall the thing I need correcting is the spelling and grammar (I don't care for the factual correctness of the content in our conversation). Perhaps it's a good maths and language tutor but struggles with other subjects.
Agree it does some maths tasks wonderfully - but you can never be sure a) it will do every task accurately and b) that it will always give the same answer! It's the lack of reliability that makes it so problematic to use without expert human oversight. EG take a look at this https://shareg.pt/BlegI0o - how confusing would that be for a kid learning Pythagoras? Even for a teacher using it to create resources - you have to check everything so carefully.
really enjoyed this Daisy, thank you. I wanted to ask about point 2 and point 6: every time I've asked it a question similar to the types of question I might ask my students, it's been extremely easy to spot a GPT answer due to length and grammatical/syntactical *accuracy*. of course, i can spend a long time refining the prompt to get it to resemble student work more, but most students won't be doing this (presumably).
when you did your study with the 8 GPT essay-seeds, how much prompting did you have to do before you got an essay that you thought *could* fool a teacher, or were they all in "one take"?
Here is what we did: https://substack.nomoremarking.com/p/how-good-is-chatgpt-at-writing-essays-some-data-eda60de7aee5
It was only 8 year olds but we have some more data coming up soon with older students. And yes, all one takes. Our study involved teachers trying to spot GPT essays from students they didn't know, which will of course make it harder.
I agree that weirdly, the best way of spotting a GPT answer is that it will be too accurate! But even without refining the prompt there are ways students can (deliberately or not) get around this - eg write their own first paragraph and get GPT to write the rest, scatter in a few sentences of their own.
Even without doing that, I still think it is harder for humans to spot GPT essays than you think, especially in real life conditions when people have 30-60 scripts to get through in an hour.
yeah, i hear that completely. it's just at odds with my own informal "research", where i've taken student responses to questions i've asked them, but the exact same Q into an LLM (GPT1, GPT4, Bard, Bing) and the LLM models stick out like the sorest of thumbs!! This includes when I've tried to simulate plausible prompts that a student might have used to wrap around the question. I think maybe a difference is that I am looking at relative short answer questions, where the longest responses might be a couple of sentences, as opposed to what you are looking at which is an entire text
Our school has adopted a structured assessment response to AI:
Red Tasks: No generative AI permissible
Yellow Tasks: Some generative AI is permitted
Green Tasks: generative AI is expected
By focusing on the core assessment constructs of a task helps determine what category they should be.
More explanation here:
https://adriancotterell.com/2023/06/05/focus-on-the-assessment-construct/
> In our future posts we will look at possible ways for educational institutions to respond to this challenge.
The most surefire way for shorter essays is to give the student the topic and two hours at their desk with just a pen and paper.
For more research based approaches with heavy works cited requirements and longer essays, from what I've seen the LLM is far weaker.
Agree that the best approach for schools in short term is pen and paper tests!
I just don’t think there’s any substitution for being a well rounded, knowledgeable individual. It makes you more interesting and usually with a terrific sense of humour. Learning is laborious and it takes A LOT of time. Somewhere along the line we’ve forgotten that, and the example we’re setting for our young learners by this instant gratification, is to set them up for failure along the way.
I disagree a bit with 4. I just used ChatGPT to explain a maths problem I was stuck on and it did an amazing job - I needed to evaluate an infinite geometric series. I attempted it myself but got nowhere. ChatGPT explained the process and gave the answer. I still couldn't get my head around something (an alternating sign in the series). It cleared it up instantly. The maths checks out. It's all right and the explanation is solid. The alternative would be a tedious email dialogue with the professor, or waiting until his office hours next week. I also find LangAI an excellent spanish tutor. Afterall the thing I need correcting is the spelling and grammar (I don't care for the factual correctness of the content in our conversation). Perhaps it's a good maths and language tutor but struggles with other subjects.
Agree it does some maths tasks wonderfully - but you can never be sure a) it will do every task accurately and b) that it will always give the same answer! It's the lack of reliability that makes it so problematic to use without expert human oversight. EG take a look at this https://shareg.pt/BlegI0o - how confusing would that be for a kid learning Pythagoras? Even for a teacher using it to create resources - you have to check everything so carefully.