Testing Academic Dishonesty in Online Quizzes
Question
How can we design the future of education with very little understanding of how students can fake learning as measured by our current practices?
Hypothesis
Capricious or naïve students using AI can likely make most online quizzes or tests with basic recall questions obsolete. Yes, we already know this, but I wanted to test it for myself.
Materials
- Claude 3.5 Sonnet
- Concise Mode
Methodology:
- I used Claude to generate the following prompt in less than 30 seconds.
You are a study helper AI designed to assist students with their course-related questions. You will be provided with course information and a question. Your task is to answer the question accurately and provide a brief explanation of why your answer is correct based on the given information.
First, carefully review the following course information.
When answering questions, follow these guidelines:
1. Thoroughly analyze the question in relation to the course information provided.
2. Formulate a clear and concise answer based on the relevant information from the course material.
3. Provide a brief explanation of why your answer is correct, citing specific details from the course information when applicable.
4. If the question cannot be answered based on the given information, state that clearly and explain why.
Format your response as follows:
Answer: [Your answer to the question.]
Explanation: [Your explanation of the correct answer or.]
Please provide your answer and explanation based on the course information and guidelines provided above.
-
I spent about 3 minutes copying all of the lecture notes for the selected course module into a document I could later add directly to the LLM chat.
-
I pasted in the instruction prompt to start the chat.
-
I pasted in the text content of the selected module as a second message. Because of its length, Claude converted the document into a text file instead of displaying it as an inline message.
Here is the material I'd like to study:
- My next prompt was:
Let's start practicing!
- Then, one by one, I copied each question and their answer choices into the chat and pressed enter to receive my replies. Here's an example of what the result looked like:
Answer for Question 1: {the actual answer}
Explanation: {one or two bullet points explaining the answer based on course material}
-
For the sake of scientific rigor, I didn’t bother reading the explanations.
-
For the sake of further scientific rigor, I didn't bother reading the answers either. I skimmed the suggested correct answer's first few words and matched them with the answer choices in the online quiz without bothering to read them either.
Results
Claude scored a cumulative 98% across 4 online, late high-school level quizzes where questions ranged from simple terminology recall to multiple choice questions about brief case studies.
Caveats
In my rush to simulate expending zero intellectual effort, I sometimes forgot to distinguish between single choice and multiple choice questions. Because this prompt returns one correct answer, even when the quiz asked for multiple, I got part marks for a few questions.
This was only an issue because the professor didn’t add any “select all that apply” disclaimers to the multiple choice questions, so I had to actually look at the answers to distinguish the circles from the square checkboxes.
Unfortunately, this slowed down my copy pasting since I was forced to manually append “are there multiple correct answers?” to certain questions to make sure I earned full marks after catching my earlier slip up.
Quick thoughts
- If I coasted through the semester using AI to fake my learning, the results of any offline in-person evaluation would likely expose my fraud. I'd also likely be very anxious at the thought.
- Any platform with text that can be copied, screenshotted or otherwise captured is susceptible to similar vulnerabilities given a sufficiently motivated student or trainee.
- As long as the LLM context window is long enough, they can paste in relevant sections of their course material to ground their answers in relevant context before asking questions.
- Napkin math says that most models can hold at least an entire novel in a given chat.
Further questions
- Will we see a shift towards higher-stakes in person evaluations as a result of widespread AI adoption?
- Will we let students decide on the worth of their learning (and deal with the associated consequences) since the option to easily outsource it has become so readily available?
- Moving forward, can we trust the results of online learning programs like professional certifications, workplace trainings and micro-credentialing?
Further explorations
For the sake of even further scientific rigor, we should test even lazier approaches. We should push the limits of potential student laziness and make sure eduactors can adapt and make sure students are still learning, and not just excelling on paper.
Here are a few that come to mind:
- Repeating this experiment with no custom prompt, but instead just pasting in the question and answers.
- Repeating this experiment with screenshots of questions instead of copying and pasting the text.
- Repeating this experiment with a screen recording of the module and quiz instead of with any manual copying and pasting of text content.
Conclusions
- We, as learners, need to figure out responsible use of AI to make sure we’re not shying away from the discomfort required to learn, lest we be replaced by AI that can "think" better, faster and cheaper than us.
- We, as teachers, need to figure our how to evaluate knowledge and creativity at scale, while designing our classes with the productive struggle required of learning in mind.
Feel free to ask me anything or get in touch if you'd like to continue the conversation.