Make My Exam! #004

Effortlessly create model answers and marking schemes

Mar 27, 2024

Dear friends of Tomorrow’s Teaching,

Welcome back to another exploration of how Large Language Models can help us save time while teaching our classes. You will have noticed that these posts don’t always arrive on the same weekday. Unfortunately, this will be the case for a while still. My weekly schedule is so full at the moment with my full-time university job and the Daily Philosophy magazine, newsletter, YouTube channel and podcast that I find it difficult to commit to a particular day for this newsletter. As I get more used to writing this, it will eventually settle into some rhythm; until then, I have to ask for your patience. Take it as a surprise addition (hopefully a pleasant one) to your week.

This morning, my wife and I were discussing how badly history is taught in schools. There’s a saying among secondary school students that if you are good at mindless memorising, then you are just the right learning type for history. This is such a pity, given how fascinating the study of history can be and how many aspects it encompasses -- the history of ideas, the history of love and partnering practices, the history of science, the history of everyday life, including the history of hygiene and cooking, the history of religious beliefs, the history of technology and countless other aspects of our humanity that we can only really understand in their historical perspectives.

Mindless memorising is the very last thing that I would personally associate with the study of history, but there you have it. Education authorities together with curriculum and examination designers have managed to butcher one of the most important and fun subjects and to make students hate it. I myself hated it when I was at school, and it took me over a decade to look at a history book again. And now, my daughter hates it, just like I did.

From this thought, I came to the topic of today’s experiment. I wanted to see if we can use ChatGPT (or Copilot, which is easier to access from everywhere, even China) to create tutorial or examination questions in history that are more interesting than just asking about facts. Such questions would also fulfil the function of testing for the students’ knowledge, but at the same time, they would also let the students be creative and actually require them to think. Such questions would sit higher up on the scale of Bloom’s Taxonomy, asking students to not only remember or understand, but also to analyse and evaluate what they have learned.

Unfortunately, creating such questions is more difficult than just asking for facts, and then one still has to create a marking scheme for every question and give model answers for the various marks. Luckily, though, we have AI to help us!

So join me today as we look at how we can quickly create an exam plus marking scheme with the help of a Large Language Model!

woman reading book while sitting on chair — Photo by Alexis Brown on Unsplash

Interesting questions

I began by asking ChatGPT (in this case, Microsoft’s Copilot) a particular question that had come to my mind while I was thinking about interesting history questions.

One of the strengths of ChatGPT is its ability to draw parallels between seemingly unrelated topics. For some reason, the LLM architecture is really good at that, much better than the human brain that tends to work less creatively in most of us, and stay more within one topic or association area.

So let’s see:

Me: Try to find common aspects in the mindset of the times between the renaissance and the times of the moonlanding in the 1960s. Write your answer as if it was the model answer in an exam.
Copilot: Exam Model Answer: Common Aspects in the Mindset of the Renaissance and the Moonlanding Era
The Renaissance, a period of cultural, artistic, and scientific rebirth from the 14th to the 17th century, and the era of the moonlanding in the 1960s share several common aspects in their mindset:
Curiosity and Exploration: Both periods were marked by a profound curiosity and a desire to explore the unknown. The Renaissance saw the exploration of new lands and the rediscovery of ancient texts, while the 1960s space race culminated in humans landing on the moon.
Innovation and Invention: Each era witnessed significant technological advancements. The Renaissance brought about inventions like the printing press, while the 1960s saw the development of spacecraft capable of lunar missions.
Humanism and Individualism: The Renaissance emphasized human potential and individual achievement, which resonated with the individualistic spirit of the astronauts and teams responsible for the Apollo missions.
Reformation and Questioning: Both times experienced a challenge to established norms—religious and cultural during the Renaissance, and political and social during the 1960s.
Intellectualism and Education: The Renaissance was a time when knowledge became more accessible, similar to the 1960s, which saw an increase in public engagement with science and space.
Patronage and Support: Just as the Medici family supported artists and scientists during the Renaissance, governments and organizations funded space programs in the 1960s.
Secularism: A shift towards secular thinking was evident in both periods, with the Renaissance moving away from purely religious explanations of the world and the moonlanding symbolizing human achievement outside of religious context.
In conclusion, the mindset of the Renaissance and the moonlanding era were both characterized by a spirit of inquiry, a push for innovation, an emphasis on human capability, and a challenge to traditional beliefs, all of which propelled humanity to new heights of achievement. These commonalities reflect a timeless human drive to understand our world and expand the boundaries of what is possible¹²³⁴⁵..
Source: Conversation with Bing, 26/03/2024
(1) 11 Core Values of the Renaissance (Analyzed!) - Analyze History. https://analyzehistory.com/values-of-the-renaissance/.
(2) Rhetoric - Renaissance, Oratory, Argumentation | Britannica. https://www.britannica.com/topic/rhetoric/The-Renaissance-and-after.
(3) The Renaissance: Reviving art, science, and knowledge. https://historyofyesterday.com/the-renaissance-reviving-art-science-and-knowledge/.
(4) The Renaissance and the Rise of European Consciousness. https://link.springer.com/chapter/10.1007/978-3-319-95435-6_6.
(5) Middle Ages vs Renaissance (Explained) - Tag Vault. https://tagvault.org/blog/middle-ages-vs-renaissance-explained/.
(6) Getty Images. https://www.gettyimages.com/detail/news-photo/painting-by-artist-michel-angelo-pacetti-shows-a-parade-of-news-photo/51801099.

Now this was quite good already. It is not anything novel or surprising by any means, but it is a well-crafted, well-structured model answer that would have taken me at least 20 minutes to think about and write down in this way. Copilot did the same, equally well or maybe even better, within less than 20 seconds.

Marking scheme

As I wrote before in this newsletter, I believe that the best use of ChatGPT in teaching currently is to treat it as a somewhat boring but competent assistant, to whom one can outsource equally boring but necessary work. Creating model answers (where required by your institution) is one such job that nobody likes, and that can be given over to AI with excellent results.

Now let’s go one step further:

Me: Can you create a grading scheme in percent for this question and the model answer, with partial grades for partially correct answers?
Copilot: Certainly! Here's a grading scheme for the exam model answer on the common aspects in the mindset of the Renaissance and the Moonlanding era:
Grading Scheme:
Full Answer (100%): The answer includes all seven aspects with clear explanations and historical context, demonstrating a deep understanding of the similarities between the Renaissance and the Moonlanding era.
Good Answer (80-99%): The answer includes at least five to six aspects with good explanations and some historical context.
Satisfactory Answer (60-79%): The answer includes three to four aspects with satisfactory explanations.
Partial Answer (40-59%): The answer includes one to two aspects with basic explanations.
Incomplete Answer (20-39%): The answer mentions one aspect without a clear explanation or historical context.
Incorrect Answer (0-19%): The answer does not correctly address the question or is off-topic.
This scheme allows for partial credit for answers that demonstrate some understanding of the question but may lack full detail or context. It encourages students to provide as comprehensive an answer as possible while recognizing the effort put into partial answers.

Again, nothing that will get us a science medal, but a solid execution of a mindless, boring, but sometimes necessary task. One could submit this directly to one’s administration and one would have done a very good job.

Thank you for reading Tomorrow's Teaching. This post is public so feel free to share it.

More questions

Finally, let us try to get Copilot to do something a little more demanding: create more questions like that. Questions that challenge the students to give interesting, thought-out answers, rather than memorised data.

Me: I am making an exam for an undergraduate, first-year university class on ethics, discussing the main ethics theories: utilitarianism, deontological ethics, social contract theories and virtue ethics. Please give me three interesting exam questions that require the students to write short essays on these theories, but emphasise not memorisation, but the creative application of these theories. Particularly, it would be good to compare and contrast the approaches of the different theories in the questions (a maximum of 2 theories per question). Please give me not only the questions themselves, but also model answers and a grading scheme that allows for partial marks.

You see that I provide quite a detailed question. As I said before, the more detailed and precise your question is, the more relevant the results will be. The AI does not know anything about you, your style, or the age of your students, so you will have to provide whatever information will be useful to it in order to generate a good output for you. Now let’s look at the answer:

Copilot: Certainly! Here are three thought-provoking exam questions that encourage creative application and critical thinking about ethics theories. Each question involves a comparison of two theories, and I've included model answers and a grading scheme for each:
Question: The Trolley Dilemma Revisited
Scenario: Imagine you are the operator of a runaway trolley hurtling down a track. Ahead, there are five people tied up and unable to move. You have the power to switch the trolley to another track where only one person is tied up. What do you do?
Theories to Compare: Utilitarianism and Deontological Ethics
Model Answer:
Utilitarian Approach: The utilitarian would weigh the consequences. Switching the trolley to the track with one person minimizes overall harm (one death vs. five deaths). Therefore, the utilitarian would pull the lever.
Deontological Approach: The deontologist would focus on moral rules. Killing an innocent person violates the rule against murder, regardless of the consequences. Therefore, the deontologist would not pull the lever.
Grading Scheme:
Full Marks: Clear explanation of both theories and their contrasting stances.
Partial Marks: Adequate discussion of one theory but lacking depth in the other.
Minimal Marks: Superficial analysis or missing key points.
Question: The Social Contract and Individual Rights
Scenario: A government proposes mandatory vaccination for all citizens during a pandemic. Some citizens resist, claiming it violates their individual rights.
Theories to Compare: Social Contract Theory and Deontological Ethics
Model Answer:
Social Contract Approach: Citizens implicitly agree to follow laws for the greater good. Vaccination protects society, so it's a collective duty. Those who refuse may face consequences.
Deontological Approach: Individual rights are inviolable. Coercive vaccination infringes on autonomy and bodily integrity. Exceptions should be minimal.
Grading Scheme:
Full Marks: Comprehensive discussion of both theories, considering implications for public health and individual rights.
Partial Marks: Adequate exploration of one theory but less depth in the other.
Minimal Marks: Superficial analysis or overlooking key aspects.
Question: Virtue Ethics in Professional Life
Scenario: You're a software engineer working on a project. Your team leader asks you to manipulate data to make the product appear more successful than it is. What do you do?
Theories to Compare: Virtue Ethics and Utilitarianism
Model Answer:
Virtue Ethics Approach: Consider virtues like honesty, integrity, and courage. Uphold truthfulness even if it harms short-term success. Transparency builds trust.
Utilitarian Approach: Weigh consequences. Falsifying data may benefit the company and shareholders. However, long-term harm to trust and reputation must be considered.
Grading Scheme:
Full Marks: Nuanced discussion of virtues, balancing short-term gains and long-term consequences.
Partial Marks: Adequate exploration of one theory but less depth in the other.
Minimal Marks: Superficial analysis or neglecting key aspects.
Remember, creativity and depth matter more than rote memorization. Encourage students to engage with these theories beyond mere definitions. Partial marks should reward thoughtful insights even if complete mastery isn't achieved. Good luck with your exam!
Source: Conversation with Bing, 3/27/2024
(1) Ethics Exam #2- Utilitarianism and Deontology Flashcards. https://quizlet.com/280334131/ethics-exam-2-utilitarianism-and-deontology-flash-cards/.
(2) Guiding Questions to Ask for the Application of Utilitarianism .... https://viva.pressbooks.pub/phi220ethics/chapter/guiding-questions-to-ask-for-the-application-of-utilitarianism-deontology-virtue-ethics-to-real-life/.
(3) Ethical Theories Quiz Questions And Answers - Trivia & Questions - ProProfs. https://www.proprofs.com/quiz-school/story.php?title=ch2_34.
(4) Virtue Ethics, Utilitarianism and Deontology Test Flashcards. https://quizlet.com/99895556/virtue-ethics-utilitarianism-and-deontology-test-flash-cards/.

I like that it reminds us to avoid memorisation questions, when it was me who asked it to make more interesting questions first.

Anyway, the questions are interesting, and I could well imagine them being used in an introductory ethics exam. One could, of course, also come up oneself with similar questions, but sometimes one has to create questions for a quiz or two, a midterm, an essay and a final exam, and possibly also for make-up exams. In this way, sometimes I have to create dozens of questions on a limited number of topics, and it becomes difficult to come up with always new ideas. Here the AI can be really helpful in coming up with original questions that I have not asked in any previous assignments.

But I’m a little unhappy about the marking scheme. It looks very superficial and not really helpful or precise. Let’s try to fix that.

A better marking scheme

Me: Thank you, this is very helpful! But I find the marking schemes a bit too abstract and not sufficiently differentiated. Could you please give me a more detailed marking scheme for question 3, that has 5 different percentage levels and specifies precisely which content in the answer will give the student what grade percentage?
Copilot: Certainly! I appreciate your feedback, and I'll provide a more detailed grading scheme for Question 3 on virtue ethics in professional life. Let's break it down into five distinct levels, specifying the content that corresponds to each grade percentage:
Question 3: Virtue Ethics in Professional Life
Scenario: You're a software engineer working on a project. Your team leader asks you to manipulate data to make the product appear more successful than it is. What do you do?
Grading Scheme:
Minimal Marks (0-20%):
Content:
Superficial analysis without engaging with virtue ethics or utilitarianism.
Generic statements without application to the scenario.
Example:
"I would follow company orders because that's my job."
"Ethics is subjective, so I'll do whatever benefits me."
Basic Understanding (21-40%):
Content:
Briefly mentions virtue ethics or utilitarianism.
Provides a simplistic rationale without depth.
Example:
"I'd consider honesty but might still manipulate data if it helps the company."
"Balancing honesty and success is tricky."
Adequate Analysis (41-60%):
Content:
Discusses both virtue ethics and utilitarianism.
Applies theories to the scenario.
Identifies relevant virtues (e.g., honesty, integrity).
Considers short-term gains vs. long-term consequences.
Example:
"As a virtuous person, I'd prioritize honesty and transparency. Falsifying data violates integrity. However, I'd also weigh the overall consequences—damage to trust and reputation—before deciding."
"Utilitarianism suggests maximizing overall benefit. While manipulating data might benefit the company temporarily, the long-term harm to trust outweighs it."
Insightful Analysis (61-80%):
Content:
Thoroughly explores virtues (e.g., honesty, courage, trustworthiness).
Balances utilitarian considerations with individual character.
Considers alternative actions (e.g., whistleblowing, negotiation).
Acknowledges complexities.
Example:
"Drawing on Aristotle's virtue ethics, I'd prioritize honesty as a character trait. However, I'd also recognize the importance of courage to challenge unethical requests. Perhaps I'd propose an alternative solution, like presenting accurate data with context."
"Utilitarianism forces me to weigh the overall impact. While honesty is crucial, I'd consider the broader consequences for stakeholders. Maybe I'd negotiate with my team leader or seek guidance from an ethics committee."
Sophisticated Analysis (81-100%):
Content:
Integrates nuanced understanding of virtues, consequences, and context.
Considers organizational culture, legal implications, and stakeholder perspectives.
Proposes a well-reasoned course of action.
Example:
"Applying virtue ethics, I'd embody honesty, transparency, and courage. I'd also consider the company's culture—does it value integrity? If not, I might escalate the issue or seek legal advice. Utilitarianism reminds me of the bigger picture: long-term trust, societal impact, and legal repercussions. Perhaps I'd propose an ethical data reporting framework to balance both."
Remember, these percentages are guidelines, and thoughtful insights can earn partial credit even if complete mastery isn't achieved. Encourage students to engage deeply with the theories and apply them creatively. Good luck with your exam, and I hope your students enjoy grappling with these ethical dilemmas!

Look at how much work we saved here! I’d be curious to hear what you think about these rubrics, but to me they seem as good as any I’ve seen in real-world exams. If we are honest, nobody likes to do this job, and we all try to get it done as quickly as possible. Copilot’s answers don’t strike me as much different from what a human teaching assistant might come up with.

Leave a comment

So there you have it! Copilot did a lot of work, and it all took only a few minutes to finish. This, for me, is some of the most boring and annoying work in teaching, because, despite often being required, no one will usually read our marking schemes, making this an exercise in Sisyphean futility.

With the help of AI, we can create such materials quickly and painlessly, but at the same time also ensuring a relatively high quality that perfectly does the job.

Please tell me what you think of the materials that Copilot produced here, both the questions and the marking schemes. Would they work for you and your classes? Would such a use of ChatGPT actually save you time? Why or why not? I’d love to hear your opinion in the comments!

Many thanks and see you next time. Happy teaching!

— Andy

5 Comments

Brad Czepiel

Mar 27Liked by Dr Andreas Matthias

I used your prompt (in Gemini) to get things rolling. I then shifted the focus to study questions related to issues students might face. And I think these could work well in group discussion during the first few minutes of class to get ideas and energy flowing. Vestigial bones of the essay and the grading remain.

Study Questions - Applying Ethical Theories (with Specific Details)

Instructions: Address all three questions in group discussion. Focus on demonstrating your understanding of the relevant ethical theories (utilitarianism, deontological ethics, social contract theories, virtue ethics) by applying them to the scenario and comparing and contrasting their approaches.

Grading Scheme: (Each question is worth 20 points)

Application (10 points): Clearly explains the relevant ethical theory and applies it thoughtfully to the scenario.

Comparison/Contrast (5 points): Demonstrates understanding of the key differences between the applied theories or compares their approaches within the scenario.

Critical Thinking (5 points): Shows a clear and logical reasoning process and considers potential limitations of the theories.

Question 1: Group Project Pressure (Utilitarianism vs. Deontological Ethics)

You are part of a group project in your Introduction to Biology class. The project is a crucial component of your final grade, and your group has a reputation for struggling to meet deadlines. One member suggests copying a significant portion of a research paper found online, claiming it would save time and guarantee a good grade. You're hesitant because plagiarism is a serious academic offense.

From a utilitarian perspective, how would you weigh the benefits (good grade, saving time) against the potential harms (getting caught and penalized, damaging your reputation) of copying the research paper?

In contrast, a deontological ethicist might focus on following the rules. Explain how a deontological approach would guide your decision in this situation.

(Model Answer):

Copying a research paper presents a clear ethical dilemma. Let's explore it through two different lenses:

Utilitarianism emphasizes maximizing overall well-being. Copying the paper could ensure a good grade for the entire group, saving valuable time. However, the risk of getting caught is substantial. Plagiarism could lead to failing grades, expulsion from the course, and damage your academic reputation. These consequences could outweigh the potential benefit of a good grade.

Deontological ethics, on the other hand, focuses on following established rules (e.g., academic integrity). Plagiarism is a clear violation of these rules, regardless of the potential benefits. Even if it guarantees a good grade, deontological ethics would dictate upholding academic honesty and completing the project with original work.

Grading Rubric: (refer to breakdown above)

Question 2: Social Media Feud (Social Contract Theory vs. Virtue Ethics)

You recently witnessed a heated online argument between two of your close friends on social media. The disagreement centers around a sensitive social issue, and both friends are posting inflammatory comments. While you disagree with some of the views expressed, you don't want to take sides and potentially damage your friendships.

Social contract theory emphasizes the importance of respecting the rights of others in a shared online space. Analyze how this theory would guide your decision on whether to intervene in the argument.

Virtue ethics focuses on developing good character traits. Explain how a virtue ethicist might approach this situation, considering qualities like courage and fairness.

(Model Answer):

Social contract theory suggests that we participate in online spaces with shared responsibilities. Letting the argument fester could contribute to a toxic online environment. Social contract theory might prompt you to intervene by privately messaging both friends, encouraging respectful discussion and reminding them of their responsibility to use social media responsibly.

Virtue ethics encourages developing positive character traits. In this scenario, virtues like courage and fairness might be relevant. Courage would involve intervening to try and de-escalate the situation. Fairness would necessitate considering both perspectives and encouraging your friends to have a civil discussion.

Grading Rubric: (refer to breakdown above)

Question 3: Smartwatch Security (Utilitarianism vs. Social Contract Theory)

You're considering purchasing a new smartwatch with advanced health tracking features. However, you've read concerns about the watch collecting a vast amount of personal data, including heart rate, sleep patterns, and even location information. The company claims this data is anonymized and used for product improvement, but there are worries about potential data breaches and misuse.

From a utilitarian perspective, discuss the potential benefits and harms of purchasing the smartwatch despite the data collection concerns.

Social contract theory emphasizes balancing individual rights and societal interests. Explain how such an approach would influence your decision on using the app.

(Model Answer):

Utilitarianism would involve weighing the benefits of the smartwatch's health features against the potential harms of data collection. On the one hand, the watch could motivate you to live a healthier lifestyle. On the other hand, potential data breaches could expose sensitive information. Even with anonymized data, there are concerns about who could

Expand full comment

1 reply by Dr Andreas Matthias

Brad Czepiel

Mar 27Liked by Dr Andreas Matthias

Just taught a class on Utilitarianism. The conversations about assets and weaknesses were good, and the kids engaged with a dilemma I proposed - Roughly: How does Utilitarianism suggest we respond to killing a death row inmate early because six people need the inmate's organs and will die if they wait. BUT - if I could construct dilemma's that are closer to the students' worlds, their engagement & inspiration would be greater. Enter your post, and I think I have a way now to do that. And, when we move to Kant, I can play each philosopher off against the other in productive ways; thank you! Brad

Expand full comment

1 reply by Dr Andreas Matthias

3 more comments...

No posts