TLDR#

I held a Python laboratory for my past faculty.

Homework assigments were copy-pasted or AI-resolved, and I did not want to look at the same code ten times in a row. So, using Python, Docker and the Discord API, I came up with my own automated solution, which waiting for a student’s message (in private), evaluated the submitted solution and returned a score. Basically every online coding platform, but made with bugs & error by me.

You can find the source code on my GitHub repo.


Intro#

Ever tried teaching (anything, really) to a bunch of students that have access to all the human knowledge at their fingertips?

Probably yes.

But what if they also had a relient, 24/7 available friend that knows all that human knowledge (at least it pretends to know it, fuck knows what’s in his head)? A friend so good that they offer their time and resources to help our students come up with almost identical homeworks? And after a good day’s work, this friend - this saviour - does not get even an honest Thank you!, only a CTRL + W or Close tab.. So tragic.

For those who did not get my poor joke of a description, I’m talking about ChatGPT.

Some time ago, I applied for a position to assist in teaching a Python course at my past faculty. I’m talking about helping with some laboratories, which represent the practical aspect of this course, in which students apply what they’ve allegedly learned during lectures. These were my favorite classes, not only because I could add some meaning behind all those learning hours, but also because most teachers were super cool and friendly, trying to make us understand the concept behind an idea, not just a way to solve some problem. So, as a late thank you for these experiences, I tried my best to adhere to this attitude.

The first laboratory#

The first laboratory was one of my proudest moments. I entered the classroom and just sit at the table, trying my hardest to look like a student that has to redo the course because he failed the exam. I started asking questions left and right:

— “Do you know the teacher?”

— “Is he any good?”

— “Does it count attendence or just homework assignments?”

I tried to keep it up until most of the students had arrived. Before revealing myself, I asked the student next to me one last time:

— “So, you don’t know the teacher, right?

He replied:

— “No, don’t think so.”

I which point I stood up, saluted the class and the rest is my memory. I like to think that they enjoyed this moment and made working with me much easier, because I did not want to present myself as an unfriendly tutor. After all, I was once in their steps too! My motto is that we are here to learn, not shame or ass kiss each one. But this can be done while also having fun, which, from my point of view, just enhances the experience all around.

But enough with the silly stories, let’s get to the technical part!

The lurking problem#

The first two laboratories were pretty easy to evaluate, as the students did not have to do much, just to make sure they’ve installed Python correctly and figured out how to type

print("hello world")

which, in reasonable terms, I estimate that a monkey would do in about 12 hours.

However, as I mentioned in the introduction, the students had a slave very good friend with them: ChatGPT (or GitHub copilot or Gemini in other cases), which made trivial questions meant for understading the basics pretty much obsolete. So, I soon started to evaluate homework not based on the result, but by looking at similarities in the code: a for loop here, an if-else there, a wrong return case here because the AI could not understand the problem properly, etc.

I did not discourage AI, because that would do nothing. Even if I was 24/7 with my eyes on them, some problems were so trivial (and they needed to be) that it was just impossible to differentiate between an AI solution and a hand written one. So, what could be done? If not for them, at least for greedy me.

The underpaid solution#

During the pandemic, the faculty switch to using Discord as the communication hub between students and teachers. Because this method worked extremely well, the faculty kept doing it with no problem. Fast forward to my third laboratory. I was trying to find a method to automate the evaluation of homeworks, due to several factors:

  • I did not like the idea of spending my free time looking at almost copy-pasted code
  • I already had the exercise and, most of the time, I needed to create a simple example for the students to test at home (an input-output pair)
  • Most execises had also some form of of error-handling requirement, which was a pain to test manually
  • I wanted to have some fun for myself and learn something new

So, I thought about an automated solution. Have the students send the homework to a server, evaluate it based on secret tests and return a score to them. Easy, simple and most importantly, it can be done in Python!

I dwelved with Discord API in the past, writing a simple bot that would control my Minecraft server so that I don’t have to ./start.sh and ./stop.sh every time I want to play. So the so-called server part was done. I will be using a Discord bot. Students will send (in private) either a .py file or a quoted text, which will be evaluated. After evaluation, the bot will return a score and custom messages about compile errors, runtime errors or hints as to why the tests did not pass.

When it comes to execution other people’s code, you need to be very careful. Even if I did not suspect any malicious intent from any student, sometimes mistake happen. It’s better to be prepared and anticipate a bad outcome than to wine that it happened. So, I though about using docker to create isolated environments - at first, I was a bit skeptical because I did not know the impact on performance, but after some tests I saw that even 10 instances running code at the same time were nothing for a mid-tier laptop. This would only happen if half the class submitted the homework at the same time, as the average time for starting, executing and closing an instance were under < 1s.

That being said, the development started. I won’t go into too much detail about the development, you can find that in the commit history if you want weirdo.

GitHub repo


But I will list some cool things that I’ve realized:

  1. Tagging a quoted code block with py or python included the text in the final solution, which led to errors.
  2. To counter the submission of the same homework from multiple students, I saved each copy they send - that way, similar solutions can be penalized, but the only solution good enough that I’ve though of was a Levenshtein distance, because, as I’ve said earlier, the exercises were not that complex to allow for multiple creative solutions.
  3. Running the evaluator locally allowed me to have full control over when I allow submissions. That way, I did not have to worry that my storage will be full after a night of endless evaluations.
  4. The bot does not need me to be there to evaluate homeworks - this made it much easier for me to answer questions from students because I could physically go to their desk while somebody’s else homework was being evaluated.
  5. Using tests allowed for objectivity - which matters, a lot.
  6. It’s cool to press some buttons and make things work - moreso when there’s a public involved!