Some Thoughts on Teaching R to 50,000 Students

Two weeks ago I finished teaching my course Computing for Data Analysis through Coursera. Since then I’ve had some time to think about how it went, what I learned, and what I’d do differently.

First off, let me say that it was a lot of fun. Seeing thousands of people engaged in the material you’ve developed is an incredible experience and unlike any I’ve seen before. I initially had a number of fears about teaching this course, the primary one being that it would be a lot of work. Managing the needs of 50,000 students seemed like it would be a nightmare and making sure everything worked for every single person seemed impossible.

These fears were ultimately unfounded. The Coursera platform is quite nice and is well-designed to scale to very large MOOCs. Everything is run off of Amazon S3 and so scalability is not an issue (although Hurricanes are a different story!) and there are numerous tools provided to help with automatic grading. Quizzes were multiple choice for me, so that gave instant feedback to students, but there are options to grade via regular expressions. For programming assignments, grading was done via unit tests, so students would feed pre-selected inputs into their R functions and the output would be checked on the Coursera server. Again, this allowed for automatic instant feedback without any intervention on my part. Designing programming assignments that would be graded by unit tests was a bit restrictive for me, but I think that was mostly because I wasn’t that used to it. On my end, I had to learn about video editing and screen capture, which wasn’t too bad. I mostly used Camtasia for Mac (highly recommended) for the lecture videos and occasionally used Final Cut Pro X.

Coursera is working hard on their platform and so I imagine there will be many improvements in the near future (some of which were actually rolled out as the course was running). The system feels like it was designed and written by a bunch of Stanford CS grad students—and lo and behold it was! I think it’s a great platform for teaching computing, but I don’t know how well it’ll work for, say, Modern Poetry. But we’ll see, I guess.

Here is some of what I took away from this experience:

  • 50,000 students is in some ways easier than 50 students. When I teach my in-class version of this course, I try to make sure everyone’s keeping up and doing well. I learn everyone’s names. I read all their homeworks. With 50,000 students there’s no pretension about individual attention. Everyone’s either on their own or has to look to the community for help. I did my best to participate in the discussion forums, but the reality was that the class community was incredibly helpful and participating in it was probably a better experience for some students than just having me to talk to.
  • Clarity and specificity are necessary. I’ve never taught a course online before, so I was used to the way I create assignments in-class. I just jot down some basic goals and problems and then clarify things in class if needed. But here, the programming assignments really had to be clear (akin to legal documents) because trying to clear up confusion afterwards often led to more confusion. The result is that it took a lot more time to write homework assignments for this class than for the same course in-person (even if it was the same homework) because I was basically writing a software specification.
  • Modularity is key to overcoming heterogeneity. This was a lesson that I didn’t figure out until the middle of the course when it was basically too late. In any course, there’s heterogeneity in the backgrounds of the students. In programming classes, some students have programmed in other languages before while some have never programmed at all. Handling heterogeneity is a challenge in any course. Now, just multiply that by 10,000 and that’s what this course was. Breaking everything down into very small pieces is key to letting people across the skill spectrum move at their own pace. I thought I’d done this but in reality I hadn’t broken things down into small enough pieces. The result was that the first homework was a beast of a problem for those who had little programming experience. 
  • Time and content are more loosely connected. Preparing for this course exposed a feature of in-class courses that I’d not thought about. In-class courses for me are very driven by the clock and the calendar. I teach twice a week, each period is 1.5 hours, and there are 8 weeks in the term. So I need to figure out how to fit material into exact 1.5 hour blocks. If something only takes 1 hour to cover then I need to cover part of the next topic, find a topic that’s short, or just fill for half an hour. While preparing for this course, I found myself just thinking about what content I wanted to cover and just doing it. I tried to target about 2 hours of video per week, but there was obviously some flexibility. In class, there’s no flexibility because usually the next class is trampling over you as the period ends. Not having to think about exact time was very liberating.

I’m grateful for all the students I had in this first offering of the course I thank them for putting up with my own learning process as I taught it. I’m hoping to offer this course again on Coursera but I’m not sure when that’ll be. If you missed the Coursera version of Computing for Data Analysis, I will be offering a version of this course through the blog very shortly. Please check here back for details.

Tags: R MOOC