Posts Tagged ‘MOOC’

Gender Balances: A look at the makeup of HarvardX registrants

Thursday, December 5th, 2013

Although the first semester of the 2013/14 academic year is coming to a close on campus and residential students are finishing up coursework and preparing for the break, the timelines are more asynchronous for students registered for 10 currently running online offerings. This batch of 10 consists of courses and modules launched by HarvardX at different times during the Fall of 2013.

While course development teams are working to create the most stimulating learning experiences and thinking about whether and how to give students a mini winter break in their courses or modules (or summer break for those in the southern hemisphere), the research team is busy studying the troves of data produced by past and current online offerings, working with course developers to set up learning experiments, and helping to facilitate research-based innovation at HarvardX.

As part of our work to inform course development and research, the research team generated course-specific and HarvardX-wide worldwide gender composition data.


The interactive visualization above shows self-reported gender composition data for all past and current online offerings, as well as overall for HarvardX. Choosing an item from the drop down menu shows data on a particular course or module on the left, while the chart on the right displays overall gender composition to facilitate comparison. Hovering the mouse over the chart brings up information on the specific numbers used to calculate percentages. As of November 17, 7-10% of students in different offerings didn’t specify gender information, which is reflected by the Missing category. Checking the box ‘Only male/female’ leaves only these categories, and calculates the percentages using the total number of reported males and females as the denominator. The data specification file including the source code and technical information can be accessed here.

The HarvardX student body is estimated to be mostly male (62% as of November 17, 2013), although there is considerable variation in gender balance from one offering to the next. For example, CS50x Introduction to Computer Science has decidedly more male students (estimated 79%), while both offerings on Poetry in America register mostly female students (estimated 57% and 61%). Some courses have almost equal percentages of female and male students. For example, GSE1x Unlocking the Immunity to Change, launching in Spring 2014, so far has registered an estimated 51% females and 49% males. We generally do not recommend interpreting the overall HarvardX average when overall enrollment is so heavily influenced by a small number of courses (e.g., Computer Science and the Science of Cooking).

In order to gain a better understanding of gender balance in HarvardX offerings, we made a world map, showing estimated gender composition of our students enrolled from different countries around the world.


The map above is an interactive visualization of estimated worldwide gender composition of students enrolled in HarvardX offerings worldwide. Blue color means that the balance is tilted towards male registrants, yellow – females, and green is approximate parity. Hovering the mouse over countries brings up information on exact estimated proportions of female and male registrants and the numbers the estimation is based on. Estimation was performed using Missing At Random assumption for missing data, and countries with less than 100 detected students are not colored as the estimated percentages can have a margin of error greater than ±5 percentage points. Choosing items from the drop down menu will bring up information on worldwide gender composition for a particular offering. The data specification file including the source code and technical information can be accessed here.

In most countries of the world, estimated gender balance is tilted towards males, with the pattern being strongest in African and South Asian countries. Exceptions include Philippines, Georgia, Armenia, Mongolia, and Uruguay, where overall estimated HarvardX gender balance is either close to 50% or tilted towards females. Possible explanations for this finding include cultural trends, selective registration to courses, which are more popular among females, Internet access, economic factors etc.

Individual HarvardX offerings exhibit very different patterns in worldwide gender composition. For example, MCB80.1x Fundamentals of Neuroscience, Part I (launched end of October) shows gender parity in the US, Canada, and Australia, while we estimate more female students from Philippines, Argentine, and Greece. Other countries including China, India, Pakistan, France, Sweden, and others are estimated to have more male students. At the same time, the recently launched PH201x Health and Society exhibits strong female enrollment from many countries around the world, while India, China, and Pakistan as well as other countries still are estimated to have more male students in the course.

There could be many possible explanations for the observed picture of worldwide gender composition in HarvardX offerings. One aspect to consider is popularity of courses in certain fields among males or females depending on the context of a particular country. For example, gender balance in the US varies greatly from one course to the next. The ways in which online learning (edX, HarvardX, and beyond) is perceived and promoted in a particular country through advertising, word of mouth, and other means may also have some influence on who ends up enrolling for courses. There are also other country-specific factors such as cultural setting, Internet access, religion, and others, all of which may contribute to the gender balance patterns we are observing.

One parallel that I find interesting is comparing worldwide gender compositions in HarvardX offerings and residential education.

Gender parity in residential tertiary education from UNESCO’s Worldwide Atlas of Gender Equality from 2012. In most countries around the world, more females register for residential tertiary education than males.


The picture above is taken from UNESCO’s Worldwide Atlas of Gender Equality in Education from 2012, and visualizes worldwide gender composition in tertiary education. Yellow color means that there are more females enrolled in tertiary education than males, green means parity, and blue means that there are more males.

What’s interesting about UNESCO's gender parity map and the interactive visualization of worldwide gender composition for HarvardX offerings, is that they should match if residential tertiary education exhibited the same gender enrollment patterns as HarvardX. However, while there are similarities, the two pictures don’t quite match. On average, more females, across multiple countries, participate in tertiary (that is, residential) education than then they do in HarvardX online courses.

Why is it?

It could be that at HarvardX, technical courses such as CS50x skew the enrollment demographic, which has been shown to be mostly male for technical/STEM subjects in all settings. It could also be that in some countries, on average, women don't think that the initial MOOCs may have relevance to their lives and work as much as males do. It remains to be seen whether the patterns in these initial gender composition data show fundamental differences between gender demographics of residential tertiary and online education, or whether the observed patterns are due to a limited number of initial online offerings and are specific to HarvardX.

Clearly, our analysis generates more research questions than it answers. Finding and polishing bits and pieces of the puzzle to answer these questions is what makes working at HarvardX research so stimulating.

HarvardX research: both foundational and immediately applicable

Wednesday, October 23rd, 2013

There is a difference between research and how innovation happens in industry. Research tends to be more foundational and forward-thinking, while innovation in industry is more agile and looks to generate value as soon as possible. Bret Victor, one of my favorite people in interaction design, summarizes it nicely in the diagram below.

Bret Victor's differences between industry and research innovation

HarvardX is a unique combination of industry and research by the classification above. The team I am part of (HarvardX research) works to generate research and help shape online learning now, as well as contribute to foundational knowledge. Course development teams, who create course content and define course structure, sit on the same floor as us. Course developers work together with the research team looking for ways to improve learning continuously and generalize findings beyond HarvardX to online and residential learning in general. Although the process still needs to be streamlined as we are scaling the effort, we are making progress. One example is the project on using assignment due dates to get a handle on student learning goals and inform course creation.

Here is how it got started.

As we were looking at the structure of past HarvardX courses, we discovered that there was a difference in how graded components were used across courses. Graded components include assignments, problem sets, or exams that contribute to the final grade of the course which determines whether a student gets a certificate of completion. Below is public information on when graded components occurred for 3 HarvardX courses.

The visualization above shows publicly available graded components structure for three completed HarvardX courses: PH207x (Health in Numbers), ER22x (Justice), and CB22x (The Ancient Greek Hero). Hovering the mouse over different elements of the plot reveals detailed information, clicking on course codes removes extra courses from display. For PH207x, each assignment had a due date preceding the release time of the next assignment (except the final exam). For the other two courses, students had the flexibility of completing their graded assignments at any time up until the end of the course.

When the due date passes on a particular graded component, students are no longer able to access and answer it for credit. The "word on the street" among course development teams so far has been that it's generally desirable to set generous due dates on the graded components as this promotes alternative (formative) modes of learning allowing students not interested in obtaining a grade to access the graded components. Also, this way students who register for a class late have an opportunity to "catch up" by completing all assignments that they "missed". However, so far it has been unclear what impact such due date structure has on academic achievement (certificate attainment rates) versus other modes of learning (non-certificate track, ie. leisurely browsing).

Indeed, one of the major metrics of online courses is certificate attainment - the proportion of students who register for the course and end up earning a certificate. It turns out that PH207x experienced the attainment rate of over 8.5%, which is the highest among all open HarvardX courses completed to date (average rate of around 4.5%). Does this mean that setting meaningful due dates boosts academic achievement by helping students "stay on track" and not postpone working on the assignments until the work becomes overwhelming? While the hypothesis is plausible, it is too early to draw causal conclusions. It may be that the observation is specific to just public health courses, or PH207x happened to have more committed students to begin with, etc.

While the effect on certificate attainment is certainly important, an equally important question to answer is what impact do due dates have on alternative modes of learning? That's why we are planning to start an A/B test (randomized controlled experiment) to study the effect of due dates, in close collaboration with course development teams. Sitting on the same floor and being immersed in the same everyday context of HarvardX allows for agile planning, so we are hoping to launch the experiment as early as November 15 or even October 31. The findings of the study have the potential to immediately inform course development for new as well as future iterations of current courses, aiming to improve educational outcomes of learners around the world and on campus.

HarvardX is a great example of a place where research is not only foundational but also immediately applicable. While the combination is certainly stimulating, I wonder to what extent this paradigm translates to other fields, and what benefits and risks it carries. With these questions in mind, I cannot wait to see what results our experimentation will bring and how we can use data to improve online learning.

Adaptive and social media in MOOCs: the data-driven and the people-driven

Thursday, May 23rd, 2013

In light of my new position as a HarvardX Research Fellow, I have been thinking about the role of data in improving online learning experiences (aka MOOCs) at edX. Can data tell us everything about the ideal learning experience of tomorrow? Can product developers at edX come up with the best version singe-handedly? Or, maybe, the online students could also tell us what is the ideal MOOC?

First, let's think about what could be the "ideal MOOC". There is a broad consensus that an ideal online learning experience would yield the best "educational outcomes" for the students. For now, let's think about the educational outcome as something that's well-approximated with the amount of learning. Specifically, this means that we want students to extract and internalize as much educational content from the interactive learning experience as possible. Finally, the educational content is information that is relevant to the substance of the class. For example, for a probability course, this would include information on how to use Bayes rule or the change of variables. For a Python programming class this would include information on how to operate Python modules and language syntax. For a class on interactive visualization, this could include (of course!) information on how to use d3js.

This is an important point. Educational content is information relevant to the substance of the class. We want the students to internalize as much of it as possible, make it their knowledge. How can we do that?

Let's assume that the educational materials (lectures, homework, tests, examples) have already been prepared and we believe that they are good. How do we expose the materials to the students in the best possible way so that students learn the most, stay engaged, and more students complete the class?

Clearly, the setting of a MOOC is different from the setting of a standard classroom. One of the significant differences is the number of students - it's massive. Depending on the course, the number of enrolled students can exceed 150 thousand - CS50x by David Malan on HarvardX is a great example. Do we want to expose every single student, no matter what country he/she is from, no matter what talents and aspirations he/she has, no matter how many peers he/she will study with, all to the same sequence of the material? Maybe, yes. And maybe, no.

The setting of MOOCs can be a wonderful platform for adaptive media - an algorithmic way of sequentially presenting content and interacting with the user in order to maximize the informational content that the user "internalizes".

Adaptive media. It's the characterizing trait of a computer as a medium - the ability to simulate responses, interact, predict, "act like a living being". We can use it to model, predict, and synthesize the best way to serve content to users, algorithmically.

Adaptive media is used actively across the Web in conjunction with social media. Often, the inputs of adaptive media are the outputs of social media (and then it repeats). When you share an article on Facebook, the system learns about your preferences and makes sure that the next time you see content it'll be more relevant to your interests. A lot of the time, by the custom-tailored content we mean advertisements. Same goes for LinkedIn - ever noticed the "Ads you may be interested in" section to the right on your LinkedIn profile?

Can we use adaptive media in MOOCs? The benefits are obvious - with hundreds of thousands of enrollees, it is impossible to adequately staff the course with enough qualified facilitators. Adaptive media could be used together with the teachers' input and social media such as forums, social grading, and study groups. The purpose, instead of displaying personalized ads, would be to make sure each student learns as much as possible from the interactive learning experience, in his or her unique way. There could also be a multitude of positive extras - reduced dropout rate, higher engagement, higher enrollment for adaptive MOOCs.

Isn't this interesting?