Courtesy of Knewton
Jose Ferreira and David Kuntz
Each shopper gets a customized experience when he or she visits Amazon. A similar trend is taking hold in education. One key company developing technology to personalize learning is a start-up called Knewton. Universities like Arizona State teach mathematics using Knewton's Web-based platform, which mines performance and behavioral data to build a profile for each student and deliver recommendations about what learning activity he or she should do next. The publishing company Pearson is also augmenting its products with Knewton's recommendations.
The idea is that Knewton can serve up "the perfect sentence, or perfect clip, or perfect problem for you at any one time, based on what you're the weakest at, and what's most important, and how you learn it best," says Jose Ferreira, Knewton's founder and chief executive. The Chronicle discussed educational data-mining in phone and e-mail conversations with Mr. Ferreira and David Kuntz, the company's vice president for research. An edited version of those exchanges follows.
Q. How does education fit into Big Data's broader picture?
Mr. Ferreira: The first really big data plays were online. It's so easy to capture the data for Google or Facebook. You do a search for Google; Google gets about 10 data points. They get, by our standards, a very small amount of data compared to what we get per user per day. If they can produce that kind of personalization and that kind of business, based off the small amount of data they get, imagine what we can do in education.
Here's why education is different from search or social media. For one thing, the average student studies for more time than they spend on Google or Facebook. People spend way more time in Knewton than they spend on Google—they spend hours a day as opposed to minutes per day. So that's one big reason why we produce a few orders of magnitude more data per user than Google, just based on usage.
But then there's the more important reason even than that, which is that education is not like Web pages or social media. It's a different product. And it lends itself infinitely more to data-mining than does any other industry right now. The reason is that nobody has tagged all the world's Web pages for Google down to the sentence level, the way that we ask publishers to tag every sentence, every answer choice of every question. They say, Here's what this sentence is about, or this video clip. They're basically telling us every single thing about every single piece of their content. That's how we can slice and dice it so finely.
Q. What data do you collect?
Mr. Ferreira: Knewton's capturing in the hundreds of thousands of data per user per day. We're capturing what you're getting right, what you're getting wrong, what answers you're falling for if you get something wrong, what concepts are in that answer choice that you're falling for. We're also capturing when you log into the system; how much you do; what tasks you do; what you don't do; what was recommended that you do that you didn't do, and vice versa. Your time on task for every little task, whether it's reading something or doing a practice question or watching something. Your click rate—how fast you're clicking on stuff. You can imagine one student accessing different material. If her click rate increases between math and verbal—maybe she's going through the verbal a little faster—maybe it's a little easier for her.
Q. Say you have two students in the same course. How does Knewton mine what it knows about them to customize their paths?
Mr. Kuntz: We scan through all of the content that we have in the system—all of the lessons that we have—and rank them for you based on all of the things that you've done and all of the things that we know, to identify which lesson is the next lesson you should do in order to most efficiently and effectively get you through that course. You're sitting there next to me taking the course, and you complete lesson seven. And I also at that point have just completed lesson seven. And what's on your screen now is a big button that says let's go to the next thing. And in your case it's lesson nine, and in my case it's lesson 11. The learning path that you take through the lessons will be very different in most cases from the learning path that I take through the lessons, because there may be areas in which you're stronger or weaker than I.
Maybe your lesson has to do with something in geometry—graphing linear equations, or solving a linear equation with a single variable—something like that might be the next lesson that you get, having mastered rates and proportions. But another student might get a different lesson having mastered rates and proportions, simply because their proficiency profile suggests that they might be more ready to tackle that other lesson.
Q. How does this relate to adaptive tests like the GRE?
Mr. Kuntz: Adaptive tests essentially say, in the simplest case: I'm going to give you a question, and I'm going to see how you do. Based on that, I'm going to make some kind of estimate of your overall ability with respect to the thing I'm testing, like knowledge of linear equations. And then I'm going to give you another question, based on that estimate, that is going to help me, the system, get the most accurate estimate of your ability or proficiency that I can, often in the fewest number of questions that I can. Adaptive learning is a little bit different, because the goal of adaptive testing is to make that testing experience as efficient as possible, but it assumes that your knowledge over the course of that testing experience remains unchanged—that the test isn't teaching you things. Adaptive learning's goal is much more about enabling students to achieve specific learning outcomes. And so the kinds of things that you do in adaptive testing, we also do in adaptive learning, but it's way more than just "What does a student know now? OK, let's pick something for them to check their strengths and weaknesses."
Q. Does adaptive learning apply to higher subjects, beyond basic math?
Mr. Kuntz: There are some things it probably won't work for at all. Like a philosophy class, for example. What it requires is that the domain has an articulable structure, in terms of the concepts or the learning objectives associated with that particular domain. So economics has a lot of structure in it, as do the sciences, as does mathematics. And psychology, English composition—those have strong and reasonable structures as determined by subject-matter experts in those domains. As long as that exists, then we can identify the concepts or constructs that are important for students to learn earlier in the process, and important for students to learn later in the process, and identify what kinds of knowledge or skills or abilities are precursors to other knowledge or skills or abilities that are required in that discipline.
A case like philosophy is more challenging. If I'm taking a course on existentialism, largely the interactions are about conversation, discussion, and papers that are largely subjective, in the sense that different faculty members may have very, very different views about what constitutes a good response to the question that they provided. It's challenging to articulate the set of concepts in that domain, and the relationships between them, in a way that affords good adaptivity right now.
Q. What will you be able to do with the behavioral data you collect—stuff like what students click on, how much time they spend on things, and what time of day they study?
Mr. Ferreira: We're launching a feature later this year called "learning modality adaptivity." It will figure out things like, you learn math best with a video clip, or you learn science best with games instead of text, or in addition to text—and we can figure out what the optimal ratio is for you. We can figure out things like, you learn math best in the morning, and verbal concepts best in the evening, on average. Maybe you learn math best between 8:32 and 9:14. If so, we'll know it. It means when you show up in the morning to do some practice, we're going to try to feed you math, and if you show up in the evening, we'll try to feed you more verbal, because that's when you're most receptive to those subject matters.
The learning modality is very important, because it may be that you actually learn a concept better if you see it a different way—maybe because something's boring to you, and you have a harder time paying attention to science unless it's visual. Maybe you're just more engaged if you see it in a certain format. So we can calculate these things. We can figure out things like—that 35-minute burst you seem to do every day at lunch, you don't retain it very well. You should actually just skip that, have fun, and spend more time in the evening, when your retention is highest.
We can tell you things like your click rate consistently declines at the 24-minute mark on cell-division concepts, in osmosis concepts. So we're going to pull those concepts from you at the 20-minute mark. Right before you're about to get bored, we can predict that boredom and move you on to something else.