Brian Wansink is nowhere to be found. He’s not in his office. Calls to his cellphone go to voicemail. He was supposed to meet me that morning at Cornell University’s Food and Brand Lab, which he created and directs, but he canceled the night before. Cornell’s media-relations staff is apologetic and accommodating: What about a meeting with the dean instead? Or a tour of campus? The architecture is amazing — and those gorges!
Normally, Wansink is only too happy to talk. The Sherlock Holmes of Food, as he’s sometimes called, made his name over the last two decades by persuading people to care about his research on eating habits. He’s popped up on Oprah, 60 Minutes, and Rachael Ray, among other shows. He’s the author of the best seller Mindless Eating: Why We Eat More Than We Think (Bantam, 2006). Wansink is affable, quotable, and good on TV. It doesn’t hurt that the clever studies he cranks out (more than 200 so far) are chock-full of practical factoids. Did you know that men eat more when women are around? Or that yogurt tastes better if it has an organic label? Or that people who hide their cereal boxes weigh 20 pounds less than people who keep their Cheerios in plain sight?
His insights have also attracted interest — and, more importantly, money — from policy makers. In 2014 the U.S. Department of Agriculture forked over a $5.5-million grant to support Wansink’s Smarter Lunchrooms program, which has since been put into practice in more than 30,000 schools across the country.
But in recent weeks, Wansink has become the subject of a less-flattering sort of attention. Four papers on which he is a co-author were found to contain statistical discrepancies. Not one or two, but roughly 150. That revelation led to further scrutiny of Wansink’s work and to the discovery of other eyebrow-raising results, questionable research practices, and apparent recycling of data in at least a dozen other papers. All of which has put the usually ebullient researcher and his influential lab on the defensive.
The slow-motion credibility crisis in social science has taken the shine off a slew of once-brilliant reputations and thrown years of research into doubt. It’s also led to an undercurrent of anxiety among scientists who fear that their labs and their publication records might come under attack from a feisty cadre of freelance critics. The specifics of these skirmishes can seem technical at times, with talk of p-values and sample sizes, but they go straight to the heart of how new knowledge is created and disseminated, and whether some of what we call science really deserves that label.
It began with a seemingly innocent anecdote.
Back in November, Wansink wrote a post on his blog, Healthier and Happier, titled "The Grad Student Who Never Said No." It was about a visiting graduate student who agreed to reanalyze a dataset from an old experiment. Wansink and his fellow researchers had spent a month gathering information about the feelings and behavior of diners at an Italian buffet restaurant. Unfortunately their results didn’t support the original hypothesis. "This cost us a lot of time and our own money to collect," Wansink recalled telling the graduate student. "There’s got to be something here we can salvage."
He had previously offered the dataset to a postdoc in his lab but the postdoc had declined, citing other priorities. The graduate student, however, eagerly took up the task and her efforts led to a number of published papers. Aspiring academics should remember, Wansink wrote, that "time management is tough when there’s [sic] so many other shiny alternatives that are more inviting than writing the background section or doing the analyses for a paper."
Say yes, work hard, get published. Who could argue with that?
Quite a lot of people, as it turns out. What Wansink had described is more or less a recipe for p-hacking, a practice that has led to a lot of hand-wringing and soul-searching in recent years, particularly among social psychologists. The "p" in p-hacking is a reference to p-value, a calculation that can help establish whether the outcome of an experiment is statistically significant. P-values can be misleading, though, particularly when you try multiple hypotheses on the same dataset. One of these hypotheses may eventually appear to "work," but that doesn’t mean that you’ve arrived at a solid scientific result. Instead it might just mean that you tortured the data long enough to find a meaningless pattern amid the noise.
Twitter caught wind of Wansink’s post in mid-December and pounced:
@BrianWansink is this supposed to be a satire on poor research practices?— Andrew & Sabrina (@PsychScientists) December 15, 2016
Try hard enough and you're guaranteed to find random noise. Unsure how? Just follow these easy steps outlined by a world-renowned scientist. pic.twitter.com/nPdyoOR9Ho— Tim van der Zee (@Research_Tim) December 15, 2016
The post was catnip for those who make it their business to ferret out research wrongdoing. They’ve been called destructo-critics, methodological terrorists, and worse. Their number includes tenured professors and motivated amateurs who tend to be exacting in their evaluations and brutal in their critiques.
One of those motivated amateurs was Jordan Anaya, who goes by the online handle Omnes Res — Latin for "All the Things," a name he chose because of his fondness for big data. Anaya left a graduate program in biochemistry and molecular genetics at the University of Virginia a couple of years back to pursue his own research interests, which he does from an apartment in Charlottesville, Va. One of those research interests included writing the GRIM program.
What is GRIM? Here’s a fairly technical answer: GRIM is the acronym for Granularity-Related Inconsistency of Means, a mathematical method that determines whether an average reported in a scientific paper is consistent with the reported sample size and number of items. Here’s a less-technical answer: GRIM is a B.S. detector. The method is based on the simple insight that only certain averages are possible given certain sets of numbers. So if a researcher reports an average that isn’t possible, given the relevant data, then that researcher either (a) made a mistake or (b) is making things up.
GRIM is the brainchild of Nick Brown and James Heathers, who published a paper last year in Social Psychological and Personality Science explaining the method. Using GRIM, they examined 260 psychology papers that appeared in well-regarded journals and found that, of the ones that provided enough necessary data to check, half contained at least one mathematical inconsistency. One in five had multiple inconsistencies. The majority of those, Brown points out, are "honest errors or slightly sloppy reporting."
But not all.
Anaya read the Brown and Heathers paper and loved it. As a tribute, he whipped up a computer program to make it easier to check papers for problems, thereby weaponizing GRIM.
After spotting the Wansink post, Anaya took the numbers in the papers and — to coin a verb — GRIMMED them. The program found that the four papers based on the Italian buffet data were shot through with impossible math. If GRIM was an actual machine, rather than a humble piece of code, its alarms would have been blaring. "This lights up like a Christmas tree," Brown said after highlighting on his computer screen the errors Anaya had identified.
Brown is a graduate student at the University of Groningen, in the Netherlands, and a crusading troublemaker of sorts. His previous exploits include teaming up with Alan Sokal, of Sokal hoax fame, to poke holes in the much-vaunted and now mostly debunked "positivity ratio," used to supposedly calculate whether someone is flourishing, along with translating the memoir of the notorious data-fabricator Diederik Stapel from Dutch into English.
Anaya, along with Brown and Tim van der Zee, a graduate student at Leiden University, also in the Netherlands, wrote a paper pointing out the 150 or so GRIM inconsistencies in those four Italian-restaurant papers that Wansink co-authored. They found discrepancies between the papers, even though they’re obviously drawn from the same dataset, and discrepancies within the individual papers. It didn’t look good. They drafted the paper using Twitter direct messages and titled it, memorably, "Statistical heartburn: An attempt to digest four pizza publications from the Cornell Food and Brand Lab."
It’s been viewed more than 10,000 times.
Their paper, published in January as a "preprint," meaning that it was not peer-reviewed, is written in a calm, collegial voice, a contrast to the often hectoring tone of social media. But the conclusion was damning. In the paper, the authors allow that while science cannot always be perfect, "it is expected to be done carefully and accurately."
Wansink’s work, they believed, had failed on both counts.
On his blog, Anaya was less restrained. "If you were to go into the lab and create someone that perfectly embodied all the problems science is currently facing you couldn’t do better than Brian Wansink," he wrote.
When I first spoke to Wansink on the phone last month, he seemed hopeful he could put all this behind him. He was willing, even eager, to defend his research and to show off his lab. He told me there were lots of cool things going on there. He likes that word (his CV has a section titled "COOL OR UNUSUAL HONORS"). For him, cool research piques public interest while offering insight into important issues. Cool papers are the kind of papers people talk about, not the ones that languish unread in obscure journals.
Because Wansink wasn’t around, having pulled out of our interview, I was given a tour of Cornell’s Food and Brand Lab by Adam Brumberg, the lab’s deputy director. One thing you notice right away: Wansink’s face is everywhere. There are photos of him on the wall, including his Got Milk? ad, complete with milk mustache. A large flat-screen plays one of Wansink’s television appearances on an endless loop. There’s a glass trophy case filled with items Wansink has made famous, like the bottomless soup bowl that he used to prove that people will eat 73-percent more if their bowl is constantly replenished via a hidden tube. A cool study, if there ever was one.
The lab has its own mock kitchen for use in experiments. It looks like a kitchen you’d find in any house — fridge, sink, stove. Though most kitchens don’t come equipped with two-way mirrors or six carefully placed cameras that allow researchers to observe what goes on from every possible angle.
Also in the mock kitchen, taped to the wall, are several big sheets of paper outlining how the lab should respond to the growing crisis in confidence. The handwriting is Wansink’s and includes items such as "Errata to Journals" and "New SOPs" — meaning, presumably, standard operating procedures.
This isn’t the first time Cornell has had to cope with a blow to its research reputation. In 2011, Daryl Bem, an emeritus professor of psychology, published a paper in which he showed, or seemed to show, that subjects could anticipate pornographic images before they appeared on a computer screen. If true, Bem’s finding would upend what we understand about the nature of time and causation. It would be a big deal. That paper, "Feeling the Future," was widely ridiculed and failed to replicate, though Bem himself has stood by his results.
Bem, however, could be dismissed as a quirky psychologist poking at conventional wisdom. He wasn’t in charge of a major lab. Cornell’s business school uses Wansink’s face to illustrate its faculty and research webpage. The university boasts about its "spirit of engagement" with the public, and no one has demonstrated that spirit more ably or consistently than Wansink.
Late that afternoon, I finally get Wansink on the phone. Some colleagues at Cornell wanted him to stay out of the spotlight, he told me. He had posted a heartfelt message on his blog a couple of days before, talking about his professional and personal struggles, including rumors about his sexuality and failure to get tenure early in his career, and apologizing profusely for the "negative attention" he had brought to the lab and the field. Maybe, his colleagues thought, it would be better if he didn’t say anything else publicly for a while.
After some cajoling, he agreed to meet anyway, suggesting a McDonald’s near his house. It might seem strange for one of the nation’s best-known food experts to arrange a rendezvous at the nation’s most famous fast-food joint, but Wansink is no culinary snob. He once tweeted: "Eat McDonald’s. Not too often. Bring laptop" — a Big Mac-friendly twist on Michael Pollan’s "eat food" dictum.
Wansink was dressed in a green hoodie, jeans, and tassled loafers. He is 56 years old, but he’s held onto his blond hair and boyish manner. Ordinarily lively, Wansink looked understandably worn and frazzled. He was drinking a medium-sized diet Coke, which he refilled twice during our interview (he’s acknowledged being a diet cola-holic). He’d been awake since 3 that morning writing remarks he would deliver later that week in front of the U.S. House Committee on Agriculture regarding possible changes to the Supplemental Nutrition Assistance Program, more commonly known as food stamps. He’d jotted a few notes for his congressional testimony on the back of his McDonald’s receipt.
Wansink is no stranger to Washington: He served as executive director of the Agriculture Department’s Center for Nutrition Policy and Promotion under George W. Bush, where he was responsible for managing the food pyramid. He was also involved with Michelle Obama’s "Let’s Move" program.
While Cornell officials and the lab’s deputy director put on a brave face, Wansink doesn’t make much effort to hide his distress. He estimates he hasn’t slept more than a couple of hours in almost two weeks, with the exception of one long stretch when, exhausted, he stayed in bed past noon. (This is dramatically out-of-character for Wansink who, according to an admiring 2015 profile in Mother Jones, gets up at exactly 4:46 a.m. every day.) His anxiety over the multitude of errors found in those papers, and the uneasy sense that everything he’s ever done is now in doubt, has taken a toll. "This sucks worse than anything I’ve experienced in my life," he says.
And it’s not just the insomnia. A postdoc who had been planning to spend a year at the Food and Brand lab called to say that she wasn’t going to come after all, fearing that the lab’s troubles might reflect poorly on her budding career. Wansink also heard from frantic co-authors, some of whom he collaborated with a decade or more ago. They’re rerunning calculations, double-checking data, terrified that their papers, too, might be deemed suspect.
He’s spoken to Ozge Sigirci, the graduate student who never said no and who is now seeing her publications pilloried. "Every time I talk to her, she is in tears about things," Wansink says. (Sigirci did not respond to an email requesting comment, nor did Kevin M. Kniffin, a visiting assistant professor of organizational behavior and leadership at Cornell, who is a co-author on one of the papers).
Wansink says he was unaware of the so-called replication crisis in social science. He had never heard the term "p-hacking" until he was accused of it. Wansink hasn’t been able to explain how a set of results from his lab could turn out to be so fundamentally flawed, but he does argue that field studies, like the one conducted at the Italian restaurant, aren’t the same as laboratory studies. According to Wansink, such studies should be considered exploratory — that is, the results should be taken with a grain of salt considering the inherent difficulty in conducting studies outside a controlled setting. "Science is messy in a lot of ways," he says.
A note he posted after the concerns became public indicated a willingness to share the lab’s data, seemingly a gesture toward transparency. But before Anaya, Brown, and van der Zee published their paper, they sent the lab an email requesting the original dataset. Wansink says that, at first, he was going to share the data with them even though it meant jumping through some hoops because of rules regarding subject anonymity. But once he realized what they were doing, he changed his mind. "If they were interested in coming up with something that’s constructive, that would be one thing. But that wasn’t really the way they responded back," he says. "There was the hassle of bringing them on board when it seemed not intended toward helping move science forward."
Brown doesn’t buy Wansink’s logic. "Science advances by, among other things, verification," he says. He notes, too, that a journal that published one of the papers in question requires researchers to share data upon request.
Wansink says he’s planning to take steps to ensure such errors never happen again. His first move was to ask a postdoc in his lab to re-examine the disputed calculations using the original dataset, which they had to hunt for and finally found in a file labeled "pizza." Also, Wansink wrote on his blog that the lab will put in place new procedures to provide "guidance for collecting, analyzing, reporting, and storing data" that will be "very useful in the future in tightening up operations." He hopes his lab will one day serve as a paragon of proper scientific methods. "My dream would be that in a year we would be seen as the gold standard," he says.
For that to happen, those four flawed papers would need to be seen as an unfortunate aberration rather than evidence of an ingrained problem. Right now Wansink’s critics lean toward the latter conclusion. They’ve publicly flagged flaws in 10 other papers so far, and there are more in the pipeline. On a whim, I asked Brown about a 2016 paper by Sigirci and Wansink titled "How Traumatic Violence Permanently Changes Shopping Behavior." The paper found that combat veterans were more willing to try new products. Brown hadn’t looked at the paper closely before, but after an hour of analysis he identified "34 statistical or numerical reporting errors, plus several implausible properties of the sample."
He sent me the highlighted version. That paper, too, lit up like a Christmas tree.
Tim van der Zee keeps a running list of Wansink publications that appear to have either statistical discrepancies or some other issue, like recycled data. That number is now at 26. "Personally, I’m not going to believe any paper he published," says van der Zee, who calls his blog "The Skeptical Scientist."
Like Anaya and Brown, van der Zee says he’s not driven by any animus toward Wansink; none of them had heard of him before they started dissecting his research. Instead he’s motivated by frustration with a scientific establishment that too frequently rewards dubious work, that seems to prefer flashiness over rigor. "It’s a disgrace for the field that we are three nobodies and we are the ones that have to discover this when it’s been out there in the published research for years," he says.
Except for a generic statement supporting "open inquiry," Cornell officials have remained mostly silent. In an internal email sent last month, the business school’s deputy dean, Christopher B. Barrett, wrote that "in the absence of any clear allegation of scientific misconduct, there are no current plans for any formal inquiry or disciplinary actions." He went on to say that the university had "strongly encouraged Brian to sincerely and swiftly address the serious criticisms leveled against his work" and expressed "deep concern over the suggestion that any of our community’s research does not satisfy the high standards of excellence that we hold dear."
Wansink still hopes to prove that his research lives up to that high standard. But he’s well aware that academic credibility is a fragile commodity and, in his less-optimistic moments, worries that the damage to his reputation has already been done. "I still think that most of our stuff is really, really rigorous," Wansink says. "What I’m upset with is that it may not be seen as such anymore. That’s my disappointment: That all my amazing work, now people will say, ‘Yeah, but I wonder.’"
Tom Bartlett is a senior writer who covers science and other things. Follow him on Twitter @tebartl.