Pages

Thursday, August 10, 2023

The Data Fraud Case - Still More

We have been blogging about the Harvard data fraud case recently.* As blog readers, the individual at the heart of this brouhaha has filed a lawsuit against Harvard and against the researchers who identified issues suggesting data manipulation.

Although it is amusing that one of the papers identified involves honesty as its topic, an article in Vox argues that the lawsuit is not so amusing in its implications for honest research.

Is it defamation to point out scientific research fraud?

A Harvard professor accused of research fraud brings a multimillion-dollar lawsuit against the university and her accusers. What comes next?

By Kelsey Piper, August 9, 2023

A few weeks ago, I wrote about Francesca Gino, a researcher on dishonesty who last month was placed on administrative leave from Harvard Business School after allegations of systematic data manipulation in four papers she co-authored. The alleged data manipulation appeared, in a few cases, chillingly blatant. Looking at Microsoft Excel version control (which stores old versions of a current file), various rows in a spreadsheet of data seem manipulated. The data before the apparent manipulation failed to show evidence of the effect the researchers had hoped to find; the data after it did.

In total, three researchers — Joe Simmons, Leif Nelson [of UC-Berkeley], and Uri Simonsohn — published four blog posts to their blog Data Colada, pointing out places where the data in these papers shows signs of being manipulated. In 2021, they also privately reported their finding to Harvard, which conducted an investigation before placing Gino on leave and sending retraction notices for the papers in question.

Gino is now suing the three researchers who published the blog posts pointing out the alleged data manipulation, asking for “not less than $25 million.” (She is also suing Harvard.) Her argument is that because of the allegations of fraud, she lost her professional reputation and a lot of income. (Harvard Business School professors can make a lot of money through speaking appearances and book deals). I reached out to Gino for comment earlier this week but did not hear back before publication deadline.

Gino’s lawsuit argues that the researchers failed to consider other explanations for the “anomalies” in the data sets analyzed; that Harvard’s investigation into the allegations was “unfair and biased,” and that Harvard’s punishment was “overly harsh” — harsher than similar punishments when male professors were credibly accused of research misconduct.

Checking papers for data manipulation is good work

While Harvard, an institution with an endowment of more than $50 billion, has plenty of resources to defend itself, the lawsuit also targets Simmons, Nelson, and Simonsohn personally. They’re academics, and they’re not billionaires. Having to defend themselves in a defamation lawsuit is likely to be a substantial imposition for them.

Does Gino actually stand a chance of winning millions from them if the case goes to trial? Probably not.

If their statements are statements of opinion (“from comparing the data sets, I feel that Ms. Gino’s work was manipulated” would be an example of a statement of opinion), they are defensible. If they are statements of fact (like “the data in Table 3 has been manipulated in Excel to change the result”) and they are true, then they cannot be defamatory. If they’re false but the authors weren’t negligent in their publication (for instance, if they can substantiate their claims with adequate sourcing and that they considered other perspectives), then they may be defensible.

The problem, though, is that it will take years — and be extraordinarily expensive — to settle the factual question in court of whether the statements are true. “Your goal [as a defendant in a defamation case] is generally not ‘I’m going to win this at trial.’ Your goal is ‘I’m going to knock this out at the pretrial stage',” attorney Ken White told me. “What you want is to be able to get away from all the stress and extreme expense of going through a defamation case and discovery; you want to knock it out at a motion to dismiss the case.”

But at that stage, the courts won’t evaluate complex questions of fact, like whether data was in fact manipulated. You can get a case dismissed by arguing successfully that your statements were statements of opinion, but a debate that turns on whether they are true or false may well go to trial. White says, “It’s very rare you can win a motion to dismiss on the theory, ‘Actually this was true.’”

“The process is the punishment”

Having read her case and spoken to defamation experts, I think Gino is unlikely to win at trial.

Gino would have to demonstrate that the claims the bloggers made aren’t true and that the bloggers should have known that. But the details contained in her lawsuit provide far more evidence that shows it was more likely that their claims were true and/or that they were not negligent in reaching those conclusions. Included in the appendices to the lawsuit are the analyses of the study data by an independent forensic firm Harvard hired to examine the situation. While the Data Colada researchers had to rely on public data for their analysis, they hypothesized that Harvard would be able to get more data from Qualtrics and other sources and compare it to the public data to find more evidence about whether manipulation happened. It appears that that’s exactly what Harvard did.

“The analysis of files demonstrated an apparent series of manipulations to a dataset prior to its publication ... Both the earlier version and the latest version of the data available for review were created in 2012 by Dr. Gino, and last saved by Dr. Gino, according to their Excel properties,” the forensic review of the 2014 paper concludes.

“There appear to be multiple discrepancies in certain score sets between the original data source (“Qualtrics Data”) and public repository data associated with the 2020 JPSP Paper (“OSF data”). ... Utilizing the same analyses for the Qualtrics data demonstrates that outcomes a) appear contrary to reported study effects, and b) have lower (or no) statistical significance,” the review of the 2020 paper concludes.

At this point, multiple independent examinations of the data have concluded that it appeared manipulated, in many cases in ways that made the authors’ hypothesis come true. This occurred across multiple papers that Gino co-authored.

Truth is always a defense to block defamation claims. But the lawsuit can still substantially harm the defendants even if the courts eventually find that they were telling the truth. “The system is so broken ... that a case like this will cost hundreds of thousands of dollars and go on for years,” White told me. “Realistically, you could wind up going to trial, and even if you’re going to win at trial eventually you’re going to be ruined doing it.”

“The process,” he added, “is the punishment.”

Overall, it looks like Data Colada’s concerns were backed up by independent forensic analysis. New data was uncovered that supports their case for data manipulation. Harvard agreed with them. And yet they’re still being sued for defamation. “If Data Colada handled this case poorly enough for it to go to litigation,” Erik Hoel writes, “it’s hard to imagine what an ‘ideal case’ of exposure would look like. It’d have to be perfect in language, perfect in analysis, perfect in conclusions, with ‘smoking guns’ and all. That’s a pretty high bar for science.”

In practical terms, that bar is pretty much unattainable, which means that scientists who have a lot of evidence of a problem in research will be increasingly hesitant to come forward.

“You can basically bully people out of science”

Whether the patterns found in the data in these four papers point to deliberate manipulation and fraud or could be explained by an innocent mistake is fundamentally a scientific question. It’s best resolved with data analysis and open debate — with everyone trying to explain how the observations in the data might have come about, and then checking version control and other sources to see if their theories are correct.

But often, scientists whose theories are challenged are trying to resort instead to silencing their critics with the courts. “I’m representing at this moment three people who are scientists or science bloggers or science observers who have been threatened for what they write,” White told me. “People are increasingly prone to sue critics or threaten to sue them rather than publish a rebuttal.”

Critics are especially vulnerable to this because many of them are graduate students or early-career academics who don’t have job security or the resources to endure a protracted legal battle. Even those who have more secure jobs sometimes suffer career consequences. David Sanders, a biologist at Purdue University, was sued in 2017 for criticizing another scientist. He persuaded the university to cover his successful legal defense. But, he told the blog Retraction Watch, there were still career consequences: “It was conveyed to me that some of my investigatory endeavors, although they were not directed towards articles with authors from my University, were unwelcome. There was a concomitant withdrawal of resources from my laboratory and me.”

Faced with the possibility of careers being ruined and personal financial liability, many people might just choose to stay quiet when they see unmistakable signs of malfeasance. “A lot of important science gets done not by big institutions questioning things but by independent people like this,” White told me. “If you can file suits like this, you can basically bully and intimidate people like this out of science.”

We’d like to think scientific fraud is astoundingly uncommon. It isn’t.

One of the papers at the center of the present controversy — a 2012 paper on dishonesty — had, in fact, already been retracted before this scandal because, in an unrelated scandal, the data in a different one of the three experiments that made up the study had been manipulated or falsified by a different person.

It’s an unwelcome reminder that while we tend to assume that peer-reviewed academic work doesn’t contain falsified or manipulated data, the peer review process doesn’t have a consistent mechanism to catch data fakery. And these studies with falsified data don’t just inflate resumes; they have real-world consequences. In the early months of Covid-19, data fraud clouded our understanding of which Covid treatments actually worked.

In the Covid case, as in this one, the fraud was uncovered by a small team of researchers doing their anti-fraud work on top of their day jobs. They started calling hospitals and reviewing spreadsheets, finding obvious signs of malfeasance like a study that was just the same 11 data points repeated over and over and over, or a study reportedly conducted at a hospital that says it never participated in the study. It’s hardly rare for such major issues to be uncovered only by a stubborn individual or small team. In another high-profile case that took down the president of Stanford University, data problems were discovered by a determined student journalist.

“The entire scientific community operates on trust,” Gideon Meyerowitz-Katz, an epidemiologist involved in looking into Covid data fraud, told me at that time. “There is this assumption in research that if someone tells you they have done something, then they have done it.”

If other researchers didn’t occasionally dig into weird results and look for signs of manipulation, many cases of data falsification would never be noticed at all, and indeed, many skated by unnoticed for years. That’s what makes the lawsuit against the Data Colada authors such a problem. It is aimed directly at the strongest mechanism for identifying data manipulation in academia today: other researchers digging in and raising questions about studies. If it succeeds — or even if it just drags on expensively for a while — it will make future academics who notice something off in others’ work more reluctant to speak up about it. And that’s a serious disservice to science.

Source: https://www.vox.com/platform/amp/future-perfect/2023/8/9/23825966/francesca-gino-honesty-research-scientific-fraud-defamation-harvard-university.

===

*http://uclafacultyassociation.blogspot.com/2023/08/the-data-fraud-case-continued.htmlhttps://uclafacultyassociation.blogspot.com/2023/06/the-irony-of-this-being-story-about.html.

No comments: