Few activities in human history have had more impact on our lives than science. And yet, the impact of new scientific research is constantly being measured, often in ways that seem to cast doubt on whether this research is even worth the expense. Why?
The answer has more to do with the performance of research than the idea of research. Research (not just in the natural sciences but in the social sciences and humanities as well) is massively more expensive than a generation ago—up at least 10-fold in constant dollar terms over the last 50 years—and the subject matter of research is also increasingly complex and siloed. Keeping track of who is spending what and for what reason is hard work. Doing so in a way that is actually helpful and meaningful—at least for researchers—is also difficult. What we find more often than not in today’s world of research impact assessment is that impact measurements aren’t really measuring research at all but factors that maybe, at best, are tangentially related to research. These impact measurements also often create impacts of their own.
Consider the case of research publishers. Different stakeholders in the research ecosystem focus on different dimensions of impact and use different metrics. Research publishers (mostly commercial scholarly journal publishers) generate volumes of statistics on the millions of articles published every year in order to better understand industry trends and customer needs, develop new titles as needed, and adjust pricing. Their best-known tool for assessing and labeling “impact” is the Journal Impact Factor (JIF), which measures the citation performance of journals over a two-year period—roughly stated, how many other researchers are reading this work and citing it in their own work. Publishers aren’t simply content to just measure impact using this tool, however. They also actively promote the impact factors of their journals because they can make more money from journals with the highest JIFs. By making certain journals appear more prestigious and impactful, publishers can levy higher subscription fees and author publishing charges for these products.
Researchers are attracted to publisher claims about impact like moths to a flame because, as survey after survey has shown over the years, they care more about making an impact than just about anything. They want their work to be useful and to make a difference. Researchers also want visible credit for their work so they can advance their careers in terms of retention, promotion, tenure and grant funding. Publishers create an illusion of prestige and impact, and researchers believe it—universities, funders, and governments as well, who all value prestige publishing more than other kinds of publishing because they believe it’s a proxy for higher impact research work. Since journals are the most important way that researchers share information—especially university-based researchers, who author around 80% of the articles published in research journals—they are particularly vulnerable to prestige baiting and particularly reliant on using impact factors to judge the impact of research.
This isn’t the only impact measure researchers care about, of course. Researchers are also judged (by their colleagues, as well as by publishers, universities, and funders) by how many articles they author or coauthor—fueling the well-known “publish or perish” syndrome—and by the dollar value of the research grants they receive, number of patents awarded, books published, dissertations directed, the reputation of their institutions, opinions of colleagues, notoriety, tenure status, position, awards, and so on. Academics have plenty of status symbols, and like the rest of society, there’s a certain presumption that collecting enough of these symbols means you’re important. There’s also a presumption (often an inaccurate one) that once you are so labeled, the work you do must be meaningful and impactful. Like the rest of society, it can take time—barring a remarkable discovery—for an early career researcher to gather enough symbols for their work be considered impactful enough to merit tenure or large grant awards.
Publishers employ other impact assessment techniques as well, such as desk evaluations by editors to determine whether a research paper will be of interest to their readers, whether it should be distributed for peer review, and whether it is worth the additional investment of time and money to transform into a published article. Most papers get rejected at least once during this process, but most eventually get published in some journal somewhere (there are tens of thousands—no one is sure exactly how many, in fact). Meanwhile, even though prestige journals reject around 90% of all submissions, and publish less than 1% of the world’s research, the attention the articles in these journals receive in research and media circles exerts an outsize influence on public perceptions and researcher incentives about which research is and is not “impactful” and worth funding.
For top-tier research universities (known as R1 universities), the tens of billions of dollars in research funding awarded annually by government agencies like the National Institutes of Health and the National Science Foundation—particularly grants in the life sciences sector—has an enormous impact on school employment, reputation, and budgets, accounting for over half of total R&D spending by US universities in recent years. In addition, school rankings like those published by US News & World Report are calculated in part using research-related inputs and outputs—the number of researchers employed, amount of research funding received, number of journal articles published, and so on. These rankings create additional impacts on school reputations, which can translate into larger endowments, higher enrollment demand, and better recognition, retention, and funding for researchers.
Nonprofit funders also play a role in impact evaluation. Most research funding comes from industry and government, but nonprofits like the Gates Foundation, Wellcome Trust, and Max Planck Institute wield a lot of influence when it comes to measuring impact. Funders of all stripes want to know if the money they spend on research makes a difference, but nonprofits have been global leaders in the push for open access and other mechanisms to help improve the impact of research through more robust transparency and reliability, and also translating research into societal impact. Today, most of the research funding awarded by nonprofits comes with the string attached that published work and data must be made freely and immediately available to the public in order to improve the accessibility of this work to other researchers and the general public. More and more governments in recent years have been following this lead and instituting similar requirements.
Businesses funders take a different approach. Most of the world’s applied and experimental research (which dwarfs the basic research conducted primarily by universities) is funded and conducted by the business sector. In addition, most patents are awarded to business-based researchers and their companies. So rather than guessing which research is likely to have the most impact, or worrying about impact factors and citations, or making research free to read (it isn’t; industry research is often completely secret and not published in journals at all), businesses are ultimately concerned with whether research translates into patents or profits. This approach might seem incongruous with a university’s approach, but it is actually a continuum. The basic research done by universities fuels the experimental and applied research done by businesses. All R1 universities have technology transfer centers that try to push research out of their settings and into the business sector via patents, licensing agreements, and spin-offs. University researchers also work closely with industry when it comes to publishing, and often make a leap to industry so they can focus more intently on developing their high-impact ideas to fruition (especially in high demand areas like artificial intelligence).
Government impact evaluation systems are a relatively new development in the history of science, starting in the post-World War II period as government spending ballooned and spending oversight ballooned along with this growth. But the complex government spending oversight mechanisms and rituals we see today really only took hold globally in a big way starting in the 1980s. As these evaluation needs evolved, they sprouted metrics for trying to measure impact and ensure their increasingly large budgets remained accountable to taxpayers. For example, all major U.S. government research funding agencies today use expert review panels to assess why the research being proposed matters, how the research is new, what the return on investment will be, and much more. These agencies must also suffer the sometimes witheringly skeptical oversight of Congressional budget appropriators. To weigh the impact of completed research, governments also collect mountains of economic statistics on the R&D sector, which get fed back into decisions about which areas of research provide a better return on investment for society. Like universities, government funders are also closely tied to universities because university researchers conduct most of the world’s basic research, and most of the funding for this work comes from governments.
These evaluation systems in the US and elsewhere around the world share three main traits in common. First, they aren’t necessarily objective, as hinted above. These processes are guided by, and sometimes even hijacked by, politicians who approve research budgets and manage oversight. In the 1970s and 80s, US Senator William Proxmire regularly scuttled research by handing out “Golden Fleece” awards for work that he thought lacked adequate public benefit. More recently, the Trump administration created a policy (later reversed) that would have made it harder for the Environmental Protection Agency to use science that—in the estimation of the agency’s politically appointed leadership—was too “secretive” to be trusted.
Second, most research funding and publishing happens in STEM (science, technology, engineering and math), so most evaluation systems and metrics are STEM-focused. These systems are often a bad fit for social science and humanities researchers, who might struggle to quantify in this evaluation framework why it’s important to study Civil War documents, or learn more about by why voter turnout is low (fields like political science routinely come under attack from government funders).
The third general truth is that we can only guess at the impact research will actually have, except for tallying up the outcomes that are quantifiable, like how much money is spent or how many people are directly employed. We especially have no idea how to gauge the impact of research across time—whether there might be a causal relationship between this work and future discovery or invention, and if so, how much. Einstein’s papers were not cited during the early years of his career, and he was widely mocked as being a misguided fool. If experts at the time in research, publishing, funding and government had assigned Einstein’s work an impact score, it would have been zero.
Still, impact evaluation is here to stay, for better and for worse. When billions of dollars are allocated from public funds every year, we obviously need systems of accountability to ensure this money is spent wisely. At the moment, though, researchers generally aren’t sold on whether the impact evaluation systems we have in place are fit for purpose. Grant funding is increasingly scarce and competitive, impact metrics aren’t actually measuring impact, and options might exist that haven’t been widely tested yet, like simply awarding grants through some kind of lottery system. In the meantime, many reform efforts are underway, often grounded in a philosophy that “open” and accountable science practices are the best way to protect the integrity of science and accelerate the progress of discovery.
Probably the most damning arguments against our current approach to impact evaluation fall into four categories:
What about science communication? Does it have a role to play in this debate? If science communication does become involved in the research impact evaluation world—and to-date it has not been involved—then our goal should be to improve not only research impact evaluation but research impact itself. Why? Because one will empower the other: As we do a better job of understanding and communicating the real impact of research as opposed to pretending that contrived metrics are reflecting this impact, research will have greater impact and at the same time will be able to separate itself from the negative effects of our current evaluation practices. The ultimate goal of research, after all, is to make an impact, not to be assigned a score. Somewhere along the way we’ve lost sight of this and today treat research funding more like an allowance than an investment. Finding the right balance between oversight and overreach is what’s needed now so science can be better protected from the negative feedback of our current impact evaluation systems, and in a real sense, be given more freedom to better serve society today and into the future.