They cannot all be ruled by one statistical analysis

An average journal article only contains one analysis pipeline by one set analysts.

They cannot all be ruled by one statistical analysis

An average journal article only contains one analysis pipeline by one set analysts. In the best case scenario, it is possible to see why judicious alternatives could produce different results.

In 2020, for example, the UK Scientific Pandemic Influenza Group on Modelling asked nine teams of experts to calculate the reproduction rate R for COVID-19. Teams were given a variety of data (deaths and hospital admissions, testing rates, etc.) and had to choose from a range of modelling approaches. Despite being clear about the question, there was a lot of variability in the estimates between teams (see "Nine teams, nine estimates").

The most optimistic estimate on 8 October 2020 suggested that 100 people with COVID-19 could infect 115 other people, but possibly as few as 96. This suggests that the pandemic may actually be receding. The most pessimistic estimate was that 100 people with COVID-19 would infect 166 others. This is indicating rapid spread. Although there was consensus that the disease's spread trajectory was alarming, the uncertainty in the nine teams was much greater than the uncertainty in any one of them. As the pandemic raged on, it informed future work.

These and other projects involving'multi-analysts' show that independent statisticians rarely use the same method2-6. However, single analyses are sufficient evidence to publish a finding or make a strong claim in many fields, including psychology, ecology, and materials science.

Researchers have been able to see how scientists can choose the best statistical procedure to arrive at the most flattering conclusions over the past ten year. It is not well understood how limiting analyses to one technique can blind researchers to an important aspect, making results appear more precise than they are.

Uncertainty is a term used by statisticians to describe the range of values that could reasonably be taken from, for example, the reproduction rate of COVID-19, the correlation between religiosity, well-being6, cerebral cortical thickness, cognitive ability7 or any other statistical estimate. The current scientific publishing model, which focuses on a single analysis, is based on'model myopia' and a limited use of statistical assumptions. This leads to poor predictions and overconfidence.

Researchers should perform multiple analyses to assess the reliability of their conclusions. Ideally, this would be done by one or more independent teams. This is a major shift in the way science is conducted. We also understand that there is not yet enough infrastructure or incentives to support this. Many researchers will find it difficult and burdensome. We believe that statistical inference can be more effective if there are more diverse approaches.

Ronald Fisher, a scholar who pioneered formal methods for hypothesis testing 100 years ago, is now considered an indispensable tool for drawing conclusions from numerical data. The best-known method for determining statistical significance is the P value. A variety of methods and tests have been created to quantify inferential uncertainties since then. However, any one analysis can only draw from a small number of these. As it stands now, uncertainty analyses only reveal the tip of an iceberg.

A dozen or so completed formal multi-analyst project (see Supplementary Information) have shown that uncertainty levels are far higher than those suggested by any one team. The 2020 Neuroimaging Anaplication Replication and Prediction study2 saw 70 teams use the same functional magnetic resonance imaging data (MRI) to test nine hypotheses about brain activity during a risky decision task. One hypothesis explored how brain regions are activated when people think about a large gain. About 20% of analyses were considered a minority report, which had a qualitative conclusion that was different from the majority. The three hypotheses with the most ambiguous results had around one-third of the teams reporting a statistically significant result. Publishing work from any one of these teams would have concealed considerable uncertainty and spread of possible conclusions. Coordinators of the study recommend that multiple analyses be performed on the same data routinely.

Another multi-analyst project involved finance3. It involved 164 teams and tested six hypotheses. Again, coordinators found that the differences in results were not due to errors but to the large number of possible analysis decisions and statistical modeling.

Two myths about applied stats have been dispelled by all these projects. The first is that there is a single, unique analysis method for every data set. Even though there are many teams, the data is relatively simple and analysts rarely follow the exact same analytic process.

Another myth is that different plausible analyses will always yield the same results. We believe that researchers should not report only one result from a single statistical study. This is because there is a lot of uncertainty. We support science reform efforts such as preregistration, large-scale replication studies and registered reports. However, these initiatives do not reveal statistical fragility by exploring how plausible alternative analyses could alter conclusions. Because they are grounded in the single-analysis framework, all formal methods, no matter how old or new, can't cure model myopia.

We need another model. Model myopia can be treated by applying more than one statistical model on the data. Astronomy and high-energy physics have a long tradition of teams performing their own analyses of research from other teams once the data is made public. Climate modelers regularly perform "sensitivity analyses" by adding and removing variables in order to determine how robust their conclusions.

Journals, reviewers, and researchers must change the way they approach statistical inference to allow other fields to follow their lead. Statistical inference should not be viewed as one-dimensional reporting of the results of a single analysis. It should instead be viewed as a complex interplay between different possible procedures and processing pathways8. This practice could be encouraged by journals in at least two ways. They could change their submission guidelines to encourage multiple analyses (possibly reported as an online supplement). This could encourage researchers to conduct additional analyses or recruit more analysts to be co-authors. Journals could also invite teams to submit their analyses as comments to a recently accepted article.

Large-scale changes in science are possible. There are increasing expectations about data sharing. For medical journals to publish the results of clinical trials, they must be registered at their launch. However, proposals to change are met with skepticism. These are the five we have encountered.

Won't readers get confused? There are currently no standards or conventions for how to present and interpret multiple analyses' results. This could lead to confusion in reporting and making conclusions more unclear. We believe that the potential for ambiguity in multi-team analyses is a good thing and not a problem. Readers should be aware that conclusions can only be supported by a small number of plausible models or analyses. It is better to face uncertainty than to sweep it under the carpet.

There are other pressing problems. There are many problems in empirical science, including selective reporting, poor transparency around analyses, hypotheses which are not supported by the theories they support, and poor data sharing. These areas need to be improved. Indeed, the way data are collected, processed and defined will have a significant impact on all subsequent analyses. Multi-analytical approaches can still provide insight. Multi-analyst teams are known for their ability to share data, provide transparent reporting, and conduct theory-driven research. These problems are seen as mutually-reinforcing, rather than as a zero sum game.

It is worth the effort and time it takes. Multiple analyses may not be necessary at publication for those who see the benefit. They would rather encourage the original team to conduct multiple analyses, or that other researchers can reanalyse shared data after publication. Both would be a significant improvement on the current status quo, as sensitivity analysis is an underutilized practice. They will not provide the same benefits as multi-team analyses that are done at the time the publication.

If they significantly undercut the original conclusion, post-publication analyses will usually be published. These analyses can lead to more squabbles than constructive discussion and are published only after both the authors and readers have reached a consensus on one analysis. The best time to gather information about uncertainty is during analysis. We doubt that a single team will be able to uncover the fragility of their findings. It might be tempting to choose analyses that together tell a cohesive story. A single research team often has limited expertise in data analysis. Each of the nine R teams would feel uncomfortable coding and producing estimates using other R models. Multiple teams can use widely different statistical models and procedures even for simple statistical scenarios.

Multi-team analyses are not likely to yield enough results to justify the effort, according to some skeptics. Although we believe that existing multi-analyst project results support this argument, it would be beneficial to have evidence from more projects. Multi-analytical approaches will become more clear the more they are used.

Won't journals baulk? Journals will be reluctant to accept our proposal because multi-analyst project will require more work, be more difficult to present and evaluate, and even require new article formats. The review and publication of a multianalyst paper does not require a fundamentally new process, we counter. Multi-team projects have been published by a number of journals. Most journals publish comments attached to accepted manuscripts. Multi-analyst projects are welcome to be published in journals. We encourage journal editors to consider this possibility. Editors might organize a special issue containing case studies to test the waters. It should be clear whether the multi-analyst approach has any value.

Will it not be difficult to find analysts? Our proposal states that most multi-team analyses published thus far have been the result of demonstration projects that were combined into one paper. These papers include many analyses with lengthy author lists composed mainly of reform enthusiasts. Most other researchers would not see much benefit from being minor contributors to a multi-analyst article, especially one that is not their core research interests. We believe enthusiasm is a broad base. We have received more than 700 sign ups for our multi-analytical projects in less than 2 weeks.

A range of incentives can also be used to attract analysts to your team, including co-authorship, the opportunity to work on important questions, or to collaborate with specialists. It is easy to think of other incentives and catalysts. A special issue of Religion, Brain & Behavior will be published in which several teams will publish their conclusions and interpretations of the research question. This allows each team to be individually acknowledged. Journals, governments, and philanthropists need to actively support or recruit multi-analysis teams when a question is urgent.

Another approach is to include multiple analyses in training programs. This would be useful for researchers and would open up new possibilities for statisticians. (At least one university already includes replication studies in its curriculum11. Participating in multiple analyses should be considered part of being a science citizen. You will receive better chances for promotion and hiring.

No matter what incentives or formats you use, multi-team analyses will be easier if they are discussed and implemented. It is important to study what makes multi-team efforts successful and how you can improve the practice. Acceptance and enthusiasm for multi-team analyses will increase as scientists learn how to do them and learn from each other.

We believe that rejecting multi-analytical vision would be like Neo choosing the blue pill in The Matrix and continuing to dream about a comforting, but false reality. Scientists and society would be better served if they were to confront the vulnerability of statistical results. Researchers and society need to be able to see the potential fragility of published results from the moment they are published. This is especially important when the results have real-world implications. Recent studies by many-analysts have shown that one analysis can lead to conclusions that are too confident and not representative. The benefits of greater insight outweigh any extra effort.