Reproducibility in Science
Reproducibility vs Replication
Reproducibility
- Redo a scientific experiment & generate similar results
- Same sample, software, data, code - same result?
Replication
- Different data, same methods - conclusions consistent?
Reusability
- Will someone be able to use your pipeline in the future?
- Will you be able to use it?
The Reproducibility Problem
-
Where did you do the analysis - laptop, server, lab computer, environment
-
Are you using the most recent version (scripts, datasets, analyses)
-
" We just used the default settings!"
Studies in Reproducibility
Nature (2016)
- Found that 70% of researchers have failed in reproducing another researcher’s results
- 50% of researchers failed to reproduce their own
PLoS Biology (2024)
- Biomedical researchers - 72% reported “reproducibility crisis”
Genome Biol (2024)
- Reproducibility in bioinformatics era
Challenges of Bioinformatics
So many tools, often with:
- Multiple versions & releases
- Complex dependencies & hidden parameters, starting seeds
- Running tools locally vs on HPC
- Formatting conversions between software
- Scalability - how tools handle datasets increasing in size
- Keeping codes organized!