Contemporary science faces several interrelated crises. Competition for tenure-track jobs is getting stiffer every year, thanks to an ever-increasing supply of talented, young Ph.D. students; not enough is being done to prepare doctoral students for jobs outside of academe; candidates for junior faculty positions must submit so many research papers that journals, editors, and reviewers can’t keep up; and too many published results aren’t reproducible. All of that is inseparable from the decline in mental health of graduate students, driven largely by feelings of loneliness and isolation.
To deal with those issues, we should look at the axis around which the whole academic enterprise spins — the publication process, specifically of papers, which are the gold standard of scientific productivity. We must unbind the sharing of scientific knowledge from the traditional journal format and explore radically creative new ways to communicate with our colleagues. Software is one obvious solution.
The reality is that the scientific paper is outdated. It first appeared in the 1600s with the rise of scientific journals, and its basic format — a static document with text, figures, and references — has not fundamentally evolved in the intervening centuries. That format was chosen because, given the technology of the time, it was the most efficient way to share scientific knowledge. In 2020, that is no longer the case.
Scientific discoveries are a mix of raw data and inferences made from that data. A major challenge is transforming the data into knowledge. As pointed out 30 years ago by the geologist Jon Claerbout and articulated by the researchers Jonathan B. Buckheit and David L. Donoho, “an article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship.” They pointed out that computer code, or software, is the scholarship itself. It generates summaries and figures that our brains can digest.
That’s why, rather than maximizing their publication output, scientists should spend more time and effort on software development. This could mean contributing to existing open-source projects on which their research depends or developing new software related to their research. In both activities, it’s crucial to recognize that “software,” as opposed to just “code,” includes many additional components, such as documentation, examples, and tests, which provide rich context for the code itself.
For example, an oceanographer who develops a new way to analyze satellite data could share that knowledge with the community via a software package that includes not only the basic code to perform the calculation but also comprehensive online documentation about the theory and implementation; a gallery of examples, including real-world scientific analyses; and a test suite to verify correctness. That contribution would include all the information contained in a scientific paper, plus so much more. Packaging knowledge in this way renders it immediately usable and extensible by the entire scientific community.
Beyond the actual code and documentation, the open-source software-development world offers many benefits, both technical and cultural, to the general scientific practice.
First and foremost, software development is more collaborative and less competitive than mainstream science. That makes for a healthier work environment. Open-source, community-driven projects often involve dozens of contributors, most of whom have never met in person, working in close collaboration toward a shared goal. The work itself — discussing issues, debugging problems, and writing and reviewing code — involves fast-paced interactions with rapid feedback and progress on time scales of weeks or months.
Success is measured by the level of use and impact of the project as a whole. Scientists who become involved in open-source development often comment that it’s simply more gratifying, and less lonely, than writing scientific manuscripts. It becomes the part of their job they enjoy the most.
Second, software development offers a set of practices, broadly termed “continuous integration,” that automate many parts of the development process using cloud computing. That can help with replicability. For example, when a scientist proposes a new contribution, tests are automatically run that verify that it meets all of the project’s requirements in terms of correctness, documentation, code style, etc. That makes it easier to trust new contributors, expanding the community and enhancing the collaborative atmosphere.
Continuous-integration practices could be adapted to scientific research, permitting us to build scientific artifacts that are continuously updated by many different contributors to incorporate the latest data and methods, a stark contrast to the static nature of traditional publications. Rather than a few individuals publishing isolated nuggets of insight, continuous integration could allow us to collaborate on more-ambitious, large-scale projects. Using technologies such as Binder, an open-source tool that allows anyone to recreate a researcher’s technical environment, we could make reproducibility a built-in property of all our scholarly communications.
Beyond the potential to transform and modernize our scientific-research process, an increased emphasis on software over publication would also help science students pursue alternative careers. Students in many fields, for example astronomy, oceanography, and genomics, are often already proficient in analyzing large, complex data sets. That’s an attractive skill to many employers.
But writing scientific papers is a niche skill of limited value outside the academy. Equipping students with better software-engineering skills would help them transition more easily to industry roles in tech and engineering. Students who could show a track record of contributing to open-source projects while still in graduate school would have a great way of demonstrating their expertise.
There are only 24 hours in a day; if scientists spend more time on software, they will have to spend less time on something else (e.g., writing papers). To use a software-derived phrase, this is a feature, not a bug, of my proposal. However, reducing the value of publication as the currency of scientific productivity and placing a greater emphasis on software would not be an easy transition. We would have to figure out how to evaluate the impact of software. That’s a thorny problem, especially given that software isn’t created in a bubble — it depends strongly on other people’s technological contributions.
It’s also not clear that simply “citing software,” as we now do for publications, is the solution. Search committees would have to become more informed about the scientific-software landscape and culture in order to directly assess a candidate’s software contributions. Such radical changes in perspective may require generational shifts.
There will be challenges to overcome, but an increased focus on software could be the solution to many of modern science’s crises. Besides making our discoveries more robust, it could allow us to have more fun, feel less lonely, and broaden our career possibilities.