Research Data Management
What is research data, what is its lifecycle, what is research data management, why should we apply it to our research and what barriers can you face?
3. Benefits and Barriers
1. Data Security
The most important benefit of RDM is that you can secure your data. By making an effective research data management plan, you minimize data loss and unauthorized access by adhering to data storage or organization standards. You also reduce the risk of losing the integrity of data either through accident or negligence.
2.Efficient Collaboration
The second most important benefit of RDM is collaboration, especially in an age where research is more complex, with more moving parts. But this is an advantage, as there is a positive correlation between the number of authors in a study compared to those with only one (Lamberts, 2013). Making data accessible for everyone in the group, even those not in the team but in the same discipline can open up massive opportunities to further your own research.
Plus, good RDM routines also improve the efficiency of data access. An organized data directory structure, for example, can make contributing data or building upon the existing dataset much easier. Efficient data organization also makes keeping tabs on the progress of the project much more seamless and puts accountability front and center.
3.Reproducibility of ResearchThe benefit of RDM to the entire scientific community is to enable you to replicate your research and validate your procedures and results. This will prevent accusations of data manipulation and various other ways of falsifying scientific results (case)
4.Higher citation rate
According to a studies (Piwowar and Vision, 2013; Fu et al., 2023; Colavizza, 2020), open data benefits from higher citation rates and thus increases the value and impact of the research beyond the end of the project.
5. Societal benefits
By publishing your data, you give people outside the field and outside science the opportunity to look at your results, perhaps from a completely different perspective, which can lead to new discoveries and innovations (case).
Barriers
1. Limited incentives to give evidence against yourself
Putting your code and data online can be very revealing and intimidating, and it is part of the human condition to be nervous of being judged by others. Although there is no law governing the communication of reproducible research - unless you commit explicit fraud in your work - sharing errors that you find in your work is heavily disincentivised.
Giving evidence against yourself, particularly if you find mistakes in published material, is difficult and stressful. But we need to balance that individual cost against the fact that releasing code can help other researchers provide feedback, learn and may help them in their research. In fact, you will almost certainly find that publishing your code and data documentation motivates you to conduct your analyses to a higher standard. Being careful about what you write down, and documenting your decisions, can also help generate new ideas for yourself and for others.
Most importantly, we need to move away from a culture where publishing nothing is safer than publishing something.
2. Publication bias towards novel findings
Scientific journals publish articles with significant conclusions and findings, putting pressure on scientists to process the data until they get the desired result that can then be published. Studies that produce negative results, or replications of the studies themselves, are then often rejected and occupy only a fraction of the journals compared to the positive ones. Too many different researchers ask the same question, don't get the answer they expect or want, and then don't tell anyone what they found. This is one of the major cultural barriers to transparent communication. We need to discuss and advocate for systemic changes in academia that will lead to the elimination of the current publishing and academic meritocracy that favours novelty over rigour.
3. Held to higher standards than others
A researcher who makes their work reproducible by sharing their code and data may be held to a higher standard than other researchers. If authors share nothing at all, then all readers of a manuscript or conference paper can do is trust (or not trust) the results.If code and data are available, peer reviewers may go looking for differences in the implementation. They may come back with new ideas on ways to analyse the data because they have been able to experiment with the work. There is a risk that they then require additional changes from the authors of the submitted manuscript before it is accepted for peer review. As we described in the previous text, the solution to this challenge is to align career incentives so that doing what is best for science also benefits the individuals involved.
4. Takes time
Making an analysis reproducible takes time and effort, particularly at the start of the project. This may include agreeing upon a testing framework, setting up version control such as a Github repository and continuous integration, and managing data. Throughout the project, time may be required to maintain the reproducible pipeline. However, this time is often amply compensated by the savings in finding the data, explaining its content and then modifying it during, at the end or after the project.
Take as a thought experiment a reviewer asking for “just one more analysis” when the publication has been submitted to a journal. In many cases, this request will come 6 to 12 months after the research team have worked with the raw data. It can be very hard to go back in time to find the one part of the pipeline that the reviewer has asked you to change. If the work is fully reproducible, including version-controlled data and figure generating code, this analysis will be very fast to run and incorporate into the final research output. The analysis pipeline can be easily adapted as needed in response to co-author and reviewer requests. It can also be easily reused for future research projects.
5. Requires additional skills
You - or someone in your team - might need to develop expertise in data engineering, research software engineering, technical writing for documentation or project management on GitHub. That is a major barrier when the current incentive structures are not aligned with learning these skills. In this course, we hope to guide you and help you learn some of these valuable skills and, together with the entire movement around RDM, work towards a paradigm shift in the evaluation of scientific results.