What is research data management?
Research data management (RDM) refers to organising, documenting, formatting and storing your research data throughout your research project, in ways that support its discoverability, potential sharing and re-use, and preservation. These practices are informed by legal, statutory, ethical and funder requirements. This management is applied from the planning stages of a project through to day-to-day management and long-term preservation and sharing.
What we consider to be research data
The University of Surrey considers research data to be any material collected, observed, processed, or created for the purpose of analysis and on which research findings and outputs are based. This includes data and documentation which are commonly accepted in the scholarly community as necessary for validation or replication of research findings. Research data may be in digital or non-digital formats. This could include:
- Audio, video, and images or photographs
- Text documents and spreadsheets
- Code, scripts, algorithms, models, and software
- Protocols and methodologies
- Specimens and samples
- Collections of digital objects
- Lab notebooks, field notes, and diaries
- Questionnaires and codebooks
- Interview schedules and transcripts
- Test responses
- Slides, artefacts, specimens, samples
Why manage your research data?
At its heart, good research is good research data management. Having a strategy for how you are going to manage your data and documentation during your project will make every stage of research easier and more secure, especially when it comes to sharing and preserving your data for verification and reuse.
Ways to manage research data
Consider including some of the below in your data management plan.
Keeping your files organised is often easier said than done. A simple file naming convention can help you quickly know what your files contain without having to open them. Common strategies for naming files:
- Use a consistent structure for sorting that makes sense for your project
- Names should be descriptive, but not too long
- Don’t embed important information for files in folders (e.g. sample01test04.txt, NOT sample01/test04.txt)
- Use numerical dates in YYYYMMDD or YYMMDD order
- Avoid spaces, full stops, or special characters in file names.
Managing research inevitably means managing versions of files. Often software has version control built into it. If not, then the simplest strategy is to use the file name to indicate the version. Whole numbers (1, 2, 3) can be used for major changes and decimals for minor changes (1.1, 2.1, 3.2). For collaborative documents you can also include a log at the beginning of the file to record who and what changes have been made to the document and when. Take a look at an example of version control.
For software and code development, GitHub is a popular free option aimed at open collaboration and sharing of code. Surrey also has its own GitLab if you need the same functionality with additional privacy or security. Learn a bit more about getting started with GitHub.
One of the best ways to keep track of versions is by deleting files. If you want to keep some earlier versions of work, then only keep major changes and delete any minor change versions. Deleting unnecessary files is some of the best data management you can do.
For more information, check out Software Carpentry’s video on data management and version control and Edinburgh’s online module. Have evolving datasets with many versions? Datastorr for R might be able to help you develop a workflow for maintaining and distributing successive versions of datasets.
Before beginning it’s worth thinking about what file formats you will be creating and using during your research. You may want to consider:
- The quality, size, and compression needed
- Whether it is a widely used format in your field
- How easily the format can be shared, transformed, or exported
- The long-term viability of the format.
If possible, try to use open and non-proprietary formats or widely used formats to make your data more openly accessible and reduce their risk of becoming obsolete. York has good advice on how to future proof your file formats.
Data repositories often have recommended formats for deposit. If you have a repository in mind, check if they have any deposit requirements. For example, you can see the UK Data Service’s recommended formats for deposit.
Documentation is the foundation of good research and should be started early. It makes your research understandable, verifiable, and reusable – first for you and then for others. Imagining these future users can help you assemble the best documentation for your project.
It can be embedded within research files, like in code, scripts, headers, summaries, label descriptions or built-in program documentation. One of the best ways to ensure the quality of your data is to automate your data creation or analysis as much as possible, which in turn becomes indispensable documentation. Take a look at an example from biology.
Documentation exists at several levels:
- Project or study level: For example research questions, methods, instruments, and the context of data collection and analysis
- File or data level: For example what each file contains, how files relate to each other, the components, structure, and logic of data files
- Variable level: For example code books or data dictionaries with definitions of variables, ranges, etc
- Metadata level: For example structured descriptions of a study or dataset consisting of defined elements to facilitate discovery and reuse, usually created as part of a data repository deposit; sometimes discipline specific. (All of Surrey’s shared and preserved data must have a metadata record in our repository).
- Creating a README file for your project
- Using electronic lab notebooks (Cambridge’s guide and comparison table)
- Registered Reports
- Publishing your protocols.
Storage and collaboration
Two of the biggest risks to research data are accidental loss or unauthorised access. We can mitigate those risks by adopting a few simple practices for storing our data.
Use University storage
During active research the best place to house your data is on University storage, where it will be regularly backed up and subject to greater access controls. This includes the University’s SharePoint or OneDrive software.
If you have collaborators and are concerned about having stronger access controls to your files, then SharePoint can afford you more protection and control over files than OneDrive.
Surrey Drop-off can be used for temporary one-off transfers of large files between you and your collaborators.
Local hard drives, portable storage devices, laptops, and tablets are NOT recommended for research data storage. Don’t use third party cloud storage (e.g. Dropbox, Google Drive, etc.) – especially if you have sensitive data. These are not as secure or as protected as University storage.
If you don’t have reliable access to the internet during your research, then regularly sync your working copy with a master copy on university storage. Consider building in a schedule for syncing your data as part of your research workflow.
If using an approved third-party online data collection tool (such as Qualtrics or Gorilla), be sure to move the data to University storage and delete it from the tool as soon as possible, i.e. at the close of collection or end of project.
For more check out data protection pages on privacy, security, and information management.
- UK Data Service’s guide to managing and sharing data
- Qualitative Data Archive’s Managing Qualitative Data module
- JISC research data management toolkit
- Messy data? Try Open Refine
- Data and software carpentries curricula
- Software Sustainability Institute's top tips
- PLOS best practices in research reporting.