What is research data?
The University of Surrey considers research data to be any material collected, observed, processed, or created for the purpose of analysis and on which research findings and outputs are based. This includes data and documentation which is commonly accepted in the scholarly community as necessary for validation or replication of research findings. Research data may be in digital or non-digital formats. This could include:
- Audio, video, and images or photographs
- Text documents and spreadsheets
- Code, scripts, algorithms, models, and software
- Protocols and methodologies
- Specimens and samples
- Collections of digital objects
- Lab notebooks, field notes, and diaries
- Questionnaires and codebooks
- Interview schedules and transcripts
- Test responses
- Slides, artefacts, specimens, samples
Why share your data?
Sharing data that underpins conclusions is at the heart of academic inquiry. Data sharing for verification and reuse can catch errors earlier, foster innovative uses of data, and push research forward faster and more transparently to the benefit of the field. Beyond academia, data can be used to the benefit of policy makers, entrepreneurs, and the public. There’s also evidence that sharing data leads to more citations, greater visibility of your work, and potential collaborations and opportunities. For more check out these five selfish reasons to work reproducibly.
Of course, not all data is suitable to share openly. Instead, data can be shared with a range of appropriate restrictions. Be sure you have consent or permission from your participants, collaborators, partners, or supervisor before sharing any data. Once you have identified which data is shareable, you should apply appropriate safeguards. If your data cannot be shared, but has long-term value, then it should be preserved.
Data sharing should strive to be as “as open as possible, as closed as necessary.” Ask yourself: what data is necessary for verifying your findings or which data could be reused? Creating data that is easily verifiable or reusable will require some planning and preparation. It’s best to plan for data sharing and build it into your research project before you start. (A data management plan is a good way to do this.)
Of course, you will want to make sure you have permission to share data from your project. Be sure to include data sharing in your consent forms. Check out the UK Data Service’s advice on consent forms for data sharing. If you have an industry partner or other collaborators, you should agree on any data sharing before the project begins.
In most cases, the best place to share data is through a data repository. These are online platforms designed to hold and disseminate research data. Some are discipline specific and others take all types of data. Repositories provide several advantages over trying to share data yourself, they can:
- Rank highly in search engine results
- Provide a DOI for your data for publications and citations
- Track view/download counts
- Allow versioning
- Facilitate access requests
- Provide long-term storage of your data.
Option 1: Identify a suitable external repository
- Does your funder require or recommend a particular repository? Some funders have their own platforms or recommend certain repositories, like Wellcome Open Research, Gates Open Research, and ESRC’s UK Data Service
- Is there is a repository typically used in your research discipline? Public platforms like Zenodo and Open Science Framework accept all types of data. Some publishers may recommend certain repositories.
Please note: When you share your data externally, you will need to create an official University record of where the data is held.
Option 2: Use Surrey’s Open Research repository
If an external repository is not recommended, use Surrey's Open Research repository, which accepts a wide variety of research outputs.
Whether you are creating a university record indicating the external location of the data or uploading your datasets in the University repository, follow the steps below:
- Visit the Open Research repository
- On the top right corner, select Surrey Researchers sign in (use your university username and password)
- Once logged in, select the 'add content' button (top right corner)
- Select “asset type”. By 'asset’ the system means type of research output (for example, article, book, etc). Select ‘dataset’
- If you are creating a record of your dataset, go to ‘Add links to 'files' to indicate where the dataset is
- If you are uploading your dataset directly, drop or select the file to upload
- Remember to add the DOI if your dataset already has one, or reserve one in the repository if your data doesn’t have a DOI
- Please create a record of your data even in cases where the datasets cannot be shared
- If you have specific requirements for your data or would like more guidance, contact email@example.com.
Data can be shared anytime! Some disciplines share data almost immediately. Others tend to do it alongside a publication. Some funders suggest specific timelines for sharing data usually tied to publications, project end dates, or norms within your discipline.
Your journal may stipulate a timeframe for data sharing as a condition for publication. Surrey’s own policy (PDF) requires sharing data that underpins publication within 12 months (or sooner if required by funders).
If you don’t have a funder or your funder doesn’t specify a timeline, then follow Surrey’s policy. Exceptions to funder expectations and Surrey’s policy should be outlined and justified in the project’s data management plan.
We recommend the following best practices when sharing your data to make it easier to find and use. Of course, your data should be well organised, labelled, and accompanied by sufficient documentation. In addition:
- Create a README file for shared data
- Use an appropriate data repository or Surrey’s repository
- Get a DOI for your data (available from repositories as part of deposit)
- Include a data availability statement in your publications (and your data’s DOI)
- See section below
- Apply a licence to your data. Some funders recommend specific licences.
One way to gauge your data sharing practices is to ask if it’s “FAIR” or findable, accessible, interoperable, and reusable. The FAIR principles outline best practices for how to share data. The CARE Principles for Indigenous Data Governance provide a complimentary set of people-focused best practices.
‘Open data’ encompasses a continuum of sharing practices allowing researchers the flexibility to balance transparency and appropriate protections for their data. Data repositories have a range of access controls that can be applied to sensitive data. Some data repositories can even handle very sensitive data, like the UK Data Service, which accepts clinical trial data. Depending on the sensitivities of your data your open data practices might include:
- Sharing a mix of openly available and restricted data
- Transforming the data to make it more shareable, e.g. de-identification or aggregation
- Restricting access and setting terms of access, e.g., only bone-fide researchers
- Creating synthetic data with the same characteristics as your data
- For verification purposes only, subject to a non-disclosure agreement
- Creating a publicly discoverable metadata record outlining what data is held and why it is not accessible.
Funders recognise that data may need to be restricted for commercial reasons. If commercial data can’t be transformed to make it more shareable, then consider making the data available only for verification purposes under a non-disclosure agreement. This meets open data expectations around transparency and verification of published findings.
If your data has commercial potential, please ensure that you have read and followed the University’s Intellectual Property Code, and contact the Technology Transfer Office (firstname.lastname@example.org).
While not all data may be suitable for sharing immediately, any data with long-term value should be preserved. To increase the likelihood of survival and reusability, data preservation should address the following:
- Preparing your data for preservation, including:
- Considering the cost of preserving data
- Identifying what can be discarded
- Good documentation and file organisation
- If your data isn’t in widely used formats, consider transforming it into open formats.
- Finding a home for your data:
- Data already in a data repository? They may have preservation policy
- Surrey can accept data for long term preservation through Surrey’s Open Research repository.
- Timelines for retention:
- Your data may be subject to statutory or funder requirements for preservation
- Surrey requires research data be retained for a minimum of ten years.
Please note: USB sticks, external storage, personal laptops, project websites, and local hard drives are not suitable for long term preservation.
Physical data with long-term value should also be preserved. If you can’t make a digital surrogate of the physical data, then you can create a metadata record in Surrey’s repository indicating what physical objects are held and how they can be accessed.
Digital Curation Centre has a useful guide for preservation, five steps to decide what data to keep, and Jisc’s Research Data Management Toolkit includes a section on preservation. Software Sustainability Institute offers guidance on software preservation.
Data availability statements should include:
1. Terms of access (if any).
2. Persistent identifier (e.g. DOI) linking to data in a repository; or where the data can be found (e.g. a third party).
3. If the data is restricted, a statement justifying why
4. If there is no data or all the data required to verify the findings appears within the publication, then the statement can simply say that there is no data or that the data appears within the publication.
The data underlying this article are available in [repository name, e.g. the xxxx Repository], at https://dx.doi.org/[doi, or give [URL]
The data underlying this article were derived from sources in the public domain: [list sources, including URLs]
- This publication is supported by multiple datasets that are openly available at locations referenced in this paper.
If the data is already included in the paper:
The data underlying this article are available in the article / in the online supplementary material.
- The data underlying this article are subject to an embargo of [period of embargo of X months from the publication date of the article] to allow for commercialisation of the results. Once the embargo expires the data will be available [give details of availability, e.g. in a repository plus embargoed link; upon reasonable request, etc.]
- The data underlying this article cannot be shared publicly due to [briefly describe why the data cannot be shared, e.g. for the privacy of individuals that participated in the study]
- The data underlying this article were provided by [third party] under licence / by permission. Data will be shared on request to the corresponding author with permission of [third party].
No data were created, collected or analysed in this study.
- Data Sharing - a UKRN animated primer
- Top Ten Tips for Doing Open Science
- Qualitative Data Archive’s Sharing Qualitative Data module
- Opening up and Sharing Data from Qualitative Research: A Primer
- Making data meaningful: guidelines for good quality open data
- Data sharing practices and data availability upon request differ across scientific disciplines
- The Qualitative Transparency Deliberations: Insights and Implications
- Qualitative Data Sharing: Participant Understanding, Motivation, and Consent.