Sharing Data
- Does project funding require your data to be shared or publicly accessible?
- When and where do you intend to publish or distribute your data?
- How do I cite data?
- Are there issues with privacy or intellectual property?
Does project funding require your data to be shared or publicly accessible?
In order to promote open access to research data, many funding agencies now require that research data produced as part of a funded project be made publicly available.
- National Institutes of Health (NIH) Data Sharing Policy: Supports the sharing of research data and expects many researchers to include a data sharing plan in their grant proposals.
- National Science Foundation (NSF) Engineering Directorate. Data Management for NSF Engineering Directoracte Proposals and Awards. "An appropriate Data Management Plan must be provided as a supplementary document (maximum of two pages) for all research proposals..."
- National Science Foundation (NSF) Division of Social and Economic Sciences Data Archiving Policy: "Grantees from all fields will develop and submit specific plans to share materials collected with NSF support, except where this is inappropriate or impossible."
- NSF Grant Proposal Guide: Includes an expectation that PIs share their research data.
Researchers can comply with these requirements by depositing their data into one of the many available data repositories. For tips on creating a data sharing plan, see the NIH examples of data sharing plans.
When and where do you intend to publish or distribute your data?
You can share your data easily by emailing it to requestors, or posting it to a website, Google, Amazon or Microsoft. However, this method of sharing makes it difficult for people to find your data. Depositing your data in an archive will facilitate its discovery and preservation.
Publish Your Data in a Repository
Any Discipline | Science and Engineering | Social Sciences | Arts and Humanities
Note: Not all of the repositories listed can ensure long-term preservation of your data; contact each one for more details. This list contains suggestions and is not necessarily complete. For a more complete list of data repositories, see these sites:
- Simmons University. Data Repositories
Data Created at UC (Any Discipline)
Merritt — a new cost-effective repository service from the University of California Curation Center (UC3) that lets the UC community manage, archive, and share its valuable digital content. Use Merritt to provide long-term preservation of digital assets, share your research with others or meet the data sharing and preservation requirements of a grant-funded project. For more information contact UC3.
EScholarship — an open access publishing platform that offers UC departments, centers, and research units direct control over the creation and dissemination of the full range of their scholarship, including working papers, peer-reviewed journals, monographic series, paper/seminar series, postprints, and conference proceedings. Contact the CDL Publishing Group for more information.
Science and Engineering
- Archaeology
- Astronomy
- Atmospheric Science
- National Center for Atmospheric Research — Scientific datasets of historical value are curated and catalogued at the Mass Storage System
- Life Sciences
- Dryad — Dryad is an international repository of data underlying peer-reviewed articles in the basic and applied biosciences.
- Protein Data Bank — An Information portal to biological macromolecular structures
- UniProt — Submit a new protein sequence to UniProtKB using SPIN, a web-based tool for submitting directly sequenced protein sequences to the Universal Protein Resource (for new nucleotide submissions, use EMBL's WEBIN instead).
- Chemistry
- PubChem — provides information on the biological activities of small molecules. It includes substance information, compound structures, and bioactivity data in three primary databases, PCSubstance, PCCompound, and PCBioAssay, respectively.
- Computer Science
- Cooperative Association for Internet Data Analysis (CAIDA) — provides tools and analyses promoting the engineering and maintenance of a robust, scalable global Internet infrastructure.
- Earth Science
- GEON — Portal for sharing, publishing, and integrating data.
- Oceanography
- National Oceanographic Data Center — NODC is an organization made up of the Oceanographic Data Center, National Coastal Data Development Center, World Data Center for Oceanography, and the NOAA Central Library, integrated to provide access to the world's most comprehensive sources of marine environmental data and information.
- Snow and Ice
- National Snow and Ice Data Center — NSIDC archives cryospheric data. NSIDC acknowledges all data providers in do cumentation, metadata (Directory Interchange Format (DIF)), and references, and is also willing to hold or restrict data distribution until providers publish.
- Space Science
- National Space Science Data Center — NSSDC accepts data from: active archives in the space sciences funded through the Science Missions Directorate, missions in that same directorate, and individual scientists (mission or instrument principal investigators).
Social Sciences
- Inter-university Consortium for Political and Social Research (ICPSR) — The world's largest archive of digital social science data. ICPSR staff can guide you in preparing your data for archiving and distribution. See their Guide to Social Science Data Preparation and Archiving and their page on Depositing Data.
Arts and Humanities
- Cultural Policy and the Arts National Data Archive — the world's first interactive digital archive of policy-relevant data on the arts and cultural policy in the United States. It is a collaborative effort of Princeton University's Firestone Library and the Princeton Center for Arts and Cultural Policy Studies, with support from the Pew Charitable Trusts.
How do I cite data?
See the DataCite Metadata Schema Repository for recommendations on what information to include and how to format it.
A quick summary: when formatting a data citation for human readers, a
complete citation should include this information in the following form:
[Creator] ([PublicationYear]): [Title]. [Publisher]. [PidInURLform]
where [Publisher] is the data archive that holds the data and
[PidInURLform] represents a persistent identifier in actionable form
(i.e., embedded in a URL). Here are some examples:
- Denhard, Michael (2009): dphase_mpeps: MicroPEPS LAF-Ensemble run by DWD for the MAP D-PHASE project. World Data Center for Climate.
http://dx.doi.org/10.1594/WDCC/dphase_mpeps - Manoug, J L (1882): Useful data on the rise of the Nile. Alexandria : Printing-Office V Penasson.
http://n2t.net/ark:/13960/t44q88124
Creator names in non-Roman scripts should be transliterated using the ALA-LC Romanization Tables.
Privacy and Intellectual Property
When publishing data, it is vital to consider the rights and responsibilities you have with regard to issues of confidentiality and intellectual property.
Confidentiality
It is vital to maintain the confidentiality of research subjects for reasons of ethics and to ensure the continuing participation in research.
- Evaluate the sensitivity of your data: Researchers should consider whether or not their data contains either direct or indirect identifiers that could be utilized with other public information to identify research participants.
- Obtain a confidentiality review: A benefit to depositing your data with ICPSR is that their staff will review your data for the presence of confidential information.
- Comply with UC regulations: UC researchers concerned about confidentiality issues with their data should consult the appropriate UC Requirements and Guidance for Conducting Research Involving Human or Animal Subjects.
- Comply with regulations for health research: HIPPA Privacy Rule, Information for Researchers.
- Enable restricted use of your data: Do you want to make your data available in a more restricted, limited-access manner? The ICPSR DSDR program has resources for data producers, including a tool for Designing a Restricted Data Use Contract.
- Learn about guidelines from the National Academy of Engineering: Their Online Ethics Center includes a discussion of Ethical Issues in Data Management.
Intellectual Property Issues
Sharing data that you produced/collected yourself:
- Data are not copyrightable (yet a particular expression of data can be, such as a chart or table in a book).
- Data can be licensed; some data providers apply licenses that limit how the data can be used, such as to protect the privacy of participants in a study or guide downstream uses of the data (e.g., forbidding for-profit use).
- If you want to promote sharing and unlimited use of your data, you can make your data available under a Creative Commons CC0 Declaration to make this explicit.
Sharing data that you have collected from other sources:
- You may or may not have the rights to do so, depending upon whether that data were accessed under a license with terms of use.
- Most databases to which the UC Libraries subscribe are licensed and prohibit redistribution of data outside of UC. For more information on terms of use for databases licensed by the Libraries, contact UC3.
If you are uncertain as to your rights to disseminate data, UC researchers can consult with your campus Office of General Council. Note: Laws about data vary outside the U.S.
For a general discussion about publishing your data, applicable to many disciplines, see the ICPSR Guide to Social Science Data Preparation and Archiving (pdf).
Credit to MIT Libraries for permission to use and adapt their
pages and to members of the UC3 community.
Please send us any comments about these guidelines.

