Jump to Content
DMPTool logo

Try the DMPTool to create a data management plan.

Why Document Data?

Ensuring that data be understood, interpreted and used, requires clear and detailed data documentation. Sharing data for long-lasting usability would be impossible with out documentation (also known as metadata) .

It is important to begin to document your data at the very beginning of your research project and continue throughout the project. By doing so will make data documentation easier and reduce the likelihood that you will forget aspects of your data later in the research project. Don't wait until the end to start to document your research project and its data!

What to Document?

Research Project Documentation
  • Context of data collection
  • Data collection methods
  • Structure, organization of data files
  • Data sources used (see citing data)
  • Data validation, quality assurance
  • Transformations of data from the raw data through analysis
  • Information on confidentiality, access & use conditions
Dataset documentation
  • Variable names, and descriptions
  • Explanation of codes and classification schemes used
  • Algorithms used to transform data
  • File format and software (including version) used

How will you document your data?

In order for your data to be used properly by you, your colleagues, and other researchers in the future, the data must be documented. Data documentation (which includes metadata) enables you to describe the content, formats, and internal relationships of your data in detail and will enable other researchers to find, use and properly cite your data.

It is critical to start documenting your data at the very beginning of your research project, before data collection begins. Doing so will make documentation easier and reduce the likelihood that you will forget aspects of your data later in the research project.

Researchers can choose among various metadata standards. Some metadata standards are designed for the purpose of documenting the contents of files, others for documenting the technical characteristics of files, and yet others for expressing relationships between files within a set of data. It is important to establish a metadata strategy that is capable of describing your data and satisfying your data management needs. For assistance in defining an adequate metadata strategy, please contact uc3@ucop.edu.

Below are some general aspects of your data that you should document, regardless of your discipline. At minimum, store this documentation in a "readme.txt" file, or the equivalent, with the data itself. You can also reference a published article that may contain some of this information.


General overview Title Name of the dataset or research project that produced it
Creator Names and addresses of the organizations or people who created the data; preferred format for personal names is surname first (e.g., Smith, Jane).
Identifier Unique number used to identify the data, even if it is just an internal project reference number
Date Key dates associated with the data, including: project start and end date; release date; time period covered by the data; and other dates associated with the data lifespan, such as maintenance cycle, update schedule; preferred format is yyyy-mm-dd, or yyyy.mm.dd-yyyy.mm.dd for a range
Method How the data were generated, listing equipment and software used (including model and version numbers), formulae, algorithms, experimental protocols, and other things one might include in a lab notebook
Processing How the data have been altered or processed (e.g., normalized)
Source Citations to data derived from other sources, including details of where the source data is held and how it was accessed
Funder Organizations or agencies who funded the research
Content description Subject Keywords or phrases describing the subject or content of the data
Place All applicable physical locations
Language All languages used in the dataset
Variable list All variables in the data files, where applicable
Code list Explanation of codes or abbreviations used in either the file names or the variables in the data files (e.g. '999 indicates a missing value in the data')
Technical description File inventory All files associated with the project, including extensions (e.g. 'NWPalaceTR.WRL', 'stone.mov')
File Formats Formats of the data, e.g., FITS, SPSS, HTML, JPEG, etc.
File structure Organization of the data file(s) and layout of the variables, where applicable
Version Unique date/time stamp and identifier for each version
Checksum A digest value computed for each file that can be used to detect changes; if a recomputed digest differs from the stored digest, the file must have changed
Necessary software Names of any special-purpose software packages required to create, view, analyze, or otherwise use the data
Access Rights Any known intellectual property rights, statutory rights, licenses, or restrictions on use of the data
Access information Where and how your data can be accessed by other researchers

Credit to MIT Libraries for permission to use and adapt their pages and to members of the UC3 community.
Please send us any comments about these guidelines.

Creative Commons License

Last updated: April 30, 2013
Document owner: Perry Willett