Try the DMPTool to create a data management plan.
What kinds of data are we talking about?
Data comes from many different sources, but there are typically four main categories that it can be sorted into for management purposes. The category you choose will then affect the choices that you make throughout the rest of your data management plan.
- Captured in real-time
- Usually irreplaceable
- Examples: Sensor readings, telemetry, survey results, images
- Data from lab equipment
- Often reproducible, but can be expensive
- Examples: gene sequences, chromatograms, magnetic field readings
- Data generated from test models
- Models and metadata, where the input more important than output data
- Examples: climate models, economic models
Derived or compiled
- Reproducible (but very expensive)
- Examples: text and data mining, compiled database, 3D models
What's the general form and stability of the data?
Data can come in many forms: text, numeric, image, multimedia, models, software, discipline-specific (e.g., FITS in astronomy, CIF in chemistry), and instrument-specific. Data can also be fixed or changing over the course of the project (and perhaps beyond the project's end). Do the data ever change? Do they grow? Is previously recorded data subject to correction? Will you need to keep track of data versions?
The form and stability of your data will inform important subsequent decisions regarding
- data file formats
- data file naming
- external persistent identifiers
- data sharing and long-term archiving
How much data will the project produce?
For instance, image data typically require a lot of storage space, so you'll want to decide which of your images, if not all, you want to retain, and where such large data can be housed. You'll want to be sure to know your archiving organization's capacity for storage and backups.
To avoid being under- or over-prepared, it is wise to estimate the growth rate of your data. Are you manually collecting and recording data? Are you using observational instruments and computers to collect data? Is data collection highly iterative? From the start of the project to its conclusion, how much do you expect the data store to increase over regular intervals, say every month or every 90 days? How much data do you anticipate collecting and generating by the end of your project?
How often will the data change or be updated?
The answer to this question affects how you organize the data as well as the level of versioning you will need to undertake. Keeping track of rapidly changing datasets can be a challenge, so it is imperative that you begin with a plan to carry you through the entire data management process.
Credit to the University of Virginia's Scientific Data Consulting Group and the MIT Libraries for permission to use and adapt their data management planning pages, and to members of the UC3 community. Please send us any comments about these guidelines.