Jump to Content
DMPTool logo

Try the DMPTool to create a data management plan.

What are the issues around file formats?

A major irony about file format standards is that there are so many standards to choose from. This means that it is important to think carefully about what format will be best for managing, sharing, and preservation of your data. How you choose to represent your data is a primary factor in someone else's ability to use your data in the future.

As technology continually changes, all contemporary hardware and software are expected to become obsolete. How will your data be read if the software used to produce it becomes unavailable?

Your data can also be at risk for non-technical reasons. Data may be effectively lost if it was encrypted with a key that has been lost (e.g., a forgotten password). Data that is legally encumbered may also be considered lost. The same holds for data that merely comes with ambiguous or unknown access and archiving rights or licensing because the cost of clarifying the rights situations is often prohibitive.

Formats more likely to be accessible in the future are:

  • Non-proprietary
  • Open, documented standards
  • In common usage by the research community
  • Using standard character encodings (ASCII, UTF-8)
  • Uncompressed (desirable, space permitting)

Strongly discouraged (absent strong counter argument) are:

Examples of preferred format choices:

  • Image: JPEG, JPG-2000, PNG, TIFF
  • Text: HTML, XML, PDF/A, UTF-8, ASCII
  • Audio: AIFF, WAVE
  • Containers: TAR, GZIP, ZIP
  • Databases: prefer XML or CSV to native binary formats

Examples of discouraged format choices:

  • Word (prefer PDF)
  • Quicktime (prefer MPEG-4)
  • GIF (which uses proprietary compression)

For more information on recommended formats, see the CDL Digital File Format Recommendations.

Credit to the University of Virginia's Scientific Data Consulting Group and the MIT Libraries for permission to use and adapt their data management planning pages, and to members of the UC3 community. Please send us any comments about these guidelines.

Commons License

Last updated: April 30, 2013
Document owner: Perry Willett