Organizing your data
- What file formats will be produced for your project and what kinds of data management risks do they present?
- How will you organize your files into directories and what naming conventions will you apply to both?
File Formats for Long-Term Access
The file format in which you keep your data is a primary factor in one's ability to use your data in the future.
As technology continually changes, researchers should plan for both hardware and software obsolescence. How will your data be read if the software used to produce them becomes unavailable?
Formats more likely to be accessible in the future are:
- Open, documented standard
- In common usage by research community
- Use standard character encoding (ASCII, UTF-8)
Consider migrating your data into a format with the above characteristics, in addition to keeping a copy in the original software format.
Examples of preferred format choices:
- Images: JPEG, JPG-2000, PNG, TIFF
- Texts: HTML, XML, PDF/A, UTF-8, ASCII
- Audio: AIFF, WAVE
- Containers: GZIP, ZIP
Directories, Files and Version Naming Conventions
Directory Structure Naming Conventions
When organizing files, top-level directory/folder should include the project title, unique identifier, and date (yyyy or yyyy.mm.dd).
The sub-directory structure should have clear, documented naming conventions. Separate files or directories could apply, for example, to each run of an experiment, each version of a dataset, and/or each person in the group.
File Naming Conventions
- Reserve the 3-letter file extension for application-specific codes, for example, formats like .wrl, .mov, and .tif.
- Identify the activity or project in the file name.
- Identify separate versions of files and datasets using file or directory naming conventions. Record all changes to a file no matter how small. Discard obsolete versions after making backups.
Tools to help you:
- http://www.bulkrenameutility.co.uk (free)
- http://renamer.com (free trial)
- http://www.powersurgepub.com/products/psrenamer.html (free)
File Naming Conventions for Specific Disciplines
Many disciplines have recommendations, for example: