Skip to Main Content
Purdue University Purdue Logo Purdue Libraries

Research Data Management Overview

Includes best practices, resources, and tools for managing and sharing research data.

Data Preservation and Archiving

Archiving research data refers to the long-term storage and preservation of data. This preservation of research data ensures data are accessible while retaining their integrity and authenticity over time. The location and process of archiving data should be considered in data management planning including maintaining data integrity, preserving data provenance, allowing data sharing and reuse, and complying with institutional and funder regulations. Many of these preservation actions overlap with the guidance in the Publishing and Sharing section, and similarly, a sharing platform such as a repository may also offer preservation and archiving services. Here are a few key components of data preservation:

  • Plan for which data you will need to preserve 
  • Ensure you have adequate, descriptive metadata, so future users can correctly interpreted the data
  • Use an open, non-proprietary, and commonly used file format
  • Document a file retention plan including who will be responsible for the data over time
  • Select an appropriate location for storage and archiving (see the next section on repositories)

 

comic strip of two men discussing digital data preservation as each of the 4 panels begin to degrade in quality

Shared through the LTER Network DataBits Stories, by John Porter & An Nguyen with input from the LTER IM Committee

File Formats and Standards

The Purdue University Research Repository (PURR) is Purdue's institutional data repository that supports long-term preservation and access. Looking at PURR's digital preservation policies can give researchers a glimpse into how this process works and what can be expected when preparing to preserve your data. PURR will provide preservation support for as many formats as possible but the system considers three levels of support for archiving data:

  • Sustainable
    • Recognized and fully supported file formats (highest probability of long-term stability) 
    • Formats are openly documented, supported by a wide range of software platforms, widely adopted, have no data compression (or lossless data compression), and are widely accepted within the archival community
    • Sustainable file formats will receive Full Preservation
  • Supported
    • Recognized file formats
    • Do not meet minimum requirements for a Sustainable ranking but come close and may be necessary for long-term care
    • Formats more likely to require migration in order to remain viewable
    • Formats are proprietary, widely adopted, publicly and commercially important, have lossy data compression, or may be a format which has been depreciated in favor of a newer version
    • Supported file formats will receive Limited Preservation
  • Unsustainable:
    • Not recognized formats
    • Unsustainable formats not viable for long term storage or accessibility
    • Formats are proprietary, have little publicly documented information, are not widely adopted, have lossy data compression, and are only supported by a single or very few software platforms
    • Unsustainable file formats will receive Bit-level Preservation

 

Here are a few examples of sustainable, supported, and unsustainable formats:

File Type Sustainable Supported Unsustainable
Word Processing PDF/A, OpenDocument Text PDF/B, Microsoft Word, Microsoft Open XML, Rich Text Format CorelWordPerfect, Lotus WordPro, PDF
Plain Text Plain Text, Comma-separated file, Tab-delimited file    
Structural Markup SGML w/DTD, XML w/DTD   SGML w/o DTD, XML w/o DTD
Spreadsheets Comma-separated file, Tab-delimited file, PDF/A Microsoft Excel, Microsoft Excel Open XML  
Databases Delimited Flat File w/DDL Microsoft Access, dBase Format  
Audio WAVE AIFF (uncompressed), Standard MIDI, MPEG, MP2AAC Audio CD, DVD-Audio, RealAudio, Shorten, RIFF-RMID, Extended MIDI
Video   AVI, MPEG-1, MPEG-2, MPEG-4, Quicktime Windows Media Video
Images TIFF, JPG 2000 JPEG, PNG, PDF/A, GIF RAW, Adobe Photoshop, PDF

Preservation Support

When choosing a location to archive your research data, it is important to find out the level of support provided for preservation through policies. As an example, PURR offers a preservation support policy following the sustainable, supported, and unsustainable specifications described above. The following show which preservation actions are supported by PURR and at what level:

Bit-level preservation

  • Digital Object Identifier (DOI)
  • Preservation metadata
  • Secure storage and backup
  • Regular virus checks
  • Regular fixity checks
  • Bitstream maintenance
  • Transformation/Normalization

Limited Preservation

  • Migrate to more preservable format
  • Strategically monitor format for changes
  • Everything included in bit-level preservation

Full Preservation

  • Migrate to successive format
  • Strategically monitor format for changes
  • Everything included in bit-level preservation