Skip to Main Content
Purdue University Purdue Logo Purdue Libraries

Data Management for Health and Human Sciences

Research Data Management overview and resources for students and faculty focusing on health and human sciences

What is Documentation?

Effective documentation is essential for managing research data. Thoroughly describing your dataset and recording your processes not only aids you and your research team but also assists others who may want to reuse your data in the future. For someone to interpret and reuse research data, they need to understand the context of the research—when, why, and how the data was collected or generated, the meaning of the variables, the processing or transformations applied, and the creation of the final dataset.

Numerous elements of your research project and its associated datasets require thorough documentation. Consider the following categories:

  • Context of data collection
  • Data collection methods
  • Information about variables used
  • File organization and naming schemes
  • How data has been transformed or processed for analysis
  • Software used for data processing and analysis
  • Outside data sources used
  • The roles and responsibilities of project personnel

What is Metadata?

A key factor in promoting data sharing and reuse is interoperability, which refers to the ability to integrate your dataset with others to improve discovery. For this to happen, similar data should be described with similar metadata and, if possible, adhere to common data standards. While not all fields have standardized data formats, it’s important to align your data and metadata with established standards whenever possible.

Some common examples of metadata may include:

  • Descriptive Metadata: title, author, abstract, and keywords describing the content of the data
  • Structural Metadata: file formats and data organization
  • Administrative Metadata: date of creation, versions, and access rights
  • Technical Metadata: tools or software used to collect or analyze the data or instrument details
  • Experimental Metadata: information about the experimental conditions (e.g., assay type, time points), the experimental protocol, and the equipment used to generate the data.

There are also many resources for various medical metadata standards:

Ways to Document Data

Here are a few examples of different measures you can take to document your data:

README File: A README file provides information about a data file and is intended to help ensure that the data can be correctly interpreted, by yourself at a later date or by others when sharing or publishing data. Cornell Data Services offers a Guide to writing "readme" style metadata with a downloadable template to get started.

Codebook: A codebook describes the contents, structure, and layout of a data collection. In What is a Codebook? ICPSR shares example codebooks as well as related resources in creating them.

Data Dictionary: A data dictionary provides an overall description of the data along with more detailed descriptions of each variable. The Open Source Framework provides more information in an article on How to Make a Data Dictionary.

Protocol: A research protocol is a document that describes the background, rationale, objectives, design, methodology, statistical considerations, and organization of a clinical research project. Nature's Five Keys to Writing a Reproducible Lab Protocol provides an abundance of tools to get started.

Resources for Documenting Your Data