Research Guides: Sensitive Research Data Management: Overview

Introduction

Sharing research data is a common funding agency expectation and a part of research practice in many disciplines. However, when the data includes information that could be used to identify research participants or contains sensitive information, researchers are legally and ethically obligated to ensure that confidentiality is maintained.

Navigating between the desire or expectation to share research data and the legal and ethical concerns of protecting research participants and preventing disclosure of sensitive information can be difficult. When appropriate, sharing research data containing sensitive or confidential data can be done, but takes planning and understanding of the requirements. These pages are designed to inform researchers of some of the issues in managing and sharing these types of data and to offer guidance in considering the issues inherent in handling research data in ways that address legal and ethical concerns.

Confidentiality and Disclosure

One of the central issues in sharing sensitive data is the prevention of accidental disclosure of the identity of a research participant.

The responsibilities of the researcher and the steps that will be taken to safeguard a participant's identity should be detailed in the consent form that is reviewed and signed by the research participant. The consent form acts as a contract between the researcher and the research participant, informing the participant as to how the data will be released (if at all) and what steps will be taken to prevent the disclosure of their identity. If you as the researcher intend to release the data in some fashion this should be made clear in the consent form. Obviously, the content of the consent form must be reviewed and approved by an Institutional Review Board (IRB) or Ethics Review Board (ERB).

Disclosure Risks - Disclosure could stem from the release of direct identifiers or indirect identifiers in the data set.

Direct Identifiers - are variables that contain information that could readily be used to discern an individual's identity such as a name, address, phone number, membership number, etc. Direct Identifiers are generally unique to an individual or a small group of individuals. It is generally understood that direct identifiers need to be removed from the data set before its release.

Indirect Identifiers - are variables that could be used in conjunction with other variables in the study or with external data to discern an individual's identity. Indirect Identifiers could include zip code, education level, medical diagnosis, race/ethnicity, occupation, etc. The variables that could be used as indirect identifiers may not be immediately obvious and so some consideration should be given to how the variables could be put together before releasing the data set.

Putting together identifiers may lead to someone being able to "re-identify" a participating subject. There are two broad types of re-identification:

Identification Disclosure - which occurs when someone is able to discern an identity of a subject from a particular record within a data set.

Attribute Disclosure - which occurs when someone learns the value of a sensitive variable that could then be used in conjunction with other variables to discover the identity of the individual.

General Resources

European Union General Data Protection Regulation (GDPR)
GDPR aims to strengthen protections for individuals of the European Union and will be enforceable May 25, 2018.
Export Controls and Research Information Assurance (EVPRP)
Maintained by the Export Control and Information Assurance team within the Office of Research and Partnerships.
Federal Policy for the Protection of Human Subjects ('Common Rule')
Information about the Federal Policy for the Protection of Human Subjects a.k.a the "Common Rule" and links to information produced by federal agencies.
Health Information Privacy
The Office for Civil Rights enforces:
- the HIPAA Privacy Rule, which protects the privacy of individually identifiable health information
- the HIPAA Security Rule, which sets national standards for the security of electronic protected health information
- the confidentiality provisions of the Patient Safety Rule, which protect identifiable information being used to analyze patient safety events and improve patient safety
HIPAA research guidelines (Purdue Office of Legal Council)
The Health Insurance Portability and Accountability Act of 1996 (HIPAA) rules create a framework to protect the privacy and security of patient and health plan member health information. Purdue University supports the goals of HIPAA and documents its commitment to comply with these laws in its Compliance with HIPAA Privacy Regulations policy.
NEH Office of Digital Humanities Data Management Plans
Data Management Plan Requirements for NEH Office of Digital Humanities Grants (updated yearly)
NIH Policy for Data Management and Sharing - January 25, 2023
The final NIH data management and sharing policy beginning in January 2023. Supplemental information about specific topics is included

more... less...

All extramural researchers submitting a grant proposal generating scientific research data are required to provide a data management and sharing plan as part of their proposal.
NSF - Dissemination and Sharing of Research Results
Links to resources about the NSF's policy on data sharing, data management plan requirements, and FAQs.
Purdue University Human Research Protection Program (and IRB)
The Purdue University Institutional Review Board (IRB) is responsible for overseeing the rights and welfare of human subjects. The terms and conditions under which sensitive are released with need to be reviewed and approved by the IRB as a part of the research protocol review process.
Secure Purdue
The goal of Secure Purdue is to further Purdue’s mission by protecting the confidentiality, integrity and availability of University information and technology assets. See Data Handling tab for updated information.