Sharing research data is a common funding agency expectation and a part of research practice in many disciplines. However, when the data includes information that could be used to identify research participants or contains sensitive information, researchers are legally and ethically obligated to ensure that confidentiality is maintained.
Navigating between the desire or expectation to share research data and the legal and ethical concerns of protecting research participants and preventing disclosure of sensitive information can be difficult. When appropriate, sharing research data containing sensitive or confidential data can be done, but takes planning and understanding of the requirements. These pages are designed to inform researchers of some of the issues in managing and sharing these types of data and to offer guidance in considering the issues inherent in handling research data in ways that address legal and ethical concerns.
One of the central issues in sharing sensitive data is the prevention of accidental disclosure of the identity of a research participant.
The responsibilities of the researcher and the steps that will be taken to safeguard a participant's identity should be detailed in the consent form that is reviewed and signed by the research participant. The consent form acts as a contract between the researcher and the research participant, informing the participant as to how the data will be released (if at all) and what steps will be taken to prevent the disclosure of their identity. If you as the researcher intend to release the data in some fashion this should be made clear in the consent form. Obviously, the content of the consent form must be reviewed and approved by an Institutional Review Board (IRB) or Ethics Review Board (ERB).
Disclosure Risks - Disclosure could stem from the release of direct identifiers or indirect identifiers in the data set.
Direct Identifiers - are variables that contain information that could readily be used to discern an individual's identity such as a name, address, phone number, membership number, etc. Direct Identifiers are generally unique to an individual or a small group of individuals. It is generally understood that direct identifiers need to be removed from the data set before its release.
Indirect Identifiers - are variables that could be used in conjunction with other variables in the study or with external data to discern an individual's identity. Indirect Identifiers could include zip code, education level, medical diagnosis, race/ethnicity, occupation, etc. The variables that could be used as indirect identifiers may not be immediately obvious and so some consideration should be given to how the variables could be put together before releasing the data set.
Putting together identifiers may lead to someone being able to "re-identify" a participating subject. There are two broad types of re-identification:
Identification Disclosure - which occurs when someone is able to discern an identity of a subject from a particular record within a data set.
Attribute Disclosure - which occurs when someone learns the value of a sensitive variable that could then be used in conjunction with other variables to discover the identity of the individual.