Sharing research data is becoming an expectation from funding agencies and a part of research practice in many disciplines. However, when the data includes information that could be used to identify individual human subjects or contains sensitive information, researchers are legally and ethically obligated to ensure that confidentiality is maintained.
Navigating between the desire or expectation to share research data and the legal and ethical concerns of protecting human subjects and preventing disclosure of sensitive information can be difficult. Sharing research data containing sensitive or confidential data can be done, it just takes planning and understanding of the requirements. These pages are designed to inform researchers of the issues in managing and sharing these types of data and to offer guidance in considering the issues inherent in handling research data in ways that address legal and ethical concerns.
The central issue in sharing sensitive data is the prevention of accidental disclosure of the identity of a participating subject.
The responsibilities of the researcher and the steps that will be taken to safeguard a subject's identity should be detailed in the consent form that is reviewed and signed by the subject. The consent form acts as a contract between the researcher and the subject, informing the subject as to how the data will be released (if at all) and what steps will be taken to prevent the disclosure of their identity. If you as the researcher intend to release the data in some fashion this should be made clear in the consent form. Obviously, the content of the consent form must be reviewed and approved by an Institutional Review Board (IRB).
Disclosure could stem from the release of direct identifiers or indirect identifiers in the data set.
Direct Identifiers - are variables that contain information that could readily be used to discern an individual's identity such as a name, address, phone number, membership number, etc. Direct Identifiers are generally unique to an individual or a small group of individuals. It is generally understood that direct identifiers need to be removed from the data set before its release.
Indirect Identifiers - are variables that could be used in conjunction with other variables in the study or with external data to discern an individual's identity. Indirect Identifiers could include zip code, education level, medical diagnosis, race/ethnicity, occupation, etc. The variables that could be used as indirect identifiers may not be immediately obvious and so some consideration should be given to how the variables could be put together before releasing the data set.
Putting together identifiers may lead to someone being able to "re-identify" a participating subject. There are two broad types of re-identification:
Identification Disclosure - which occurs when someone is able to discern an identity of a subject from a particular record within a data set.
Attribute Disclosure - which occurs when someone learns the value of a sensitive variable that could then be used in conjunction with other variables to discover the identity of the individual.