Prior to submitting data to an archive, the researcher(s) needs to remove any information that could allow subjects to be identified, either through direct (identification) or indirect (attribute) identifiers that could be used to identify a person when in combination with other information in the dataset.
This can be done in two ways:
This is the process of removing direct and indirect identifiers from a dataset, while maintaining enough information for the data to be useable to future researchers. In de-identification a key is geneated that explains the steps taken to de-identify the data and which could be used to reverse the process and reassociate the data with individuals.
The process of anonymization is similar to deidentification in the types of information masked in the original data set. However, this process is irreversible, meaning no key is generated and there is no way in the future to reconnect the individual subject with the data they supplied for the project.
Some common methods of handling indirect identifiers include:
There are some techniques that may affect the analyses that can be performed on the data set. Careful consideration should be given to the possible effect of these techniques on the data before they are applied.
Qualitative data may present a challenge to de-identify as it is typically not as structured as quantitative data. Deidentification or Anonimization may also distort or otherwise effect the value of the data, particularly in cases where the value comes from capturing personal expereinces or stories. Researchers may want to consider deidentification in conunction with other strategies such as restricting access (see below) or securing permission from the respondants through the infomed consent process to share some or all of the personal data collected.
Techniques that may be applied to qualitative data inlcude:
Whatever technique is used consideration should be given to how it will affect the utility of the data set.
The Council of European Social Science Data Archives (CESSDA) provides the following guidance in working with sensative qualitative data:
If the anonymisation is being carried out after transcription:
(Source: CESSDA, http://www.cessda.org/sharing/rights/3/)
Some data sets will not remain usable if all identifiers are removed. An example is medical information where gender, age, race and medical history may be required for accurate analysis. In this case, restricting access to the data is required.