Data De-identification

Data submitted to ARCHIMEDES must comply with applicable privacy regulations and ethical approvals. In many cases this involves de-identifying or coding data with consent prior to submission.

The tools and resources below are provided for educational purposes only, and researchers are responsible for ensuring their data is prepared appropriately.

What is data de-identification?

De-identification is the process of removing or transforming personal identifiers in data in order to reduce the risk that an individual can be identified.

Health data may contain direct identifiers (e.g., names, health card numbers, exact addresses), indirect identifiers (e.g., date of birth, postal code, rare diagnoses), or embedded identifiers (e.g., metadata in imaging files or text within images). These elements must be carefully reviewed and either removed or modified to protect privacy.

Why is data de-identification important?

De-identification enables responsible data sharing while protecting patient privacy. It supports research, collaboration, and innovation, while helping organizations comply with privacy legislation, ethics requirements, and institutional policies. Careful de-identification helps protect privacy while preserving the usefulness of the data for research and innovation.

What to consider before beginning data de-identification:

Effective de-identification depends on:

Data type

Different data modalities (e.g., structured datasets, clinical notes, imaging, genomics) require different approaches.

Intended use

The level of de-identification required depends on whether data will be used internally, shared across institutions, or made publicly available.

Regulatory context

Applicable legislation and ethics requirements determine necessary safeguards.

Re-identification risk

Risk must be assessed based on the data elements and broader context.

Ongoing Work

De-identification of Health Data for Open Sharing - A Scoping Review

As part of this effort, we are conducting a Scoping Review entitled “De-identification of Health Data for open sharing – A Scoping Review”. This review aims to map the current landscape of de-identification techniques across various forms of health data, identify which methods are applied to which data types, and highlight existing gaps in the field. The results will inform the creation of a clear, structured overview of existing methods to help researchers make informed decisions when preparing their data for sharing. ​

Coming Soon​

The Resource Hub will soon expand to include:

  • Step-by-step de-identification guides for each data type
  • Re-identification risk guidance and assessment tools
  • Quality control checklists and resources for de-identified datasets
  • Additional technical tutorials and implementation support

Check back soon for updates and newly released resources.