Data De-identification
Data submitted to ARCHIMEDES must comply with applicable privacy regulations and ethical approvals. In many cases this involves de-identifying or coding data with consent prior to submission.
The tools and resources below are provided for educational purposes only, and researchers are responsible for ensuring their data is prepared appropriately.
What is data de-identification?
De-identification is the process of removing or transforming personal identifiers in data in order to reduce the risk that an individual can be identified.
Health data may contain direct identifiers (e.g., names, health card numbers, exact addresses), indirect identifiers (e.g., date of birth, postal code, rare diagnoses), or embedded identifiers (e.g., metadata in imaging files or text within images). These elements must be carefully reviewed and either removed or modified to protect privacy.
Why is data de-identification important?
De-identification enables responsible data sharing while protecting patient privacy. It supports research, collaboration, and innovation, while helping organizations comply with privacy legislation, ethics requirements, and institutional policies. Careful de-identification helps protect privacy while preserving the usefulness of the data for research and innovation.
What to consider before beginning data de-identification:
Effective de-identification depends on:
Data type
Different data modalities (e.g., structured datasets, clinical notes, imaging, genomics) require different approaches.
Intended use
The level of de-identification required depends on whether data will be used internally, shared across institutions, or made publicly available.
Regulatory context
Applicable legislation and ethics requirements determine necessary safeguards.
Re-identification risk
Risk must be assessed based on the data elements and broader context.
De-identification Resource Hub
The De-Identification Resource Hub provides practical guidance, tools, and educational materials to support responsible data preparation and sharing. Our goal is to make de-identification more accessible and transparent while promoting research quality, reproducibility, and privacy compliance across institutions.
Getting started with de-identification
Core concepts, introductory video content, and guidance to help you understand the fundamentals of de-identification.
Step-by-step de-identification guide
Practical step-by-step guidance tailored by data type, outlining key considerations, common identifiers, and recommended approaches for structured data, text, imaging, and more.
Tutorials & Workshops
Workshops, code based tutorials, and implementation resources for applying de identification methods in practice.
Link library
A curated collection of external guidance, regulatory frameworks, best practice documents, and trusted de-identification resources.
Ongoing Work
De-identification of Health Data for Open Sharing - A Scoping Review
As part of this effort, we are conducting a Scoping Review entitled “De-identification of Health Data for open sharing – A Scoping Review”. This review aims to map the current landscape of de-identification techniques across various forms of health data, identify which methods are applied to which data types, and highlight existing gaps in the field. The results will inform the creation of a clear, structured overview of existing methods to help researchers make informed decisions when preparing their data for sharing.
Coming Soon
The Resource Hub will soon expand to include:
- Step-by-step de-identification guides for each data type
- Re-identification risk guidance and assessment tools
- Quality control checklists and resources for de-identified datasets
- Additional technical tutorials and implementation support
Check back soon for updates and newly released resources.