Thank you for your interest in ARCHIMEDES. If you would like to contribute data to the ARCHIMEDES platform, contact our Community and Partnership Liaison at ARCHIMEDES@ottawaheart.ca to initiate your institution’s Data Contribution Agreement.

De-identification Tools

At ARCHIMEDES, we are committed to supporting responsible health data sharing. All researchers depositing data into the ARCHIMEDES platform are responsible for ensuring their data is appropriately de-identified prior to upload.

To support this process, the ARCHIMEDES team is developing a range of educational resources and tools to help researchers navigate de-identification best practices based on the type and format of their data.

Ongoing Work

De-identification of Health Data for Open Sharing - A Scoping Review

As part of this effort, we are conducting a Scoping Review entitled “De-identification of Health Data for open sharing – A Scoping Review”. This review aims to map the current landscape of de-identification techniques across various forms of health data, identify which methods are applied to which data types, and highlight existing gaps in the field. The results will inform the creation of a clear, structured overview of existing methods to help researchers make informed decisions when preparing their data for sharing. ​

Coming Soon:
De-identification Resource Hub

These tools are designed to make the de-identification process easier to navigate, while promoting transparency, reproducibility, and data privacy compliance. Our goal is to make the de-identification process more accessible, transparent, and consistent, empowering researchers to share high-quality, privacy-compliant data through the ARCHIMEDES platform. ​

A list of open-source de-identification toolkits with guidance on their applications to specific data types (e.g., imaging, structured data, free text)

Infographics that break down complex concepts into digestible visuals

Interactive coding tutorials (e.g., Jupyter notebooks) to support users working with larger or more complex datasets

De-Identification Fundamentals

Playlist

1 Videos
Transcript

Welcome to ARCHIMEDES, the Advanced Research Collaboration for Health Integration, Medical Exploration, and Data Synthesis – a platform designed for seamless and secure medical data sharing.

Preparing data for sharing on ARCHIMEDES involves several steps to ensure privacy, security, and compliance with legal and ethical frameworks. One crucial component of this process is data de-identification. Data must be fully de-identified by the uploader before it is submitted to ARCHIMEDES.

De-identification is the process of removing or modifying personal information from data. This ensures that patient privacy is protected in medical data. It protects patient privacy, minimizes the risk of breaches, and allows data to be shared for collaboration – all while staying compliant with privacy laws.

However, de-identifying data isn’t always simple. It requires balancing privacy with data usability – and compliance with a range of regulatory frameworks. De-identification must account for potential risks of re-identification, especially with advances in data analytics and machine learning. Proper de-identification is essential for fostering trust in data sharing among stakeholders while preserving the value of the data for research and clinical use.

The terms “de-identification” and “anonymization” are often used interchangeably, but terminology can vary. Both processes remove personal health information (PHI) to protect privacy. Anonymization irreversibly removes PHI, which minimizes the risk of re-identification. On the other hand, de-identification (sometimes also called “pseudonymization”) removes most PHI, but may retain low-risk identifiers or use coding or encryption to preserve data utility over time. While both methods aim to protect privacy, de-identification often allows researchers to link data across time or datasets, whereas anonymization eliminates this possibility for greater privacy protection. Both anonymization and de-identification protect privacy and ensure compliance with privacy regulations, but de-identification often allows for greater data utility. To achieve this, a variety of techniques can be used to effectively remove or alter sensitive information. Let’s explore some of the most commonly used de-identification methods

First, data masking. This involves the removal or modification of direct identifiers—things like names, phone numbers, and medical record numbers. Masking is often the first and most straightforward step in the de-identification process.

Next, data perturbation. This method slightly modifies the values of sensitive data to protect identity. For example, an age or date might be adjusted by a small, random amount. While the overall dataset stays statistically meaningful, individual-level precision is blurred to protect privacy.

Finally, tokenization. This replaces identifiable data with unique codes or pseudonyms that cannot be linked back to an individual without a secure key. Tokenization is especially helpful when researchers need to track records across time or across datasets, without compromising identity.

Together, these tools form the foundation of most de-identification strategies—removing identifiers, adding uncertainty, and preserving utility where possible.

In Canada, the Personal Information Protection and Electronic Documents Act—PIPEDA— outlines legal requirements for de-identifying medical data. In the U.S., the HIPAA De-identification Standard outlines similar rules. These frameworks define how data must be treated. While the PIPEDA outlines legal requirements for data de-identification, the Office of the Privacy Commissioner of Canada provides guidance on how to adequately de-identify data. Some provinces also have their own regulations. There are lots of other resources available to learn more about de-identification regulations.

To explore resources, templates, and tools for data de-identification, visit the ARCHIMEDES platform and learn how to get started.