Name: Medical AI Research Foundations: A repository of medical foundation models
Published: April 25, 2023
License: https://physionet.org/about/duas/medical-ai-foundations/

Model Credentialed Access

Shekoofeh Azizi , Jan Freyberg , Laura Culp , Patricia MacWilliams , Sara Mahdavi , Vivek Natarajan , Alan Karthikesalingam

Published: April 25, 2023. Version: 1.0.0

When using this resource, please cite: (show more options)
Azizi, S., Freyberg, J., Culp, L., MacWilliams, P., Mahdavi, S., Natarajan, V., & Karthikesalingam, A. (2023). Medical AI Research Foundations: A repository of medical foundation models (version 1.0.0). PhysioNet. https://doi.org/10.13026/grp0-z205.

MLA	Azizi, Shekoofeh, et al. "Medical AI Research Foundations: A repository of medical foundation models" (version 1.0.0). PhysioNet (2023), https://doi.org/10.13026/grp0-z205.
APA	Azizi, S., Freyberg, J., Culp, L., MacWilliams, P., Mahdavi, S., Natarajan, V., & Karthikesalingam, A. (2023). Medical AI Research Foundations: A repository of medical foundation models (version 1.0.0). PhysioNet. https://doi.org/10.13026/grp0-z205.
Chicago	Azizi, Shekoofeh, Freyberg, Jan, Culp, Laura, MacWilliams, Patricia, Mahdavi, Sara, Natarajan, Vivek, and Alan Karthikesalingam. "Medical AI Research Foundations: A repository of medical foundation models" (version 1.0.0). PhysioNet (2023). https://doi.org/10.13026/grp0-z205.
Harvard	Azizi, S., Freyberg, J., Culp, L., MacWilliams, P., Mahdavi, S., Natarajan, V., and Karthikesalingam, A. (2023) 'Medical AI Research Foundations: A repository of medical foundation models' (version 1.0.0), PhysioNet. Available at: https://doi.org/10.13026/grp0-z205.
Vancouver	Azizi S, Freyberg J, Culp L, MacWilliams P, Mahdavi S, Natarajan V, Karthikesalingam A. Medical AI Research Foundations: A repository of medical foundation models (version 1.0.0). PhysioNet. 2023. Available from: https://doi.org/10.13026/grp0-z205.

Additionally, please cite the original publication:

Azizi, S., Culp, L., Freyberg, J., Mustafa, B., Baur, S., Kornblith, S., ... & Natarajan, V. (2022). Robust and efficient medical imaging with self-supervision. arXiv preprint: https://arxiv.org/abs/2205.09723

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

APA	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
MLA	Goldberger, A., et al. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000).
CHICAGO	Goldberger, A., L. Amaral, L. Glass, J. Hausdorff, P. C. Ivanov, R. Mark, J. E. Mietus, G. B. Moody, C. K. Peng, and H. E. Stanley. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000).
HARVARD	Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P.C., Mark, R., Mietus, J.E., Moody, G.B., Peng, C.K. and Stanley, H.E., 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.
VANCOUVER	Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov PC, Mark R, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

Medical AI Research Foundations is a repository of open-source medical foundation models. With this collection of non-diagnostic models, APIs, and resources like code and data, researchers and developers can accelerate their medical AI research. This is a clear unmet need as currently there is no central resource today that developers and researchers can leverage to build medical AI and as such, this has slowed down both research and translation efforts. Our goal is to democratize access to foundational medical AI models, and help researchers and medical AI developers rapidly build new solutions. To this end, we open-sourced REMEDIS code-base and we are currently hosting REMEDIS models for chest x-ray and pathology. We expect to add more models and resources for training medical foundation models such as datasets and benchmarks in the future. We also welcome the medical AI research community to contribute to this.

Background

Despite recent progress in the field of medical artificial intelligence (AI), most existing models are narrow, single-task systems that require large quantities of labeled data to train. Moreover, these models cannot be easily reused in new clinical contexts as they often require the collection, de-identification (i.e., process for data anonymization) and annotation of site-specific data for every new deployment environment, which is both laborious and expensive. This problem of data-efficient generalization (a model’s ability to generalize to new settings using minimal new data) continues to be a key translational challenge for medical AI and has in turn, prevented their broad uptake in real world healthcare settings.

The emergence of foundation AI models offer a significant opportunity to rethink development of medical AI to make it more performant, safer, and equitable. These AI models are trained using data at scale, often by self-supervised learning. This process results in generalist models that can rapidly be adapted to new tasks and environments with little need for supervised data. With foundation models, it may be possible to safely and efficiently deploy models across various clinical contexts and environments.

In REMEDIS [1], we introduce a unified large-scale self-supervised learning framework for building foundation medical imaging models. This strategy combines large scale supervised transfer learning with self-supervised learning and requires minimal task-specific customization. REMEDIS shows significant improvement in data-efficient generalization across medical imaging tasks and modalities with a 3-100x reduction in site-specific data for adapting models to new clinical contexts and environments. Building on this, we’re also pleased to announce Medical AI Research Foundations, a collection of open-source non-diagnostic models (starting with REMEDIS models), APIs and resources to help researchers and developers accelerate medical AI research.

Please refer to our paper for experimental results [1]. Overall, the models we trained greatly reduce the need for supervised learning data and can thus serve as strong foundations for researchers building medical imaging models for chest x-ray and pathology. We are looking forward to feedback from the community to help improve our models. In particular, we hope the release of these models will spur research into open questions such as the impact of large scale supervision on aspects such as fairness, bias, safety and privacy.

Model Description

Overview

All models comprise convolutional neural networks pre-trained with Big Transfer representation learning, and contrastively trained with SimCLR self-supervision.

All models are a ResNet family model pre-trained with Big Transfer representation learning, and contrastively trained with SimCLR self-supervision.

We provide ResNet 50x1 and ResNet 52x2 models for both the tasks. The models were pretrained at a resolution of 224x224 using TensorFlow and available as TF Hub weights. The suffix -m and -s refer to models pretrained using BiT-M and BiT-S respectively as the starting point [2].

Files

There are multiple models provided. Each model file has the following format: {DATA_TYPE}-{ARCHITECTURE}-remedis-{PRETRAINING_DATA_SIZE}.

DATA_TYPE: either cxr (for Chest X-Ray) or path (for Pathology)
ARCHITECTURE: either 50x1 (for ResNet 50x1) or 52x2 (for ResNet 52x2), indicating the architectures
RETRAINING_DATA_SIZE: either s or m, indicating whether BiT-S or BiT-M were used as a starting point

The models are provided as TensorFlow 2 saved models, and are compatible with versions of TF above 2.11.

Example Usage

The TensorFlow 2 saved model format [3] can be loaded in Python 3 as follows:

import tensorflow_hub as hub

module = hub.load('TOP_LEVEL_HUB_PATH')

# Pathology: The image is of shape (<BATCH_SIZE>, 224, 224, 3)
# Chest X-Ray: The image is of shape (<BATCH_SIZE>, 448, 448, 3)
image = <LOAD_IMAGE_HERE>

embedding_of_image = module(image)

See further information about hub.Module here [4]

TensorFlow SavedModel Format

The SavedModel format includes several files. This is described in TensorFlow documentation [3], but I will summarize here.

the assets subfolder is empty in our case. It can contain things like a vocabulary file for a language model, but the models here have no such need
fingerprint.pb is a fingerprint of the model, which is several 64-bit hashes that uniquely identify the contents of the SavedModel
saved_model.pb is the SavedModel protocol buffer, which includes the graph definition
the variables subfolder contains a standard training checkpoint with variables.data-00000-of-00001 (a single shard with variables), and a variables.index file. These can be used to load variables into a matching graph structure (see TensorFlow documentation on checkpoints here [5]).

Technical Implementation

Model Overview

Overall, our approach comprises the following steps:

Supervised representation learning on a large-scale dataset of labeled natural images
Self-supervised contrastive representation learning on an unlabeled dataset of in-distribution medical images
Supervised fine-tuning on labeled in-distribution medical images

We open-source models that are the result of step 2. These models provide strong starting points for researchers developing diagnostic machine learning models. The models are use Resnet as the architecture backbone.

A brief description of the pre-training procedure follows below. For a full summary, please read our paper [1] where we provide detailed descriptions of the preprocessing, pretraining, finetuning and hyperparameters for each of the tasks and models. The code examples for finetuning are provided at [6].

1. Supervised representation learning

We begin with ResNet backbone models initialized with weights from Big Transfer (BiT) [2] pretrained models. In addition to the model architecture, BiT models vary based on the pretraining dataset: BiT-S, BiT-M and BiT-L, where S(mall), M(edium) and L(arge) indicate if the pretraining was done on ILSVRC-2012 (ImageNet-1K), ImageNet-21K or JFT, respectively. We open source models based on BiT-M only.

2. Self-supervised contrastive representation learning on an unlabeled dataset of in-distribution medical images

For contrastive pretraining, we build on SimCLR [7], which proposes a simple approach for contrastive learning for images. We performed a disjoint hyper-parameter tuning procedure to select factors influencing the quality of the learned representation, which we measured by the model performance in the downstream tasks using the validation set of the in-distribution data.

In our default contrastive pretraining setting, we utilized random cropping (C), random color distortion (D), rotation (R), and random Gaussian blur (G) as the data augmentation strategy. Due to the grayscale nature of radiology images, for these images we opted for stronger data augmentation to reduce the chances of overfitting. We further improved the final performance by incorporating histogram equalization and elastic deformation in addition to our default data augmentation strategy.

Training Data

We open-source models trained on public medical data only. This is available for chest x-ray and pathology only. The data used in each model are the following:

Chest X-Ray
- MIMIC-CXR-JPG [8, 9] This is a large, publicly available dataset of chest radiographs in JPG format. It is wholly derived from MIMIC_CXR [10], with the JPG files derived from the DICOM images and the structured labels from free-text reports.
- CheXpert [11, 12] This is a large open source dataset of 224,316 de-identified CXRs from 65,240 unique patients. We specifically use the five most prevalent pathologies, as specified by [13], including atelectasis, consolidation, pulmonary edema, pleural effusion, and cardiomegaly.
Pathology
- The Cancer Genome Atlas (TCGA) [14] A random sample of 50M patches from 10,705 cases (29,018 slides) spanning 32 “studies” (cancer types) from TCGA is used.

Installation and Requirements

TensorFlow

In order to use the models, TensorFlow above a version of 2.11 is required. To install TensorFlow in your python runtime, please see the TensorFlow documentation [15].

Inference Requirements

These models can be used as fixed embedding models that produce image representations to then train other models on. To only run inference, no complex hardware is needed. Simply load the model as shown, and perform inference.

Training or Fine-Tuning Requirements

These models can be used for full end-to-end fine-tuning on radiology or pathology data. Although fine-tuning of these models could be done on any hardware, it will be slow. Simply loading the data alone on some hardware may be slow or impossible (the patch_camelyon dataset provided on TensorFlow datasets is 7.48GiB in size, with ~330 thousand images). Hence GPU or TPU is suggested in these cases.

Usage Notes

We believe these models are best used for either full, end-to-end finetuning on radiology or pathology data, or as fixed embedding models that produce image representations to then train other models on.

For examples of fine-tuning on CheXpert and Camelyon, please see the included notebook, or an updated notebook and our code on GitHub [6].

Fine-Tuning Suggestions

There are many possible datasets these models could be fine-tuned with. We suggest any radiology data for the chest x-ray models, and any pathology data for the pathology models. Many datasets require getting approval to use. Three of these datasets, which were used in REMEDIS [1], are as follows:

ChestX-ray14: an expansion of ChestX-ray8 [16, 17]. These were collected at the National Institutes of Health Clinical Center, MD, USA
Camelyon-16 [18, 19] Breast lymph node slides from the CAMELYON16 challenge
Camelyon-17: We used Lymph node slides from 5,161 stage II and III colorectal cancer cases (36,520 slides) collected between 1984-2007 from the Institute of Pathology and the BioBank at the Medical University of Graz [20]

Unseen Environments

While we have attempted to rigorously evaluate our models in diverse tasks and settings, they may still fail when encountering data from unseen environments. Further, the impact of large scale self-supervised learning on fairness and safety is open topic of research. We hope the release of these models will spur further research here.

Release Notes

Version 1.0.0: First public release. The release accompanies our paper on Robust and efficient medical imaging with self-supervision [1].

Ethics

The models provided are trained on publicly available clinical datasets, with no new data collection occurring. We only share embedding models, which can be used to generate a representation of a given medical image but does not produce any diagnostic outputs (e.g. classification). Requirement for IRB approval was waived by Advarra IRB Services due to the research focus on retrospective, de-identified data.

Data used in pretraining of the Chest X-ray condition classification, including MIMIC-CXR and CheXpert are publicly available. Data used for in-distribution fine-tuning of pathology metastases detection is publicly available on the CAMELYON challenge website. For the Natural-Image datasets, ImageNet-1K (ILSVRC2012) has been used for pretraining of baseline supervised models and ImageNet-21K has been used for pretraining of BiT-M models. Both of these are publicly available at ImageNet website.

Acknowledgements

This work involved extensive collaborative efforts from a multidisciplinary team of researchers, software engineers, clinicians, and cross-functional contributors across Google Health AI and Google Brain. We thank our co-authors: Laura Culp, Jan Freyberge, Basil Mustafa, Sebastien Baur, Simon Kornblith, Ting Chen, Patricia MacWilliams, S. Sara Mahdavi, Ellery Wulczyn, Boris Babenko, Megan Wilson, Aaron Loh, Po-Hsuan Cameron Chen , Yuan Liu, Pinal Bavishi, Scott Mayer McKinney, Jim Winkens, Abhijit Guha Roy, Zach Beaver, Fiona Ryan, Justin Krogue, Mozziyar Etemadi, Umesh Telang, Yun Liu, Lily Peng, Greg S. Corrado, Dale R. Webster, David Fleet, Geoffrey Hinton, Neil Houlsby, Alan Karthikesalingam, Mohammad Norouzi and Vivek Natarajan.

We would like to thank Zoubin Ghahramani for his valuable feedback and continuous support through the course of the project. We thank Maithra Raghu, Nenad Tomašev, Jonathan Krause, Douglas Eck, and Michael Howell for their valuable feedback in improving the quality of the work. Additionally, we would like to thank Jakob Uszkoreit, Jon Deaton, Varun Godbole, Marcin Sieniek, Shruthi Prabhakara, Daniel Golden, Dave Steiner, Xiaohua Zhai, Andrei Giurgiu, Tom Duerig, Christopher Semturs, Peggy Bui, Jay Hartford, Sunny Jansen, Shravya Shetty, Terry Spitz, Dustin Tran, Jieying Luo, Olga Wichrowska, and Abbi Ward for their support throughout this project. We also thank our partners for access to the datasets used in the research.

Conflicts of Interest

This study was funded by Google LLC and/or a subsidiary thereof (‘Google’). J.F., L.C., S.A., V.N., N.H., A.K., M.N., B.M., S.B., P.M., S.S.M., S.K., T.C., B.B., P.B., E.W., C.C., Yuan Liu, Yun Liu, S.M., A.L., J.W., M.W., Z.B., A.G.R., U.T., D.W., D.F., L.P., G.C., J.K., and G.H. are employees of Google and may own stock as part of the standard compensation package. M.E. received funding from Google to support the research collaboration.

References

Azizi, S., Culp, L., Freyberg, J., Mustafa, B., Baur, S., Kornblith, S., ... & Natarajan, V. (2022). Robust and efficient medical imaging with self-supervision. arXiv preprint: https://arxiv.org/abs/2205.09723
Kolesnikov, A., Beyer, L., Zhai, X., Puigcerver, J., Yung, J., Gelly, S., & Houlsby, N. (2020). Big transfer (bit): General visual representation learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16 (pp. 491-507). Springer International Publishing.
TensorFlow documentation on saved models. https://www.tensorflow.org/guide/saved_model [Accessed: 25 April 2023]
TensorFlow Hub documentation. https://www.tensorflow.org/hub/api_docs/python/hub/Module [Accessed: 25 April 2023]
TensorFlow documentation on model checkpoints. https://www.tensorflow.org/guide/checkpoint#loading_mechanics [Accessed: 25 April 2023]
Medical AI Research Foundation code on GitHub. https://github.com/google-research/medical-ai-research-foundations [Accessed: 25 April 2023]
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020, November). A simple framework for contrastive learning of visual representations. In International conference on machine learning (pp. 1597-1607). PMLR.
Johnson, A., Lungren, M., Peng, Y., Lu, Z., Mark, R., Berkowitz, S., & Horng, S. (2019). MIMIC-CXR-JPG - chest radiographs with structured labels (version 2.0.0). PhysioNet. https://doi.org/10.13026/8360-t248.
Johnson, A. E., Pollard, T. J., Berkowitz, S. J., Greenbaum, N. R., Lungren, M. P., Deng, C.-y., Mark, R. G. & Horng, S. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data 6, 1–8 (2019).
Johnson, A., Pollard, T., Mark, R., Berkowitz, S., & Horng, S. (2019). MIMIC-CXR Database (version 2.0.0). PhysioNet. https://doi.org/10.13026/C2JT1Q.
CheXpert, a large dataset of chest X-rays. https://stanfordmlgroup.github.io/competitions/chexpert/ [Accessed: 25 April 2023]
Irvin, Jeremy, et al. "Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison." Proceedings of the AAAI conference on artificial intelligence. Vol. 33. No. 01. 2019.
Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S., Chute, C., Marklund, H., Haghgoo, B., Ball, R., Shpanskaya, K., et al. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison in Proceedings of the AAAI Conference on Artificial Intelligence 33 (2019), 590–597.
The Cancer Genome Atlas Program (TCGA) website. https://www.cancer.gov/ccg/research/genome-sequencing/tcga [Accessed: 25 April 2023]
TensorFlow documentation. https://www.tensorflow.org/ [Accessed: 25 April 2023]
Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M. & Summers, R. M. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases in Proceedings of the IEEE conference on computer vision and pattern recognition (2017), 2097–2106.
ChestX-ray14 Dataset from the National Institutes of Health - Clinical Center. https://nihcc.app.box.com/v/ChestXray-NIHCC
Bejnordi, B. E., Veta, M., Van Diest, P. J., Van Ginneken, B., Karssemeijer, N., Litjens, G., Van Der Laak, J. A., Hermsen, M., Manson, Q. F., Balkenhol, M., et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Jama 318, 2199–2210 (2017).
Litjens, G., Bandi, P., Ehteshami Bejnordi, B., Geessink, O., Balkenhol, M., Bult, P., ... & van der Laak, J. (2018). 1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset. GigaScience, 7(6), giy065.
Wulczyn, E., Steiner, D., Moran, M., Plass, M., Reihs, R., Tan, F., Flament-Auvigne, I., Brown, T., Regitnig, P., Chen, P.-H. C., Hegde, N., Sadhwani, A., MacDonald, R., Ayalew, B., Corrado, G. S., Peng, L. H., Tse, D., Müller, H., Xu, Z., Liu, Y., Stumpe, M. C., Zatloukal, K. & Mermel, C. H. Interpretable survival prediction for colorectal cancer using deep learning. npj Digital Medicine (2021).