Use of MIMIC Data with Large Language Models and Online Services

Sept. 24, 2025

We have received inquiries about the use of credentialed and restricted data on PhysioNet, including MIMIC-III, MIMIC-IV, MIMIC-CXR, and their derivatives, with large language models (LLMs) and online services. The PhysioNet Credentialed Data Use Agreement explicitly prohibits sharing access to the data with third parties, including sending it through APIs or using it on online platforms.

Key Requirements:

  • Zero Data Retention: MIMIC data must not be stored or retained by third-party LLM services.

  • User Responsibility: Researchers are responsible for ensuring compliance with the Data Use Agreement.

Recommendations:

  • Strongly Recommended: Use locally deployed LLMs to maintain full control over the data.

  • If Using Cloud Services or APIs: Verify that the service’s settings ensure zero data retention, no use of data for model training, and no human review. Many services retain data by default. Even when services claim "zero data retention," their requirements may be insufficient due to internal processing, logging, or caching practices. Regularly review platform policies, as they may change without notice. If a service’s data handling practices are unclear or cannot be fully verified, do not use the service.

Important Disclaimer: PhysioNet cannot verify the data practices of external services and does not endorse or recommend specific platforms.