Sharing on PhysioNet

We invite you to share your resources with the research community. Incentives to share include:

Improving the discoverability of your work through increased citations.
Creating opportunities for collaboration.
Facilitating reuse of your data and software, maximizing impact.
Encouraging higher-quality, transparent research.

Please read the author guidelines below carefully before submitting your work.

Submission Overview

Submitting data and software to PhysioNet is similar to submitting an article to a journal. Briefly, the process for submitting your work for publication is as follows:

If you are the submitting author, create a new project using the project management system.
- We recommend validating datasets prior to upload using the PhysioNet "preflight" validation tool. Install with pip install physionet and run the validator from the command line with physionet validate PATH_TO_DATA. Please report bugs or suggestions for improvement at: https://github.com/MIT-LCP/physionet/issues
- We also recommend generating Croissant metadata for datasets prior to submission using the Croissant Baker tool. Install with pip install croissant-baker and generate metadata from the command line with croissant-baker --input PATH_TO_DATASET.
Add your project details. These will include contact information for co-authors, descriptive text, and data files.
Once you are satisfied, submit the project to the editorial team for review.
You will be notified of an editorial decision when the review process is complete.
Once any issues have been addressed, your data or software will be published on PhysioNet.
When a project is published, its content is fixed and cannot be changed.

Before submitting your project for review, please make sure that you have:

Added all co-authors to the system. Co-authors will receive a notification message upon submission and must approve of the submission prior to publication.
Read the Author Guidelines section for a detailed description on how to generate project files and related metadata.
Selected an appropriate license for releasing your data files.

In addition, please check ensure:

Your project content and files are clearly described.
Details of the data collection process are provided in the Method section. Supporting information and diagrams should be provided for less well-known data capturing devices.
Relevant publications are included in the Reference section and are cited in the main text, in the style [1], [3-4]. While we encourage you to link to a project website in the "Discovery" metadata, the documentation on PhysioNet must be self-contained and sufficient for others to understand and use your resource.
If this is an updated version of a previously published resource, please clearly specify what the major changes are from the previous version.

Editorial process

Once your project is submitted, it will be reviewed by one or more content specialists. Based on this review, you will be provided with an initial editorial decision within four weeks from submission. The possible decisions are:

Accept: The editor is mostly satisfied with the content. The submission continues into the copyedit stage.
Resubmit: The content is suitable for publication, but changes are required to make it more clear or reusable. Changes may include adding required information, restructuring files, and rewording content. Once the submitting author has made these changes, they may resubmit the project for review.
Reject: The content is not suitable for publication.

Author Guidelines

Choosing a project type

When submitting a project to PhysioNet you will be asked to select one of the following project types:

Database: Research data with significant potential for reuse by the research community. This may include data that enables published studies to be reproduced, data for benchmarking algorithms, and data that supports novel investigations.
Software: Software that has been developed for research applications.
Challenge: Description of a challenge for the research community. Files such as datasets and software may be included as part of the challenge.
Model: An implementation of a statistical or machine learning model with potential for reuse by the research community. Typically models will be created by a training process and may have dependencies on specific computational frameworks.

Creating the project metadata

To help the community to reuse your shared resources, we require a detailed description. The information that you provide should focus on the resource and how it might be reused. During the submission process you are asked to provide information such as a title, an abstract for distribution to search indexes, and context describing the manner in which the resource was created. Further details are outlined below:

Title: Your title should be no longer than 200 characters. Avoid acronyms and abbreviations where possible. Also avoid leading with "The". Only letters, numbers, spaces, underscores, and hyphens are allowed.
- If your dataset is derived from MIMIC and you would like to use the MIMIC acronym, please include the letters "Ext" (for example, MIMIC-IV-Ext-YOUR-DATASET"). Ext may either indicate "extracted" (e.g. a derived subset) or "extended" (e.g. annotations), depending on your use case.
- If the dataset is derived from another dataset, the title must make this clear.
Abstract: Your abstract must be no longer than 250 words. The focus should be on the resource being shared. If the resource was generated as part of a scientific investigation, relevant information may be provided to facilitate reuse. References should not be included. The abstract should also include a high-level description of the data as well as an overview of the key aims of the project. The abstract may appear in search indexes independently of the full project metadata, so providing detailed information about the content is important.
Background: Your background should provide the reader with an introduction to the resource. The section should offer context in which the resource was created and outline your motivations for sharing.
Methods & Technical Implementation: The "Methods" and "Technical Implementation" sections provide details of the procedures used to create your resource including, but not limited to, how the data was collected, any measurement devices, etc. For software, the section may cover aspects such as development process, software design, and description of algorithms. For data, the section may include details such as experimental design, data acquisition, and data processing.
Content description: Your content (data, software, model) description should describe the resource in detail, outlining how files are structured, file formats, and a description of what the files contain. We also suggest including summary statistics where appropriate (e.g. total number of distinct patients, number of files, types of signals, over what time span was the data collected, etc.).
Usage notes: This section should provide the reader with information relevant to reuse. Why is this data useful for the community?
- In particular we suggest discussing: (1) how the data has already been used (citing relevant papers); (2) the reuse potential of the dataset; (3) known limitations that users should be aware of when using the resource; and (4) any complementary code or datasets that might be of interest to the user community.
Ethics: Please provide a statement on the ethics of your work. Think about the project impact and briefly highlight both benefits and risks. Please also add relevant institutional review details here, for example:
- Data collected from human subjects: Please provide a statement that the study protocol was approved by relevant Institutional Review Boards (IRBs) or ethics committees. If human participants gave written informed consent, then please state this.
- Clinical trial data: Please specify trial registration number and registry name.
- Data collected from animals: Please specify the animal care guidelines used in collecting your data. See, for example, this project and this official NIH manual.
Acknowledgments: In this section, acknowledge the people who helped with the research, but who were not included as co-authors. In addition, provide funding information.
Conflicts of interest: A statement on potential conflicts of interest is required. If the authors have no conflicts of interest, the section should say "The author(s) have no conflicts of interest to declare".
Version: The version number of the resource. Semantic versioning is encouraged (major version, minor version, patch version). If unsure, put "1.0.0".
References: Please use the Vancouver reference style. All citations should be numbered sequentially in the text in square brackets. For example, the first citation [1], the second citation [2], and the third and fourth citations [3,4]. Entries in the reference list should be in the following style: 1. Xu YZ, Geng DC, Mao HQ, Zhu XS, Yang HL (2010). "A comparison of the proximal femoral nail antirotation device and dynamic hip screw in the treatment of unstable pertrochanteric fracture". J Int Med Res. 38: 1266–1275. PMID 20925999.
Weblinks: Please do not include URLs/weblinks/hyperlinks in the main text. All external resources (including websites, publications, datasets, and GitHub repositories) should be added to the References section and cited in the main text in the style [1], [2-4].

Preparing your project files

PhysioNet publishes content such as data and software for reuse by the research community. We typically do not review and publish content that reports on scientific findings. Scientific findings should be published elsewhere (for example, in a journal or conference). Our goals are to ensure that the content is safe to share and that it is sufficiently well structured and described for it to be a valuable resource for the research community. When submitting a project, you will be asked to upload relevant data and software files. Please review the following guidelines when preparing your files for submission:

All projects:
- README file: A README file should be included alongside the files. At minimum, the readme should include a title and a brief description of the package content.
- Protected Health Information (PHI): All protected health information must be removed. All dates (except year), including data collection dates, must be date-shifted or removed. The comprehensive guide for de-identifying data can be seen here.
- File naming: All files should be clearly named and must not include spaces (use underscores instead for increased readability) or special characters (e.g. "/","\,"."). Further, filenames should generally be lowercase (exceptions are "special files”, such as the RECORDS and ANNOTATORS files used for waveforms). Please use brevity when naming files (e.g. 1001abp.dat is better than subject_1001_ABP_wave_100Hz.dat).
- File types: All files must be in open-source format and machine readable. Files in proprietary format, such as MatLab, Excel spreadsheet or Microsoft Word document, will not be supported, and must be converted to open-source format. For example, MatLab data, Excel spreadsheets, or Microsoft Word documents can be converted to CSV format. Some suggested formats for data based on its usage can be seen here.
Data (general):
- Small datasets: Comma-Separated Value (CSV) is a good format for small datasets. CSVs should be formatted according to the RFC 4180 specification.
- Tidy data: Information needed for reuse of the data must be provided. In most cases, tabulated datasets should be structured following the principles of "tidy data". For example, each variable should be in a column and each observation (or case) in a row.
Data (waveforms):
- WFDB compatibility: In general, high time-resolution data such as ECG and EEG recordings should be stored in a WFDB compatible (or other open-source) format. Details such as gain and baseline should be included in the file headers. For detailed guidance on creating MIT format signal files, see this tutorial, and for EDF format, see this tutorial.
- Build an index file RECORD of your waveform records: Provide a file named RECORDS at the top-level directory in your submission. The RECORDS file must contain a list of all WFDB format records where each row is the name of a WFDB file (without any .hea or .dat extensions) or EDF file (WITH .edf extensions) in your contribution. Example files can be seen here and here. (Note that for EDF files, you need to specify the .edf file extension as part of your file name as seen here.)
- Additional subject data: Information about the subjects can be included either at the bottom of the signal header files or in a separate text file. Preferred information includes: age, gender, height, weight, medications, and diagnoses. If relevant: gestational age.
- Visualize and check your waveforms using LightWave: If you upload the RECORDS file, you will see a link to view your project's waveforms in LightWAVE. Note that the RECORDS file must be at the top-level directory in your submission, and must be named exactly RECORDS for LightWAVE to locate the file. If your WFDB (or EDF) files are organized in a sub-directory in your project, the relative path of the file location must be specified in the RECORDS file. For example, if you have a WFDB file named "subject1_ecg.dat" under a subdirectory "ECG", then a row in the RECORDS file should read: “ECG/subject1_ecg”.
- Check for valid signal types (WFDB format): One tool to check WFDB formatted files for valid signal types is to use the wfdbcheck auxillary function for the WFDB software package. This program will attempt to find errors which have occurred previously in both signal and annotation files. It is not comprehensive, therefore it is recommended that the user reports any errors which may have occurred as well as a description of the new signal type they would like to be added to the signal type dictionary.
- Signal and waveform channel naming conventions: We use standardized signal names and units for all waveform records for consistency across databases. Please name your signals using the following standardized signal names supported on PhysioNet standard signal name list. Details of the format are at the top of wfdbcal, and also in wfdbcal(5). For signal names, upper case should be used where it improves readability (e.g. ABP_Sys is better than abp_sys).
  - If your signals are not already in our standard signal name list, please specify the following information so that PhysioNet tools like LightWAVE can display signals with a reasonable plotting scale by default.
  - Please define the vertical scale for each signal not already defined in wfdbcal. Specifically, for any signals of types not listed in wfdbcal, please supply additional one-line entries to be added to that calibration file, in a plain text file named "CALIBRATION". Or, if you are not sure how to construct such a line, just let us know what are typical ranges (in physical units) for each of these signal types.
- Annotations or event locations: Annotations and event locations should be provided as WFDB annotation files. You can use the Matlab Toolbox or the Python Toolbox to create annotations. See the command wrann.
Software:
- Instructions for installation and usage should be clearly documented.
- Dependencies should be indicated in a requirements file or similar.
- Unit tests should be used to demonstrate correct functioning of major features of the software.
- Standard style guidelines should be followed where appropriate (for example, PEP8 for Python).
Model/Software:
- We now require that most machine learning based projects be published in a reputable, peer-reviewed journal or conference before they can be considered for full review and publication on PhysioNet. The associated publication should provide a clear description of the methodology, rigorous validation, and appropriate comparisons with relevant baselines.
- When sharing model weights or other model related files, authors must use safe, non-executable file formats. For example, safetensors is preferred for neural network model weights. PhysioNet does not accept .pkl/pickle files for model sharing because they can execute arbitrary code when loaded and therefore pose a security risk.

Licenses

When sharing data and software, it is important to be clear about how you intend the content to be reused. To maximize reuse potential of the content, we encourage permissive, open licenses. Currently, authors submitting content to PhysioNet are able to select the following licenses.

Database

Open Data Commons Attribution License v1.0
License - Home page
Creative Commons Attribution 4.0 International Public License
License - Home page
Creative Commons Attribution-ShareAlike 4.0 International Public License
License - Home page
Open Data Commons Open Database License v1.0
License - Home page
Creative Commons Zero 1.0 Universal Public Domain Dedication
License - Home page
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License
License - Home page
PhysioNet Restricted Health Data License 1.5.0
License - Home page
PhysioNet Credentialed Health Data License 1.5.0
License - Home page

Software

GNU General Public License version 3
License - Home page
MIT License
License - Home page
The 3-Clause BSD License
License - Home page

Challenge

Open Data Commons Attribution License v1.0
License - Home page
Creative Commons Attribution 4.0 International Public License
License - Home page
Creative Commons Attribution-ShareAlike 4.0 International Public License
License - Home page
GNU General Public License version 3
License - Home page
Open Data Commons Open Database License v1.0
License - Home page
Creative Commons Zero 1.0 Universal Public Domain Dedication
License - Home page
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License
License - Home page
MIT License
License - Home page
The 3-Clause BSD License
License - Home page
PhysioNet Restricted Health Data License 1.5.0
License - Home page
PhysioNet Credentialed Health Data License 1.5.0
License - Home page

Model

GNU General Public License version 3
License - Home page
MIT License
License - Home page
The 3-Clause BSD License
License - Home page
PhysioNet Restricted Health Data License 1.5.0
License - Home page
PhysioNet Credentialed Health Data License 1.5.0
License - Home page

Submit your Project

Click here to create your project.

Share