The MIMIC II Waveform Database

Mirror users: Please use the PhysioNet master server as needed if files of interest within this database are inaccessible from a mirror.

This is version 3.2 of the MIMIC II Waveform Database (August 2011, updated August 2017; 67,830 records). For new studies, we recommend using the MIMIC-III Waveform Database instead. This database will remain available to support ongoing work.

This database is described in

M. Saeed, M. Villarroel, A.T. Reisner, G. Clifford, L. Lehman, G.B. Moody, T. Heldt, T.H. Kyaw, B.E. Moody, R.G. Mark. Multiparameter intelligent monitoring in intensive care II (MIMIC-II): A public-access ICU database. Critical Care Medicine 39(5):952-960 (2011 May); doi: 10.1097/CCM.0b013e31820a92c6.

Please cite this publication when referencing this material, and also include the standard citation for PhysioNet:

Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PCh, Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation 101(23):e215-e220 [Circulation Electronic Pages; http://circ.ahajournals.org/cgi/content/full/101/23/e215]; 2000 (June 13).

The MIMIC II Waveform Database contains thousands of recordings of multiple physiologic signals ("waveforms") and time series of vital signs ("numerics") collected from bedside patient monitors in adult and neonatal intensive care units (ICUs). It is a companion to the MIMIC II Clinical Database, which contains detailed clinical information for many of the patients represented in the Waveform Database. The MIMIC II Waveform Database Matched Subset contains 4,897 waveform records and 5,266 numerics records from the MIMIC II Waveform Database, which have been matched and time-aligned with 2,809 MIMIC II Clinical Database records. See the MIMIC II home page for information about the project in which these databases were created.

Recorded waveforms and numerics vary depending on choices made by the ICU staff. Waveforms almost always include one or more ECG signals, and often include continuous arterial blood pressure (ABP) waveforms, fingertip photoplethysmogram (PPG) signals, and respiration, with additional waveforms (up to 8 simultaneously) as available. Numerics typically include heart and respiration rates, SpO2, and systolic, mean, and diastolic blood pressure, together with others as available. Recording lengths also vary; most are a few days in duration, but some are shorter and others are several weeks long.

Use the PhysioBank ATM to view any desired record in this database, to export it in a variety of formats, or to perform a variety of other operations on it.

The MIMIC II Waveform Database does not currently include annotations. There is, however, a PhysioNetWorks community project that is developing a set of reference annotations for the MIMIC II Waveform Database. You are invited to join this project if you wish to contribute and share annotations of this database with the community.

What's New (Changes from version 2):

Version 3 of the MIMIC II Waveform Database is about 7 TB in all, more than nine times the size of version 2. Using enhanced data recovery techniques, the original raw data dumps have been reprocessed to extract many records that were previously unreadable, and additional raw data dumps collected since the previous release were also processed to obtain many more records.

Records in versions 1 and 2 that contained gaps of an hour or more have been split into multiple records that do not contain such gaps. (See Gaps and patient identification below.)

Versions 1 and 2 contained surrogate dates (since real dates are PHI that cannot be shared freely). As MIMIC II Waveform Database records were matched with MIMIC II Clinical Database records, the surrogate dates were changed to match those in version 2.5 of the MIMIC II Clinical Database. To avoid confusion associated with changing the surrogate dates, the public release of version 3 contains no dates.

In versions 1 and 2, record names were 5-digit numbers with a prefix of a or n. In version 3, the records are identified by 7-digit numbers (3000000 through 3999999; most numbers in this range are unassigned). The assigned names have been chosen randomly, so that there is no relationship between the record names and the dates of the original recordings.

As of March 2012, the MIMIC II Waveform Database includes 23180 records (see RECORDS for a list of record names). These include 17468 records of adult ICU patients (RECORDS-adults) and 5712 of neonates (RECORDS-neonates). We estimate that the database currently contains records from roughly 13500 distinct patients, or about 40% of the patients represented in the MIMIC II Clinical Database.

Version 2 of the MIMIC II Waveform Database is still available here, for those who are using it in ongoing studies. New studies should make use of the most recent version, which includes all of the 4164 previously released records and many more collected before, during, and after the interval when the version 1 and 2 records were obtained.

Version History

Version Released Records
NewRevisedTotal
3.0 August 2011 21,422
3.1 March 2012 1,758 23,180
3.2 August 2017 44,650 51* 67,830

In version 3.2, significant errors were found in 51 previously-published records, which have now been updated. These errors included incorrect record starting times as well as corrupted signal files.

Furthermore, the layout headers for all waveform records have been revised in version 3.2, to reduce loss of precision and avoid incorrect clipping or wrapping when reading the record using WFDB, or converting it into other formats. The underlying signal data have not been altered.

Organization of the Database

Each recording comprises two records (a waveform record and a matching numerics record) in a single record directory ("folder") with the name of the record. To reduce access time, the record directories have been distributed among ten intermediate-level directories (listed below). The names of these intermediate directories (30, 31, ..., 39) match the first two digits of the record directories they contain.

In almost all cases, the waveform records comprise multiple segments, each of which can be read as a separate record. Each segment contains an uninterrupted recording of a set of simultaneously observed signals, and the signal gains do not change at any time during the segment. Whenever the ICU staff changed the signals being monitored or adjusted the amplitude of a signal being monitored, this event was recorded in the raw data dump, and a new segment begins at that time.

Each composite waveform record includes a list of the segments that comprise it in its master header file. The list begins on the second line of the master header with a layout header file that specifies all of the signals that are observed in any segment belonging to the record. Each segment has its own header file and (except for the layout header) a matching (binary) signal (.dat) file. Occasionally, the monitor may be disconnected entirely for a short time; these intervals are recorded as gaps in the master header file, but there are no header or signal files corresponding to gaps.

The numerics records (designated by the letter n appended to the record name) are not divided into segments, since the storage savings that would be achieved by doing so would be relatively little.

Physiologic waveform records in this database contain up to eight simultaneously recorded signals digitized at 125 Hz with 8-, 10-, or (occasionally) 12-bit resolution. Numerics records typically contain 10 or more time series of vital signs sampled once per second or once per minute.

An example will make this arrangement clear:

  • Intermediate directory 31 contains all records with names that begin with 31.
  • Record directory 3141595 is contained within intermediate directory 31.
  • All files associated with physiologic waveform record 3141595 and its companion numerics record 3141595n are contained within record directory 31/3141595.
    • The first line of the master header file for waveform record 314595 (31/3141595/3141595.hea) indicates that the record is 242353557 sample intervals (about 22 days at 125 samples per second) in duration, and that it contains 427 segments and gaps. (See header(5) in the WFDB Applications Guide for details on the format of this text file.) The first segment is named 3141595_0001, and it is 2888500 sample intervals (6 hours, 15 minutes, and 8 seconds, at 125 samples per second) in duration. At the end of the master header file, a comment (# Location: nicu) specifies the ICU in which the recording was made (the neonatal ICU in this case).
    • The layout header file for this record (31/3141595/3141595_layout.hea) indicates that five ECG signals (I, II, III, AVR, and "V"), a respiration signal, and a PPG signal are available during portions of the record. (The five ECG signals are not all available simultaneously.)
    • The header file for the first segment of this record (31/3141595/3141595_0001.hea) shows that a PPG signal ("PLETH"), a respiration signal, and ECG leads II and AVR are available throughout this initial segment.
  • The matching numerics record is named 3141595n, and its header file (31/3141595/3141595n.hea) shows that it is 1938730 sample intervals (about 22 days at 1 sample per second) in duration, and that it contains heart rate (HR, from ECG, as well as PULSE, from one or more pulsatile signals), noninvasive blood pressure (raw as well as systolic, diastolic, and mean), respiration rate, and SpO2.

Any WFDB application can read any waveform record from this database directly from the PhysioNet web server (i.e., without downloading the record first) using a record name of the form mimic2wdb/3x/3xyyyyy/. Numerics records can be read using the longer form mimic2wdb/3x/3xyyyyy/3xyyyyyn (note that the final 3xyyyyy must be repeated and followed by n to specify the numerics record).

For example, if you have installed the WFDB Software Package, you can read the first 10 seconds of waveform record 3141595 using this rdsamp command:

rdsamp -r mimic2wdb/31/3141595/ -p -v -t 10

To read the first 10 seconds of the matching numerics record 3141595n, use this command instead:

rdsamp -r mimic2wdb/31/3141595/3141595n -p -v -t 10

Notice that the first command produces 1250 samples of each waveform (125 samples per second, for 10 seconds), but the second command produces only 10 samples of each vital sign (1 sample per second, for 10 seconds). See How to obtain PhysioBank data in text form for details about using rdsamp.

Clinical Correlates

The MIMIC II Clinical Database contains detailed clinical information about most of the subjects represented in the MIMIC II Waveform Database. Since the contents of each database were collected independently, in partially deidentified form, matching the clinical data with the waveform data is a non-trivial task, and only about 25% of MIMIC II Waveform Database records (see the MIMIC II Waveform Database Matched Subset) have been matched with MIMIC II Clinical Database records as of January 2012.

In these cases, the matches provide additional information about the subjects, including age, gender, and detailed clinical information collected during (and in some cases before and after) the periods that have been recorded in the Waveform Database records. MAP-CW is a (plain text) map linking the Clinical and Waveform Database records that have been matched to date; it includes age and gender information for almost all matched records. For additional clinical correlates, apply for access to the MIMIC II Clinical Database (a data use agreement is required).

Multiple recordings of a given patient, which may exist (for example) if that patient was admitted more than once to any of the study ICUs during the study period, do not have related MIMIC II Waveform Database record names; it will be necessary to use the MAP file (above) to discover any such cases.

Technical Limitations

Waveforms or numerics missing: Occasionally, technical limitations of the data acquisition system make it possible to create a physiologic waveform record but not a numerics record, or vice versa. In 534 of 23180 record sets (2.30%), the physiologic waveforms are unavailable, and in 1957 records (8.44%), the numerics are unavailable. In 20689 cases (89.25%), both waveforms and numerics are available.

A given signal may not be available throughout an entire record. Records in the MIMIC II Waveform Database vary in length; some are several weeks in duration. It is common for the physiologic signals to be interrupted or changed occasionally during recordings of such long duration. When using a viewer such as the PhysioBank ATM, all signals available at any time during a record are listed, although in most cases only a subset is visible at any given time.

Gaps and patient identification. The waveform and numerics records have been extracted from raw data dumps collected from the bedside monitors using a facility provided by the monitor manufacturer. The raw data dumps contain files of data collected from a single patient monitor during a single monitoring session (which may last days or weeks). Usually the monitoring session ends when the patient is discharged, so that the data in a single file come from a single patient. Occasionally, however, the monitor is not reset when the patient is discharged, and the session continues after a new patient has been admitted; in this case the raw data file contains data from two (or more) patients, with a gap (an interval during which no waveforms or numerics are recorded) that is typically an hour or more in duration. Such gaps may also appear if the monitor is temporarily disconnected (for example, for a laboratory test) and then reconnected to the same patient. Since the raw data files do not usually contain patient identifiers, it is not trivial to determine with certainty if the data before and after a gap were collected from the same patient.

Ideally, each MIMIC II Waveform Database record should contain data from only one patient. In versions 1 and 2 of the database, the task of identifying raw data files containing data from two or more patients was handled manually, and errors detected in version 1 were corrected in version 2. For version 3, all raw data files containing gaps of an hour or more have been split into separate records in order to decrease the likelihood that any record contains data from multiple patients. As a result, nearly half (48.4%) of the version 1 and 2 records have been split in version 3. An ongoing project is to examine the sets of records created this way, matching them with MIMIC II Clinical Database records when possible, to determine if and how they should be reassembled.

ECG limitations: The ECG signals in the waveform records were originally sampled with 12-bit precision at a high sampling rate, and were then scaled and decimated to 500 samples per second (per signal). The scaling reduced the effective amplitude resolution from 12 bits to 9 or 10 bits in typical cases, and as little as 7 bits in some cases. From each set of 4 consecutive decimated samples of the same ECG signal, one was recorded (chosen using a turning-point compressor, a technique sometimes called "peak-picking"). The result is an ECG signal sampled 125 times per second, but at intervals that vary between 2 and 14 ms (averaging 8 ms). Since the interval between any given pair of samples was not available to us, the reconstructions of the ECG signals assume uniform 8 ms intervals. These signals with reduced time and amplitude resolution, and sampling jitter introduced by the "peak-picking", were the only ECG signals that were possible to capture from the ICU monitors. Although ECGs reconstructed in this way can be readily interpreted visually, they are unsuitable as input for certain algorithms for ECG analysis, particularly those that are sensitive to frequency-domain features of the signal. Note that these limitations apply only to the ECG signals, not to the other signals, which were originally sampled at uniform 8 ms intervals (125 samples per second) and were not scaled prior to capture.

Icon  Name                    Last modified      Size  Description
[PARENTDIR] Parent Directory - [DIR] 30/ 2017-08-07 13:05 - MIMIC II Waveform Database version 3, part 0 [DIR] 31/ 2017-08-05 10:45 - MIMIC II Waveform Database version 3, part 1 [DIR] 32/ 2017-08-05 10:45 - MIMIC II Waveform Database version 3, part 2 [DIR] 33/ 2017-08-05 10:45 - MIMIC II Waveform Database version 3, part 3 [DIR] 34/ 2017-08-05 10:45 - MIMIC II Waveform Database version 3, part 4 [DIR] 35/ 2017-08-05 10:45 - MIMIC II Waveform Database version 3, part 5 [DIR] 36/ 2017-08-05 10:45 - MIMIC II Waveform Database version 3, part 6 [DIR] 37/ 2017-08-05 10:45 - MIMIC II Waveform Database version 3, part 7 [DIR] 38/ 2017-08-05 10:45 - MIMIC II Waveform Database version 3, part 8 [DIR] 39/ 2017-08-05 10:45 - MIMIC II Waveform Database version 3, part 9 [DIR] matched/ 2017-08-05 11:12 - MIMIC II Waveform Database Matched Subset, version 3.1 [DIR] signal-quality/ 2012-03-15 13:04 - [DIR] versions/ 2017-08-05 10:42 - [   ] MAP-CW 2011-08-23 12:54 60K age, sex, clinical ID of matched waveform records [   ] RECORDS-neonates 2017-08-04 18:11 99K waveform records of neonates [   ] RECORDS-adults 2017-08-04 18:11 695K waveform records of adult subjects [   ] RECORDS-waveforms 2017-08-04 18:24 779K list of waveform record names [   ] RECORDS 2017-08-04 17:28 795K list of waveform record names [   ] RECORDS-numerics 2017-08-04 18:24 1.2M list of numerics record names

Questions and Comments

If you would like help understanding, using, or downloading content, please see our Frequently Asked Questions.

If you have any comments, feedback, or particular questions regarding this page, please send them to the webmaster.

Comments and issues can also be raised on PhysioNet's GitHub page.

Updated Friday, 28 October 2016 at 16:58 EDT

PhysioNet is supported by the National Institute of General Medical Sciences (NIGMS) and the National Institute of Biomedical Imaging and Bioengineering (NIBIB) under NIH grant number 2R01GM104987-09.