MIMIC II: Record Matching and Surrogate Dates

Since the MIMIC II Waveform and Clinical Databases have been collected from different sources, it was not known initially which waveform and clinical records are associated with the same patient. Record matching is the process used to identify these associations, and it is described below.

The MIMIC II Waveform Database Matched Subset contains all MIMIC II Waveform Database records that have been associated with MIMIC II Clinical Database records. The record matching process is ongoing, and more records may be added to the matched subset in the future.

Each patient admitted to the hospital wears a wristband containing his or her medical record number (MRN) in barcode form. These bands are scanned whenever patients are transferred between units in the hospital.

The source for MIMIC II Clinical Database records is the hospital-wide medical information system. Each record is tagged with a unique identifier (the ICU stay ID), and also includes the patient’s MRN (captured by scanning the wristband on admission to the ICU) and the patient’s name (provided by the hospital information system based on the MRN). The ICU admission ID and the patient’s name and MRN are highly reliable identifiers, since no manual transcription is used to record them.

The source for MIMIC II Waveform Database records also includes the patients' names and MRNs, as well as a unique identifier (the case ID, assigned automatically by the monitoring equipment for each monitoring session). Since the monitoring equipment does not include bar-code readers, and is not integrated with the hospital-wide information system, however, the name and MRN fields are usually typed into the ICU’s central monitoring station when a patient is admitted. Since nurses and physicians working in the ICU use more reliable means of identifying their patients, errors and omissions in the manually entered name and MRN fields are not uncommon, and they may go unnoticed for lengthy periods.

Waveform and clinical records with matching patient names, MRNs, and overlapping dates can be tentatively matched, as can those with near matches (variant spellings of names, similar MRNs, etc.). For tentative matches, the hourly vital-signs measurements recorded by the ICU nurses in the clinical records are compared with the higher-resolution vital-signs data recorded by the monitoring equipment in the waveform records, and if they are sufficiently similar, the match is accepted.

Briefly, assessing trend similarity includes four stages:

The record matching process is biased against questionable matches, so that accepted matches are almost certainly correct. Despite the considerable care and effort used in this process, however, it is possible that some of the matches may be incorrect. For further details, please see the MIMIC II User’s Guide.

The identifiers used in the record matching process are protected health information (PHI), so they have been removed or replaced by surrogate identifiers in the MIMIC II Databases. This step has significant implications with respect to dates, discussed in the next section.


All real dates in medical records are protected health information, although intervals between dates (except for the ages of patients over 89) are not PHI. Since the time intervals between events in a patient’s record are very important elements of the MIMIC II Database, we have replaced real dates with surrogate dates in the MIMIC II Clinical Database, and in the matching records of the MIMIC II Waveform Database. Unmatched records in the MIMIC II Waveform Database are undated (although the time of day is preserved in these records), and the remainder of the discussion below does not apply to these records.

To avoid altering the time intervals between events, the surrogate dates are derived from the real dates by adding a random number of days, N, to the real dates. N has several significant properties:

The last of these properties of N has two significant implications for the use of the MIMIC II Databases. First, it is not possible from the surrogate dates to identify groups of patients who were in the ICU at the same time; and second, it is not possible from the surrogate dates to separate groups of patients who were in the ICU before or after a given date.

Ages of patients who are more than 89 years old are also PHI (except that all such patients may be described as a single group aged "90 or more"). For any patient who was 90 or older at any time during his or her associated MIMIC II record, the surrogate birth date has been adjusted so that the patient’s age during the record appears to be approximately 200.

