Database Open Access

Haaglanden Medisch Centrum sleep staging database

Diego Alvarez-Estevez Roselyne Rijsman

Published: March 18, 2022. Version: 1.1

When using this resource, please cite: (show more options)
Alvarez-Estevez, D., & Rijsman, R. (2022). Haaglanden Medisch Centrum sleep staging database (version 1.1). PhysioNet.

Additionally, please cite the original publication:

Alvarez-Estevez D, Rijsman RM (2021) Inter-database validation of a deep learning approach for automatic sleep scoring. PLoS ONE 16(8): e0256111.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.


A collection of 151 whole-night polysomnographic (PSG) sleep recordings (85 Male, 66 Female, mean Age of 53.9 ± 15.4) collected during 2018 at the Haaglanden Medisch Centrum (HMC, The Netherlands) sleep center. Patient recordings were randomly selected and include a heterogeneous population which was referred for PSG examination on the context of different sleep disorders. The dataset contains electroencephalographic (EEG), electrooculographic (EOG), chin electromyographic (EMG), and electrocardiographic (ECG) activity, as well as event annotations corresponding to scoring of sleep patterns (hypnogram) performed by sleep technicians at HMC. The dataset was collected as part of a study evaluating the generalization performance of an automatic sleep scoring algorithm across multiple heterogeneous datasets.


Polysomnographic sleep recordings (PSGs) capture key biomedical signals of a patient in the context of sleep medicine studies, and are a central tool for the diagnosis of many sleep disorders. Current guidelines for sleep scoring carry out segmentation of neurophysiological activity in discrete 30s epochs. Each epoch may be classified as one of five possible states according to signal activity: wakefulness, stages N1, N2, N3, and R. For sleep staging, neurophysiological activity of interest involves monitoring of different traces of electroencephalographic (EEG), electromyographic (EMG) and electrooculographic (EOG) activity [1].

The HMC sleep staging dataset was collected as part of a study evaluating the generalization performance of an automatic sleep scoring algorithm across multiple heterogeneous datasets. Development of inter-database generalizable sleep staging algorithms presents a challenge due to variability across different datasets and problems associated with sharing of patient data. Detailed information about the motivation, the study design, as well as description of methods regarding the proposed computer algorithm, and its validation, can be found in our associated paper [2].


The dataset includes a total of 151 PSG recordings gathered retrospectively from the sleep center database of the Haaglanden Medisch Centrum (HMC, The Hague, The Netherlands). Patient recordings were randomly selected from a heterogeneous group of patients who were referred for PSG examination in 2018 due to different sleep disorders. No additional selection criteria were applied because the study sought to assess performance and reliability on the most general and heterogeneous patient phenotype possible. The collection includes a mix of both in-hospital and ambulatory recordings. Ambulatory recordings were started at hospital following setup and biological calibration controlled by expert sleep technicians.

The recordings were acquired in the course of common clinical practice. Patients were not subjected to any additional treatment or interventions outside of the standard clinical workflow. Data were fully anonymized to avoid possibility of individual patient identification. Ethics approval for reuse of this dataset was granted by the Zuid-West Holland institutional review board (METC-19-065).

The PSG data consist of four EEG (F4/M1, C4/M1, O2/M1, and C3/M2), two EOG (E1/M2 and E2/M2), one bipolar chin EMG, and one ECG (single modified lead II) derivations. This montage meets the minimal recommended technical specifications for visual scoring of sleep stages according to the 2.4. version of the AASM guidelines [1].

All signals were sampled at 256 Hz. Signals were recorded using SOMNOscreen PSG, PSG+, and EEG 10-20 recorders (SOMNOmedics, Germany) using AgAgCl electrodes. Raw PSG signals were then digitalized using the EDF format [3]. No additional digital filtering was applied besides the default analog pre-filter settings. Specific LP/HP cut-off values are available per recording in the corresponding EDF pre-filter header, in any case never more restrictive than the AASM technical recommendations. Scoring of sleep stages was carried out manually by well-trained sleep technicians according to the 2.4 version of the AASM guidelines [1].

Data Description

Night recordings are identified by a sequence number SNXXX (e.g. SN001 identifies recording 001). Three different files are available for each recording using the following notation:

  • SNXXX.edf: Contains the PSG signals (e.g. EEG, EMG, EOG and ECG derivations as described above) in EDF format [3]
  • SNXXX_sleepscoring.edf: Contains the corresponding hypnogram annotations and lights-off/on markers* in EDF+ format [4]
  • SNXXX_sleepscoring.txt: Contains the corresponding hypnogram annotations and lights-off/on markers* in comma separated text file format.

*Hypnogram sleep stages and lights off-on text markers are coded following the EDF+ standard texts and polarity rules [5].

Recording periods made available in the files were clipped to contain valid scoring intervals only, i.e. time between 'lights off' and 'lights on' markers. Aggregated data regarding age, gender, and general PSG descriptors are available in Subjects_info_aggregated.txt as plain text.

Usage Notes

The HMC sleep staging database may be of interest to researchers studying physiological changes related to the sleep process and the relationship to different sleep stages according to clinical sleep scoring standards [1]. In addition, the data may be used as a benchmark dataset for the development and validation of automatic sleep scoring algorithms. For reference, see our associated papers on automatic sleep scoring and addressing database variability [2,6].

Both EDF and EDF+ formats are open and can be viewed using free software such as:

  • Polyman (for MS-Windows only) [7]. An easy way to view these recordings using Polyman is to select File → Open → Template → Browse... to open HCMdatabase_quickcheck.xml and then select a target SNXXX.edf file.
  • EDFbrowser (for Linux, Mac OS X, and MS-Windows; at [8]
  • LightWAVE platform-independent web application from PhysioNet [9]
  • WAVE and other applications for Linux, Mac OS X, and MS-Windows in the WFDB Software Package, also from PhysioNet [10]. Applications using WFDB library version 10.4.5 (February 2008) or later can read EDF files directly with no conversion required. Note that the WFDB library does not decode annotations in EDF+ files, however. Recently the WFDB-Python package provides methods for viewing EDF/EDF+ formatted fields as well [11].
  • In general, an updated list of EDF(+) compatible software is usually available on the EDF Plus website [12]

The automatic algorithm described in [2] used as input two EEG derivations (C4/M1, C3/M2), the chin EMG and one horizontal EOG channel. Horizontal EOG (E1-E2) can be derived by subtracting the E1/M2 and E2/M2 channels.

Release Notes

Version 1.0.0: initial release.

Version 1.0.1: reference info updated

Version 1.1: recordings SN014, SN064, and SN135 were removed after it was detected that these recordings contained erroneous (and unfixable) signal data. If you were using an earlier version of this database, you might consider to exclude these recordings from your analyses


The authors declare no ethics concerns.

Conflicts of Interest

The authors have no conflicts of interest to declare.


  1. Berry RB, Brooks R, Gamaldo CE, et al. for the American Academy of Sleep Medicine. The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology and Technical Specifications. Darien, IL: American Academy of Sleep Medicine; 2017. Version 2.4
  2. Alvarez-Estevez D, Rijsman RM (2021) Inter-database validation of a deep learning approach for automatic sleep scoring. PLoS ONE 16(8): e0256111.
  3. EDF format specification:
  4. EDF+ format specification:
  5. EDF+ standard texts and polarity rules:
  6. Alvarez-Estevez, D., & Fernández-Varela, I. (2020). Addressing database variability in learning from medical data: An ensemble-based approach using convolutional neural networks and a case of study applied to automatic sleep scoring. Computers in Biology and Medicine, 119, 103697. doi:10.1016/j.compbiomed.2020.103697
  7. Polyman viewer:
  8. EDFbrowser:
  9. Lightwave viewer:
  10. F. Renna, J. H. Oliveira, and M. T. Coimbra, “Deep convolutional neural networks for heart sound segmentation,” IEEE Journal of Biomedical and Health Informatics, vol. 23, no. 6, pp. 2435–2445, 2019. [Online]. Available:
  11. Xie, C., McCullum, L., Johnson, A., Pollard, T., Gow, B., & Moody, B. (2021). Waveform Database Software Package (WFDB) for Python (version 3.3.0). PhysioNet.
  12. List of EDF(+) compatible software:


Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.

License (for files):
Creative Commons Attribution 4.0 International Public License

Corresponding Author
You must be logged in to view the contact information.
  • 1.0.0 - July 1, 2021
  • 1.0.1 - Sept. 27, 2021
  • 1.1 - March 18, 2022


Total uncompressed size: 15.7 GB.

Access the files

Visualize waveforms

Folder Navigation: <base>
Name Size Modified
HMCdatabase_quickcheck.xml (download) 8.5 KB 2021-04-01
LICENSE.txt (download) 14.5 KB 2022-03-17
RECORDS (download) 3.1 KB 2022-02-18
SHA256SUMS.txt (download) 42.2 KB 2022-03-22
Subjects_info_aggregated.txt (download) 163 B 2022-02-18