Database Open Access

PSG-IPA: A PolySomnoGraphic Inter-scorer Performance Assessment database

Diego Alvarez-Estevez

Published: Jan. 8, 2026. Version: 1.0.0


When using this resource, please cite: (show more options)
Alvarez-Estevez, D. (2026). PSG-IPA: A PolySomnoGraphic Inter-scorer Performance Assessment database (version 1.0.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/esx0-nw71

Additionally, please cite the original publication:

Alvarez-Estevez, A., & Rijsman, R. M. (2022). Computer-assisted analysis of polysomnographic recordings improves inter-scorer associated agreement and scoring times. PLOS ONE, 17(9), e0275530

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.

Abstract

This dataset, named PolySomnoGraphic Inter-scorer Performance Assessment (PSG-IPA) database, comprises 20 polysomnography (PSG) recordings and related multi-expert scoring annotations. It was utilized in a study to evaluate inter-scorer variability, specifically assessing reliability and scoring times associated with various PSG review tasks. The study also aimed to compare the impact of computer-assisted (semi-automatic) scoring versus unassisted visual analysis on these factors.

The dataset includes recordings of common raw PSG signal activity, such as electroencephalography (EEG), electrooculography (EOG), electromyography (EMG) of both the chin and lower limbs, electrocardiography (ECG), and various respiratory derivations. Each recording is accompanied by both manual and computer-assisted scorings, performed by 12 independent expert sleep technologists. This comprehensive resource offers significant value for further research into PSG inter-scorer variability, as well as for the development and performance assessment of automated sleep stage and event detection algorithms.


Background

While Polysomnography (PSG) remains the gold standard for diagnosing sleep disorders, its reliance on skilled sleep technologists to manually score vast amounts of physiological data is inherently time-consuming and prone to inter-scorer variability, which can affect diagnostic accuracy and treatment decisions. Computer-assisted analysis tools have emerged as a potential solution to mitigate these challenges, but their impact on scoring efficiency and reliability needs thorough evaluation.

This dataset comprises a standardized set of 20 PSG recordings that contain raw signals and sleep staging and event annotations from multiple independent expert scorers using both manual and computer-assisted scoring settings. The resulting data is meant to be a valuable resource enabling researchers to analyze inter-scorer effects to assess the limits of human scoring as well as the benchmarking of automated sleep analysis methods


Methods

Study database

PSG data for this study has been gathered by retrospective inspection of the Haaglanden Medisch Centrum (HMC, The Hague, The Netherlands) Sleep Center patient database. The pre-sample dataset comprised 2801 recordings, corresponding to the most recent full-year data in the HMC database at the time (2019). Using this pre-sample dataset as reference, 5 PSG recordings were independently selected in the context of each of the following PSG scoring sub-tasks: (i) sleep staging, the detection of (ii) EEG arousals, (iii) leg movements, and (iv) respiratory-related events, amounting to a total of 20 PSG recordings. For each group, the selection procedure was implemented following an automatic procedure with the aim to minimize the risk of selection bias and ensuring a balanced representation of scoring difficulty for each task. The procedure is described with more detail in [1]. Other than that, no specific exclusion criteria were applied to filter out recordings due to specific patient conditions, or poor signal quality. A sufficient condition was that the recording had been accepted for manual scoring during regular clinical workflow.

All data were originally acquired in the course of common clinical practice. PSG data consisted of raw biomedical signals following standard acquisition procedures described by the American Association of Sleep Medicine (AASM) guidelines [2]. SOMNOscreenTM plus devices (SOMNOmedics, Germany) were used as the acquisition hardware. Scorings procedures were carried out by HMC expert sleep technicians following the AASM protocols for the analysis of sleep stages, EEG arousals, and the detection of respiratory-related events [2]. The scoring of leg movements was scheduled following the methods of the 2016 World Association of Sleep Medicine (WASM2016) manual [3]. Both the raw signals’ data and the resulting clinical scoring annotations were digitally stored using the European Data Format plus (EDF+) free open format [4].

Rescoring task

A group of 12 expert scorers were prompted to review each of the 5 PSGs that were involved in each of the targeted scoring sub-tasks. All scorers were experienced sleep technicians from the same center (HMC), who have a completed training certification, and that regularly and autonomously participate in the daily scoring routine of the sleep department. Sleep technicians with uncompleted training or undergoing supervision were excluded from this study

Scorings were repeated, separately for each task, using first a pure manual visual analysis, followed by computer-assisted semi-automatic scoring, resulting in a total of 40 different scoring exercises per scorer (5 PSGs per task, 4 tasks, 2 scoring settings). Each participant scorer was tasked to review the exact same recordings. In all cases, scoring was performed blindly to both the patient identity and the results of possible previous scorings (e.g. that could take place during regular clinical workflow, from other scorers, or during a previous self-rescoring subtask). Scorers were thus not informed about the fact that manual and semi-automatic scorings would involve the exact same recordings. To avoid learning effects, at least 4 months of separation were scheduled between the two manual and semi-automatic scoring moments. For reference, an average amount of 70 PSG recordings are scored by each sleep technician due to the normal sleep lab activity during that period.

For all tasks, scoring took place between Time In Bed (TIB) periods only, i.e. between “lights off” and “lights on” markers, which were provided as pre-filled annotations on each case. For the scoring of EEG arousals, leg movements, and respiratory events, the pre-filled clinical hypnogram was also provided as additional source for contextual interpretation, and to avoid divergence due to initial conditions. Scorers were then instructed to stick to the scoring of the relevant events in the context of the specific target task, not being allowed to change any pre-filled contextual information, when supplied. All scoring tasks took place using the Polyman software as supporting tool [5].

For implementation of the semi-automatic scoring process, pre-filled scored events that resulted from the output of a first-pass of specific automatic analysis algorithms were provided. Scorers were then instructed to review these scorings by adding, deleting, or editing the the corresponding event’s onset and offset times, where corresponds, and according to their own expertise. Details regarding the specific development and performance assessment of the automatic scoring algorithms that were used for this purpose have been reported in past works. The reader is referred to check the corresponding references regarding the automatic scoring of sleep stages [6], EEG arousals [7], leg movements [8,9], and respiratory events [10].


Data Description

The dataset includes raw PSG signal files and corresponding annotation files from each of the scoring tasks described in the previous section.

Each PSG recording contains multiple channels following an AASM-based setup. More specifically:

  • EEG: F4-M1, C4-M1 (or Cz-M1 as alternative), and O2-M1 - 256 Hz
  • EOG: E1-M2, E2-M2 - 256 Hz
  • EMG: Submental (EMG chin - 256 Hz) and lower limbs anterior tibialis (EMG LAT and EMG RAT - 128-256 Hz)
  • ECG: Modified Lead II - 256 Hz
  • Respiratory Flow: Nasal pressure - 256 Hz
  • Respiratory Effort: Thoracic and abdominal effort belts - 32 Hz
  • Oxygen Saturation: SpO2 - 4 Hz

AgAgCl electrodes were used for all ExG derivations, which were commonly referenced to Cz in the case of EEG signals. Signals were afterwards combined from common source resulting in the above-described montages. Other than that, signals are provided in raw format, i.e. with no additional pre-processing after the A/D conversion process from the used hardware device itself (SOMNOscreenTM plus, SOMNOmedics, Germany). That includes the absence of additional digital filtering besides default analog pre-filter settings, whose specific LP/HP cut-off values are available in the corresponding EDF+ header section on each recording [4]. These values are, in all instances, no more restrictive than the AASM technical recommendations.

Night recordings are identified using the following coding schema: SNX_taskID, where SNX references the sequence number (e.g. SN1 identifies recording 1) within the corresponding scoring task identified by taskID = {SleepStages, EEGarousals, Respiration, LimbMovements}.

For example, the file named SN3_EEGarousals.edf contains the digitalized signals from the third recording related to the EEG arousals scoring task.

Recording periods are trimmed to one minute before and after the corresponding lights-off and lights-on markers.

Basic summary statistics for the cohort composition are provided in the following table. PSG descriptors correspond to values resulting from retrospective examination in the clinical database, i.e. prior to the multi-expert rescoring procedures carried out in the study. Distributions are characterized using the median and the corresponding interquartile ranges. For more details see Table 1 in [1]:

Scoring task
Parameter Sleep staging EEG arousals Respiration Leg movements All
n 5 5 5 5 20
Age (years) 52.0 [47.0, 57.0] 55.0 [52.0, 63.0] 59.0 [57.0, 61.0] 57.0 [51.0, 68.0] 57.0 [51.8, 61.5]
Male (n, %) 5 (100%) 3 (60%) 3 (60%) 1 (20%) 12 (60%)
Time In Bed (TIB, hours) 7.5 [7.4, 8.0] 8.1 [7.2, 8.2] 6.5 [6.4, 7.4] 7.3 [7.0, 7.3] 7.3 [7.0, 8.0]

Accompanying each recording, scoring annotations are provided within corresponding EDF+ annotation-only files identified as SNX_taskID_scoringSetting_scorerID.The same annotations are also provided in separated text comma separated value (CSV) files. For example, the files named SN2_Respiration_manual_scorer4.edf and SN2_Respiration_manual_scorer4.txt contain the scoring annotations from the second recording related to the identification of respiratory-related events, annotated under the manual visual scoring setting from the fourth scorer (out of twelve available). Files under the automatic setting, contain the pre-filled annotations (contextual information + automatic scorings for the specific task) that represent the starting point for semi-automatic scorings.

The specific set of annotations included on each case unfolds according to the following schema:

Task Scoring setting Pre-filled as context1,2 Imported from automatic method1,3 Rescored by experts1
Sleep staging Manual Lights on - off markers Sleep stages
Semi-automatic Lights on - off markers Sleep stages Sleep stages
Automatic Lights on - off markers Sleep stages
EEG arousals Manual Lights on - off markers, sleep stages EEG arousals
Semi-automatic Lights on - off markers, sleep stages EEG arousals EEG arousals
Automatic Lights on - off markers, sleep stages EEG arousals
Respiration Manual Lights on - off markers, sleep stages, EEG arousals, leg movements Respiratory events
Semi-automatic Lights on - off markers, sleep stages, EEG arousals, leg movements Respiratory events Respiratory events
Automatic Lights on - off markers, sleep stages, EEG arousals, leg movements Respiratory events
Leg movements Manual Lights on - off markers, sleep stages, EEG arousals, respiratory events Leg movements
Semi-automatic Lights on - off markers, sleep stages, EEG arousals, respiratory events Leg movements Leg movements
Automatic Lights on - off markers, sleep stages, EEG arousals, respiratory events Leg movements

1The specific event labels follow the EDF+ standard texts and polarity rules [11].
2This set of annotations is provided as baseline context information. Neither the automatic algorithm nor the human expert are able to change them
3See methods for more information and references on the specific used algorithm

The following table describes file counts for each of the task folders:

Task # PSG EDF files Scoring setting # EDF+ annotation files # CSV/TXT annotation files
Sleep staging 5 Manual 12x5=60 12x5=60
Semi-automatic 12x5=60 12x5=60
Automatic 5 5
EEG arousals 5 Manual 12x5=60 12x5=60
Semi-automatic 12x5=60 12x5=60
Automatic 5 5
Respiration 5 Manual 12x5=60 12x5=60
Semi-automatic 12x5=60 12x5=60
Automatic 5 5
Leg movements 5 Manual 12x5=60 12x5=60
Semi-automatic 12x5=60 12x5=60
Automatic 5 5

Usage Notes

This dataset is suitable and intended for:

  • Developing and validating automated sleep stage and event detection algorithms
  • Studying inter-scorer variability in manual and computer-assisted PSG analysis
  • Benchmarking novel signal processing and machine learning approaches for sleep research
  • Training and evaluating sleep technologists in scoring consistency

Possible limitations might involve:

  • Data and performance scores are associated with one specific sleep lab. Results might not generalize to other centers
  • While a high number of human experts was involved (12), PSG sample size (20 in total, 5 per task) was limited, constrained by costs associated to human (re-)scoring
  • The gold standard implicitly assumes that the outcome of all human scorers is equally valid, which is a potentially risky assumption since there is no clear formula to determine the "best reference" among experts
  • Semi-automatic scoring data are modulated by the reliability of the specific automatic analysis algorithms used as the first pass. The use of alternative automatic scoring methods might lead to different results

The European Data Format (EDF) is a simple and flexible format for exchange and storage of multichannel biological and physical signals. It is free and open, and has become the de-facto standard for EEG and PSG recordings in commercial equipment and multicenter research projects. An extension of EDF, named EDF+, was later developed which is largely compatible with EDF and in addition enables the recording of interrupted recordings and scoring annotations. The full specification of both formats, together with related original publications, are available in the dedicated edfplus.info website [4].

Both EDF and EDF+ formats can be opened and/or viewed using free software such as:

  • Polyman (for MS-Windows only) [12]. An easy way to view these recordings using Polyman is to select File → Open → Template → Browse... to open PSG_IPA_db_quickcheck.xml and then select one target SNX_taskID.edf file.
  • EDFbrowser (for Linux, Mac OS X, and MS-Windows) [13].
  • LightWAVE platform-independent web application from PhysioNet [14]. Select the PSG-IPA as the input database and the desired recording number to show the corresponding signals on the screen.
  • WAVE and other applications for Linux, Mac OS X, and MS-Windows in the WFDB Software Package, also from PhysioNet [15]. Applications using WFDB library version 10.4.5 (February 2008) or later can read EDF files directly with no conversion required. Note that the WFDB library does not decode annotations in EDF+ files, however, recently the WFDB-Python package provides methods for viewing EDF/EDF+ formatted fields as well. Detailed instructions on how to getting started to read EDF(+) files in WFDB-Python are provided in [16].

The reference website [17] is also a good reference point and lists several additional EDF(+) compatible software and programming libraries.


Release Notes

Version 1.0.0: Initial release of the dataset


Ethics

All recordings were de-identified and subrogate study numbers were assigned to each patient prior their inclusion in the study, thus avoiding any possibility of individual patient identification. Dates have as well been shifted to protect the privacy of the participants. The study obtained approval of the local Medical Ethics Committee (Medisch Ethische Toestsingscomissie Zuidwest Holland) under code MTEC-19-065, who considered that the protocol did not fall under the scope of the Medical Scientific Research Involving Human Subjects Act (WMO). No informed consent was required for these retrospective examination of the de-identified recordings. Permission was obtained for public sharing of the anonymized EDF and related scoring annotation data.


Acknowledgements

The author acknowledges the support received from projects 2019-073 at Haaglanden Medisch Centrum and ED431H 2020/10 by Xunta de Galicia, enabling original data collection and study publication. The author also acknowledges the support received at the time of publication of this dataset by projects RYC2022-038121-I, funded by MCIN/AEI/10.13039/501100011033 and European Social Fund Plus (ESF+), PID2023-147422OB-I00 funded by MCIU/AEI/10.13039/501100011033 and by European FEDER program, and ED431F 2025/35 funded by Xunta de Galicia.


Conflicts of Interest

The author declares no competing interests related to the dataset itself or the study described in the associated publication


References

  1. Alvarez-Estevez A, Rijsman RM. (2022). Computer-assisted analysis of polysomnographic recordings improves inter-scorer associated agreement and scoring times. PLoS One. 17(9):e0275530
  2. Berry R, Brooks R, Gamaldo C, Harding S, Lloyd R, Quan S, et al. (2017). The AASM manual for the scoring of sleep and associated events: rules, terminology and technical specifications, Version 2.4. Darien (IL): American Academy of Sleep Medicine
  3. Ferri R, Fulda S, Allen R, Zucconi M, Bruni O, Chokroverty S, et al. (2016). World Association of Sleep Medicine (WASM) 2016 standards for recording and scoring leg movements in polysomnograms developed by a joint task force from the International and the European Restless Legs Syndrome Study Group (IRLSSG and EURLSSG). Sleep Med. 26:86–95
  4. EDF+ format specification: https://www.edfplus.info/specs/edfplus.html
  5. Kemp B, Roessen M. (2007). Polyman: a free(ing) viewer for standard EDF(+) recordings and scorings. In: Sleep and Wake Research in The Netherlands. Dutch Society for Wake-Sleep Research
  6. Alvarez-Estevez D, Rijsman R. (2021). Inter-database validation of a deep learning approach for automatic sleep scoring. PLoS One. 16(8):e0256111
  7. Alvarez-Estevez D, Fernández-Varela I. (2019). Large-scale validation of an automatic EEG arousal detection algorithm using different heterogeneous databases. Sleep Med. 57:6–14
  8. Alvarez-Estevez D, Wahid D, Rijsman R. (2017). Validation of an automatic scoring algorithm for the analysis of periodic limb movements according to the WASM2016 guidelines. Sleep Med. 40:e13–e14
  9. Alvarez-Estevez D. (2016). A new automatic method for the detection of limb movements and the analysis of their periodicity. Biomed Signal Process Control. 26:117–125
  10. Moret-Bonillo V, Alvarez-Estevez D, Fernández-Leal A, Hernández-Pereira E. (2014). Intelligent approach for analysis of respiratory signals and oxygen saturation in the Sleep Apnea/Hypopnea Syndrome. Open Med Inform J. 8:1–19
  11. EDF+ standard texts and polarity rules: https://www.edfplus.info/specs/edftexts.html
  12. Polyman viewer: https://www.edfplus.info/downloads/index.html
  13. EDFbrowser: https://www.teuniz.net/edfbrowser/
  14. Lightwave viewer: https://physionet.org/lightwave/
  15. Moody G, Pollard T, Moody B. (2022). WFDB Software Package [Internet]. Version 10.7.0. PhysioNet. Available from: https://doi.org/10.13026/gjvw-1m31
  16. Xie, C., McCullum, L., Johnson, A., Pollard, T., Gow, B., & Moody, B. (2023). Waveform Database Software Package (WFDB) for Python (version 4.1.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/9njx-6322
  17. List of EDF(+) compatible software: https://www.edfplus.info/downloads/index.htm

Share
Access

Access Policy:
Anyone can access the files, as long as they conform to the terms of the specified license.

License (for files):
Creative Commons Attribution 4.0 International Public License

Discovery

DOI (version 1.0.0):
https://doi.org/10.13026/esx0-nw71

DOI (latest version):
https://doi.org/10.13026/7kvm-qe34

Corresponding Author
You must be logged in to view the contact information.

Files

Total uncompressed size: 2.6 GB.

Access the files

Visualize waveforms

Folder Navigation: <base>
Name Size Modified
EEG_arousals
Limb_movements
Resp_events
Sleep_stages
LICENSE.txt (download) 14.5 KB 2026-01-01
PSG_IPA_db_quickcheck.xml (download) 11.5 KB 2025-06-13
PSG_IPA_db_quickcheck_with_scoring.xml (download) 16.9 KB 2025-06-13
RECORDS (download) 754 B 2025-06-11
SHA256SUMS.txt (download) 133.7 KB 2026-01-08
subject-info.txt (download) 790 B 2025-11-18