Database Credentialed Access

EchoGraph-annotated ECHO-NOTE2NUM examples

Chieh-Ju Chao Mohammad Asadi

Published: Dec. 3, 2025. Version: 1.0.0


When using this resource, please cite: (show more options)
Chao, C., & Asadi, M. (2025). EchoGraph-annotated ECHO-NOTE2NUM examples (version 1.0.0). PhysioNet. RRID:SCR_007345. https://doi.org/10.13026/hb5q-9532

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.

Abstract

This repository releases the EchoGraph-annotated ECHO-NOTE2NUM dataset, containing 45,794 echocardiography reports with comprehensive entity and relation annotations. Each report from the ECHO-NOTE2NUM dataset has been automatically annotated using EchoGraph, a BERT-based information extraction model specifically designed for echocardiography reports. The annotations employ a tailored schema capturing clinical observations (definitely present, definitely absent, uncertain), anatomical structures, measurements, and four types of relations (Modify, Gauge, Located at, Suggestive of). The dataset contains a total of 1,709,074 entities and 671,512 relations across all 45,794 reports. EchoGraph was developed using 600 densely annotated Mayo Clinic reports (48,256 entities, 29,731 relations) and validated on 60 MIMIC-EchoNote reports, demonstrating strong performance (entity F1 0.85 internal, 0.80 external; relation F1 0.70 internal, 0.52 external).

This annotated dataset enables research in clinical NLP, automated report evaluation, and development of AI systems for echocardiography.


Background

Echocardiography reports are the primary communication tool for echocardiography studies, containing detailed quantitative measurements and cardiologists' interpretations [1, 2]. Although these reports hold substantial clinically relevant information, their free-text format presents significant challenges for large-scale use. The unstructured nature of this data, combined with complex medical language, hampers automated extraction and analysis, limiting their utility in artificial intelligence (AI) research and clinical practice [3]. With the rise of generative AI, particularly language models [4, 5], there is an increasing need for automatic, generic metrics to evaluate the factual correctness of generated clinical text to accelerate progress in developing vision-language models that produce accurate reports [3, 6, 7]. Studies have leveraged information extraction or language model techniques for the development of automatic metrics or frameworks to evaluate the reliability and accuracy of AI-generated clinical narratives, enhancing their utility in medical practice and research [3, 6, 8-10].

However, studies specifically on echo reports remain limited, and earlier works have not been developed using generic schemas for information extraction [11-13]. Specifically, these studies often focus on a relatively narrow component, such as the comparison statement of reports [13], or use predefined important elements that do not capture the full report content [11, 12]. The scarcity of densely annotated reports likely led to reliance on narrow extraction schemas, limiting information scope and hindering the development of comprehensive metrics for assessing factual correctness in clinical echo reports.

RadGraph and RadGraph XL are knowledge graph models addressing similar challenges in radiology report analysis [3, 6]. These models use a generic schema with entity types (anatomy, observation, uncertainty) and relation types (modify, located_at, suggestive_of) to extract structured information from chest X-ray reports. However, despite robust out-of-distribution performance, their direct application to echo reports has significant limitations [3, 6, 14-16]. EchoGraph [17] extends this approach with a refined clinical entity and relation extraction schema and reward function specifically tailored to echocardiography reports. The EchoGraph schema differs from RadGraph by: (1) adding a dedicated “Measurements (MEAS)” entity category to capture the quantitative nature of echocardiography; and (2) introducing a “Gauge” relation type to represent measurement-assessment connections common in cardiac evaluations.

This repository releases the EchoGraph-annotated ECHO-NOTE2NUM dataset to facilitate future research in automated echocardiography report analysis and evaluation of AI-generated clinical text.


Methods

Dataset Selection and Preprocessing

All 45,794 examples from the ECHO-NOTE2NUM dataset [18] were included in this study. Each echocardiography report underwent preprocessing to isolate the clinically relevant content, specifically retaining only the "Interpretation" section while removing out-of-scope or repeating text elements. Capitalized anatomical subheadings (e.g., "LEFT VENTRICLE," "VALVES") were systematically excluded to focus on the narrative clinical content rather than structural formatting elements. No additional text normalization or tokenization was applied beyond this section extraction. The key identifier fields from the ECHO-NOTE2NUM dataset (subject_id, hadm_id, row_id, category) were retained to maintain compatibility with the parent dataset.

EchoGraph Annotation Generation

The preprocessed text was subsequently annotated using the EchoGraph model, which employs a comprehensive annotation schema encompassing both named entities and relational structures. All annotations were generated automatically by the trained EchoGraph model without human expert review or correction. The model was trained on 600 densely annotated echocardiography reports from Mayo Clinic, 2017, and validated on 60 MIMIC-EchoNote reports before being applied to the full ECHO-NOTE2NUM dataset.

Named Entity Categories:

  • Observation-Definitely Present (OBS-DP): Clinical entities confirmed to be present.
  • Observation-Definitely Absent (OBS-DA): Clinical entities explicitly ruled out or absent.
  • Observation-Uncertain (OBS-UC): Clinical entities with ambiguous or uncertain status.
  • Anatomy (ANAT): Anatomical structures and cardiac regions.
  • Measurements (MEAS): Quantitative values and dimensional assessments.

Relational Categories:

  • Modify: Relationships that qualify or modify entity characteristics.
  • Gauge: Relationships indicating measurement or assessment connections.
  • Located at:Spatial relationships between findings and anatomical locations.
  • Suggestive of: Relationships indicating diagnostic implications or associations.

Data Description

The dataset consists of a single JSON file (MIMIC_EchoNotes-echograph_annotations.json, approximately 470 MB) containing 45,794 annotated echocardiography reports. The file has been verified to parse correctly.

JSON Structure

Each record in the JSON array contains the following keys:

Key Data Type Description Example Value
subject_id Integer Patient identifier (matches ECHO-NOTE2NUM) 12622
hadm_id Float Hospital admission identifier (matches ECHO-NOTE2NUM) 105857.0
row_id Integer Record identifier (matches ECHO-NOTE2NUM) 70623
category String Report category classification (matches ECHO-NOTE2NUM) "Echo"
interpretation String Extracted interpretation section text after preprocessing "The left atrium is normal..."
radgraph_annotations Object EchoGraph-generated structured annotations See below

Annotation Schema Structure

The radgraph_annotations object contains:

  • A numeric key (typically '0') indexing the annotation set.
  • text: The tokenized interpretation text.
  • entities: Object containing numbered entity entries.
  • data_source: Null (reserved for future use).
  • data_split: "inference" (indicating model-generated annotations).

Each entity within the entities object contains:

  • tokens: The text span of the entity.
  • label: Entity type (ANAT, OBS-DP, OBS-DA, OBS-UC, MEAS).
  • start_ix: Starting token index.
  • end_ix: Ending token index.
  • relations: Array of [relation_type, target_entity_id] pairs.

Representative Example

{
  "subject_id": 12622,
  "hadm_id": 105857.0,
  "row_id": 70623,
  "category": "Echo",
  "interpretation": "The left atrium is normal in size...",
  "radgraph_annotations": {
    "0": {
      "text": "The left atrium is normal in size...",
      "entities": {
        "1": {
          "tokens": "left atrium",
          "label": "ANAT",
          "start_ix": 1,
          "end_ix": 2,
          "relations": [["Modify", "3"]]
        },
        "2": {
          "tokens": "normal",
          "label": "OBS-DP",
          "start_ix": 4,
          "end_ix": 4,
          "relations": [["Modify", "3"]]
        },
        "3": {
          "tokens": "size",
          "label": "MEAS",
          "start_ix": 6,
          "end_ix": 6,
          "relations": []
        }
      },
      "data_source": null,
      "data_split": "inference"
    }
  }
}

Dataset Statistics

  • Total reports: 45,794.
  • Total entities: 1,709,074.
  • Total relations: 671,512.
  • Average entities per report: 37.3.
  • Average relations per report: 14.7.

Identifier Consistency

All identifiers (subject_id, hadm_id, row_id) exactly match those in the parent ECHO-NOTE2NUM dataset. No new identifiers were introduced during the annotation process.

Clarification on Training vs. Released Dataset

The Abstract mentions 48,256 entities and 29,731 relations from the training corpus, which refers to the 600 Mayo Clinic reports used to train the EchoGraph model. The released dataset in this repository contains 45,794 ECHO-NOTE2NUM reports with 1,709,074 entities and 671,512 relations generated by applying the trained model to these reports.


Usage Notes

The following Python code demonstrates how to load and parse the JSON file:


import json

# Load the dataset
with open('MIMIC_EchoNotes-echograph_annotations.json', 'r') as f:
    data = json.load(f)

# Access a single report
report = data[0]
print(f"Subject ID: {report['subject_id']}")
print(f"Interpretation: {report['interpretation'][:100]}...")

# Access entities and relations
annotations = report['radgraph_annotations']['0']
entities = annotations['entities']

# Iterate through entities
for entity_id, entity in entities.items():
    print(f"Entity {entity_id}: {entity['tokens']} ({entity['label']})")
    if entity['relations']:
        print(f"  Relations: {entity['relations']}")

Intended Applications

This annotated dataset is designed to support research and development in clinical natural language processing, specifically for echocardiogram report analysis. The structured EchoGraph annotations enable multiple use cases including:

  • Named Entity Recognition (NER) model training: Fine-tune models to identify clinical observations, anatomical structures, and measurements in cardiology reports.
  • Relation extraction research: Develop systems to automatically detect diagnostic relationships and spatial associations within echocardiogram findings.
  • Clinical decision support systems: Build tools that can parse and structure unstructured echocardiogram interpretations for downstream clinical applications.
  • Medical text mining and knowledge extraction: Extract structured clinical knowledge from narrative cardiology reports at scale.

Unique Dataset Characteristics

This dataset provides several distinctive advantages for the research community:

  • Domain-specific clinical annotations: Unlike general medical NLP datasets, these annotations are specifically tailored to echocardiography terminology and diagnostic patterns.
  • Comprehensive entity-relation framework: The EchoGraph schema captures both clinical findings and their complex interrelationships, providing richer semantic structure than entity-only annotations.
  • Preprocessed clinical focus: By isolating interpretation sections and removing structural elements, the dataset emphasizes clinically meaningful content over administrative formatting.
  • MIMIC-III compatibility: Direct mapping to the established ECHO-NOTE2NUM/MIMIC-III ecosystem facilitates integration with existing clinical research workflows and enables longitudinal patient data analysis.

Limitations

Users should be aware of the following limitations:

  • This dataset is constructed based only on preprocessed interpretation sections extracted from complete echocardiography reports. Researchers requiring full document structure, including technical parameters, comparison sections, or header information should refer to the original ECHO-NOTE2NUM dataset.
  • All annotations were generated automatically by the EchoGraph model without human expert review. While the model demonstrated strong performance on validation sets (F1: 0.85 for entities, F1: 0.70 for relations on internal validation), automated annotations may contain errors.
  • The dataset may not generalize well to text generation tasks, as it is optimized for information extraction from existing clinical text rather than for training generative models.

Release Notes

Version 1.0.0: Initial public release of the dataset.


Ethics

This dataset uses only de-identified data from the MIMIC-III database and the ECHO-NOTE2NUM dataset. All data inherit IRB approval and consent waivers from the original MIMIC-III study. No new identifiable patient information was introduced through the annotation process or metadata generation. The EchoGraph annotation model was trained on data from a separate Mayo Clinic cohort under appropriate institutional approval. All patient identifiers in this released dataset remain consistent with the de-identified identifiers from the parent ECHO-NOTE2NUM dataset.


Acknowledgements

Chieh-Ju Chao, MD, is supported by research funding from the CV Prospective award from the Mayo Clinic Department of Cardiovascular Medicine and the AI/ML Enablement award from the Center for Digital Health at the Mayo Clinic. The other authors declare no relevant funding for this work.


Conflicts of Interest

The authors declare no competing interests.


References

  1. Gardin JM, Adams DB, Douglas PS, et al. Recommendations for a standardized report for adult transthoracic echocardiography: a report from the American Society of Echocardiography’s Nomenclature and Standards Committee and Task Force for a Standardized Echocardiography Report. J Am Soc Echocardiogr. 2002;15(3):275–90.
  2. Douglas PS, Carabello BA, Lang RM, et al. 2019 ACC/AHA/ASE key data elements and definitions for transthoracic echocardiography: a report of the American College of Cardiology/American Heart Association Task Force on Clinical Data Standards and the American Society of Echocardiography. Circ Cardiovasc Imaging. 2019;12(7):e000027.
  3. Jain S, Agrawal A, Saporta A, et al. RadGraph: extracting clinical entities and relations from radiology reports. arXiv preprint arXiv:2106.14463 (2021).
  4. Clusmann J, Kolbinger FR, Muti HS, et al. The future landscape of large language models in medicine. Commun Med 2023;3(1):141.
  5. Busch F, Hoffmann L, Rueger C, et al. Current applications and challenges in large language models for patient care: a systematic review. Commun Med 2025;5(1):26.
  6. Delbrouck J-B, Chambon P, Chen Z, et al. RadGraph-XL: a large-scale expert-annotated dataset for entity and relation extraction from radiology reports. Findings Assoc Comput Linguistics: ACL 2024. 2024:12902–15.
  7. Ostmeier S, Xu J, Chen Z, et al. GREEN: generative radiology report evaluation and error notation. arXiv preprint arXiv:2403.05276 (2024).
  8. Liu L, Yang X, Li F, et al. Towards automatic evaluation for LLMs’ clinical capabilities: metric, data, and algorithm. arXiv preprint arXiv:2403.08795 (2024).
  9. Wu K, Wu E, Wei K, et al. An automated framework for assessing how well LLMs cite relevant medical references. Nat Commun 2025;16(1):3615.
  10. Abbasian M, Khatibi E, Azimi I, et al. Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative AI. Npj Digit Med 2024;7(1):82.
  11. Szekér S, Fogarassy G, Vathy-Fogarassy Á. A general text mining method to extract echocardiography measurement results from echocardiography documents. Artif Intell Med 2023;143:102584.
  12. Toepfer M, Corovic H, Fette G, Klügl P, Störk S, Puppe F. Fine-grained information extraction from German transthoracic echocardiography reports. BMC Méd Inform Decis Mak 2015;15(1):91.
  13. Sun D, Oliveira LM, Spencer KT. Extracting key findings compared in an echocardiogram report. In: Proceedings of the 2018 IEEE International Conference on Healthcare Informatics (ICHI). 2018:382–3.
  14. Yan B, Liu R, Kuo DE, et al. Style-aware radiology report generation with RadGraph and few-shot prompting. arXiv preprint arXiv:2308.12617 (2023).
  15. Yu F, Endo M, Krishnan R, et al. Evaluating progress in automatic chest X-ray radiology report generation. Patterns 2023;4(9):100802.
  16. Chao C-J, Banerjee I, Arsanjani R, et al. Evaluating large language models in echocardiography reporting: opportunities and challenges. Eur Hear J - Digit Heal 2025:ztae086.
  17. Chao C-J, Delbrouck J-B, Asadi M, et al. EchoGraph: a specialized solution for automatic echocardiography report quality evaluation. medRxiv [Preprint]. 2025 May 7:2025.05.07.25327158.
  18. Kwak GH, Moukheiber D, Moukheiber M, Moukheiber L, Moukheiber S, Butala N, et al. EchoNotes Structured Database derived from MIMIC-III (ECHO-NOTE2NUM) (version 1.0.0). PhysioNet. 2024. RRID:SCR_007345. Available from: https://doi.org/10.13026/xhrz-ht59

Parent Projects
EchoGraph-annotated ECHO-NOTE2NUM examples was derived from: Please cite them when using this project.
Share
Access

Access Policy:
Only credentialed users who sign the DUA can access the files.

License (for files):
PhysioNet Credentialed Health Data License 1.5.0

Data Use Agreement:
PhysioNet Credentialed Health Data Use Agreement 1.5.0

Required training:
CITI Data or Specimens Only Research

Discovery

DOI (version 1.0.0):
https://doi.org/10.13026/hb5q-9532

DOI (latest version):
https://doi.org/10.13026/2h1h-9n57

Corresponding Author
You must be logged in to view the contact information.

Files