Cardiology
EchoNext: A Dataset for Detecting Echocardiogram-Confirmed Structural Heart Disease from ECGs
This dataset contains a de-identified collection of 100,000 12-lead electrocardiograms (ECGs) with paired structural heart disease (SHD) labels derived from echocardiography, collected at Columbia University Irving Medical Center. Each ECG is provided with raw waveform data sampled at 250 Hz across all 12 leads, along with accompanying demographic and ECG-specific tabular metadata, including age, sex, heart rate, PR interval, QRS duration, and corrected QT interval. Each ECG is annotated with a binary label indicating the presence or absence of structural heart disease based on echocardiographic findings. This dataset was developed as part of the creation of the Columbia Mini-Model, a lightweight deep learning model for SHD detection from ECGs.