WAPrFX: A protocol to derive simple signals from complex heart-rate data and its application to train an AI model to predict wether the signal contains anomalies or not.
Note. This work involved human subjects. All human-subject procedures and protocols are exempt from review board approval.
Abstract
We present a pipeline that projects heartbeat signals into simplified heart-rate representations (WAPrFX) and trains a deep neural network to classify healthy vs. unhealthy cases. The projection preserves key variability and periodic structure while enabling a compact 10-feature input. On a 100-record manual hold-out, the model achieves AUC = 0.954 (95% CI ([0.902, 0.993])); at the validation-selected operating point, F1 = 0.93 and accuracy = 0.93. These results support effective anomaly detection using simplified signals.
Keywords: Heart Signal, ECG, Cardiac Anomalies, Heart Disease, Heart, Deep Learning
1. Introduction
Cardiovascular disease remains the leading global cause of death and disability [1]. Post–COVID-19 survivors—symptomatic or not—may show biomarkers of lasting cardiac injury, risking higher morbidity and mortality without proactive measures [2, 15]. Time is critical: each minute without defibrillation cuts survival by 7%–10% [16], and recognition delays—especially among older adults and women—worsen outcomes [17, 18].
Current monitoring is limited by expert dependence, slow interpretation, and costly acquisition. While deep learning on full-spectrum ECG attains strong accuracy [7–13], single-modality reliance can constrain generalizability. We instead simplify signals while preserving discriminative content, focusing on heart rate to maximize compatibility and access; recent systematic reviews on rPPG/PPG and ECG-based deep learning contextualize this choice [11, 33–35].
2. Method
2.1 Problem Statement
Learn a compact embedding from cardiac signals that preserves class-discriminative information for binary classification (healthy vs. non-healthy), yielding a low-dimensional feature vector suitable for deep learning.
2.2 Data Repositories
We unify three public, clinically validated sources—MIT–BIH Arrhythmia, PTB-Diagnostic ECG, and Autonomic Aging [3, 4, 6].
- MIT–BIH: 47 ECG records (1975–1979)
- PTB-Diagnostic: 290 subjects (17–87 years)
- Autonomic Aging: 1,121 healthy volunteers with resting ECG and continuous non-invasive blood pressure
After schema harmonization and quality control, we obtain an automatic train/validation set of 2,224 simplified heart-rate (rHR) sequences, balanced 1:1 (Class 0 = 1,112; Class 1 = 1,112), plus a 100-record manual hold-out (50/50) for independent verification. No missing values were detected. The ETL process for the three databases is documented in a flowchart diagram [38].
2.3 Waveform Projection and Feature Extraction (WAPrFX)
We project the heartbeat time series to a 1 Hz waveform with heart rate as amplitude, preserving short/long-term variability and periodic structure while compressing the signal. From this projection, we derive a compact set of statistical and spectral features sufficient to classify signals as healthy or non-healthy. We term the pipeline Waveform Projection and Feature Extraction (WAPrFX); its features exhibit distinct distributions for normal vs. pathological cases, enabling effective discrimination.
2.4 Feature Set and Scaling
We first designed 24 candidate features as the basis for analysis. After extraction from the projected waveform, each feature is standardized with scikit-learn’s StandardScaler
[19]:
where and are estimated only on the training set. We persist and and reuse them unchanged for validation, test, and deployment to avoid data leakage and ensure reproducibility. Standardization equalizes feature scales, improves numerical conditioning, and stabilizes downstream optimization.
Our feature design aligns with recent reviews in rPPG/PPG and ECG deep learning [11, 33, 36, 34].
2.5 Feature Screening and Selection
From 24 candidates, we applied two complementary screens to remove redundancy and quantify contribution:
- PCA (unsupervised). On standardized features, the first three PCs explain 69.1% of variance (PC1 = 41.9%, PC1–2 = 55.9%); loadings show PC1 dominated by LF/HF/VLF powers and PC2 by lowest HR, HRV SD, and max HR.
- Supervised importance. Using PyCaret [20] and a model-agnostic mutual-information ranking, spectral powers rank highest, followed by HRV dispersion and minimum rate.
We retained 10 concordant, low-collinearity features (see pairwise visualization important-features.png
), and define them below.
1) Minimum Heart Rate Variation
The smallest non-zero absolute difference between consecutive heart-rate values:
2) Approximate Entropy (ApEn)
Quantifies time-series regularity (lower ApEn = more predictable; higher = more irregular).
We set per Pincus for short, noisy physiologic series and use Euclidean distance [23].
3) LF Power (FFT)
Energy of HRV within 0.04–0.15 Hz (sympathetic + parasympathetic) [22]:
4) Variation Slope
Average rate of change in heart rate over time:
5) HF Power (FFT)
Energy in 0.15–0.40 Hz (primarily parasympathetic/respiratory) [21]:
6) Lowest Heart Rate
7) Average Trend Deviation
Smooth the series with a moving average, then fit a linear trend:
8) Standard Deviation of HR Variability
9) Maximum Heart Rate Variation
10) VLF Power (FFT)
Energy in 0.003–0.04 Hz (long-term regulatory mechanisms) [22]:
Top-10 feature importances (percentage):
Feature | Importance |
---|---|
Min. HR variation | 20.80% |
Approx. entropy | 17.75% |
LF power (FFT) | 10.02% |
Variation slope | 9.93% |
HF power (FFT) | 9.65% |
Lowest heart rate | 7.96% |
Std. dev. of HR variation | 6.10% |
Avg. trend deviation | 6.07% |
Max. HR variation | 6.02% |
VLF power (FFT) | 5.70% |
3. Deep Learning Inference
3.1 Classification Model
We train a supervised deep neural network to classify the 10-D WAPrFX vectors as healthy vs. unhealthy. The operating threshold is selected on validation (Youden’s ) and applied unchanged to the hold-out. Integrated into WAPrFX, this supports scalable, near-real-time monitoring.
3.2 Neural Network Architecture
A fully connected MLP with 10-D input. Hidden widths start at 8,192 () and halve sequentially to 32 () with ReLU activations. The output layer has 2 units (healthy, unhealthy) with softmax.
3.3 Optimization and Learning
Training uses Adam () optimizing sparse categorical cross-entropy; we monitor validation accuracy, apply early stopping and checkpointing.
Rationale for a wide first hidden layer. We adopt a wide first layer (8,192) to ease optimization and generalization under explicit/implicit regularization; it also serves as a reusable feature bank and supports knowledge preservation (e.g., adapters/EWC). Capacity is controlled by stratified CV, early stopping, checkpointing, and weight penalties; the layer adds only 90,112 parameters [26–32].
3.4 Validation
We use stratified 5-fold cross-validation. Fold 1 starts from random weights; subsequent folds warm-start from the prior fold’s best model for speed only (no data leakage). Out-of-fold predictions yield ROC/AUC and operating-point metrics (sensitivity, specificity, PPV, NPV, accuracy, ). The operating threshold is selected on validation via Youden’s and held fixed on the hold-out; learning curves are inspected for overfitting. Reporting follows fair-evaluation guidance for ECG arrhythmia classification [37].
4. Results and Discussion
Last split (Fold 5), class-wise:
Class | Precision | Recall | F1-score | Support |
---|---|---|---|---|
0 (Negative) | 0.990 | 0.990 | 0.990 | 155 |
1 (Positive) | 0.990 | 0.990 | 0.990 | 156 |
Fold 5 (summary):
Fold | Accuracy | Macro Avg (P/R/F1) | Weighted Avg (P/R/F1) | Support |
---|---|---|---|---|
5 | 0.99 | 0.99 / 0.99 / 0.99 | 0.99 / 0.99 / 0.99 | 311 |
To obtain an unbiased estimate, we evaluated a stratified, random hold-out of 100 ECGs never used for training or model selection: 50 anomalous and 50 non-anomalous.
Manual hold-out, class-wise (Youden’s operating point):
Class | Precision | Recall | F1-score | Support |
---|---|---|---|---|
0 (Negative) | 0.939 | 0.920 | 0.929 | 50 |
1 (Positive) | 0.922 | 0.940 | 0.931 | 50 |
Manual hold-out (Youden’s threshold = 0.856):
Metric | Value |
---|---|
Accuracy | 0.930 |
Sensitivity (Recall) | 0.940 |
Specificity | 0.920 |
Precision (PPV) | 0.922 |
Negative Predictive Value | 0.939 |
F1-score | 0.931 |
Balanced Accuracy | 0.930 |
Confusion matrix (hold-out):
Pred. Negative | Pred. Positive | |
---|---|---|
Real Negative | 46 | 4 |
Real Positive | 3 | 47 |
As summarized above, precision was 0.939 for class 0 and 0.922 for class 1, with F1 ≈ 0.93 for both classes.
ROC/AUC (hold-out, threshold chosen on validation via Youden’s and applied unchanged):
AUC = 0.954 (95% CI ([0.902, 0.993])); Sensitivity = 0.94 (95% CI ([0.923, 1.000])); Specificity = 0.92 (95% CI ([0.809, 0.980])); PPV = 0.922; NPV = 0.939.
5. Limitations and Future Work
Dataset vintage and heterogeneity. Two sources (MIT–BIH, PTB–Diagnostic) were collected decades ago; domain shift with modern devices is possible. The rHR projection may attenuate information present in raw ECG.
External validity and real-time evaluation. We report internal validation and a 100-record manual hold-out only—no external, multicenter, or streaming/real-time testing.
Prevalence mismatch and thresholding. Balanced datasets (1:1) aid training but do not reflect clinical prevalence; PPV/NPV and utility depend on prevalence and threshold. We used Youden’s ; other goals (screening vs. rule-in) may require different thresholds.
Population bias and subgroups. No stratification by age, sex, or comorbidities; residual biases may persist.
Calibration and interpretability. We assessed discrimination and operating-point metrics, but not probability calibration (e.g., reliability/Brier) or post-hoc explanations.
Manual hold-out size. The 50/50 hold-out is informative but small; CIs are wide and real-world performance may differ.
Future Work. External, multicenter prospective validation; on-device, real-time evaluation (latency/memory/energy); probability calibration and decision-curve analysis; subgroup/fairness reporting; domain adaptation and OOD detection; extension to additional cardiac conditions via transfer learning—aligned with standardization/feasibility guidance [37].
Disclaimer — This research prototype is not intended for clinical use or to guide diagnosis or treatment.
6. Conclusion
Future Applications
The ability to operate on a simplified signal enables integration with rPPG, oximeters, or wearable heart-rate monitors, and can slot into more complex systems seeking enhanced detection.
Our findings are consistent with broader patterns highlighted in recent surveys on ECG-DL and AI in ECG analysis [11, 34, 35].
Final Thoughts
This work delivers a pipeline for early cardiac-anomaly detection by simplifying cardiac signals and training deep models. A key contribution is WAPrFX, which projects ECG-derived sequences into a compact, heart-rate–based representation and extracts discriminative features—unlike prior AI systems that rely on full ECG waveforms, our approach operates on simplified signals while retaining clinically relevant information.
Acknowledgment
We thank Cenfotec for the opportunity to investigate this project as part of a Graduation Project for an Applied Artificial Intelligence Master’s degree for Gino Marín, and the AILab for support and resources.
References
- World Health Organization, “WHO reveals leading causes of death and disability worldwide: 2000–2019,” who.int, Dec. 9, 2020. Available: https://www.who.int/news/item/09-12-2020-who-reveals-leading-causes-of-death-and-disability-worldwide-2000-2019
- I. Vosko, A. Zirlik, and H. Bugger, “Impact of COVID-19 on cardiovascular disease,” Viruses, 15(2):508, 2023. doi:10.3390/v15020508
- A. L. Goldberger et al., “PhysioBank, PhysioToolkit, and PhysioNet…,” Circulation, 101(23):e215–e220, 2000. RRID: SCR_007345.
- R. Bousseljot, D. Kreiseler, A. Schnabel, “Nutzung der EKG-Signaldatenbank CARDIODAT…,” Biomed. Tech., 40(S1):317, 1995.
- G. B. Moody, R. G. Mark, “The impact of the MIT-BIH Arrhythmia Database,” IEEE Eng. Med. Biol. Mag., 20(3):45–50, 2001. PMID: 11446209.
- A. Schumann, K. J. Bär, “Autonomic aging—dataset…,” Sci. Data, 9:95, 2022. doi:10.1038/s41597-022-01202-y
7–13. (Deep-learning ECG reviews and applications) [7]–[13] as in your list.
14–18. (Clinical context & recognition/response) [14]–[18]. - F. Pedregosa et al., “Scikit-learn: ML in Python,” JMLR, 12:2825–2830, 2011.
- M. Ali, “PyCaret…,” 2020. https://www.pycaret.org
- M. V. Højgaard et al., “Dynamics of spectral components of HRV…,” AJP Heart Circ. Physiol., 275(1):H213–H219, 1998.
- L. A. Fleisher et al., “Thermoregulation and HRV,” Clin. Sci., 90(2):97–103, 1996.
- S. M. Pincus, A. L. Goldberger, “Physiological time-series analysis…,” AJP Heart Circ. Physiol., 266(4 Pt 2):H1643–H1656, 1994.
24–25. (Optimizers) [24], [25].
26–32. (Generalization, double descent, NTK, transfer, EWC) [26]–[32]. - A. Debnath, S. Kim, “rPPG + DL review,” Biomed. Eng. OnLine, 24:60, 2025.
- M. R. Silva et al., “DL and ECG: systematic review,” Biomed. Eng. OnLine, 24:36, 2025.
- N. Bouassida et al., “AI/ML/DL in ECG analysis,” Comput. Methods Programs Biomed., 250:107848, 2024.
- P. Singh, A. Kumar, J. Lee, “DL for PPG data,” arXiv:2401.12783, 2024.
- T. T. Nguyen, L. X. Tran, H. T. Le, “Systematic review… fair evaluation & embedded feasibility,” arXiv:2503.07276, 2025.
- G. A. Marín León, “Database ETL Diagram,” Zenodo, Aug. 21, 2025. doi:10.5281/zenodo.16923294