646 - Deep Learning for Risk Stratification of Severe Intraventricular Hemorrhage in Extremely Preterm Infants at Birth: A Temporally Validated Study with Uncertainty Quantification in the Neocosur Network
Saturday, April 25, 2026
3:30pm - 5:45pm ET
Publication Number: 2630.646
Cecilia Cocucci, Hospital Universitario Austral, Buenos Aires, Buenos Aires, Argentina; Sebastian Camerlingo, Austral, Buenos Aires, Buenos Aires, Argentina; Gabriel Musante, Hospital Universitario Austral, Pilar, Buenos Aires, Argentina
Neonatologist - Assistant Professor Biostatistics Hospital Universitario Austral Buenos Aires, Buenos Aires, Argentina
Background: Severe intraventricular hemorrhage (SIVH) Papile grades III-IV affects ~20% of extremely low gestational age neonates (ELGAN). Prophylactic indomethacin may benefit high-risk infants but carries significant toxicity demanding precise individual risk stratification. Additionally, observed-to-expected (O/E) indices enable neonatal networks to monitor center-level performance for quality improvement initiatives. Existing logistic and machine learning models emphasize discrimination yet rarely report calibration, uncertainty, and validation, limiting clinical applicability and their transparency Objective: To develop and temporally validate a deep-learning (DL) model to predict SIVH in ELGANs, emphasizing calibration, uncertainty and usability over discrimination, and derive an O/E index for center benchmarking Design/Methods: In this retrospective multicenter cohort study we predicted the composite outcome of SIVH or death ≤7 days in ELGANs born 2013-2024 in Neocosur Network centers (Argentina, Chile, Paraguay, Peru, Uruguay). Fourteen perinatal and postnatal variables available within 12 h of birth were used as predictors. After dimensionality reduction, five models (single- and multilayer neural networks, with Random Forest, XGBoost, and LASSO logistic regression as benchmarks) were trained/tested on 2013-2022 subset with 70/30 random split and temporally validated on 2023-2024 subset (Fig 1). Performance was assessed through AUC, Brier, calibration-in-the-large, slope, expected calibration error, entropy, 90% conformal intervals, decision-curve analysis (DCA) for net benefit and an O/E index calculated as observed cases divided by the sum of predicted probabilities per center Results: We analyzed 7,616 ELGANs; outcome prevalence was 39% in train/test and 36% in validation sets. The DL single-layer neural network outperformed other models, achieving comparable discrimination but superior calibration, lower entropy, and narrower conformal intervals. DCA showed net benefit across thresholds of 10-50%, reducing unnecessary prophylaxis while preserving benefit in high-risk infants. The O/E index revealed marked inter-center variability. Temporal validation confirmed robust model transportability (Fig 2-3)
Conclusion(s): This temporally validated DL model delivered better-calibrated and more confident predictions than benchmark approaches, supporting individualized indomethacin decisions within ≤12 h of birth and enabling network-level benchmarking through O/E metrics. Emphasizing calibration, uncertainty quantification, and clinical utility bridges precision medicine and quality improvement in neonatal care
Figure 1. Study flowchart. Predictor variables
Figure 2. Models Performance. Best Model selection
Figure 3. Uncertainty Quantification, Clinical Utiliity, Temporal Validation of Best Model and O/E index funnel plot by center