Session: Neo-Perinatal Health Care Delivery: Epidemiology/Health Services Research 1
77 - Unsupervised, Population-Based Phenotyping of Preterm Infants to Identify Risk Profiles for Mortality and Severe Preterm Morbidities
Sunday, April 26, 2026
9:30am - 11:30am ET
Publication Number: 3073.77
Kee Thai Yeo, KK Women's & Children's Hospital, Singapore, N/A, Singapore; Kok Joo Chan, University Malaya Medical Centre, KUALA LUMPUR, Kuala Lumpur, Malaysia; Sithum Munasinghe, Western Sydney University, Penrith, New South Wales, Australia; Mithilesh Dronavalli, Western Sydney University, Sydney, New South Wales, Australia; Meredith C. Ward, University of New South Wales, Sydney, New South Wales, Australia; Josef Neu, University of Florida, Gainesville, FL, United States; Ju Lee Oei, University of Queensland, Brisbane, Queensland, Australia
Senior Consultant KK Women's & Children's Hospital Singapore, Singapore
Background: Early life outcomes of preterm infants are shaped by individual clinical trajectories influenced by diverse clinical exposures and etiological factors. Identification of distinct preterm phenotypes characterized by specific combinations of clinical features can provide opportunities to understand risk factors for short- and long-term preterm outcomes. Objective: To discern phenotypes of preterm infants that are associated with death and major morbidities before NICU discharge. Design/Methods: Data from 19,352 preterm infants ( < 37 weeks gestation) born in New South Wales, Australia from 2007-2016 were analysed. A total of 84 demographic and clinical features underwent unsupervised clustering by K prototype. Cluster mapping and visualization were performed using t-distributed stochastic neighbour embedding (t-SNE). Incidence of death before discharge and major morbidities were compared post-clustering. Random Forest model was applied for phenotypic interpretation of the machine-generated clusters and relative importance of features within each cluster was quantified using Gini Impurity Index. Results: Six distinct clusters were identified with defined phenotypes (Table 1). Figure 1a illustrates the distribution of the major clusters, with the mean gestational age of the clusters decreasing from bottom left to top right (Cluster 2 - median gestational age 36 weeks to Cluster 3 - median 25 weeks). The top 10 features which provided the clearest separation between the six clusters are shown in Figure 1b. A total of 25 different features were used by the models to distinguish between the clusters. Of note, cluster 2 consisted of infants with hypoxic-ischemic encephalopathy who received cooling, with high mortality rates (11.3%). Of the two extreme preterm clusters (3 and 5), infants in cluster 3 was differentiated by respiratory support (high-frequency oscillator ventilation(HFOV)), ventilator support and oxygen) and number of blood transfusions, and had the highest rates of pre-discharge mortality (12.3%), and severe preterm morbidities (severe IVH, severe ROP, BPD, NEC) (Figure 2). Comparing the two clusters (1 and 6) of infants with median gestation of 31 and 32 weeks, those in Cluster 6 had distinctly lower Apgar scores at one and five minutes at birth and had higher pre-discharge death rates but comparable incidence of severe morbidities.
Conclusion(s): Unsupervised clustering techniques of large administrative and clinical datasets allow for identification of specific preterm phenotypes that are associated with distinctly different outcomes.
Table 1. Demographic and clinical characteristics stratified by designated clusters (arranged by gestational maturity)
Figure 1. (a) Cluster visualization of 19,352 preterm infants using T-distributed stochastic neighbor embedding (T-SNE) and the (b) top 10 prioritized features per cluster according to GINI impurity index
Figure 2. Major pre-discharge outcome of preterm infants by designated clusters