Icon Legend

Presentation Icons

Ticketed Event

APS Award Winner

ASPN Award Winner

APA Award Winner

SPR Award Winner

Awarded Part 4 Maintenance of Certification (MOC) Credit

Poster Icons

SPR Award Winner

APA Award Winner

Awarded Part 4 Maintenance of Certification (MOC) Credit

131 Views

Neonatal Fetal Nutrition & Metabolism

Session: Neonatal Fetal Nutrition & Metabolism 3

321 - Neonatal Gut Microbiota Stratification and Identification of SCFA-Associated Microbial Subgroups Using Unsupervised Clustering and Machine Learning Classification

Sunday, April 26, 2026

9:30am - 11:30am ET

Publication Number: 3310.321

Kee Hyun Cho, Kangwon National University, Chuncheon, Kangwon-do, Republic of Korea; Payam Hosseinzadeh Kasani, Kangwon National University, Chuncheon, Kangwon-do, Republic of Korea

Poster Presenting Author(s)

Kee Hyun Cho, MD PhD (she/her/hers)

Assistant Professor of Pediatrics
Kangwon National University
Chuncheon, Kangwon-do, Republic of Korea

Background: The neonatal gut microbiome plays a crucial role in early-life health through the production of short-chain fatty acids (SCFAs), yet the structure and metabolic organization of SCFAs-producing communities in newborns remain poorly characterized due to high interindividual variability.

Objective: To use unsupervised clustering and machine learning approaches to classify neonatal microbial subgroups linked to SCFAs production and reveal their compositional and functional characteristics.

Design/Methods: This study recruited 71 mother-infant pairs from Kangwon National University Hospital and Bundang CHA Hospital, collecting meconium samples within five days postpartum. Microbial diversity was analyzed through 16S rRNA gene sequencing (V3-V4 region) at the genus level, alongside SCFAs concentration measurements in neonatal stool samples. To identify functionally distinct microbial subgroups, K-Means, Agglomerative, Spectral, and Gaussian Mixture Model clustering were applied. Clustering validity was assessed using Silhouette Score, Calinski-Harabasz Index, Davies-Bouldin Index, and Prediction Strength Validation, with t-distributed Stochastic Neighbor Embedding (t-SNE) visualization to evaluate cluster separation. SCFAs distributions across clusters were compared, while random forest and logistic regression models were used to classify SCFAs-associated microbial clusters through Receiver Operating Characteristic curves.

Results: The clustering analysis identified distinct microbial subgroups linked to SCFAs production, with Agglomerative clustering outperforming K-Means in capturing functionally relevant structures. Cluster 1 had higher SCFAs levels, enriched in Bacteroides, Prevotella, and Enterococcus, while Cluster 2 exhibited lower SCFAs concentrations with a more heterogeneous composition. The introduction of a third cluster in multi-class analysis revealed an intermediate metabolic profile, suggesting a continuum in microbial metabolic function. Classification analysis confirmed random forest model superiority, achieving receiver operating characteristic (ROC) curve socre of 91.05% (Agglomerative) and 87.74% (KMeans) in binary classification, and 92.98% (Agglomerative) and 89.84% (K-Means) in multi-class classification, demonstrating RF’s strong predictive ability for SCFAs-based clusters.

Conclusion(s): Unsupervised clustering combined with classification analysis effectively predict SCFAs-associated subgroups and paving the way for future research on longitudinal tracking and functional genomic integration in early-life metabolic health.