321 - Neonatal Gut Microbiota Stratification and Identification of SCFA-Associated Microbial Subgroups Using Unsupervised Clustering and Machine Learning Classification
Sunday, April 26, 2026
9:30am - 11:30am ET
Publication Number: 3310.321
Kee Hyun Cho, Kangwon National University, Chuncheon, Kangwon-do, Republic of Korea; Payam Hosseinzadeh Kasani, Kangwon National University, Chuncheon, Kangwon-do, Republic of Korea
Assistant Professor of Pediatrics Kangwon National University Chuncheon, Kangwon-do, Republic of Korea
Background: The neonatal gut microbiome plays a crucial role in early-life health through the production of short-chain fatty acids (SCFAs), yet the structure and metabolic organization of SCFAs-producing communities in newborns remain poorly characterized due to high interindividual variability. Objective: To use unsupervised clustering and machine learning approaches to classify neonatal microbial subgroups linked to SCFAs production and reveal their compositional and functional characteristics. Design/Methods: This study recruited 71 mother-infant pairs from Kangwon National University Hospital and Bundang CHA Hospital, collecting meconium samples within five days postpartum. Microbial diversity was analyzed through 16S rRNA gene sequencing (V3-V4 region) at the genus level, alongside SCFAs concentration measurements in neonatal stool samples. To identify functionally distinct microbial subgroups, K-Means, Agglomerative, Spectral, and Gaussian Mixture Model clustering were applied. Clustering validity was assessed using Silhouette Score, Calinski-Harabasz Index, Davies-Bouldin Index, and Prediction Strength Validation, with t-distributed Stochastic Neighbor Embedding (t-SNE) visualization to evaluate cluster separation. SCFAs distributions across clusters were compared, while random forest and logistic regression models were used to classify SCFAs-associated microbial clusters through Receiver Operating Characteristic curves. Results: The clustering analysis identified distinct microbial subgroups linked to SCFAs production, with Agglomerative clustering outperforming K-Means in capturing functionally relevant structures. Cluster 1 had higher SCFAs levels, enriched in Bacteroides, Prevotella, and Enterococcus, while Cluster 2 exhibited lower SCFAs concentrations with a more heterogeneous composition. The introduction of a third cluster in multi-class analysis revealed an intermediate metabolic profile, suggesting a continuum in microbial metabolic function. Classification analysis confirmed random forest model superiority, achieving receiver operating characteristic (ROC) curve socre of 91.05% (Agglomerative) and 87.74% (KMeans) in binary classification, and 92.98% (Agglomerative) and 89.84% (K-Means) in multi-class classification, demonstrating RF’s strong predictive ability for SCFAs-based clusters.
Conclusion(s): Unsupervised clustering combined with classification analysis effectively predict SCFAs-associated subgroups and paving the way for future research on longitudinal tracking and functional genomic integration in early-life metabolic health.