Session: Health Services Research Trainee Ongoing Projects
TOP 46 - Predictive Modeling using Machine Learning for In-Hospital Length of Stay Among Preterm Infants in the United States
Monday, April 27, 2026
8:00am - 10:00am ET
Publication Number: 4749.TOP 46
Parth Bhatt, Cincinnati Children's Hospital Medical Center, CINCINNATI, OH, United States; Tarang Parekh, University of Delaware, Newark, DE, United States; Matthew Molloy, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States; Fredrick dapaah-siakwan, Valley Children's Healthcare, Madera, CA, United States; Vivek Narendran, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States; Judith Dexheimer, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
Fellow Cincinnati Children's Hospital Medical Center CINCINNATI, Ohio, United States
Background: Preterm birth rates are rising in the US, with rates increasing 12% between 2014 and 2022. With increased survival, studying Neonatal Intensive care Unit (NICU) Length of Stay (LOS) is vital. Prolonged NICU LOS (pLOS) increases morbidity, readmissions, and substantial healthcare costs, making accurate prediction necessary for quality improvement and resource allocation. Traditional prediction methods are limited by bias, limited geographic representation or poor generalizability from single center data. The utility of Machine Learning (ML) to predict LOS across the full spectrum of preterm neonates using a large, national dataset remains unexplored Objective: To compare the performance of multiple ML models (including Logistic Regression, Random Forest, Naive Bayes, and XG Boost) to accurately predict pLOS (LOS > mean/median) among preterm neonates in the US. Secondly, we aim to identify key drivers of pLOS across major Gestational Age (GA) categories: < 28 weeks, 28-32 weeks, and 33-36 weeks Design/Methods: This is a retrospective cohort study using 2019 Kid's Inpatient Database (KID). We include all liveborn hospitalization with GA < 37 weeks after excluding transfers and mortality. Clinically relevant diagnosis/procedure codes identified via ICD-10-CM diagnosis and procedure codes. pLOS defined as NICU LOS > mean/median based on data distribution. Baseline characteristics will be reported by GA categories. Chi-square and t-test will be used to assess differences. Data will be split into 80% training and 20% testing sets. ML models will undergo 10-fold cross validation and if needed Synthetic Minority Over-Sampling Techniques (SMOTE) to address outcome imbalance. Model performance will be reported using Accuracy, Area Under the Curve (AUC), precision and recall. Key drivers of LOS will be determined using feature importance from the best-performing model (feature weight > 10%). Subgroup analysis will be performed for all three GA categories