657 - Comparison of Cause of Death Determination by Large Language Models to Expert Panel Determinations in 7 Low- and Middle-Income Countries
Sunday, April 26, 2026
9:30am - 11:30am ET
Publication Number: 3636.657
Chris A. Rees, Emory University School of Medicine and Children's Healthcare of Atlanta, Atlanta, GA, United States; Yuting Guo, Emory University School of Medicine, Atlanta, GA, United States; Soter Ameh, The Africa Research Collaborative (The ARC), Child Health and Mortality Prevention Surveillance, Freetown, Western Area, Sierra Leone; Rajib Biswas, International Centre for Diarrhoeal Disease Research, Bangladesh, Mohakhali, Dhaka, Bangladesh; Siobhan L. Johnston, Wits VIDA, Johannesburg, Gauteng, South Africa; Lucy Liu, The Task Force for Global Health, Decatur, GA, United States; Abeed Sarker, Emory University, Atlanta, GA, United States; Lola Madrid, Haramaya University, Harar, Hareri Hizb, Ethiopia; Cynthia G.. Whitney, The Task Force for Global Health, Atlanta, GA, United States; Ziyaad Dangor, University of the Witwatersrand, Johannesburg, Gauteng, South Africa
Assistant Professor Emory University School of Medicine Atlanta, Georgia, United States
Background: Fewer than 4% of deaths in sub-Saharan Africa and South Asia have definitive causes of death determined, largely owing to infrequent postmortem testing and accurate determination of cause of death. Thus, public health interventions targeting mortality prevention are often informed by low-quality data. Objective: To determine the accuracy of open-source large language models on cause of death determination in comparison to expert panel consensus-determined causes of death. Design/Methods: We conducted a cross-sectional analysis using data collected prospectively in a 7-country childhood mortality surveillance program (Child Health and Mortality Prevention Surveillance [CHAMPS]). CHAMPS conducts active surveillance in healthcare facilities and community settings in sites in 6 sites in sub-Saharan Africa and 1 in Bangladesh. CHAMPS collects clinical, verbal autopsy, and postmortem histopathologic and microbiologic testing in enrolled cases of death among children aged < 5 years. These comprehensive data are then reviewed by Determination of Cause of Death (DeCoDe) panels at each site to identify causes of death for each case. For this analysis, we used the 10 most common DeCoDe-determined causes of death among infants and children aged 1-59 months as the reference standard at two levels. First, we used diagnostic categories (e.g., sepsis). Second, we used specific ICD-10 level diagnostic codes (e.g., sepsis due to Klebsiella pneumoniae). We assessed the performance of several open-source large language models (e.g., Roberta-Base, GPT-OSS, LLAMA3, and Gemma) and a machine learning algorithm (i.e., Support Vector Machine [SVM]) through training on 80% of randomly selected cases and testing on the remaining 20% of cases. We assessed performance of each model by reporting mean classification accuracy. Results: There were 1,672 cases included. Of these, 54.4% were male and the median age at the time of death was 338 days (interquartile range 140, 713 days; Table 1). For broad diagnostic categories, the mean accuracy of the models ranged from 0.63 to 0.78. However, for specific ICD-10 codes, the models achieved a much higher mean accuracy of 0.98 (Table 2). All assessed large language models were suboptimal in identifying lower respiratory tract infections as the broad cause of death. Conversely, all large language models were highly accurate across ICD-10 causes of death.
Conclusion(s): Large language models demonstrated accuracy in identifying cause of death compared to expert panel determinations when models were given extensive clinical, verbal autopsy, and histopathologic data.
Table 1. Demographics of included cases of infants and children aged 1-59 months at the time of death Table 1.pdf
Table 2. Accuracy of large language models to determine cause of death among infants and children aged 1-59 months at the time of death Table 2.pdf