236 - External Validation of a Natural Language Screener for High-Risk Injuries in Infant Emergency Department Encounters

Saturday, April 25, 2026

3:30pm - 5:45pm ET

Publication Number: 2227.236

Gunjan Tiyyagura, Yale School of Medicine, Cheshire, CT, United States; Berger Rachel, JAG Intelligence, Pittsburgh, PA, United States; Toan Ong, University of Colorado, Aurora, CO, United States; Mengli Xiao, University of Colorado Anschutz Medical Campus, Aurora, CO, United States; Foster R. Goss, University of Colorado School of Medicine, Aurora, CO, United States; Daniel M. Lindberg, University of Colorado Anschutz Medical Campus, Denver, CO, United States

Poster Presenting Author(s)

Gunjan Tiyyagura, MD, MHS (she/her/hers)

Associate Professor of Pediatrics and Emergency Medicine
Yale School of Medicine
Cheshire, Connecticut, United States

Background: Minor injuries in infants may be opportunities for early physical abuse recognition, but abuse evaluations are often omitted. Automated screening using natural language processing (NLP) of emergency department (ED) notes within the electronic health record accurately identified infants with high-risk injuries in one health system (1 pediatric ED, 8 general EDs).

Objective: To externally validate a previously developed NLP screener in a different state and health system.

Design/Methods: We conducted a retrospective, cross-sectional study in 24 general EDs. The NLP screener analyzed clinician documentation for infants ( < 12 months old) who presented for care 1/2021-12/2023. Ground truth was determined by manual chart review. High-risk injuries were defined a priori as: any bruise or oral injury in an infant < 5 months old, or a fracture, intracranial injury, bruising or injury to torso, ears, neck, frena, angle of jaw, cheek, eyelid, or subconjunctival hemorrhage (TEN-FACES regions), patterned bruising, burns, or abdominal injury in any infant < 12 months old. To identify potential false-negatives (high-risk injuries not identified by the NLP-screener), manual chart review was conducted for 838 infant encounters with skeletal survey, or diagnosis (e.g., rib fractures, bruising) or consultation (e.g., trauma surgery, child protection) codes often associated with abuse, and 431 other, randomly selected infant encounters.

Results: Among 18,344 ED encounters, 234 (1.3%) had a true high-risk injury. The NLP screener flagged 278 encounters, of which 209 had a true high-risk injury. The NLP screener had the following test characteristics: Sensitivity/recall – 89% (95%CI 85-93%), Specificity – 93% (95%CI 92-94%), Precision/PPV – 75% (95%CI 70-80%), F1 score – 0.81 (95%CI 0.76-0.85). Among infants with high-risk injuries, 36% were evaluated for physical abuse by the clinical team. The NLP screener would have identified 88% of the children with high-risk injuries who were not evaluated for abuse. NLP performance in this setting had decreased sensitivity but improved specificity compared to that of the institution where the NLP screener was developed and refined.

Conclusion(s): In conclusion, this NLP screener identified infants with high-risk injuries with high accuracy outside the development setting. Next steps will be to determine if children with high-risk injuries flagged by the NLP, but not evaluated for abuse by the clinical team, were at risk for subsequent abuse using child welfare data.