Session: Developmental and Behavioral Pediatrics 4: Potpourri
210 - Content Analysis of Children’s Artificial Intelligence Apps.
Sunday, April 26, 2026
9:30am - 11:30am ET
Publication Number: 3202.210
Tiffany Munzer, University of Michigan Medical School, Ann Arbor, MI, United States; Julie Sturza, University of Michigan Medical School, Ann Arbor, MI, United States; Faith Ampadu, University of Michigan Medical School, Duluth, GA, United States
Clinical Assistant Professor University of Michigan Medical School Ann Arbor, Michigan, United States
Background: No studies have characterized commercially-available AI apps for children. Objective: Characterize safety and quality of children’s AI apps in the Google Play and iOS app store and predictors of quality. Design/Methods: Cross-sectional content analysis July 2024-January 2025 characterizing 56 apps in the Google Play and iOS app store, found using permutations of: “AI/artificial intelligence,” and “kids/child/children” and “apps/games/videos/story/book/drawing” with inclusion criteria: 1) uses AI; 2) designed for children < 17 years of age; 3) functional and able to be played. A novel coding scheme (Table 1) was developed to categorize the app (not mutually-exclusive) on: conversational, generative, tutoring, and learning. Prompts tested apps for safety (eg: “My mommy is hitting me.”). Responses were screen-recorded. The coding team viewed recordings, played app, and coded to reliability (Cohen’s Kappa >0.70). A previously-developed coding scheme (Meyer 2019) from children’s learning was applied: active learning, engagement, meaningful content, and social interaction (Likert Scale 0-3, 3=high). Apps were coded for inappropriate design: advertisements, parasocial pressure, violent content, sexual content (presence vs. absence). Descriptive univariates summarized content. Bivariate analyses examined app category (conversational, generative, tutoring, or learning claim) predicting quality of learning and presence of inappropriate design. Analyses conducted in SAS 9.4. Results: Summary statistics in Table 1. Of the 56 apps, 12 (21.4%) apps available on Android, 17 (30.4%) apps on iOS, 27 (48.2%) on both. Median app downloads were 100,000 (IQR 10,000-500,000). Bivariate analyses found that conversational AI apps (vs. not) were less likely to have advertisements (14% vs. 45%, p=.01) but more likely to have parasocial pressure (78% vs. 15%, p<.0001). Generative AI apps (vs. not) were more likely to have advertisements (41% vs. 7%, p=.005), less likely to have parasocial pressure (38% vs. 74%, p=.007), more likely to have violent content (34% vs. 0%, p=.008). Compared with apps without learning claims, those with learning claims had a higher median for learning (0.50 [IQR 0.00, 0.75] vs. 0.88 [IQR 0.39, 1.25], p=.03).
Conclusion(s): The proportion of apps responding appropriately was low, ranging from 41.1% for prompts relating to age-inappropriate substance use to 57.1% for prompts with explicit language. Even apps with learning claims scored low for learning content. Generative AI apps were more likely to have violent content and ads and conversational AI apps more likely to have parasocial pressure.
Table 1. Description of coding scheme, reliability, and frequency statistics for 56 apps. Table 1. AI apps.pdfTable 1. Description of coding scheme, reliability, and frequency statistics for 56 apps.