387 - Decoding the Human Experience: Feasibility and Ethical Considerations of Using Artificial Intelligence to Augment Qualitative Research
Monday, April 27, 2026
8:00am - 10:00am ET
Publication Number: 4379.387
Habeebah Muhammad-Kamal, Harvard Medical School, lONDON, England, United Kingdom; Dinesh Rai, Boston Children's Hospital, Westfield, NJ, United States; Anne Sullivan, Boston Children's Hospital, Boston, MA, United States; Donna Luff, Boston Children's Hospital, Boston, MA, United States; Christy L. Cummings, Boston Children's Hospital, Boston, MA, United States; david N. williams, Boston Children's Hospital, Boston, MA, United States
Associate Professor of Pediatrics Boston Children's Hospital Boston, Massachusetts, United States
Background: Qualitative research in medicine is essential for understanding patient experiences that quantitative data alone cannot capture. Qualitative analysis, especially with large data sets, can be complex and time-consuming. Artificial intelligence (AI), specifically large language models (LLMs) such as GPT-4o (a model that powers ChatGPT), may be helpful in augmenting qualitative analysis. However, studies of their feasibility remain limited, particularly in coding for critical human elements, such as empathy. Objective: To investigate whether LLMs, like GPT-4o, can address the nuanced aspects of qualitative research, including the interpretation of human sentiment, values, and tone underlying parental perspectives and other ethical challenges. Design/Methods: GPT-4o was accessed through a HIPAA-compliant enterprise application programming interface (API). Using refined prompts, we asked Boston Children’s Hospital’s secure, compliant language-learning model to apply established qualitative methods to generate codes and themes from a previously published qualitative dataset on counselling parents of infants with extreme prematurity. Six Researchers independently compared codes and themes with those from traditional thematic analysis to determine agreement. Results were then compared with AI-generated comparisons. Results: GPT-4o-generated codes and themes largely aligned with traditional thematic analysis, though some differences were noted. GPT-4o was adept at reproducing descriptive themes but missed subtle interpretive nuances that human analyses identified. This study highlighted the researchers’ responsibility as the primary agents of interpretation. In gathering empirical evidence, LLMs raised ethical concerns related to researcher conduct, potential human and AI bias, loss of analytical skills, and overreliance on models that lack transparency.
Conclusion(s): This study highlights the feasibility of integrating both human and machine learning qualitative analysis, along with several caveats. LLM-assisted qualitative analysis should be approached from a critical perspective, with AI serving as a supplemental research tool. Recognizing potential limitations, this hybrid AI-human approach could augment qualitative analysis while ensuring crucial human interpretation is retained.