WCET - Abstract

Presented by: Daniel R. Hanna MD
University of Kansas Medical Center

Introduction

As multiple artificial intelligence (AI) chat models become available for public use, this technology will inevitably enter the healthcare space. Patients are likely to use AI chatbots to help answer medical questions. Bing AI (Microsoft) is a new chatbot with 3 different question response modes: More Creative, Balanced, and Precise. It is unclear if Bing AI can accurately answer medical questions and if the different modes alter its responses. In this study, we aim to evaluate each Bing AI response mode for surgical management of nephrolithiasis questions.

Materials

We created 20 questions based on the AUA Surgical Management of Stones guideline. Bing AI’s response was evaluated in each of its beta chat modes by two physicians using the validated Brief DISCERN score, originally developed to evaluate the quality of healthcare information online. The score is on a scale of 6-30 and addressed the aims, relevance, sources, and bias. Guideline adherence, expressing empathy, recommending physician consultation, or stating it could not answer the inquiry were also evaluated. We used descriptive statistics to evaluate responses and an ANOVA to compare the results of the three chat modes.

Results

Brief DISCERN Score in More Creative, Balanced, and Precise response modes were: 22.6 (IQR 21.5-23.75), 22.95 (IQR 22.375-24.125), and 22.25 (IQR 20.375-24.5), respectively. There was no difference in Brief DISCERN scores between the three chat modes (p=0.52). However, when evaluating response appropriateness, More Creative mode was superior (85% vs. Balanced 75% vs. Precise 45% p=0.017). No statistically significant difference between modes was found in guideline adherence, empathy, recommending consulting a doctor, or inability to answer the prompt. However, More Creative mode scored highest in each of these categories (Figure 1).

Conclusion

All three Bing AI modes scored well on surgical stone management suggestions. Our data shows that in this small cohort, More Creative mode gave the most appropriate responses to stone management questions. Importantly, it also had the highest incidence of recommending consultation or stating it could not answer the question. For the best results, patients should use Bing’s More Creative mode for stone questions. However, additional studies are needed to better understand this AI chatbot. Patients should use caution when utilizing this technology, considering there is a 15% inappropriate response rate, even on the best chat mode.

Funding

N/A

Co-Authors

Willian Ito, MD
University of Kansas Medical Center

Russell S. Terry, MD
University of Florida

Wilson R. Molina, MD
University of Kansas Medical Center

Bristol B. Whiles, MD
University of Kansas Medical Center

Utilization of Bing AI Chatbot for Stone Management Questions: A Comparison of Chat Response Modes and the AUA Guidelines

Description

MP28: 08
Session Name:Moderated Poster Session 28: Stones: Instrumentation and New Technology 2