Chat GPT may be used to generate clinically accurate responses to inquiries regarding insomnia, according to a new study published in the Journal of Clinical Sleep Medicine.
Researchers used 4 sets of identical insomnia-related queries that differed in context to evaluate the accuracy and precision of chat GPTS responses. Each question set differed by the context in which ChatGPT was prompted. The 4 sets included no prompt, patient-centered, physician-centered, and with references and statistics. Researchers then presented these responses for review by 2 academic sleep surgeons, 1 academic sleep medicine physician, and 2 sleep medicine fellows and asked them to evaluate the responses based on clinical accuracy, prompt adherence, referencing, and statistical precision. They used a binary grading system to evaluate the responses and then calculated Flesh-Kincaid grade level scores. Fleiss’s Kappa was used to calculate interrater reliability.
Researchers found that the Fleiss Kappa scores for clinical accuracy and relevance were poor across the various prompts. Despite this, the evaluators indicated that the artificial intelligence, ChatGPT responses across the 1st 3 forms still had a high level of clinical accuracy. A total of 80% of the references that were cited were found to both be real and pertinent to the given responses; however, only 25% of the referenced statistics were directly corroborated by the articles cited by ChatGPT. There was a statistically significant difference in the mean Flesh-Kincaid grade level scores between the groups.
Researchers concluded that ChatGPT generates clinically relevant and accurate information responses that were influenced by the specificity of the provided prompts. They found ChatGPT’s ability to cite reputable references as satisfactory. However, all the references were dated to 2015 or earlier, which they found to be in alignment with the training data sets available until 2021. According to researchers, the lack of later training data limits ChatGPT’s ability to provide the most current information and reliable statistical data.
Researchers acknowledged that ChatGPT is not officially approved for medical applications and depends on the reliability of outdated training data that limited this study. They also noted that the ability of ChatGPT to access current literature is contingent upon the developer’s indexing which limits its ability to stay up to date. Their study was further limited by the lack of a verified objective grading skill.
Researchers concluded that “ChatGPT possesses the capacity to generate clinically pertinent and accurate information in response to prevalent insomnia-related inquiries. However, the ability of the language model to draw upon contemporary resources and accurately extrapolate statistics from the sources remains a considerable challenge.”
This article originally appeared on Sleep Wake Advisor
References:
Alapati R, Campbell D, Molin N, et.al. Evaluating insomnia queries from an artificial intelligence chatbot for patient education. J Clin Sleep Med. 2024;20(4)583-594. doi:10.5664/jcsm.10948