Assessment of patient information guides generated by LLMs for common cardiological procedures
DOI:
https://doi.org/10.21542/gcsp.2025.26Abstract
Introduction: The use of artificial intelligence (AI) has advanced rapidly in the field of cardiology owing to its ability to process complex data and analyze electrocardiograms, echocardiography, and cardiac testing. AI tools, such as ChatGPT and Google Gemini, can provide evidence-based treatment recommendations using concise language, which can help in the early diagnosis of disease.
Methodology: In this cross-sectional study, patient information brochures for three cardiological procedures (ECG, 2D echocardiography, and exercise stress testing) were generated using ChatGPT and Google Gemini. The total word count, sentence count, average words per sentence, and syllables for words were assessed using the Flesch-Kincaid Calculator. The similarity of the text was determined using the Quill Bot plagiarism tool. The reliability of the generated responses was analyzed and graded using the Modified DISCERN Score, which is a 5-point rating system that uses a set of uniform standards to assess the accuracy and dependability of consumer health-related data. Statistical analysis was performed using RStudio v4.3.2. Additionally, the simplicity and reliability scores were compared using Pearson's Coefficient of Correlation. The unpaired t-test was used to compare the responses.
Results: Responses generated by ChatGPT and Google Gemini were observed to have no significant difference in the word count (P = 0.59), sentence count (P = 0.74), average word per sentence (P = 0.79), grade level (P = 0.06), similarity (P = 0.45), and reliability scores (P = 0.38) between ChatGPT and Google Gemini. However, the ease score was significantly better for Google Gemini-generated responses than for ChatGPT (P = 0.0044), indicating that the responses generated by Google Gemini are more easily readable and understandable.
Conclusions: The study found a statistically significant difference between the average syllables per word and ease score. No significant differences were observed in the number of words, sentences, average words per sentence, grade level, similarity, or reliability scores. More AI technologies need to be evaluated in future studies, which should cover a wider range of illnesses.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Suppraja Soundarrajan, Karine Vartanian, Rahul Bhakle, Thanuja Katakam, Kinnera Dhanwada, Karansher Singh Randhawa, Nikhitha Puvvala

This work is licensed under a Creative Commons Attribution 4.0 International License.
This is an open access article distributed under the terms of the Creative Commons Attribution license CC BY 4.0, which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited.