Assessment of patient information guides generated by LLMs for common cardiological procedures

Authors

  • Suppraja Soundarrajan Government medical college Omandurar Government Estate
  • Karine Vartanian Southern California Hospital Heart Institute
  • Rahul Bhakle
  • Thanuja Katakam
  • Kinnera Dhanwada
  • Karansher Singh Randhawa
  • Nikhitha Puvvala

DOI:

https://doi.org/10.21542/gcsp.2025.26

Abstract

Introduction: The use of artificial intelligence (AI) has advanced rapidly in the field of cardiology owing to its ability to process complex data and analyze electrocardiograms, echocardiography, and cardiac testing. AI tools, such as ChatGPT and Google Gemini, can provide evidence-based treatment recommendations using concise language, which can help in the early diagnosis of disease.

Methodology: In this cross-sectional study, patient information brochures for three cardiological procedures (ECG, 2D echocardiography, and exercise stress testing) were generated using ChatGPT and Google Gemini. The total word count, sentence count, average words per sentence, and syllables for words were assessed using the Flesch-Kincaid Calculator. The similarity of the text was determined using the Quill Bot plagiarism tool. The reliability of the generated responses was analyzed and graded using the Modified DISCERN Score, which is a 5-point rating system that uses a set of uniform standards to assess the accuracy and dependability of consumer health-related data. Statistical analysis was performed using RStudio v4.3.2. Additionally, the simplicity and reliability scores were compared using Pearson's Coefficient of Correlation. The unpaired t-test was used to compare the responses.

Results: Responses generated by ChatGPT and Google Gemini were observed to have no significant difference in the word count (P = 0.59), sentence count (P = 0.74), average word per sentence (P = 0.79), grade level (P = 0.06), similarity (P = 0.45), and reliability scores (P = 0.38) between ChatGPT and Google Gemini. However, the ease score was significantly better for Google Gemini-generated responses than for ChatGPT (P = 0.0044), indicating that the responses generated by Google Gemini are more easily readable and understandable.

Conclusions: The study found a statistically significant difference between the average syllables per word and ease score. No significant differences were observed in the number of words, sentences, average words per sentence, grade level, similarity, or reliability scores. More AI technologies need to be evaluated in future studies, which should cover a wider range of illnesses.

Downloads

Published

2025-07-28

Issue

Section

Research articles