Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review.: Social discussions and analytics

Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review.

Suhana Bedi, Yutong Liu, Lucy Orr-Ewing, Dev Dash, Sanmi Koyejo, Alison Callahan, Jason A Fries, Michael Wornow, Akshay Swaminathan, Lisa Soleymani Lehmann, Hyo Jung Hong, Mehr Kashyap, Akash R Chaurasia, Nirav R Shah, Karandeep Singh

January 2025 JAMA

Synopsis of Social media discussions

Many participants noted that only 5% of the studies reviewed used real patient data, highlighting the disconnect between research and clinical application. For instance, one post remarked on the need for evaluations that consider fairness and bias, while another mentioned the inadequacies in current methodologies. The tone of urgency and phrases like 'need broader evaluations' suggest a strong community drive towards improving AI applications in healthcare.

Agreement

Moderate agreement

Most posts express a general agreement with the article's findings regarding the need for improved evaluation in healthcare LLMs.

Interest

High level of interest

The discussion shows strong interest in the implications of the research, highlighting its relevance to ongoing debates in healthcare technology.

Engagement

High engagement

Many participants engage deeply, referencing specific data points from the study and suggesting areas for improvement.

Impact

High level of impact

Contributors view the study as having a significant impact on the future of LLM applications in healthcare, emphasizing potential changes in evaluation standards.

Social Mentions

YouTube

2 Videos

Facebook

2 Posts

Twitter

117 Posts

Blogs

5 Articles

News

9 Articles

Metrics

Video Views

219

Total Likes

259

Extended Reach

1,994,700

Social Features

135

Timeline: Posts about article

Evaluating Large Language Models in Healthcare: Insights and Tools

This panel discussion focuses on evaluating large language models (LLMs) with frameworks and tools in healthcare. Key topics include a systematic review highlighting evaluation shortcomings and recommendations from expert panelists, aiming to augment the assessment of LLM applications in medical settings.

BrainX Community

January 31, 2025

119 views

Evaluating Large Language Models in Health Care Applications

This video discusses the influence of large language models (LLMs) in health care productivity and their potential applications. We analyze a systematic review highlighting key components such as data type and evaluation metrics, revealing challenges in addressing fairness and bias in current methodologies.

Artificial Intelligence Crocodile Project

May 1, 2025

101 views

Teresa Hartman
@thartman2u (Twitter)

RT @AMAEdHub: New from JN Learning: Testing and Evaluation of Health Care Applications of Large Language Models https://t.co/N8L2Dw4bVA
view full post

February 23, 2025

1
AMA Ed Hub™
@AMAEdHub (Twitter)

New from JN Learning: Testing and Evaluation of Health Care Applications of Large Language Models https://t.co/N8L2Dw4bVA
view full post

February 23, 2025

2
1
Teresa Hartman
@thartman2u (Twitter)

RT @AMAEdHub: New today: Testing and Evaluation of Health Care Applications of Large Language Models https://t.co/T3DSTvzU4x #CME
view full post

February 23, 2025

2
AMA Ed Hub™
@AMAEdHub (Twitter)

New today: Testing and Evaluation of Health Care Applications of Large Language Models https://t.co/T3DSTvzU4x #CME
view full post

February 23, 2025

2
2
Marco.Care
@Marco_Care_AI (Twitter)

RT @fedelosco:
view full post

February 9, 2025

4
MJGonzapelt
@jgonzalezapelt (Twitter)

RT @fedelosco:
view full post

February 7, 2025

4
Cinthia
@cinthiavgauna (Twitter)

RT @fedelosco:
view full post

February 7, 2025

4
Martín Angel
@Martin_AngelMD (Twitter)

RT @fedelosco:
view full post

February 7, 2025

4
FLoscoMD
@fedelosco (Twitter)

view full post

February 7, 2025

13
4
STITCHES Medicine - the Best of Medical Research
@STITCHESMed (Twitter)

Only 5% of studies evaluated large language models in healthcare using real patient care data, mostly focusing on medical knowledge assessments. by Bedi S, Liu Y (...) Shah NH et 16 al. in JAMA https://t.co/qGnsCScNuI #MedX #MedResearch
view full post

February 6, 2025
T.kimura
@jjcrazydiamond (Twitter)

RT @JAMA_current: This study identifies inconsistent evaluation practices of large language models (LLMs) in health care, finding a lack of…
view full post

February 2, 2025

5
Manoj Mayogi Mishra
@mayogisense (Twitter)

RT @JAMA_current: This study identifies inconsistent evaluation practices of large language models (LLMs) in health care, finding a lack of…
view full post

February 2, 2025

5
Fiatopichan
@Fiatopichan (Twitter)

RT @JAMA_current: This study identifies inconsistent evaluation practices of large language models (LLMs) in health care, finding a lack of…
view full post

February 2, 2025

5
JAMA
@JAMA_current (Twitter)

This study identifies inconsistent evaluation practices of large language models (LLMs) in health care, finding a lack of standardized frameworks and limited use of real patient data. https://t.co/3CRQ4Cb5jd
view full post

February 2, 2025

12
5
A.R. García
@air_garcia (Twitter)

This systematic review characterizes the current performance of LLM in evaluating clinical health care settings, including uniformity, thoroughness, and robustness and proposes a framework for their testing and evaluation across health care applications. https://t.co/OMQn79N0Fi]
view full post

January 29, 2025
Salvador Pedraza
@salvasapedraza (Twitter)

Testing and Evaluation of Health Care Applications of Large Language Models https://t.co/Aj9wsQ4tyN https://t.co/4ptRvoxsPp
view full post

January 28, 2025
ForensicPsyMD
@ForensicPsyMD (Twitter)

Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review | Digital Health | JAMA | JAMA Network https://t.co/cRqAU7aReh
view full post

January 28, 2025
Un1v3rs0 Z3r0
@Un1v3rs0Z3r0 (Twitter)

Testing and Evaluation of Health Care Applications of Large Language Models https://t.co/5vqXJSF6dD
view full post

January 28, 2025
Dr. Xs （Fuu）Artificial Life Intelligence The I
@_x_ai_i (Twitter)

RT @AdamRodmanMD: But we know how to test efficacy in medicine. Clinical trials are messy, more expensive than in silico studies, and requi…
view full post

December 17, 2024

4
Westyn Branch-Elliman, M.D., MMSc., FSHEA
@wbranchelliman (Twitter)

RT @AdamRodmanMD: But we know how to test efficacy in medicine. Clinical trials are messy, more expensive than in silico studies, and requi…
view full post

December 17, 2024

4
Dan Morgan
@dr_dmorgan (Twitter)

RT @AdamRodmanMD: But we know how to test efficacy in medicine. Clinical trials are messy, more expensive than in silico studies, and requi…
view full post

December 17, 2024

4
Josh Mandel, MD
@JoshCMandel (Twitter)

RT @AdamRodmanMD: But we know how to test efficacy in medicine. Clinical trials are messy, more expensive than in silico studies, and requi…
view full post

December 17, 2024

4
Adam Rodman
@AdamRodmanMD (Twitter)

But we know how to test efficacy in medicine. Clinical trials are messy, more expensive than in silico studies, and require multidisciplinary expertise. But they're still the right thing to do. Only 5% of LLM in medicine studies even use real data (https://t.co/gc2sl3VFYn)!!
view full post

December 17, 2024

23
4
Dr M. Mahesh (ಮಹೇಶ್) (he/him/his)
@mmahesh1 (Twitter)

Interesting: "Existing evaluations of LLMs mostly focus on accuracy of ques answering for medical exams, without consideration of real patient care data. Dimensions such as fairness, bias, toxicity & deployment considerations received limited attention" https://t.co/b6TebwSlKE
view full post

November 21, 2024
Xosé M Fernández
@xosegb (Twitter)

Testing and Evaluation of Health Care Applications of #LLM : A Systematic Review ⁦@JAMANetwork⁩ https://t.co/lcEb5XYaWj
view full post

November 21, 2024
Stanford Department of Medicine
@StanfordDeptMed (Twitter)

From diagnostics to patient communication, large language models are transforming healthcare. This @JAMA_current review by #StanDOM's @drnigam, @niravrshah, Arnold Milstein & Michael Pfeffer, sheds light on their diverse applications & effectiveness. https://t.co/HD2KEwkvvp
view full post

November 1, 2024

2
Tony Shanks
@alshanks (Twitter)

The pace of AI in medical education is rapidly advancing. I appreciate summaries like this that show the gaps and where we can focus. https://t.co/H8dwGRp4om
view full post

October 29, 2024

1
Woojin Kim
@woojinrad (Twitter)

Testing and Evaluation of Health Care Applications of Large Language Models
view full post

October 27, 2024

2
Dr. Xs （Fuu）Artificial Life Intelligence The I
@_x_ai_i (Twitter)

RT @CeoImed: ｢大規模言語モデルの医療応用の評価｣ JAMA ･ 2022年〜2024年2月までに発表された 519 件の研究を対象としたsystematic review ･評価に実際の患者データを使用したのはわずか 5% ･評価の主に正確性であり、公平性…
view full post

October 23, 2024

6
Yaron Einhorn
@yaronoox (Twitter)

Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review | Artificial Intelligence | JAMA | JAMA Network https://t.co/pxth5SMzu2
view full post

October 23, 2024
...
@ppoHeisenberg (Twitter)

RT @juanelosag: cuantificar los sesgos, cubrir una gama más amplia de tareas y especialidades y reportar métricas de desempeño estandarizad…
view full post

October 23, 2024

1
Juan E Losa. Infectólogo. HUFA. URJC. Sandoval Sur
@juanelosag (Twitter)

cuantificar los sesgos, cubrir una gama más amplia de tareas y especialidades y reportar métricas de desempeño estandarizadas para permitir una implementación a gran escala.” https://t.co/1bRFPXxnGN
view full post

October 23, 2024

1
1
Supriyo SB Chatterjee
@sbc111 (Twitter)

Testing and Evaluation of #HealthCare Applications of Large Language Models @JAMAplusAI @JAMANetworkOpen #AI #HealthAI #LLM #TechHartford https://t.co/PHImDKf3Fx
view full post

October 21, 2024
Srinivas Karri
@xsrinikar (Twitter)

Testing and Evaluation of Health Care Applications of Large Language Models
view full post

October 21, 2024
Srinivas Karri
@xsrinikar (Twitter)

Testing and Evaluation of Health Care Applications of Large Language Models
view full post

October 21, 2024
葉隠れ
@osanpochuudayo (Twitter)

(2/2) Critical administrative tasks (prescribing, billing) were neglected (<1%), and bias assessment protocols were inadequate (15.8%). Bedi et al. Stanford, UCSF, et al. report in JAMA Oct 15, 2024 on doi:10.1001/jama.2024.21700 https://t.co/F4uAmqtbJo
view full post

October 21, 2024
葉隠れ
@osanpochuudayo (Twitter)

(1/2)A systematic review (n=519) revealed substantial methodological flaws in LLM healthcare evaluations. Authentic clinical data were scarce (5%), with question-answering dominating (84.2%). https://t.co/JJoXDwo0O7
view full post

October 21, 2024
Kazu＠精神科医 MD&PhD
@peacewaffle (Twitter)

RT @CeoImed: ｢大規模言語モデルの医療応用の評価｣ JAMA ･ 2022年〜2024年2月までに発表された 519 件の研究を対象としたsystematic review ･評価に実際の患者データを使用したのはわずか 5% ･評価の主に正確性であり、公平性…
view full post

October 20, 2024

6
うさきち@冬コミは型月
@usakichiusa (Twitter)

RT @CeoImed: ｢大規模言語モデルの医療応用の評価｣ JAMA ･ 2022年〜2024年2月までに発表された 519 件の研究を対象としたsystematic review ･評価に実際の患者データを使用したのはわずか 5% ･評価の主に正確性であり、公平性…
view full post

October 20, 2024

6
ただ/だた (pinmarch)
@pinmarch_t (Twitter)

RT @CeoImed: ｢大規模言語モデルの医療応用の評価｣ JAMA ･ 2022年〜2024年2月までに発表された 519 件の研究を対象としたsystematic review ･評価に実際の患者データを使用したのはわずか 5% ･評価の主に正確性であり、公平性…
view full post

October 20, 2024

6
at_ayeaye
@at_ayeaye (Twitter)

RT @CeoImed: ｢大規模言語モデルの医療応用の評価｣ JAMA ･ 2022年〜2024年2月までに発表された 519 件の研究を対象としたsystematic review ･評価に実際の患者データを使用したのはわずか 5% ･評価の主に正確性であり、公平性…
view full post

October 20, 2024

6
河野健一 Kenichi Kono ｜脳外科医 CEO｜AI 医療 MBA｜脳血管内手術支援AI
@CeoImed (Twitter)

｢大規模言語モデルの医療応用の評価｣ JAMA ･ 2022年〜2024年2月までに発表された 519 件の研究を対象としたsystematic review ･評価に実際の患者データを使用したのはわずか 5% ･評価の主に正確性であり、公平性、バイアス、毒性の評価はあまり研究されていない https://t.co/5wuW7A7dEP https://t.co/fRd9TgVDMR
view full post

October 20, 2024

15
6
Grupo Investigación Multidisciplinar Extremeño
@GRIMEX_ (Twitter)

https://t.co/jwSwM6swCg
view full post

October 19, 2024
EXTREMADURA SALUDABLE
@EXTREMADURASAL1 (Twitter)

https://t.co/S7GP9HZ269
view full post

October 19, 2024
Medical Research Library of Brooklyn
@DMCLibraryBKLYN (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 18, 2024

56
Dr. Gennadi Glinsky, MD, Ph.D.
@gglinskii (Twitter)

Testing and Evaluation of Health Care Applications of Large Language Models. A Systematic Review. https://t.co/c0wYR1dN5G
view full post

October 17, 2024
Josh Davis
@joshp_davis (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 17, 2024

56
Jesse Burk-Rafel
@jbrafel (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 17, 2024

56
Venkat C
@chalamalasetti (Twitter)

https://t.co/xnv4q7QTO2
view full post

October 17, 2024
Oscar Camara
@oscarcamararey (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 17, 2024

56
Milton Tan
@mtanichthys (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 17, 2024

56
Arun Umesh Mahtani
@ArunUMahtani (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 17, 2024

56
Dmitrii (Dima) Smirnov
@SmirnovDDD (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 17, 2024

56
EileenD6☮️
@eileen_d6 (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 17, 2024

56
Adam Dunn
@adamgdunn (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 17, 2024

56
Leslie Vargas-Ramírez
@lilo1278 (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 17, 2024

56
Leslie Vargas-Ramírez
@lilo1278 (Twitter)

RT @daforerog: Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review https://t.co/438Kp6WosP
view full post

October 17, 2024

1
Diego Forero MD, PhD
@daforerog (Twitter)

Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review https://t.co/438Kp6WosP
view full post

October 17, 2024

2
1
Nicholas Tatonetti
@proftatonetti (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 17, 2024

56
Abstream
@abstreamme (Twitter)

Testing and Evaluation of Health Care Applications of Large Language Models #science #publication #research #publications https://t.co/AsAvv0eBpa
view full post

October 17, 2024
Jason H. Moore, PhD
@moorejh (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 17, 2024

56
Nick_Zen
@Nick_Zen (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 17, 2024

56
HAS-veille
@HAS_veille (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 17, 2024

56
Research Data MGMT & LIVING
@fdmincoop (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 17, 2024

56
Carlos KH Wong
@CarlosWongHKU (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 17, 2024

56
SF
@SofiaVi185 (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 17, 2024

56
Carlos Cmx
@rulpogt (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 17, 2024

56
Ryan Cello
@ryan_c_cello (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 17, 2024

56
Manish Sharma
@msharmas (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 17, 2024

56
Daisy Davis
@daisy_davis2010 (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 17, 2024

56
Nicole Miller
@veeh_2011 (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 17, 2024

56
THEE Gregory Stewart
@gstewtwo (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 17, 2024

56
Suhana Bedi
@BediSuhana42170 (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
Sergei Polevikov
@AIHealthUncut (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
Flappest
@Flappest (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
ong beng hooi
@ongbenghooi1 (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
Dr. Shashank Joshi
@AskDrShashank (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
Samantha_4JD
@Samantha_July01 (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
Dr. Robert Glatter
@DrRobertGlatter (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
Lindvall Lab
@lindvalllab (Twitter)

Systematic review finds only 5% of #LLM studies in healthcare use real patient data. We need broader evaluations that address bias, fairness, and more diverse tasks. #AI #HealthCare #MedTwitter https://t.co/mFRB0BKnOD
view full post

October 16, 2024

2
JC Stanford
@JCDarnestown (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
Martin Michaelis
@MartMichaelis (Twitter)

#Evidence that more scrutiny, care, and caution is needed regarding the use of #AI in #healthcare. I assume, this is also true for most other areas... https://t.co/jNngY6nAkM
view full post

October 16, 2024
Luis Eduardo Pino V
@docpinoAI (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
Josep M Garcia-Alamino, PhD (Oxonian)
@JosepMGarcia75 (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
Edouard Lhomme
@Edouard_Lhomme (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
Ryan Flinn
@RS_Flinn (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
strmdev
@strmdev1 (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
Supriyo SB Chatterjee
@sbc111 (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
Temerty Centre for AI in Medicine (T-CAIREM)
@UofT_TCAIREM (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
George E. Dafoulas MD, MBA in HSM
@GeorgeEDafoulas (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
Biomedical Informatics Research
@StanfordBMIR (Twitter)

Testing and Evaluation of Health Care Applications of Large Language Models https://t.co/KfwEUCsqsK
view full post

October 16, 2024
Brad Wouters
@bradwouters (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
priya joseph
@ayirpelle (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
Eldon Edwards
@eldonredwards (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
Manuel Ramos-Casals
@ramos_casals (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
Fco. Rojas
@RadBark (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
Dwan Turner ⛴️
@DwanTurner (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
Soo-Yong Shin
@likesky3 (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
Shivam Vedak, MD MBA
@ShivamVedakMD (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
Takefumi Kimura
@VulletForMy (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
Colorful MD Phd
@FanYang38636272 (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
syawal™ シ
@syawal (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
어옌
@sapiens202 (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
Ram Sesha
@OncoAI (Twitter)

RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full post

October 16, 2024

56
Eric Topol
@EricTopol (Twitter)

Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam https://t.co/EXl5HYzd0w
view full post

October 16, 2024

154
56
thetranscendedman
@atranscendedman (Twitter)

A review of 519 studies found only 5% used real patient data to evaluate Large Language Models (LLMs) in healthcare. Most focused on medical exams, few on admin tasks like writing prescriptions. Real data and broader evaluations needed. https://t.co/Ui46zI9dLD
view full post

October 16, 2024
Xema Pérez
@Xemadeyaka14 (Twitter)

RT @emiliomonteb: Testing and Evaluation of Health Care Applications of Large Language ModelsA Systematic Review https://t.co/b7NBd05m1c…
view full post

October 16, 2024

1
Emilio Monte
@emiliomonteb (Twitter)

Testing and Evaluation of Health Care Applications of Large Language ModelsA Systematic Review https://t.co/b7NBd05m1c #AI #IA #LLM https://t.co/IevhiBcVvV
view full post

October 16, 2024

2
1
Amadeo Wals
@AmadeoWals (Twitter)

https://t.co/qSFwuEJS5G #RADONC #LLMs #AI
view full post

October 16, 2024
Dr. Suchitra Kataria
@Suchitrk (Twitter)

Testing and Evaluation of Health Care Applications of Large Language Models https://t.co/Bi0xVukb7m?
view full post

October 16, 2024
Ryan Nipp, MD, MPH, MBA, FASCO
@RyanNipp (Twitter)

Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review. https://t.co/E4reSupeee @JAMA_current @JAMANetwork #ArtificialIntelligence #MedEd #MEDTECH #DigitalHealth @StanfordMed https://t.co/6XN7M8Qzhf
view full post

October 15, 2024

5
Yonemoto N
@nyonenyone (Twitter)

Testing and Evaluation of Health Care Applications of Large Language Models https://t.co/APxR4xVXct
view full post

October 15, 2024
Supriyo SB Chatterjee
@sbc111 (Twitter)

RT @JAMAplusAI: This study identifies inconsistent evaluation practices of large language models (LLMs) in health care, finding a lack of s…
view full post

October 15, 2024

1
JAMA+ AI
@JAMAplusAI (Twitter)

This study identifies inconsistent evaluation practices of large language models (LLMs) in health care, finding a lack of standardized frameworks and limited use of real patient data. https://t.co/XlRwt5Uweq https://t.co/xZAsuEnxIW
view full post

October 15, 2024

5
1
Epic Plain
@EpicPlain (Twitter)

Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review https://t.co/ULthWLsDZW #LLMs
view full post

October 15, 2024
Temerty Centre for AI in Medicine (T-CAIREM)
@UofT_TCAIREM (Twitter)

RT @JAMA_current: This study identifies inconsistent evaluation practices of large language models (LLMs) in health care, finding a lack of…
view full post

October 15, 2024

2
JAMA
@JAMA_current (Twitter)

This study identifies inconsistent evaluation practices of large language models (LLMs) in health care, finding a lack of standardized frameworks and limited use of real patient data. https://t.co/O7NbeErlL3 https://t.co/zqkUcvCfkg
view full post

October 15, 2024

9
2

Abstract Synopsis

Large language models (LLMs) show potential in health care but existing evaluation methods may not effectively highlight their best applications.
A systematic review analyzed studies from January 2022 to February 2024, identifying five components: data type, health care task, NLP/NLU tasks, evaluation dimensions, and medical specialty.
Out of 519 studies reviewed, only 5 utilized real patient data; the majority focused on assessing medical knowledge and answering questions, with accuracy being the main evaluation metric, while other factors like fairness and bias were rarely addressed.

Socials: