Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review.
Suhana Bedi, Yutong Liu, Lucy Orr-Ewing, Dev Dash, Sanmi Koyejo, Alison Callahan, Jason A Fries, Michael Wornow, Akshay Swaminathan, Lisa Soleymani Lehmann, Hyo Jung Hong, Mehr Kashyap, Akash R Chaurasia, Nirav R Shah, Karandeep Singh
January 2025 JAMASynopsis of Social media discussions
Many participants noted that only 5% of the studies reviewed used real patient data, highlighting the disconnect between research and clinical application. For instance, one post remarked on the need for evaluations that consider fairness and bias, while another mentioned the inadequacies in current methodologies. The tone of urgency and phrases like 'need broader evaluations' suggest a strong community drive towards improving AI applications in healthcare.
Agreement
Moderate agreementMost posts express a general agreement with the article's findings regarding the need for improved evaluation in healthcare LLMs.
Interest
High level of interestThe discussion shows strong interest in the implications of the research, highlighting its relevance to ongoing debates in healthcare technology.
Engagement
High engagementMany participants engage deeply, referencing specific data points from the study and suggesting areas for improvement.
Impact
High level of impactContributors view the study as having a significant impact on the future of LLM applications in healthcare, emphasizing potential changes in evaluation standards.
Social Mentions
YouTube
2 Videos
2 Posts
117 Posts
Blogs
5 Articles
News
9 Articles
Metrics
Video Views
219
Total Likes
259
Extended Reach
1,994,700
Social Features
135
Timeline: Posts about article
Top Social Media Posts
Posts referencing the article
Evaluating Large Language Models in Healthcare: Insights and Tools
This panel discussion focuses on evaluating large language models (LLMs) with frameworks and tools in healthcare. Key topics include a systematic review highlighting evaluation shortcomings and recommendations from expert panelists, aiming to augment the assessment of LLM applications in medical settings.
Evaluating Large Language Models in Health Care Applications
This video discusses the influence of large language models (LLMs) in health care productivity and their potential applications. We analyze a systematic review highlighting key components such as data type and evaluation metrics, revealing challenges in addressing fairness and bias in current methodologies.
-
RT @AMAEdHub: New from JN Learning: Testing and Evaluation of Health Care Applications of Large Language Models https://t.co/N8L2Dw4bVA
view full postFebruary 23, 2025
1
-
AMA Ed Hub™
@AMAEdHub (Twitter)New from JN Learning: Testing and Evaluation of Health Care Applications of Large Language Models https://t.co/N8L2Dw4bVA
view full postFebruary 23, 2025
2
1
-
Teresa Hartman
@thartman2u (Twitter)RT @AMAEdHub: New today: Testing and Evaluation of Health Care Applications of Large Language Models https://t.co/T3DSTvzU4x #CME
view full postFebruary 23, 2025
2
-
AMA Ed Hub™
@AMAEdHub (Twitter)New today: Testing and Evaluation of Health Care Applications of Large Language Models https://t.co/T3DSTvzU4x #CME
view full postFebruary 23, 2025
2
2
-
Marco.Care
@Marco_Care_AI (Twitter)RT @fedelosco:
view full postFebruary 9, 2025
4
-
MJGonzapelt
@jgonzalezapelt (Twitter)RT @fedelosco:
view full postFebruary 7, 2025
4
-
Cinthia
@cinthiavgauna (Twitter)RT @fedelosco:
view full postFebruary 7, 2025
4
-
Martín Angel
@Martin_AngelMD (Twitter)RT @fedelosco:
view full postFebruary 7, 2025
4
-
FLoscoMD
@fedelosco (Twitter)February 7, 2025
13
4
-
STITCHES Medicine - the Best of Medical Research
@STITCHESMed (Twitter)Only 5% of studies evaluated large language models in healthcare using real patient care data, mostly focusing on medical knowledge assessments. by Bedi S, Liu Y (...) Shah NH et 16 al. in JAMA https://t.co/qGnsCScNuI #MedX #MedResearch
view full postFebruary 6, 2025
-
T.kimura
@jjcrazydiamond (Twitter)RT @JAMA_current: This study identifies inconsistent evaluation practices of large language models (LLMs) in health care, finding a lack of…
view full postFebruary 2, 2025
5
-
Manoj Mayogi Mishra
@mayogisense (Twitter)RT @JAMA_current: This study identifies inconsistent evaluation practices of large language models (LLMs) in health care, finding a lack of…
view full postFebruary 2, 2025
5
-
Fiatopichan
@Fiatopichan (Twitter)RT @JAMA_current: This study identifies inconsistent evaluation practices of large language models (LLMs) in health care, finding a lack of…
view full postFebruary 2, 2025
5
-
JAMA
@JAMA_current (Twitter)This study identifies inconsistent evaluation practices of large language models (LLMs) in health care, finding a lack of standardized frameworks and limited use of real patient data. https://t.co/3CRQ4Cb5jd
view full postFebruary 2, 2025
12
5
-
A.R. García
@air_garcia (Twitter)This systematic review characterizes the current performance of LLM in evaluating clinical health care settings, including uniformity, thoroughness, and robustness and proposes a framework for their testing and evaluation across health care applications. https://t.co/OMQn79N0Fi]
view full postJanuary 29, 2025
-
Salvador Pedraza
@salvasapedraza (Twitter)Testing and Evaluation of Health Care Applications of Large Language Models https://t.co/Aj9wsQ4tyN https://t.co/4ptRvoxsPp
view full postJanuary 28, 2025
-
ForensicPsyMD
@ForensicPsyMD (Twitter)Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review | Digital Health | JAMA | JAMA Network https://t.co/cRqAU7aReh
view full postJanuary 28, 2025
-
Un1v3rs0 Z3r0
@Un1v3rs0Z3r0 (Twitter)Testing and Evaluation of Health Care Applications of Large Language Models https://t.co/5vqXJSF6dD
view full postJanuary 28, 2025
-
Dr. Xs (Fuu)Artificial Life Intelligence The I
@_x_ai_i (Twitter)RT @AdamRodmanMD: But we know how to test efficacy in medicine. Clinical trials are messy, more expensive than in silico studies, and requi…
view full postDecember 17, 2024
4
-
Westyn Branch-Elliman, M.D., MMSc., FSHEA
@wbranchelliman (Twitter)RT @AdamRodmanMD: But we know how to test efficacy in medicine. Clinical trials are messy, more expensive than in silico studies, and requi…
view full postDecember 17, 2024
4
-
Dan Morgan
@dr_dmorgan (Twitter)RT @AdamRodmanMD: But we know how to test efficacy in medicine. Clinical trials are messy, more expensive than in silico studies, and requi…
view full postDecember 17, 2024
4
-
Josh Mandel, MD
@JoshCMandel (Twitter)RT @AdamRodmanMD: But we know how to test efficacy in medicine. Clinical trials are messy, more expensive than in silico studies, and requi…
view full postDecember 17, 2024
4
-
Adam Rodman
@AdamRodmanMD (Twitter)But we know how to test efficacy in medicine. Clinical trials are messy, more expensive than in silico studies, and require multidisciplinary expertise. But they're still the right thing to do. Only 5% of LLM in medicine studies even use real data (https://t.co/gc2sl3VFYn)!!
view full postDecember 17, 2024
23
4
-
Dr M. Mahesh (ಮಹೇಶ್) (he/him/his)
@mmahesh1 (Twitter)Interesting: "Existing evaluations of LLMs mostly focus on accuracy of ques answering for medical exams, without consideration of real patient care data. Dimensions such as fairness, bias, toxicity & deployment considerations received limited attention" https://t.co/b6TebwSlKE
view full postNovember 21, 2024
-
Xosé M Fernández
@xosegb (Twitter)Testing and Evaluation of Health Care Applications of #LLM : A Systematic Review @JAMANetwork https://t.co/lcEb5XYaWj
view full postNovember 21, 2024
-
Stanford Department of Medicine
@StanfordDeptMed (Twitter)From diagnostics to patient communication, large language models are transforming healthcare. This @JAMA_current review by #StanDOM's @drnigam, @niravrshah, Arnold Milstein & Michael Pfeffer, sheds light on their diverse applications & effectiveness. https://t.co/HD2KEwkvvp
view full postNovember 1, 2024
2
-
Tony Shanks
@alshanks (Twitter)The pace of AI in medical education is rapidly advancing. I appreciate summaries like this that show the gaps and where we can focus. https://t.co/H8dwGRp4om
view full postOctober 29, 2024
1
-
Woojin Kim
@woojinrad (Twitter)Testing and Evaluation of Health Care Applications of Large Language Models
view full postOctober 27, 2024
2
-
Dr. Xs (Fuu)Artificial Life Intelligence The I
@_x_ai_i (Twitter)RT @CeoImed: 「大規模言語モデルの医療応用の評価」 JAMA ・ 2022年〜2024年2月までに発表された 519 件の研究を対象としたsystematic review ・ 評価に実際の患者データを使用したのはわずか 5% ・ 評価の主に正確性であり、公平性…
view full postOctober 23, 2024
6
-
Yaron Einhorn
@yaronoox (Twitter)Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review | Artificial Intelligence | JAMA | JAMA Network https://t.co/pxth5SMzu2
view full postOctober 23, 2024
-
...
@ppoHeisenberg (Twitter)RT @juanelosag: cuantificar los sesgos, cubrir una gama más amplia de tareas y especialidades y reportar métricas de desempeño estandarizad…
view full postOctober 23, 2024
1
-
Juan E Losa. Infectólogo. HUFA. URJC. Sandoval Sur
@juanelosag (Twitter)cuantificar los sesgos, cubrir una gama más amplia de tareas y especialidades y reportar métricas de desempeño estandarizadas para permitir una implementación a gran escala.” https://t.co/1bRFPXxnGN
view full postOctober 23, 2024
1
1
-
Supriyo SB Chatterjee
@sbc111 (Twitter)Testing and Evaluation of #HealthCare Applications of Large Language Models @JAMAplusAI @JAMANetworkOpen #AI #HealthAI #LLM #TechHartford https://t.co/PHImDKf3Fx
view full postOctober 21, 2024
-
Srinivas Karri
@xsrinikar (Twitter)Testing and Evaluation of Health Care Applications of Large Language Models
view full postOctober 21, 2024
-
Srinivas Karri
@xsrinikar (Twitter)Testing and Evaluation of Health Care Applications of Large Language Models
view full postOctober 21, 2024
-
葉隠れ
@osanpochuudayo (Twitter)(2/2) Critical administrative tasks (prescribing, billing) were neglected (<1%), and bias assessment protocols were inadequate (15.8%). Bedi et al. Stanford, UCSF, et al. report in JAMA Oct 15, 2024 on doi:10.1001/jama.2024.21700 https://t.co/F4uAmqtbJo
view full postOctober 21, 2024
-
葉隠れ
@osanpochuudayo (Twitter)(1/2)A systematic review (n=519) revealed substantial methodological flaws in LLM healthcare evaluations. Authentic clinical data were scarce (5%), with question-answering dominating (84.2%). https://t.co/JJoXDwo0O7
view full postOctober 21, 2024
-
Kazu@精神科医 MD&PhD
@peacewaffle (Twitter)RT @CeoImed: 「大規模言語モデルの医療応用の評価」 JAMA ・ 2022年〜2024年2月までに発表された 519 件の研究を対象としたsystematic review ・ 評価に実際の患者データを使用したのはわずか 5% ・ 評価の主に正確性であり、公平性…
view full postOctober 20, 2024
6
-
うさきち@冬コミは型月
@usakichiusa (Twitter)RT @CeoImed: 「大規模言語モデルの医療応用の評価」 JAMA ・ 2022年〜2024年2月までに発表された 519 件の研究を対象としたsystematic review ・ 評価に実際の患者データを使用したのはわずか 5% ・ 評価の主に正確性であり、公平性…
view full postOctober 20, 2024
6
-
ただ/だた (pinmarch)
@pinmarch_t (Twitter)RT @CeoImed: 「大規模言語モデルの医療応用の評価」 JAMA ・ 2022年〜2024年2月までに発表された 519 件の研究を対象としたsystematic review ・ 評価に実際の患者データを使用したのはわずか 5% ・ 評価の主に正確性であり、公平性…
view full postOctober 20, 2024
6
-
at_ayeaye
@at_ayeaye (Twitter)RT @CeoImed: 「大規模言語モデルの医療応用の評価」 JAMA ・ 2022年〜2024年2月までに発表された 519 件の研究を対象としたsystematic review ・ 評価に実際の患者データを使用したのはわずか 5% ・ 評価の主に正確性であり、公平性…
view full postOctober 20, 2024
6
-
河野健一 Kenichi Kono | 脳外科医 CEO|AI 医療 MBA|脳血管内手術支援AI
@CeoImed (Twitter)「大規模言語モデルの医療応用の評価」 JAMA ・ 2022年〜2024年2月までに発表された 519 件の研究を対象としたsystematic review ・ 評価に実際の患者データを使用したのはわずか 5% ・ 評価の主に正確性であり、公平性、バイアス、毒性の評価はあまり研究されていない https://t.co/5wuW7A7dEP https://t.co/fRd9TgVDMR
view full postOctober 20, 2024
15
6
-
Grupo Investigación Multidisciplinar Extremeño
@GRIMEX_ (Twitter)https://t.co/jwSwM6swCg
view full postOctober 19, 2024
-
EXTREMADURA SALUDABLE
@EXTREMADURASAL1 (Twitter)https://t.co/S7GP9HZ269
view full postOctober 19, 2024
-
Medical Research Library of Brooklyn
@DMCLibraryBKLYN (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 18, 2024
56
-
Dr. Gennadi Glinsky, MD, Ph.D.
@gglinskii (Twitter)Testing and Evaluation of Health Care Applications of Large Language Models. A Systematic Review. https://t.co/c0wYR1dN5G
view full postOctober 17, 2024
-
Josh Davis
@joshp_davis (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 17, 2024
56
-
Jesse Burk-Rafel
@jbrafel (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 17, 2024
56
-
Venkat C
@chalamalasetti (Twitter)https://t.co/xnv4q7QTO2
view full postOctober 17, 2024
-
Oscar Camara
@oscarcamararey (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 17, 2024
56
-
Milton Tan
@mtanichthys (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 17, 2024
56
-
Arun Umesh Mahtani
@ArunUMahtani (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 17, 2024
56
-
Dmitrii (Dima) Smirnov
@SmirnovDDD (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 17, 2024
56
-
EileenD6☮️
@eileen_d6 (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 17, 2024
56
-
Adam Dunn
@adamgdunn (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 17, 2024
56
-
Leslie Vargas-Ramírez
@lilo1278 (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 17, 2024
56
-
Leslie Vargas-Ramírez
@lilo1278 (Twitter)RT @daforerog: Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review https://t.co/438Kp6WosP
view full postOctober 17, 2024
1
-
Diego Forero MD, PhD
@daforerog (Twitter)Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review https://t.co/438Kp6WosP
view full postOctober 17, 2024
2
1
-
Nicholas Tatonetti
@proftatonetti (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 17, 2024
56
-
Abstream
@abstreamme (Twitter)Testing and Evaluation of Health Care Applications of Large Language Models #science #publication #research #publications https://t.co/AsAvv0eBpa
view full postOctober 17, 2024
-
Jason H. Moore, PhD
@moorejh (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 17, 2024
56
-
Nick_Zen
@Nick_Zen (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 17, 2024
56
-
HAS-veille
@HAS_veille (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 17, 2024
56
-
Research Data MGMT & LIVING
@fdmincoop (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 17, 2024
56
-
Carlos KH Wong
@CarlosWongHKU (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 17, 2024
56
-
SF
@SofiaVi185 (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 17, 2024
56
-
Carlos Cmx
@rulpogt (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 17, 2024
56
-
Ryan Cello
@ryan_c_cello (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 17, 2024
56
-
Manish Sharma
@msharmas (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 17, 2024
56
-
Daisy Davis
@daisy_davis2010 (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 17, 2024
56
-
Nicole Miller
@veeh_2011 (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 17, 2024
56
-
THEE Gregory Stewart
@gstewtwo (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 17, 2024
56
-
Suhana Bedi
@BediSuhana42170 (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
Sergei Polevikov
@AIHealthUncut (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
Flappest
@Flappest (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
ong beng hooi
@ongbenghooi1 (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
Dr. Shashank Joshi
@AskDrShashank (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
Samantha_4JD
@Samantha_July01 (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
Dr. Robert Glatter
@DrRobertGlatter (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
Lindvall Lab
@lindvalllab (Twitter)Systematic review finds only 5% of #LLM studies in healthcare use real patient data. We need broader evaluations that address bias, fairness, and more diverse tasks. #AI #HealthCare #MedTwitter https://t.co/mFRB0BKnOD
view full postOctober 16, 2024
2
-
JC Stanford
@JCDarnestown (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
Martin Michaelis
@MartMichaelis (Twitter)#Evidence that more scrutiny, care, and caution is needed regarding the use of #AI in #healthcare. I assume, this is also true for most other areas... https://t.co/jNngY6nAkM
view full postOctober 16, 2024
-
Luis Eduardo Pino V
@docpinoAI (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
Josep M Garcia-Alamino, PhD (Oxonian)
@JosepMGarcia75 (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
Edouard Lhomme
@Edouard_Lhomme (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
Ryan Flinn
@RS_Flinn (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
strmdev
@strmdev1 (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
Supriyo SB Chatterjee
@sbc111 (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
Temerty Centre for AI in Medicine (T-CAIREM)
@UofT_TCAIREM (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
George E. Dafoulas MD, MBA in HSM
@GeorgeEDafoulas (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
Biomedical Informatics Research
@StanfordBMIR (Twitter)Testing and Evaluation of Health Care Applications of Large Language Models https://t.co/KfwEUCsqsK
view full postOctober 16, 2024
-
Brad Wouters
@bradwouters (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
priya joseph
@ayirpelle (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
Eldon Edwards
@eldonredwards (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
Manuel Ramos-Casals
@ramos_casals (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
Fco. Rojas
@RadBark (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
Dwan Turner ⛴️
@DwanTurner (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
Soo-Yong Shin
@likesky3 (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
Shivam Vedak, MD MBA
@ShivamVedakMD (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
Takefumi Kimura
@VulletForMy (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
Colorful MD Phd
@FanYang38636272 (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
syawal™ シ
@syawal (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
어옌
@sapiens202 (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
Ram Sesha
@OncoAI (Twitter)RT @EricTopol: Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam ht…
view full postOctober 16, 2024
56
-
Eric Topol
@EricTopol (Twitter)Of over 500 LLM #AI reports in healthcare, only 5% used real patient data. https://t.co/SAhfJxcdhz @JAMA_current @drnigam https://t.co/EXl5HYzd0w
view full postOctober 16, 2024
154
56
-
thetranscendedman
@atranscendedman (Twitter)A review of 519 studies found only 5% used real patient data to evaluate Large Language Models (LLMs) in healthcare. Most focused on medical exams, few on admin tasks like writing prescriptions. Real data and broader evaluations needed. https://t.co/Ui46zI9dLD
view full postOctober 16, 2024
-
Xema Pérez
@Xemadeyaka14 (Twitter)RT @emiliomonteb: Testing and Evaluation of Health Care Applications of Large Language ModelsA Systematic Review https://t.co/b7NBd05m1c…
view full postOctober 16, 2024
1
-
Emilio Monte
@emiliomonteb (Twitter)Testing and Evaluation of Health Care Applications of Large Language ModelsA Systematic Review https://t.co/b7NBd05m1c #AI #IA #LLM https://t.co/IevhiBcVvV
view full postOctober 16, 2024
2
1
-
Amadeo Wals
@AmadeoWals (Twitter)https://t.co/qSFwuEJS5G #RADONC #LLMs #AI
view full postOctober 16, 2024
-
Dr. Suchitra Kataria
@Suchitrk (Twitter)Testing and Evaluation of Health Care Applications of Large Language Models https://t.co/Bi0xVukb7m?
view full postOctober 16, 2024
-
Ryan Nipp, MD, MPH, MBA, FASCO
@RyanNipp (Twitter)Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review. https://t.co/E4reSupeee @JAMA_current @JAMANetwork #ArtificialIntelligence #MedEd #MEDTECH #DigitalHealth @StanfordMed https://t.co/6XN7M8Qzhf
view full postOctober 15, 2024
5
-
Yonemoto N
@nyonenyone (Twitter)Testing and Evaluation of Health Care Applications of Large Language Models https://t.co/APxR4xVXct
view full postOctober 15, 2024
-
Supriyo SB Chatterjee
@sbc111 (Twitter)RT @JAMAplusAI: This study identifies inconsistent evaluation practices of large language models (LLMs) in health care, finding a lack of s…
view full postOctober 15, 2024
1
-
JAMA+ AI
@JAMAplusAI (Twitter)This study identifies inconsistent evaluation practices of large language models (LLMs) in health care, finding a lack of standardized frameworks and limited use of real patient data. https://t.co/XlRwt5Uweq https://t.co/xZAsuEnxIW
view full postOctober 15, 2024
5
1
-
Epic Plain
@EpicPlain (Twitter)Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review https://t.co/ULthWLsDZW #LLMs
view full postOctober 15, 2024
-
Temerty Centre for AI in Medicine (T-CAIREM)
@UofT_TCAIREM (Twitter)RT @JAMA_current: This study identifies inconsistent evaluation practices of large language models (LLMs) in health care, finding a lack of…
view full postOctober 15, 2024
2
-
JAMA
@JAMA_current (Twitter)This study identifies inconsistent evaluation practices of large language models (LLMs) in health care, finding a lack of standardized frameworks and limited use of real patient data. https://t.co/O7NbeErlL3 https://t.co/zqkUcvCfkg
view full postOctober 15, 2024
9
2
Abstract Synopsis
- Large language models (LLMs) show potential in health care but existing evaluation methods may not effectively highlight their best applications.
- A systematic review analyzed studies from January 2022 to February 2024, identifying five components: data type, health care task, NLP/NLU tasks, evaluation dimensions, and medical specialty.
- Out of 519 studies reviewed, only 5 utilized real patient data; the majority focused on assessing medical knowledge and answering questions, with accuracy being the main evaluation metric, while other factors like fairness and bias were rarely addressed.
Teresa Hartman
@thartman2u (Twitter)