Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
post
page

BrainX Community Live! December 2025: Reasons to evaluate LLMs in healthcare

Featuring Suhana Bedi & Raghav Awasthi

16 December, 2025

Watch the video of this event on our YouTube channel.

Speakers

Suhana Bedi

B.S.

Suhana Bedi is a PhD student in Biomedical Data Science at Stanford University, developing systematic benchmarks for healthcare Foundation Models. Her recent research published in JAMA revealed gaps between LLM evaluations and clinical needs. She focuses on improving model performance through data-driven techniques and alignment methods, aiming to enable safe deployment in clinical settings.

Raghav Awasthi

PhD

Raghav is the co-lead for BrainXAI ReSearch and a postdoc scholar at Case Western Research University,USA. He completed his Ph.D. from IIIT-D working on AI in Healthcare. Prior to joining this institute, he completed Masters in Mathematics from IIT Kanpur where he was the rank opener for the batch.He also outperformed in CSIR-JRF which approved his funding for Ph.D. His Ph.D. thesis is currently focused on Causal Machine Learning and Reinforcement Learning in Healthcare. His focus is to build data-driven decision support that is explainable and effectively deployable.He is highly motivated to explore the applications of Mathematics in Healthcare.

Program Description:

Introduction

“Evaluating LLMs for Reasoning and Real-World Use in Medicine”: Suhana Bedi, BS, Stanford University

“Evaluation tools for LLMs in healthcare”: Raghav Awasthi, PhD., BrainXAI ReSearch

Q&A

 

Key Related publications:

  1. Bedi SLiu YOrr-Ewing L, et al. Testing and Evaluation of Health Care Applications of Large Language ModelsA Systematic ReviewJAMA. 2025;333(4):319–328. doi:10.1001/jama.2024.21700
  2. Bedi SJiang YChung PKoyejo SShah N. Fidelity of Medical Reasoning in Large Language Models. JAMA Netw Open. 2025;8(8):e2526021. doi:10.1001/jamanetworkopen.2025.26021
  3. https://crfm.stanford.edu/helm/medhelm/latest/
  4. Awasthi, R., Bhattad, A., Ramachandran, S.P. et al. Human evaluation of large language models in healthcare: gaps, challenges, and the need for standardization. npj Health Syst. 2, 40 (2025). https://doi.org/10.1038/s44401-025-00043-2
  5. Raghav Awasthi, Nishant Singh, Shreya Mishra, Piyush Mathur, et al. 2025. GamELY: Human-in-the loop Framework for Scaling Human Evaluation of LLMs in Healthcare. Proceedings of the 16th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. Association for Computing Machinery, New York, NY, USA, Article 48, 1–6. https://doi.org/10.1145/3765612.3767224
  6. Theory of Mind Imitation by LLMs for Physician-Like Human Evaluation. Raghav Awasthi, Shreya Mishra, Piyush Mathur, et al. 2025.  https://www.medrxiv.org/content/10.1101/2025.03.01.25323142v2