Dr. Amulya Yadav (Penn State University) joins host Jeffrey to discuss a new Penn State study testing large language models (ChatGPT, Google Gemini, Meta LLaMA) on real patient queries. The study — evaluated by nine Penn State physicians — found
LLMs produced medically valid answers about 76% of the time. In this interview Dr. Yadav explains where LLMs perform well (primary care, differential diagnosis), where they struggle (dermatology, mental health, cases requiring tests or images), and
how these tools should be used as complements to clinicians rather than replacements. They also discuss ethical concerns, existing guardrails, and the need for evolving regulation and user education.
00:00 — Intro & guest welcome
00:18 — Study overview: LLMs tested (ChatGPT, Gemini, LLaMA)
01:05 — Method: Penn State patient queries judged by nine physicians
01:40 — Key result: 76% of LLM answers judged valid/accurate
02:30 — Comparison to human doctors (misdiagnosis ~10–11%)
03:10 — Where LLMs do well: general primary-care queries & differential diagnosis
04:00 — Where LLMs struggle: dermatology (image-dependent) & mental health
05:00 — Risks: sycophancy, rare harmful responses, and limitations without diagnostics
06:00 — Use case: complementary tool for patients with limited access & to assist physicians
07:00 — Ethics & regulation: need for guardrails and evolving frameworks
08:10 — User responsibility: treat LLM outputs cautiously; not a replacement for doctors
09:00 — Closing remarks
Key takeaways
LLMs provided medically valid answers for ~76% of patient queries in the Penn State study.
Strongest performance: general primary-care concerns and differential diagnosis.
Weaknesses: dermatology (requires images) and mental-health responses (tendency to be overly agreeable).
Role: useful complementary tool—especially where access to care is limited—but not a replacement for human physicians; regulation and user awareness are essential.
tags:
#AIHealth #ChatGPT #MedicalAI #LLM #HealthcareTech #Telemedicine #DigitalHealth #AIResearch #EthicalAI
