Clinical Education

Medical AI
Competence

A practical framework for safe, effective clinical AI use. Evidence-based. Clinician-designed.

0 of 10 modules complete
Professional Responsibility

AI literacy improves
Patient Safety.

Clinicians who use AI without understanding how it works introduce a new category of medical error — not through negligence, but through unfamiliarity with a tool that does not behave like any tool that came before it.

AI Competencies for Clinicians
Why This Is a Professional Obligation

The rapid adoption of AI tools in clinical settings has outpaced training. Surveys consistently show that the majority of clinicians are now using AI regularly — for documentation, differential diagnosis, patient communication, and clinical decision support — while fewer than a third have received any formal instruction on AI limitations, failure modes, or safe use practices.

This gap is not abstract. It creates real clinical risk: fabricated citations informing management decisions, outdated guideline summaries presented with current-day confidence, and patient-identifying information entering non-HIPAA-compliant systems. These are not hypothetical — they are documented, recurrent, and preventable.

The Patient Safety Case

AI systems, like humans, are not 100% perfect at flagging dangerous drug interactions, and may provide incorrect dosing information, or generate confident responses to clinical questions where the correct answer is genuinely uncertain.1 Clinicians need to confirm any AI-generated solutions and answers — and vice versa.

Across medicine, the introduction of new technologies — electronic health records, laparoscopic surgery, point-of-care ultrasound — has consistently required formal competency frameworks before widespread clinical adoption. AI is no different, and arguably demands more systematic training because its failure modes are less visible and more linguistically persuasive than any prior clinical tool.

What the Evidence Shows

Large language models have passed USMLE Step examinations, outperformed physicians on certain diagnostic tasks, and demonstrated remarkable facility with medical language.2,3 This capability is real and clinically significant. But capability in a benchmark environment does not equal reliability in a clinical one. LLMs also:

The same model that is remarkably useful when used correctly is a source of harm when used by someone who does not know its limits. AI literacy is the bridge between the two.

The Professional Standard Is Shifting

The American Medical Association, ACOG, and multiple specialty societies have begun issuing guidance on AI in clinical practice. As AI becomes embedded in workflows — through EHR documentation tools, diagnostic aids, and patient-facing chatbots — AI literacy will become part of the expected standard of care. Clinicians who develop this competency now are ahead of a requirement that is coming.

About This Course

This course presents 10 foundational competencies for the safe, effective clinical use of AI — developed at Medical AI Competence (MedicalAICompetence.com). Each module covers one competency in depth: the underlying concept, its clinical implications, recognizable failure patterns, and practical application. The course concludes with a 10-question assessed examination; a score of 70% or above earns a certificate of completion.

The framework is designed for physicians, residents, nurses, certified nurse-midwives, and advanced practice providers. No prior technical background in AI is assumed. The goal is not to produce AI engineers — it is to produce clinicians who know how to use AI safely, recognize when it is failing them, and protect their patients in the process.

10
Competency Modules
70%
Passing Score
🎍
Certificate of Completion
~60
Minutes to Complete
Ready to begin?
Start with Module 1: Clinical Context Engineering
1 Ayers JW, et al. Comparing physician and artificial intelligence chatbot responses to patient questions. JAMA Intern Med. 2023;183(6):589–596.
2 Kung TH, et al. Performance of ChatGPT on USMLE. PLOS Digit Health. 2023;2(2):e0000198.
3 Singhal K, et al. Large language models encode clinical knowledge. Nature. 2023;620(7972):172–180.
4 Obermeyer Z, et al. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447–453.
1

Understanding What LLMs Are and Are Not

Know prediction, non-determinism, fluency limits, and why confident text is not truth.
LLM architecture Knowledge cutoffs Fluency vs. accuracy
What an LLM Actually Is

A Large Language Model is a probabilistic text predictor, not a medical knowledge database or a clinical expert system. It was trained to predict the most statistically likely next word given the words that came before — across enormous volumes of human-generated text.

This architecture has remarkable properties: it can summarize, synthesize, translate, and reason through complex problems. But it has fundamental limitations that every clinician must internalize.

What LLMs ARE
Text predictors
Trained to produce fluent, coherent, contextually appropriate text
What LLMs ARE NOT
Knowledge retrieval
They do not look up facts in a database — they generate text that resembles what they were trained on
What LLMs ARE NOT
Real-time systems
Training has a cutoff date; guideline updates after that cutoff are unknown to the model
What LLMs ARE NOT
Self-verifying
LLMs cannot confirm whether their own output is accurate — they have no external ground truth
Non-Determinism and Variability

LLMs use probabilistic sampling, meaning the same question can produce different answers on different occasions. This is intentional — it prevents every response from being identical — but it means LLM output is not reproducible in the scientific sense.

Clinical Implication

Ask an LLM the same drug dosing question three times and you may get three slightly different answers. This is not a malfunction. It is the expected behavior of a probabilistic system. Treat every clinical output as a draft requiring verification, not a fixed reference.

Fluency Is Not Accuracy

This is the single most dangerous misconception in clinical AI use. LLMs produce grammatically correct, medically plausible, authoritative-sounding text even when the content is factually wrong, outdated, or fabricated.

The confident, clinical prose style of AI output activates the same cognitive heuristics humans use to assess expert communication — which means we are neurologically inclined to trust it more than we should.

Key Insight

Fluency is a measure of text quality, not medical accuracy. A beautifully written paragraph citing a nonexistent study is still wrong. Train yourself to read AI output with the same skeptical eye you'd apply to a medical student's undocumented clinical statement.

Training Cutoffs and Guideline Currency

Every LLM has a training data cutoff date. Medical guidelines, drug approvals, and clinical evidence evolve continuously. When you ask an LLM about ACOG recommendations, it answers based on what was published before its training ended — and may have no awareness that a practice bulletin was revised six months ago.

Setting Up Claude for Clinical Use

Claude (claude.ai) is one of the most capable AI tools available to clinicians right now. But when you first open it, it knows nothing about you — your specialty, your patients, how you like to communicate, or what level of clinical depth you need. A few one-time setup steps change this completely. Here is exactly what to configure, where to find each setting, and why it matters.

First: Free vs. Paid — This Matters

The free version of Claude gives you basic conversations with a smaller AI model. It has no memory between sessions, limited context (it "forgets" earlier parts of long conversations), and no ability to search the web or run calculations. For serious clinical use — long documents, literature searching, memory of your preferences, better reasoning — Claude Pro ($20/month) is the practical minimum. Everything in this setup guide applies to Claude Pro unless otherwise noted.

⚙️ STEP 1 — Find Settings

On desktop: look at the bottom-left corner of the screen — click your name or profile icon, then click Settings.
On mobile (iPhone/Android app): tap the menu icon (three lines, top-left), then tap your name or Settings.

Important note: Claude's interface is updated regularly. If something looks slightly different from this description, look for the same words — the features exist, they may just be in a slightly different location. When in doubt, type "where do I find settings?" directly into Claude and it will tell you.

📝 STEP 2 — Custom Instructions (the most important setup step)

Custom Instructions are a block of text you write once that Claude reads automatically at the start of every conversation. Think of it as briefing a new colleague on who you are before they start working with you — except you only have to do it once and it applies forever.

Where to find it: Settings → “Profile” or “User Preferences”

Write your instructions in plain English. There is no special format required. Here is a practical example for an ObGyn clinician:

I am an attending physician in Obstetrics and Gynecology, specializing in Maternal-Fetal Medicine.
Use clinical depth appropriate for a physician, not a layperson.
Do not add unnecessary disclaimers to clinical discussions — I am a licensed clinician.
Always use Vancouver citation format. Never make up references.
When writing patient-facing materials, use 7th–8th grade reading level.
When I ask for a differential diagnosis, include rare diagnoses I might miss, not just the obvious ones.
Always flag when a guideline may have changed recently or when evidence is genuinely contested.

The result: Claude will automatically apply this clinical context to every conversation without you having to repeat yourself. This alone eliminates the most common frustration with AI — generic, overly cautious responses that treat you like a layperson.

🧠 STEP 3 — Memory

Memory is different from Custom Instructions. Custom Instructions are rules you write once and they stay fixed. Memory is a living record of facts that Claude learns about you over time, automatically, from your conversations.

For example: if you mention in a conversation that you work at a community hospital, Claude stores that and uses it in future conversations without you repeating it. Over time, Claude builds up a useful profile of your work context, preferences, and common tasks.

How to manage memory:

  • To turn on: Settings → find the Memory toggle → switch ON
  • To see what Claude remembers about you: In any conversation, type: "What do you remember about me?" — Claude will list everything it has stored
  • To add something: Just tell Claude: "Remember that I prefer bullet-point summaries" or "Remember I am based in New York"
  • To delete a memory: Settings → Memory → you can view and delete individual items
  • Never store patient information in memory. Memory is for your professional profile and preferences, not clinical data about specific patients.
🌐 STEP 4 — Tools in the Chat Input (Web Search, File Upload, Code)

When you are inside a conversation, look at the bottom of the text input box. You will see a row of small icons. These are your tools — they give Claude additional capabilities beyond just answering questions from its training data.

  • Globe / Web Search icon: Click this to let Claude search the internet in real time. Use this when you need current guidelines, recent publications, or news. Without it, Claude only knows what it was trained on (which has a cutoff date). When to use it: any time currency matters — "what is the latest ACOG guidance on X?"
  • Paperclip / Attachment icon: Upload a PDF, image, or document. Claude will read it and can summarize, extract data, answer questions about it, or rewrite it. Clinical use: upload a journal article and ask for a plain-language summary.
  • Code / Analysis icon (sometimes labeled "Analysis"): Allows Claude to actually run calculations and write executable code. Clinical use: "calculate the z-scores for these gestational age measurements" or "build me a simple BMI calculator."
Important

The icons may look slightly different or be labeled differently depending on when you are reading this — Claude's interface is updated regularly. The features are always there; look at the bottom toolbar of the text input area and hover over each icon to see what it does.

📁 STEP 5 — Projects (organized workspaces)

Projects are separate workspaces where you can group related conversations and give Claude a specific set of instructions and reference documents for that context. Think of each Project as a specialized assistant for a specific part of your work.

How to create one: In the left sidebar, look for “New Project” or a “+” button near Projects. Give it a name. Inside the project you can add Project Instructions (instructions that only apply within this project) and upload documents that Claude will use as reference material.

Practical clinical examples:

  • Patient Education project: Upload your practice's standard handouts. Ask Claude to create new ones that match your existing style and terminology.
  • Research project: Upload your manuscript draft. Ask Claude to check consistency, suggest references, or help with a specific section.
  • Lectures project: Upload your slide outline. Ask Claude to expand each point into speaker notes at the right level for your audience.
📌 STEP 6 — Save your best prompts

When you write a prompt that produces exactly the result you wanted, save it somewhere — a note on your phone, a document on your computer, or inside a Project in Claude. Over time this becomes your personal clinical prompt library.

The goal is to never have to re-engineer a good prompt from scratch. If "summarize this article in 3 sections at 7th-grade level, flag any claims that need verification" worked perfectly once, paste it every time. Prompts are reusable tools.

The Three Most Common Setup Mistakes
  • Skipping Custom Instructions entirely — means re-explaining your specialty and preferences in every single conversation. Spend 10 minutes on this once and save hours over the following months.
  • Confusing Memory and Custom Instructions — Instructions are rules you write that always apply, exactly as written. Memory is facts Claude picks up and stores automatically over time. Both are useful. They are not the same thing and they do not replace each other.
  • Using the free tier for clinical work — the free tier uses a smaller model, has no memory, and loses context in longer conversations. For the tasks described in this course, Claude Pro ($20/month) is what you need.
2

Defining the Clinical Question

Match AI use to the task: counseling, documentation, differential diagnosis, or workflow.
Task taxonomy Use-case matching High vs. low risk
Not All Clinical Questions Are Equal

AI performs extraordinarily well for some clinical tasks and unreliably for others. The difference is not the AI — it's the nature of the task. A question requiring synthesis of general knowledge plays to AI's strengths. A question requiring precise retrieval of a specific current guideline exposes its weaknesses.

Competency 3 is the skill of knowing which kind of question you are asking before you ask it.

Task Taxonomy: High vs. Low Risk
Strong AI Use
Drafting & Summarizing
Patient letters, discharge summaries, consult notes — AI drafts, you verify and edit
Strong AI Use
Plain Language Translation
Converting clinical text to patient-accessible language for counseling
Strong AI Use
Differential Generation
Brainstorming a broad differential — AI as a cognitive checklist, not a diagnostic oracle
Moderate — Verify
Evidence Summaries
Summarizing published literature — content may be accurate but citations require verification
High Risk — Verify All
Drug Doses & Calculations
AI-generated doses must be verified in official prescribing references or pharmacopoeia
High Risk — Verify All
Specific Guideline Retrieval
Ask AI to summarize ACOG PB 203 — you may get an accurate summary, or a plausible fabrication
Before You Prompt: Three Questions
  1. What category is this task? Synthesis/drafting (AI strength) or precise retrieval (AI weakness)?
  2. What is the consequence of a wrong answer? Inconvenience vs. patient harm?
  3. Can I verify the output? If you cannot check it against a primary source, do not act on it.
Case Example

An OB resident asks an LLM: "What is the ACOG recommendation for GBS prophylaxis in a penicillin-allergic patient with low anaphylaxis risk?" This is a specific guideline retrieval task. The AI may answer correctly — or may generate a plausible-sounding but incorrect regimen. This answer must be verified against the current ACOG/CDC GBS guideline before acting. This is non-negotiable.

3

Choosing the Right AI Tool

Select the model or platform that fits the clinical job, privacy needs, and setting.
Platform selection HIPAA compliance Consumer vs. enterprise
The Platform Landscape

Clinicians currently access AI through three broad categories of platforms, each with different capabilities, privacy guarantees, and appropriate use contexts. Choosing incorrectly exposes patients to privacy risk and clinicians to liability.

Tier 1 — Free
Free Consumer AI
Claude.ai (free), ChatGPT (free), Gemini (free). Limited context window, no memory, basic models. Not HIPAA-compliant. Use for de-identified learning only. Never enter PHI.
Tier 2 — Paid
Paid Consumer AI
Claude Pro ($20/mo), ChatGPT Plus ($20/mo), Gemini Advanced. Larger context windows, memory, web search, file uploads, better models. Still not HIPAA-compliant individually — same PHI rules apply.
Tier 3
Enterprise AI
Claude for Enterprise, Azure OpenAI, Google Workspace AI. BAA available. Appropriate for institutional use with proper agreements.
Tier 4
EHR-Integrated
Epic Copilot (Nuance DAX), Oracle Health AI. Designed for PHI. Operated within your system's privacy framework. Lowest risk for clinical documentation.
Tier 5
Medical-Specific
Glass AI, Doximity AI, specialty tools. Purpose-built for clinical workflows with varying evidence bases.
The HIPAA Question

HIPAA applies whenever Protected Health Information (PHI) is involved. PHI includes names, dates, geographic data, contact information, MRNs, and any information that could identify a specific patient, either alone or in combination.

Non-Negotiable Rule

Never enter PHI into a consumer AI platform. This applies even if you intend to "de-identify" it yourself in real time. The practical definition: if a determined person with access to your inputs could re-identify the patient, it is PHI. Use synthetic or fully de-identified information, or use a platform with a signed Business Associate Agreement (BAA).

Decision Framework
  1. Does this task involve patient data? If yes → go to step 2. If no → any platform is acceptable.
  2. Is the data PHI? If yes → only EHR-integrated or enterprise tools with BAA. If no → enterprise or consumer tools.
  3. Does your institution have an AI policy? If yes → follow it. If no → advocate for one and default to conservative settings.
  4. Does the task require real-time data or clinical calculations? Use a purpose-built tool with validated algorithms, not a general LLM.
4

Verification and Source Validation

Check facts, guidelines, references, doses, and calculations before clinical use.
Verification workflow Reference validation Primary sources
The Verification Imperative

No AI output should be used in clinical practice without verification of the specific factual claims it contains. This is not a sign that AI is unreliable — it is the appropriate professional standard, analogous to verifying a trainee's management plan against primary references before signing off on it.

The practical question is not whether to verify, but how much and how quickly — calibrated to the potential consequence of an error.

What to Verify (and How)
Always Verify
Drug Doses
Check against FDA prescribing information, clinical pharmacopoeia, or institutional formulary. Never act on AI-generated doses alone.
Always Verify
Cited References
Look up every citation in PubMed or the journal directly. Confirm title, authors, journal, year, volume, pages, and DOI.
Always Verify
Guideline Recommendations
Check the actual ACOG/SMFM/FIGO source document. AI summarizes — it does not reproduce guidelines exactly.
Risk-Calibrated
Clinical Calculations
EDD, gestational age, Bishop score — verify with validated tools or calculators, not LLM math.
Reference Hallucination

LLMs frequently generate citations that appear real but do not exist — fabricated author combinations, invented journal volumes, or real journal names attached to fictional articles. This is not intentional deception; it is the predictable output of a system that generates plausible-sounding text without ground truth access.

Published Evidence

Multiple studies have documented high rates of AI-generated citation fabrication in medical contexts. A 2023 study found that 30-47% of AI-generated references in legal and academic contexts were partially or fully fabricated. Clinical rates are similar. Every AI-generated citation must be verified in PubMed before use in any professional document.

Verification Rule

Search PubMed by PMID if provided, or by first author + keywords + year. Confirm: authors match, title matches, journal matches, volume/issue/pages match, DOI resolves. If any element fails, the reference should be flagged and not used.

5

Clinical Context Engineering and Agentic Workflows

Frame the patient context, task, and decision point. Use better prompts, stepwise agentic workflows, and vibe coding when useful.
PRECISE frameworkAgentic workflowsMulti-step AIVibe coding
Why It Matters

AI output quality is directly proportional to input quality. An LLM cannot compensate for vague or incomplete clinical framing — it will confidently fill gaps with plausible-sounding but potentially incorrect assumptions. In clinical practice, those assumptions can lead to harm.

The clinician's most important AI skill is not knowing which model to use — it is knowing how to structure what you give it.

Core Principle

GIGO applies with extra clinical weight: Garbage In, Garbage Out. In consumer tasks, a bad AI answer wastes time. In clinical tasks, it may influence management, documentation, or counseling.

The PRECISE Framework

Use this structure to build effective clinical prompts. Each element reduces the probability of a clinically misleading response.

P
Patient
Age, gestational age if applicable, parity, key diagnoses, relevant medications, allergies
R
Role
State your clinical role: "I am an MFM specialist," "I am a resident on L&D"
E
End Goal
What decision or output do you need? Differential? Counseling language? Documentation?
C
Constraints
What limitations apply? Specific guidelines, hospital policy, patient preferences, formulary
I
Instructions
Format requirements: bullet list, lay language, 150 words, SOAP format, Vancouver citations
S
Safety Rails
State what NOT to include: "Do not suggest treatments not on our formulary," "Flag any dose that exceeds guidelines"
E
Example (Optional)
Provide a sample output or style you want to match. One example can calibrate the format precisely.
Clinical Scenarios
Weak Prompt → Risk

"What should I do with this patient who has hypertension?"
AI cannot distinguish gestational hypertension from chronic hypertension from preeclampsia. Any management suggestion could be wrong and dangerous.

Strong Prompt → Safer

"I am an OB attending. My patient is a 34-year-old G2P1 at 36+4 weeks with new-onset BP readings of 152/98 and 148/96, 6 hours apart, proteinuria 2+, no other symptoms. Summarize the ACOG diagnostic criteria for preeclampsia without severe features and list the management decision points per current ACOG guidance. Flag any areas of clinical controversy."

Never Include

Patient name, date of birth, MRN, or any identifying information in a consumer AI platform (Claude.ai, ChatGPT, Gemini). Use synthetic or de-identified information only.

Practical Tips
Agentic AI: What It Is and Where to Use It

The word "agentic" simply means that the AI performs a sequence of steps toward a goal, rather than answering a single question. Instead of one prompt → one answer, you give the AI a goal and it figures out and executes multiple steps to reach it — searching the web, reading documents, running calculations, writing code — before delivering the final result.

The Key Point Most Guides Miss

Agentic AI is not a separate platform or product you need to find and sign up for. It is already built into the same Claude Pro, ChatGPT Plus, and other paid AI tools you may already have. The difference is not which app you use — it is whether you have turned on the right tools and given the AI a multi-step goal rather than a single question.

Where Agentic AI Lives Right Now — Concretely
Claude Pro (claude.ai) — What you already have

When you turn on Web Search and give Claude a multi-step goal, it becomes agentic. Example: "Search for the three most recent RCTs on low-dose aspirin for preeclampsia prevention, summarize each one, then give me a paragraph synthesis with implications for my practice."

Claude searches the web multiple times, reads the results, decides what to keep, summarizes each paper, then writes the synthesis — all in one go. You gave one goal; it executed many steps. That is agentic AI. You did not install anything new.

Deep Research mode — The most explicitly agentic feature

Available in Claude Pro and ChatGPT Plus. You give a research question; the AI runs 10–30 sequential web searches over several minutes, reads and evaluates sources, resolves contradictions, and produces a structured report with citations. To use it in Claude: type your question, then look for a “Deep Research” button or option before hitting send. In ChatGPT Plus: look for the research option in the same area. This is a fully autonomous multi-step workflow — you provide the question, the AI does everything else.

Epic Copilot / Nuance DAX — Agentic AI already in your workflow

If your hospital uses Epic with AI documentation tools, you are already using agentic AI. The system listens to your patient encounter, transcribes it in real time, extracts the HPI, exam, assessment, and plan, structures them into a SOAP note, and drafts it for your review — all automatically from a single trigger (starting the recording). You set the goal (document this encounter); the AI executes all the steps. No prompting required.

Sequential prompting — Agentic behavior without special tools

Even in a basic conversation, you can create an agentic-style workflow by writing a prompt that tells Claude to complete a series of steps in order before delivering the final answer. Example: "First summarize this patient context. Then identify the three most relevant risk factors. Then draft a patient counseling letter addressing those specific risk factors at 7th-grade reading level. Do all three steps before giving me the letter." No special features required — just a well-structured multi-step prompt.

Practical Summary
  • You already have agentic AI if you have Claude Pro or ChatGPT Plus — turn on web search and give it a multi-step goal
  • Deep Research mode is the most powerful built-in agentic feature — look for it in the interface before sending a complex research question
  • No separate platform needed for most clinical agentic workflows — the distinction is how you prompt, not where you are
  • EHR-integrated tools (Epic Copilot, DAX) are purpose-built agentic systems — if your hospital has them, they are the safest option for PHI-containing workflows
Safety Rules for Agentic AI

Agentic workflows amplify both efficiency and error propagation. An error in step 1 (wrong patient context, wrong retrieved guideline) may not be visible until the step 4 output has already been shaped by it. The longer the chain, the more important early verification.

Agentic Safety Rules
  • Verify the foundation: If step 1 context or retrieved information is wrong, every downstream step is potentially wrong. Verify early-stage outputs before approving continuation.
  • Define explicit human checkpoints for high-stakes clinical workflows — do not allow fully autonomous execution on patient-affecting decisions
  • Maintain full-chain accountability: You are responsible for the final output including errors introduced in intermediate steps you did not review
  • PHI applies throughout: Every step of an agentic workflow is subject to the same HIPAA rules as a single-turn interaction
The Supervision Principle

As AI workflows become more autonomous, the clinician's role shifts from executor to supervisor. This is a higher-order skill. Effective supervision requires understanding each step's purpose, the errors each step might introduce, and the verification required before any output is used clinically.

Vibe Coding: Building Clinical Tools Without Writing Code

Vibe coding is the practice of creating functional software — calculators, forms, trackers, decision-support tools — by describing what you want in plain language and iterating conversationally with an AI until the result works. The name reflects the approach: you describe the feel, function, and purpose of the tool you want; the AI writes the code. You never touch the code directly.

This is not a theoretical future capability. It is available now in Claude Pro, ChatGPT Plus, and similar paid tiers. The tools at ObGyn Intelligence (tools.obmd.com) — risk calculators, informed consent generators, screening tools, and interactive educational modules including this course — are built entirely this way. No programming background required.

What Vibe Coding Changes for Clinicians

Until recently, building a custom clinical tool required hiring a developer ($5,000–$50,000+), waiting months, and depending on someone else to maintain it. Vibe coding compresses this to hours, at near-zero cost. A clinician who identifies a workflow gap on Monday can have a working prototype by Tuesday. This democratizes clinical tool development in a way nothing has before.

How to Vibe Code a Clinical Tool: Step by Step
Step 1
Describe the Tool
Tell Claude exactly what you want: "Build an HTML single-page tool that calculates VBAC success probability based on age, prior vaginal birth, BMI, and indication for prior cesarean." Be specific about inputs and outputs.
Step 2
Specify the Design
Describe visual requirements: "Use a navy and cream color scheme, professional medical appearance, mobile-friendly layout. Add a disclaimer that this is for educational purposes." Include branding if relevant.
Step 3
Review the Output
Claude produces the working HTML/code. Download it and open in a browser. Test every input combination. Does it behave as intended? Are the results clinically correct?
Step 4
Iterate in Plain Language
"The BMI field accepts negative numbers — add a validation. The results section should also show a confidence interval. Change the button color to gold." Each refinement is a plain-language instruction.
Step 5
Verify Clinical Content
The most important step. Verify that every calculation, threshold, and recommendation the tool produces is accurate and matches current guidelines. The code may be correct; the clinical logic must be validated by you.
Step 6
Deploy
Tools built as single HTML files can be deployed to Netlify (netlify.com) by drag-and-drop — free hosting in under 60 seconds. Share via URL with colleagues or patients.
Clinical Tools You Can Build in One Session
Patient-Facing Tools
  • Gestational diabetes diet and monitoring guide personalized by trimester and glucose targets
  • Postpartum warning signs checklist customized by delivery type and comorbidities
  • Informed consent summary document for specific procedures at 7th-grade reading level
  • Fetal kick count tracker with trend analysis
Clinician-Facing Tools
  • Modified Bishop score calculator with delivery recommendation
  • GBS prophylaxis decision guide by allergy status and sensitivities
  • Preeclampsia risk stratification tool based on SMFM criteria
  • Personalized CME tracker with specialty-specific logging
Vibe Coding Safety Rules
  • You own the clinical content: AI writes the code; you verify the medicine. Never deploy a clinical tool without validating every calculation, threshold, and recommendation against the primary evidence base.
  • Disclaimer required: Every patient-facing tool must include a clear disclaimer that it is for educational or informational purposes and does not replace clinical judgment or a consultation with a licensed provider.
  • PHI still applies: Tools designed to collect or process patient data must comply with HIPAA. Do not vibe code a PHI-collecting tool using consumer AI; use enterprise tools with appropriate data agreements.
  • Version control matters: When you update a deployed tool, ensure the previous version is archived. Clinical tools used in practice should have a documented version history.
💡 Advanced Tip: The Step-by-Step Questioning Prompt

One of the most powerful but underused prompting techniques is instructing the AI to gather information through sequential questions before generating its output — rather than attempting a response with incomplete data. This mirrors how expert clinicians actually reason: structured data collection first, assessment second.

This technique is particularly valuable for complex clinical assessments where missing one piece of information changes the entire interpretation. The AI becomes an interactive clinical framework, not a static text generator.

The Pattern

Instruct the AI explicitly: "Do not give me an assessment yet. Instead, ask me one question at a time about [clinical topic]. After I answer each question, ask the next one. When you have gathered all the information you need, give me a structured assessment."

🎯 WORKED EXAMPLE — CTG Interpretation Prompt

Copy and paste this prompt to have Claude guide you through a systematic CTG interpretation:

I want to practice systematic CTG (cardiotocogram) interpretation. Do NOT give me an assessment yet.

Instead, ask me questions ONE AT A TIME in this exact order, waiting for my answer before proceeding:

1. First ask: What is the baseline fetal heart rate (in bpm)?
2. Then ask: How would you describe the baseline variability? (absent/minimal/moderate/marked)
3. Then ask: Are there accelerations present? If yes, are they spontaneous or provoked?
4. Then ask: Are there decelerations? If yes — what type: early, late, variable, or prolonged?
5. Then ask: If variable or late decelerations are present — describe their depth, duration, and recovery.
6. Then ask: What are the uterine contractions like — frequency (per 10 min), duration, and resting tone?
7. Then ask: What is the gestational age and clinical context (labor, antepartum, post-dates, IOL)?
8. Then ask: Any relevant maternal or fetal risk factors (e.g., GDM, IUGR, epidural, oxytocin)?

After I have answered all 8 questions, give me:
• A structured NICHD classification (Category I / II / III)
• The clinical reasoning behind the classification
• Recommended immediate management steps
• Any features that would trigger escalation or expedited delivery
• One teaching point about the most significant finding in this tracing

This prompt template is reusable. Modify the clinical domain to create structured assessment prompts for any complex scenario: shoulder dystocia management, postpartum hemorrhage staging, eclampsia protocols, or neonatal resuscitation decision trees. The principle — sequential structured questioning before assessment — produces dramatically more accurate and clinically useful AI outputs than presenting all information in a single unstructured block.

Why This Works Better

Structured sequential input forces the AI to weight each clinical variable appropriately before synthesizing. It also forces you to observe each feature systematically — which is itself the educational value of the exercise.

Other Clinical Applications
  • Systematic pelvic exam assessment
  • Ultrasound biophysical profile scoring
  • Preeclampsia severity classification
  • Postpartum hemorrhage staging and response
6

Recognizing AI Failure Modes

Detect hallucinations, outdated guidance, bias, sycophancy, and overconfidence.
Hallucination Sycophancy Bias detection
Taxonomy of AI Failures

AI failures are not random — they follow recognizable patterns. Training yourself to anticipate these patterns is the clinical equivalent of knowing the common complications of a procedure: you cannot prevent all of them, but you can watch for them and catch them early.

Failure Mode 1
Hallucination
Confident generation of false information — fabricated citations, nonexistent drugs, invented studies. Most dangerous because the text is indistinguishable from accurate output.
Failure Mode 2
Sycophancy
AI agrees with the user's framing even when wrong. If you state an incorrect premise in your prompt, the AI is likely to validate it rather than correct it.
Failure Mode 3
Outdated Guidance
Confidently recommending superseded management strategies based on training data that predates a guideline revision.
Failure Mode 4
Demographic Bias
Systematic differences in output quality or recommendations by race, sex, insurance status — inherited from biased training data.
Failure Mode 5
Overconfidence
AI states uncertain or contested information with the same confident tone as established facts, giving no signal about epistemic status.
Sycophancy in Clinical Practice

Sycophancy is among the most clinically dangerous failure modes because it is invisible unless you test for it. An LLM trained to be helpful and agreeable will tend to validate your framing — including wrong framings. This mirrors a phenomenon in medical education called "premature closure," except the AI will never push back on its own.

Demonstration

"My patient at 37+2 weeks is full term and I'm planning induction. What are the benefits?"
An AI is likely to answer this question supportively — describing induction benefits — rather than correcting the terminology (37+2 is early term, not full term per ACOG). This is sycophancy in practice.

Counter-Strategy

Deliberately test AI outputs with adversarial prompts: "What would be the strongest argument against this plan?" or "Under what circumstances would this recommendation be wrong?" This forces the model out of validation mode.

Red Flags in AI Output
7

AI-Supported Clinical Reasoning, Not Memorization

Use AI to handle retrieval and memorization so you can focus on higher-level judgment, interpretation, and patient-centered care.
Cognitive augmentation Accountability Judgment calibration
The Partnership Model

The most effective clinical AI use is not replacement of clinical reasoning — it is augmentation of it. Think of AI as a highly read, tireless, linguistically fluent colleague who has never examined a patient, cannot observe clinical context, and carries no legal or professional accountability for what they suggest.

You bring the clinical examination, the patient relationship, the institutional context, the professional accountability, and the judgment. AI brings breadth of text, speed of synthesis, and tireless availability.

Fundamental Principle

Physician accountability is non-delegable. No level of AI capability transfers professional or legal responsibility for a clinical decision. The clinician who acts on AI output owns the outcome of that action.

Cognitive Offloading Risk

A well-documented cognitive science phenomenon: when we rely on external tools for mental tasks, we gradually lose the internal capacity to perform those tasks without the tool. This is acceptable for calculator arithmetic — it may not be acceptable for clinical reasoning skills that must remain available in an emergency, at 3 AM, without internet access.

When to Trust, Override, or Escalate
Use AI Output Directly (after verification)

Documentation drafts, patient education materials, coding suggestions, literature search starting points, administrative templates

Use AI as Input, Apply Clinical Judgment

Differential diagnosis generation, management option lists, guideline summaries — AI provides the draft, your clinical assessment determines the plan

Override AI and Verify Independently

Drug doses, specific guideline thresholds, time-sensitive management decisions, any situation where your clinical assessment contradicts the AI output. Your clinical judgment is more reliable here than the text predictor.

8

AI in Patient Counseling and Communication

Improve readability, plain-language explanation, informed consent, and correction of misinformation.
Health literacy Informed consent Misinformation
AI as a Communication Bridge

The average health literacy of US adults is approximately 8th-grade level, yet most standard informed consent documents are written at 12th-grade level or above. AI can translate complex clinical information into accessible language with precision and consistency — one of its most clinically valuable and underutilized applications.

High-Yield Application

AI-generated plain-language summaries of complex procedures, diagnoses, and treatment options can reduce the comprehension gap that underlies many informed consent failures. The translation must be verified by the clinician for accuracy before use.

Supporting Informed Consent

Informed consent requires that patients receive information in a form they can understand and process. AI can help generate:

Critical Constraint

AI-generated patient materials require clinician review before delivery. Errors in patient-facing materials — particularly regarding risk percentages, procedure outcomes, or medication instructions — can directly harm patients and create significant liability exposure.

Correcting Misinformation

Patients increasingly arrive with AI-generated information — from Claude.ai, ChatGPT, Google Gemini, or health-specific chatbots. This information may be accurate, outdated, exaggerated, or simply wrong.

A productive clinical response is not to dismiss AI-sourced information but to engage with it specifically: acknowledge what is accurate, correct what is not, and explain the source of the discrepancy in plain language. AI can help you draft these explanations.

Example Response Framework

"What you read is partially accurate — [correct part]. The concern is that [specific correction]. Here's why your care plan is based on something different: [brief explanation]. The guideline I'm working from is [source]."

9

Privacy, Ethics, Documentation, and Liability

Protect PHI, document responsibly, respect consent, and understand medicolegal risk.
PHI protection Documentation Medicolegal risk
PHI and the AI Privacy Boundary

HIPAA's Privacy and Security Rules apply to the use of AI tools when PHI is involved. Consumer AI platforms (Claude.ai, ChatGPT, Gemini) are not HIPAA-covered entities and do not sign Business Associate Agreements with individual clinicians. Inputting PHI into these platforms constitutes a potential HIPAA violation, regardless of intent.

Specific Risk Areas
  • Pasting a clinical note (even one you wrote) containing patient name, age, and diagnosis
  • Uploading pathology reports or imaging results to summarize
  • Using AI to draft referral letters with patient-identifying information
  • Asking AI to help you think through a specific patient case with identifying details
Safe Practice

De-identify completely before using consumer AI: replace name with "Ms. A," remove specific dates (use relative timing), remove geographic details, use age range rather than exact age. When in doubt, use your EHR-integrated AI tool or an enterprise platform with BAA.

Documentation Standards

Emerging professional standards increasingly require clinicians to document when AI was used in clinical care. The rationale: transparency supports accountability, allows audit of AI influence on outcomes, and protects the clinician from liability by demonstrating that human judgment was applied.

Liability Framework

Current legal analysis places liability for AI-assisted clinical decisions squarely on the clinician who acted on those decisions. There is no viable "the AI told me to" defense in medical malpractice. The professional standard of care does not lower because an AI was involved — it may be argued to increase if AI use was careless or not appropriately supervised.

Ethical Consideration

Informed consent for AI involvement in care is an evolving ethical obligation. Patients have a reasonable expectation of knowing whether AI systems are participating in their diagnostic or treatment planning process. As AI becomes more integrated into clinical workflows, disclosure standards will develop. Adopt a posture of transparency now.

10

Workflow Integration, Governance, and Standard of Care

Build safe implementation, team oversight, policy, and readiness for changing standards.
Governance Implementation Standard of care
Why Governance Matters

Individual AI competency is necessary but not sufficient for safe clinical AI adoption. When AI tools operate without institutional governance, individual variation in use creates unequal care, unmanaged liability, and undetected errors. A single hallucinated drug dose that reaches a patient is a governance failure as much as an individual failure.

The Governance Framework
Step 1
Inventory
Identify all AI tools currently in use by clinical staff, formally and informally. What is being used, for what tasks, by whom?
Step 2
Policy
Define approved tools, approved use cases, prohibited uses, and documentation standards. Get legal and compliance review.
Step 3
Training
All clinical staff using AI must complete AI literacy training — including these 10 competencies or equivalent content.
Step 4
Oversight
Establish a mechanism for reporting AI-related errors or near-misses. Review quarterly. Update policy as tools and evidence evolve.
Step 5
Monitoring
Track patient outcomes associated with AI-assisted care. Audit documentation quality. Watch for emerging liability patterns.
Standard of Care: A Shifting Landscape

The standard of care in medicine is defined by what a reasonably prudent practitioner would do under similar circumstances. As AI becomes embedded in clinical workflows, two opposing risks emerge:

  1. Early adoption risk: Using AI tools that are not validated for clinical use, without appropriate oversight, creating liability exposure
  2. Non-adoption risk: As AI-assisted care improves outcomes, the reasonable standard may evolve to expect AI use — creating liability for those who do not use available tools appropriately
Bottom Line

Clinicians who develop AI literacy now — and document it — are better positioned for both the current liability environment and the evolving standard of care. AI literacy is not optional; it is a professional competency of the 21st-century clinician.

Preparing Your Team

Evidence Base

This course synthesizes published evidence on AI literacy, clinical AI safety, and LLM performance in medical contexts. Vancouver format. All citations verified where noted.

  1. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44–56. doi:10.1038/s41591-018-0300-7. PMID 30617339.
    Foundational framing of AI-human clinical partnership; defines augmentation vs. replacement paradigm.
  2. Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med. 2023;183(6):589–596. doi:10.1001/jamainternmed.2023.1838. PMID 37115527.
    Demonstrates AI capacity for patient communication; also illustrates importance of clinical verification.
  3. Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447–453. doi:10.1126/science.aax2342. PMID 31649194.
    Landmark study on demographic bias in clinical AI; essential reading for Competency 6.
  4. Rajpurkar P, Chen E, Banerjee O, Topol EJ. AI in health and medicine. Nat Med. 2022;28(1):31–38. doi:10.1038/s41591-021-01614-0. PMID 35058619.
    Comprehensive review of AI applications across clinical domains; contextualizes task taxonomy in Competency 3.
  5. Ji Z, Lee N, Frieske R, Yu T, Su D, Xu Y, et al. Survey of hallucination in natural language generation. ACM Comput Surv. 2023;55(12):1–38. doi:10.1145/3571730.
    Definitive technical review of hallucination mechanisms; underpins Competencies 5 and 6.
  6. Bickmore TW, Trinh H, Olafsson S, Barrett TG, Galen R, Monuteaux MC, et al. Patient and consumer safety risks when using conversational AI for medical information: a study of both user and content safety risks. JAMA Intern Med. 2018;178(8):1115–1116. doi:10.1001/jamainternmed.2018.1815. ⚠️ UNVERIFIED — verify authors, pages, and DOI.
    Early documentation of patient-facing AI safety risks; relevant to Competencies 4, 8, and 9.
  7. American College of Obstetricians and Gynecologists. Ethical Considerations for the Integration of Artificial Intelligence Assisted Technologies in Obstetric and Gynecologic Practice. Committee Opinion No. 904. Washington, DC: ACOG; 2022. Available from: https://www.acog.org
    ACOG's primary ethical framework for AI in ObGyn; essential reading for Competency 9. ⚠️ Verify current Committee Opinion number and year on ACOG.org
  8. American Medical Association. Augmented intelligence in medicine: policy and principles. Chicago: AMA; 2023. Available from: https://www.ama-assn.org
    AMA policy framework for AI governance; informs Competency 10.
  9. Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, et al. Large language models encode clinical knowledge. Nature. 2023;620(7972):172–180. doi:10.1038/s41586-023-06291-2. PMID 37438534.
    MedPaLM study demonstrating LLM clinical knowledge encoding; contextualizes both capabilities and limits discussed in Competency 2.
  10. Nori H, King N, McKinney SM, Carignan D, Horvitz E. Capabilities of GPT-4 on medical challenge problems. arXiv:2303.13375. 2023. Available from: https://arxiv.org/abs/2303.13375
    Documents GPT-4 performance on USMLE-style questions; illustrates fluency vs. clinical judgment distinction in Competency 2.
  11. Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198. doi:10.1371/journal.pdig.0000198. PMID 36812645.
    Demonstrates LLM medical knowledge performance; relevant to understanding appropriate use limits in Competency 3.
  12. Rodger D, Porsdam Mann S, Earp B, Savulescu J, Bobier C, Blackshaw BP. Generative AI in healthcare education: How AI literacy gaps could compromise learning and patient safety. Nurse Educ Pract. 2025;87:104461. doi:10.1016/j.nepr.2025.104461. PMID 40633198.
    Directly addresses AI literacy gaps and patient safety consequences; supports the professional responsibility framing of this course.
  13. Labkoff S, Solomonides A, Raths A, Starren J, Bhavnani SK, Kavuluru R, et al. Toward a responsible future: recommendations for AI-enabled clinical decision support. J Am Med Inform Assoc. 2024;31(11):2730–2739. doi:10.1093/jamia/ocae209. PMID 39325508.
    Governance and responsible implementation framework; informs Competency 10.
  14. Cao W, Zhang Q, Liu J, Liu S. From agents to governance: essential AI skills for clinicians in the large language model era. J Med Internet Res. 2026;28:e86550. doi:10.2196/86550.
    Maps AI skills to clinical governance; supports Competencies 5 (agentic workflows) and 10.
  15. Garvey KV, Thomas Craig KJ, Russell R, Novak LL, Moore D, Miller BM. Considering clinician competencies for the implementation of artificial intelligence-based tools in health care: findings from a scoping review. JMIR Med Inform. 2022;10(11):e37478. doi:10.2196/37478. PMID 36318697.
    Scoping review of clinical AI competency frameworks; foundational evidence base for the 10-competency structure.
  16. Moëll B, Sand Aronsson F. Harm reduction strategies for thoughtful use of large language models in the medical domain: perspectives for patients and clinicians. J Med Internet Res. 2025;27:e75849. doi:10.2196/75849. PMID 40712151.
    Harm reduction framework for LLM clinical use; supports Competencies 4 (verification), 6 (failure modes), and 9 (ethics).
Citation Integrity Notice: ⚠️ UNVERIFIED citations require PubMed confirmation before professional use. All other citations have been checked against PubMed at the time of course development. DOIs should be tested directly. Per Vancouver format, all cited literature should be accessible at the time of reference.

Clinical AI Literacy Assessment

10 multiple-choice questions covering the full competency framework. Score ≥70% (7/10) to earn your certificate of completion.

Estimated time: 8–10 minutes

MedicalAICompetence.com
Certificate of Completion
is awarded to
Clinician
Medical AI Competence — The 10 Clinical AI Competencies
A Practical Framework for Safe, Effective Clinical AI Use
Score: — · Completed —
Amos Grünebaum, MD
Professor of Obstetrics & Gynecology and Maternal-Fetal Medicine
MedicalAICompetence.com · obmd.com