A practical framework for safe, effective clinical AI use. Evidence-based. Clinician-designed.
Clinicians who use AI without understanding how it works introduce a new category of medical error — not through negligence, but through unfamiliarity with a tool that does not behave like any tool that came before it.
The rapid adoption of AI tools in clinical settings has outpaced training. Surveys consistently show that the majority of clinicians are now using AI regularly — for documentation, differential diagnosis, patient communication, and clinical decision support — while fewer than a third have received any formal instruction on AI limitations, failure modes, or safe use practices.
This gap is not abstract. It creates real clinical risk: fabricated citations informing management decisions, outdated guideline summaries presented with current-day confidence, and patient-identifying information entering non-HIPAA-compliant systems. These are not hypothetical — they are documented, recurrent, and preventable.
AI systems, like humans, are not 100% perfect at flagging dangerous drug interactions, and may provide incorrect dosing information, or generate confident responses to clinical questions where the correct answer is genuinely uncertain.1 Clinicians need to confirm any AI-generated solutions and answers — and vice versa.
Across medicine, the introduction of new technologies — electronic health records, laparoscopic surgery, point-of-care ultrasound — has consistently required formal competency frameworks before widespread clinical adoption. AI is no different, and arguably demands more systematic training because its failure modes are less visible and more linguistically persuasive than any prior clinical tool.
Large language models have passed USMLE Step examinations, outperformed physicians on certain diagnostic tasks, and demonstrated remarkable facility with medical language.2,3 This capability is real and clinically significant. But capability in a benchmark environment does not equal reliability in a clinical one. LLMs also:
The same model that is remarkably useful when used correctly is a source of harm when used by someone who does not know its limits. AI literacy is the bridge between the two.
The American Medical Association, ACOG, and multiple specialty societies have begun issuing guidance on AI in clinical practice. As AI becomes embedded in workflows — through EHR documentation tools, diagnostic aids, and patient-facing chatbots — AI literacy will become part of the expected standard of care. Clinicians who develop this competency now are ahead of a requirement that is coming.
This course presents 10 foundational competencies for the safe, effective clinical use of AI — developed at Medical AI Competence (MedicalAICompetence.com). Each module covers one competency in depth: the underlying concept, its clinical implications, recognizable failure patterns, and practical application. The course concludes with a 10-question assessed examination; a score of 70% or above earns a certificate of completion.
The framework is designed for physicians, residents, nurses, certified nurse-midwives, and advanced practice providers. No prior technical background in AI is assumed. The goal is not to produce AI engineers — it is to produce clinicians who know how to use AI safely, recognize when it is failing them, and protect their patients in the process.
A Large Language Model is a probabilistic text predictor, not a medical knowledge database or a clinical expert system. It was trained to predict the most statistically likely next word given the words that came before — across enormous volumes of human-generated text.
This architecture has remarkable properties: it can summarize, synthesize, translate, and reason through complex problems. But it has fundamental limitations that every clinician must internalize.
LLMs use probabilistic sampling, meaning the same question can produce different answers on different occasions. This is intentional — it prevents every response from being identical — but it means LLM output is not reproducible in the scientific sense.
Ask an LLM the same drug dosing question three times and you may get three slightly different answers. This is not a malfunction. It is the expected behavior of a probabilistic system. Treat every clinical output as a draft requiring verification, not a fixed reference.
This is the single most dangerous misconception in clinical AI use. LLMs produce grammatically correct, medically plausible, authoritative-sounding text even when the content is factually wrong, outdated, or fabricated.
The confident, clinical prose style of AI output activates the same cognitive heuristics humans use to assess expert communication — which means we are neurologically inclined to trust it more than we should.
Fluency is a measure of text quality, not medical accuracy. A beautifully written paragraph citing a nonexistent study is still wrong. Train yourself to read AI output with the same skeptical eye you'd apply to a medical student's undocumented clinical statement.
Every LLM has a training data cutoff date. Medical guidelines, drug approvals, and clinical evidence evolve continuously. When you ask an LLM about ACOG recommendations, it answers based on what was published before its training ended — and may have no awareness that a practice bulletin was revised six months ago.
Claude (claude.ai) is one of the most capable AI tools available to clinicians right now. But when you first open it, it knows nothing about you — your specialty, your patients, how you like to communicate, or what level of clinical depth you need. A few one-time setup steps change this completely. Here is exactly what to configure, where to find each setting, and why it matters.
The free version of Claude gives you basic conversations with a smaller AI model. It has no memory between sessions, limited context (it "forgets" earlier parts of long conversations), and no ability to search the web or run calculations. For serious clinical use — long documents, literature searching, memory of your preferences, better reasoning — Claude Pro ($20/month) is the practical minimum. Everything in this setup guide applies to Claude Pro unless otherwise noted.
On desktop: look at the bottom-left corner of the screen — click your name or profile icon, then click Settings.
On mobile (iPhone/Android app): tap the menu icon (three lines, top-left), then tap your name or Settings.
Important note: Claude's interface is updated regularly. If something looks slightly different from this description, look for the same words — the features exist, they may just be in a slightly different location. When in doubt, type "where do I find settings?" directly into Claude and it will tell you.
Custom Instructions are a block of text you write once that Claude reads automatically at the start of every conversation. Think of it as briefing a new colleague on who you are before they start working with you — except you only have to do it once and it applies forever.
Where to find it: Settings → “Profile” or “User Preferences”
Write your instructions in plain English. There is no special format required. Here is a practical example for an ObGyn clinician:
The result: Claude will automatically apply this clinical context to every conversation without you having to repeat yourself. This alone eliminates the most common frustration with AI — generic, overly cautious responses that treat you like a layperson.
Memory is different from Custom Instructions. Custom Instructions are rules you write once and they stay fixed. Memory is a living record of facts that Claude learns about you over time, automatically, from your conversations.
For example: if you mention in a conversation that you work at a community hospital, Claude stores that and uses it in future conversations without you repeating it. Over time, Claude builds up a useful profile of your work context, preferences, and common tasks.
How to manage memory:
When you are inside a conversation, look at the bottom of the text input box. You will see a row of small icons. These are your tools — they give Claude additional capabilities beyond just answering questions from its training data.
The icons may look slightly different or be labeled differently depending on when you are reading this — Claude's interface is updated regularly. The features are always there; look at the bottom toolbar of the text input area and hover over each icon to see what it does.
Projects are separate workspaces where you can group related conversations and give Claude a specific set of instructions and reference documents for that context. Think of each Project as a specialized assistant for a specific part of your work.
How to create one: In the left sidebar, look for “New Project” or a “+” button near Projects. Give it a name. Inside the project you can add Project Instructions (instructions that only apply within this project) and upload documents that Claude will use as reference material.
Practical clinical examples:
When you write a prompt that produces exactly the result you wanted, save it somewhere — a note on your phone, a document on your computer, or inside a Project in Claude. Over time this becomes your personal clinical prompt library.
The goal is to never have to re-engineer a good prompt from scratch. If "summarize this article in 3 sections at 7th-grade level, flag any claims that need verification" worked perfectly once, paste it every time. Prompts are reusable tools.
AI performs extraordinarily well for some clinical tasks and unreliably for others. The difference is not the AI — it's the nature of the task. A question requiring synthesis of general knowledge plays to AI's strengths. A question requiring precise retrieval of a specific current guideline exposes its weaknesses.
Competency 3 is the skill of knowing which kind of question you are asking before you ask it.
An OB resident asks an LLM: "What is the ACOG recommendation for GBS prophylaxis in a penicillin-allergic patient with low anaphylaxis risk?" This is a specific guideline retrieval task. The AI may answer correctly — or may generate a plausible-sounding but incorrect regimen. This answer must be verified against the current ACOG/CDC GBS guideline before acting. This is non-negotiable.
Clinicians currently access AI through three broad categories of platforms, each with different capabilities, privacy guarantees, and appropriate use contexts. Choosing incorrectly exposes patients to privacy risk and clinicians to liability.
HIPAA applies whenever Protected Health Information (PHI) is involved. PHI includes names, dates, geographic data, contact information, MRNs, and any information that could identify a specific patient, either alone or in combination.
Never enter PHI into a consumer AI platform. This applies even if you intend to "de-identify" it yourself in real time. The practical definition: if a determined person with access to your inputs could re-identify the patient, it is PHI. Use synthetic or fully de-identified information, or use a platform with a signed Business Associate Agreement (BAA).
No AI output should be used in clinical practice without verification of the specific factual claims it contains. This is not a sign that AI is unreliable — it is the appropriate professional standard, analogous to verifying a trainee's management plan against primary references before signing off on it.
The practical question is not whether to verify, but how much and how quickly — calibrated to the potential consequence of an error.
LLMs frequently generate citations that appear real but do not exist — fabricated author combinations, invented journal volumes, or real journal names attached to fictional articles. This is not intentional deception; it is the predictable output of a system that generates plausible-sounding text without ground truth access.
Multiple studies have documented high rates of AI-generated citation fabrication in medical contexts. A 2023 study found that 30-47% of AI-generated references in legal and academic contexts were partially or fully fabricated. Clinical rates are similar. Every AI-generated citation must be verified in PubMed before use in any professional document.
Search PubMed by PMID if provided, or by first author + keywords + year. Confirm: authors match, title matches, journal matches, volume/issue/pages match, DOI resolves. If any element fails, the reference should be flagged and not used.
AI output quality is directly proportional to input quality. An LLM cannot compensate for vague or incomplete clinical framing — it will confidently fill gaps with plausible-sounding but potentially incorrect assumptions. In clinical practice, those assumptions can lead to harm.
The clinician's most important AI skill is not knowing which model to use — it is knowing how to structure what you give it.
GIGO applies with extra clinical weight: Garbage In, Garbage Out. In consumer tasks, a bad AI answer wastes time. In clinical tasks, it may influence management, documentation, or counseling.
Use this structure to build effective clinical prompts. Each element reduces the probability of a clinically misleading response.
"What should I do with this patient who has hypertension?"
AI cannot distinguish gestational hypertension from chronic hypertension from preeclampsia. Any management suggestion could be wrong and dangerous.
"I am an OB attending. My patient is a 34-year-old G2P1 at 36+4 weeks with new-onset BP readings of 152/98 and 148/96, 6 hours apart, proteinuria 2+, no other symptoms. Summarize the ACOG diagnostic criteria for preeclampsia without severe features and list the management decision points per current ACOG guidance. Flag any areas of clinical controversy."
Patient name, date of birth, MRN, or any identifying information in a consumer AI platform (Claude.ai, ChatGPT, Gemini). Use synthetic or de-identified information only.
The word "agentic" simply means that the AI performs a sequence of steps toward a goal, rather than answering a single question. Instead of one prompt → one answer, you give the AI a goal and it figures out and executes multiple steps to reach it — searching the web, reading documents, running calculations, writing code — before delivering the final result.
Agentic AI is not a separate platform or product you need to find and sign up for. It is already built into the same Claude Pro, ChatGPT Plus, and other paid AI tools you may already have. The difference is not which app you use — it is whether you have turned on the right tools and given the AI a multi-step goal rather than a single question.
When you turn on Web Search and give Claude a multi-step goal, it becomes agentic. Example: "Search for the three most recent RCTs on low-dose aspirin for preeclampsia prevention, summarize each one, then give me a paragraph synthesis with implications for my practice."
Claude searches the web multiple times, reads the results, decides what to keep, summarizes each paper, then writes the synthesis — all in one go. You gave one goal; it executed many steps. That is agentic AI. You did not install anything new.
Available in Claude Pro and ChatGPT Plus. You give a research question; the AI runs 10–30 sequential web searches over several minutes, reads and evaluates sources, resolves contradictions, and produces a structured report with citations. To use it in Claude: type your question, then look for a “Deep Research” button or option before hitting send. In ChatGPT Plus: look for the research option in the same area. This is a fully autonomous multi-step workflow — you provide the question, the AI does everything else.
If your hospital uses Epic with AI documentation tools, you are already using agentic AI. The system listens to your patient encounter, transcribes it in real time, extracts the HPI, exam, assessment, and plan, structures them into a SOAP note, and drafts it for your review — all automatically from a single trigger (starting the recording). You set the goal (document this encounter); the AI executes all the steps. No prompting required.
Even in a basic conversation, you can create an agentic-style workflow by writing a prompt that tells Claude to complete a series of steps in order before delivering the final answer. Example: "First summarize this patient context. Then identify the three most relevant risk factors. Then draft a patient counseling letter addressing those specific risk factors at 7th-grade reading level. Do all three steps before giving me the letter." No special features required — just a well-structured multi-step prompt.
Agentic workflows amplify both efficiency and error propagation. An error in step 1 (wrong patient context, wrong retrieved guideline) may not be visible until the step 4 output has already been shaped by it. The longer the chain, the more important early verification.
As AI workflows become more autonomous, the clinician's role shifts from executor to supervisor. This is a higher-order skill. Effective supervision requires understanding each step's purpose, the errors each step might introduce, and the verification required before any output is used clinically.
Vibe coding is the practice of creating functional software — calculators, forms, trackers, decision-support tools — by describing what you want in plain language and iterating conversationally with an AI until the result works. The name reflects the approach: you describe the feel, function, and purpose of the tool you want; the AI writes the code. You never touch the code directly.
This is not a theoretical future capability. It is available now in Claude Pro, ChatGPT Plus, and similar paid tiers. The tools at ObGyn Intelligence (tools.obmd.com) — risk calculators, informed consent generators, screening tools, and interactive educational modules including this course — are built entirely this way. No programming background required.
Until recently, building a custom clinical tool required hiring a developer ($5,000–$50,000+), waiting months, and depending on someone else to maintain it. Vibe coding compresses this to hours, at near-zero cost. A clinician who identifies a workflow gap on Monday can have a working prototype by Tuesday. This democratizes clinical tool development in a way nothing has before.
One of the most powerful but underused prompting techniques is instructing the AI to gather information through sequential questions before generating its output — rather than attempting a response with incomplete data. This mirrors how expert clinicians actually reason: structured data collection first, assessment second.
This technique is particularly valuable for complex clinical assessments where missing one piece of information changes the entire interpretation. The AI becomes an interactive clinical framework, not a static text generator.
Instruct the AI explicitly: "Do not give me an assessment yet. Instead, ask me one question at a time about [clinical topic]. After I answer each question, ask the next one. When you have gathered all the information you need, give me a structured assessment."
Copy and paste this prompt to have Claude guide you through a systematic CTG interpretation:
This prompt template is reusable. Modify the clinical domain to create structured assessment prompts for any complex scenario: shoulder dystocia management, postpartum hemorrhage staging, eclampsia protocols, or neonatal resuscitation decision trees. The principle — sequential structured questioning before assessment — produces dramatically more accurate and clinically useful AI outputs than presenting all information in a single unstructured block.
Structured sequential input forces the AI to weight each clinical variable appropriately before synthesizing. It also forces you to observe each feature systematically — which is itself the educational value of the exercise.
AI failures are not random — they follow recognizable patterns. Training yourself to anticipate these patterns is the clinical equivalent of knowing the common complications of a procedure: you cannot prevent all of them, but you can watch for them and catch them early.
Sycophancy is among the most clinically dangerous failure modes because it is invisible unless you test for it. An LLM trained to be helpful and agreeable will tend to validate your framing — including wrong framings. This mirrors a phenomenon in medical education called "premature closure," except the AI will never push back on its own.
"My patient at 37+2 weeks is full term and I'm planning induction. What are the benefits?"
An AI is likely to answer this question supportively — describing induction benefits — rather than correcting the terminology (37+2 is early term, not full term per ACOG). This is sycophancy in practice.
Deliberately test AI outputs with adversarial prompts: "What would be the strongest argument against this plan?" or "Under what circumstances would this recommendation be wrong?" This forces the model out of validation mode.
The most effective clinical AI use is not replacement of clinical reasoning — it is augmentation of it. Think of AI as a highly read, tireless, linguistically fluent colleague who has never examined a patient, cannot observe clinical context, and carries no legal or professional accountability for what they suggest.
You bring the clinical examination, the patient relationship, the institutional context, the professional accountability, and the judgment. AI brings breadth of text, speed of synthesis, and tireless availability.
Physician accountability is non-delegable. No level of AI capability transfers professional or legal responsibility for a clinical decision. The clinician who acts on AI output owns the outcome of that action.
A well-documented cognitive science phenomenon: when we rely on external tools for mental tasks, we gradually lose the internal capacity to perform those tasks without the tool. This is acceptable for calculator arithmetic — it may not be acceptable for clinical reasoning skills that must remain available in an emergency, at 3 AM, without internet access.
Documentation drafts, patient education materials, coding suggestions, literature search starting points, administrative templates
Differential diagnosis generation, management option lists, guideline summaries — AI provides the draft, your clinical assessment determines the plan
Drug doses, specific guideline thresholds, time-sensitive management decisions, any situation where your clinical assessment contradicts the AI output. Your clinical judgment is more reliable here than the text predictor.
The average health literacy of US adults is approximately 8th-grade level, yet most standard informed consent documents are written at 12th-grade level or above. AI can translate complex clinical information into accessible language with precision and consistency — one of its most clinically valuable and underutilized applications.
AI-generated plain-language summaries of complex procedures, diagnoses, and treatment options can reduce the comprehension gap that underlies many informed consent failures. The translation must be verified by the clinician for accuracy before use.
Informed consent requires that patients receive information in a form they can understand and process. AI can help generate:
AI-generated patient materials require clinician review before delivery. Errors in patient-facing materials — particularly regarding risk percentages, procedure outcomes, or medication instructions — can directly harm patients and create significant liability exposure.
Patients increasingly arrive with AI-generated information — from Claude.ai, ChatGPT, Google Gemini, or health-specific chatbots. This information may be accurate, outdated, exaggerated, or simply wrong.
A productive clinical response is not to dismiss AI-sourced information but to engage with it specifically: acknowledge what is accurate, correct what is not, and explain the source of the discrepancy in plain language. AI can help you draft these explanations.
"What you read is partially accurate — [correct part]. The concern is that [specific correction]. Here's why your care plan is based on something different: [brief explanation]. The guideline I'm working from is [source]."
HIPAA's Privacy and Security Rules apply to the use of AI tools when PHI is involved. Consumer AI platforms (Claude.ai, ChatGPT, Gemini) are not HIPAA-covered entities and do not sign Business Associate Agreements with individual clinicians. Inputting PHI into these platforms constitutes a potential HIPAA violation, regardless of intent.
De-identify completely before using consumer AI: replace name with "Ms. A," remove specific dates (use relative timing), remove geographic details, use age range rather than exact age. When in doubt, use your EHR-integrated AI tool or an enterprise platform with BAA.
Emerging professional standards increasingly require clinicians to document when AI was used in clinical care. The rationale: transparency supports accountability, allows audit of AI influence on outcomes, and protects the clinician from liability by demonstrating that human judgment was applied.
Current legal analysis places liability for AI-assisted clinical decisions squarely on the clinician who acted on those decisions. There is no viable "the AI told me to" defense in medical malpractice. The professional standard of care does not lower because an AI was involved — it may be argued to increase if AI use was careless or not appropriately supervised.
Informed consent for AI involvement in care is an evolving ethical obligation. Patients have a reasonable expectation of knowing whether AI systems are participating in their diagnostic or treatment planning process. As AI becomes more integrated into clinical workflows, disclosure standards will develop. Adopt a posture of transparency now.
Individual AI competency is necessary but not sufficient for safe clinical AI adoption. When AI tools operate without institutional governance, individual variation in use creates unequal care, unmanaged liability, and undetected errors. A single hallucinated drug dose that reaches a patient is a governance failure as much as an individual failure.
The standard of care in medicine is defined by what a reasonably prudent practitioner would do under similar circumstances. As AI becomes embedded in clinical workflows, two opposing risks emerge:
Clinicians who develop AI literacy now — and document it — are better positioned for both the current liability environment and the evolving standard of care. AI literacy is not optional; it is a professional competency of the 21st-century clinician.
This course synthesizes published evidence on AI literacy, clinical AI safety, and LLM performance in medical contexts. Vancouver format. All citations verified where noted.
10 multiple-choice questions covering the full competency framework. Score ≥70% (7/10) to earn your certificate of completion.
Estimated time: 8–10 minutes