When Enterprise Health AI Makes a Mistake, Who’s Liable?
Medicine has a well-established model for distributing liability across physicians and tools. AI vendors have quietly opted out of it.
This piece focuses on liability in enterprise and clinical AI. For the consumer side of the same question, see When Consumer Health AI Makes a Mistake, Who’s Liable?.
The Usual Model
Physicians have always used tools, and those tools have always had a predictable liability structure: whoever caused the failure bears the liability. When a pathologist misreads a correctly performed biopsy stain, the pathologist bears the liability. When the stain reagent is defective — contaminated, mislabeled, degraded — the manufacturer bears it. Liability follows the actual source of the error. This is how responsibility has traditionally been distributed across medicine.
The same logic applies to lab tests. If a physician prescribes the wrong medication because a lab test was faulty, the liability lies with the lab. If a physician prescribes the wrong medication despite an accurate lab result, the liability lies with the physician. The responsible party is the one whose failure caused the harm.
How AI Departs from This Model
Most AI tools used in clinical settings — general-purpose large language models, purpose-built clinical decision support software, FDA-cleared imaging AI, ambient documentation systems — involve a physician who reviews the output and makes the final clinical decision. In that sense they superficially resemble other physician tools. But their contract terms do something the stain manufacturer and the lab do not: they disclaim responsibility for the output itself.
The traditional liability model would suggest: if the physician uses the AI’s information incorrectly, it is the physician’s fault — but if the AI provides incorrect information (a hallucination — fabricated or incorrect output presented with apparent confidence), the AI vendor bears the liability. That is not the approach these contracts take. If the physician makes the mistake, it is the physician’s fault. If the AI makes the mistake, it is still the physician’s fault. Unlike the defective stain reagent manufacturer or the lab that ran a faulty test, the AI vendor has contracted away responsibility even for its own product’s failures.
“Speed up prior authorization reviews with Claude”
OpenAI’s terms for ChatGPT for Healthcare — a product explicitly marketed for clinical workflows, with HIPAA compliance and BAAs (Business Associate Agreements) — tell clinicians they “should always verify the information provided” and cannot rely “primarily or solely” on the output. Anthropic’s Claude for Healthcare (terms) similarly disclaims accuracy and places verification responsibility entirely on the user. FDA-cleared imaging AI vendors and ambient documentation tools like Nuance DAX and Abridge follow the same pattern.
An example of how this plays out in practice: in Sampson v. HeartWise Health Systems (386 So.3d 411, Ala. 2023), physicians followed cardiac screening software that misclassified a young adult with a family history of congenital heart defects as normal. The patient died. The court dismissed the developer’s liability because the licensing agreement gave physicians final decision-making authority — but it reinstated the negligence claims against the physicians themselves. The vendor escaped; the doctors did not. That is what output-related exclusions do in practice.
Vendors also draw the boundary between “decision-support” and “medical device” very deliberately to avoid higher-tier regulatory scrutiny. OpenAI’s Healthcare terms explicitly prohibit using the tool for real-time analysis of ECG waveforms, genomic sequences, or in vitro diagnostic signals — uses that would trigger FDA medical device classification. This boundary, between “decision support” and “medical device,” is partly a clinical distinction but it is also a legal one: staying on the decision-support side keeps the vendor out of device regulation and, not coincidentally, keeps liability with the physician.
The Confidence Problem
The liability model places full responsibility on the physician to catch AI errors before they harm patients. Whether that is reasonable depends heavily on how detectable those errors are. With a reference book, a physician has confidence cues: a publication date signals currency, a known author signals authority. With a lab test, the false-positive rate is documented and known.
Current AI systems, particularly large language models, strip away most of those cues. They present all outputs with similar fluency and authority regardless of whether the underlying confidence is high, moderate, or essentially nil. A well-grounded answer and a hallucination can be indistinguishable on the screen. The same query may produce different responses on different days. There is no reference list, no publication date, no documented error rate for a given clinical context.
This matters because the liability model assumes the physician can exercise meaningful oversight. When a tool’s failures are invisible — when it sounds equally authoritative whether it knows the answer or doesn’t — that assumption becomes harder to sustain. The physician is being asked to catch errors they have no reliable signal are occurring.
When There Is No Physician in the Loop
The physician-tool model described above presupposes a physician. When AI makes clinical decisions autonomously — without a physician reviewing the output — that model simply does not apply. There is no physician to bear responsibility. The liability question has to be answered some other way.
This is not hypothetical. There are four FDA-cleared autonomous AI diagnostic systems, all for diabetic retinopathy screening: LumineticsCore (Digital Diagnostics), EyeArt (Eyenuk), AEYE-DS (AEYE Health), and Retina-AI Galaxy. And in December 2025, Utah launched the first US program allowing AI to autonomously renew prescriptions without a physician involved in each transaction — developed by Doctronic, covering roughly 190 chronic-condition medications for patients who already have an existing prescription from a human clinician.
Digital Diagnostics and Doctronic have both answered the liability question directly: they accept it. Digital Diagnostics carries medical malpractice insurance for LumineticsCore and assumes liability for injuries arising from misdiagnosis. Doctronic secured a custom malpractice policy through Beazley, a global specialty insurer, explicitly covering the AI’s clinical decisions and holding the system to physician-level accountability standards. Utah’s director of AI policy was plain about the logic: “Generally, if it’s Doctronic’s product, it’s Doctronic’s liability.”
The other FDA-cleared autonomous diagnostic systems have not made their liability terms publicly available. Eyenuk’s website has standard terms of service covering website use; there are no public product terms, no statements about malpractice insurance, no disclosure of who bears liability when EyeArt misdiagnoses a patient. AEYE Health’s website has no visible terms of service at all. These are systems making autonomous diagnostic decisions without physician oversight, and the question of who bears responsibility when they are wrong cannot be answered from publicly available information.
What Buyers Should Ask
There is almost no AI-specific case law yet, and the US has no federal statute on AI medical liability. The insurance market is already moving: some carriers are introducing policy riders for heavy AI use, and European insurers are applying sublimits and higher deductibles for unvalidated tools. Health systems operating in this environment should be asking specific questions before procurement.
For AI used alongside physicians:
What do the output-related exclusions actually cover? Understand that standard contracts place liability with the physician even when the AI’s output is the proximate cause of an error.
What documentation does the vendor require or recommend? Given that courts will ask whether the physician’s decision to follow or reject AI output was reasonable, documentation of that reasoning matters.
What training is in place to help clinicians recognize AI’s confidence problem — the fact that the output sounds equally authoritative whether the AI is certain or hallucinating?
What audit trail does the AI generate? Documentation of whether and how the physician reviewed AI output will matter in litigation.
For autonomous AI without a physician in the loop:
Who bears liability when the system is wrong?
Does the vendor carry malpractice insurance for the AI’s clinical decisions? Digital Diagnostics and Doctronic do. If a vendor cannot answer that question, that is itself informative.
Is the “autonomous vs. decision support” classification based on clinical design or legal positioning?
The Model Worth Watching
Digital Diagnostics has argued publicly that assuming liability is not just an ethical position but a business one: autonomous AI that accepts responsibility for its decisions gets reimbursed at a sustainable rate because it creates genuine, accountable value. CMS validated that argument by establishing a national reimbursement rate specifically for autonomous AI. Doctronic is testing the same thesis in prescription renewal, with a named insurer and a state regulatory framework built around the principle that the product’s creator bears responsibility for what it does.
The rest of the industry has mostly taken the opposite approach: disclaim the output, classify as decision support, and leave the physician holding liability for errors they may have had no reliable way to detect. That approach is legally coherent and, as Sampson shows, courts have upheld it. But it is a significant departure from the model that has governed liability across clinical tools for decades — one that physicians and health system buyers deserve to understand before they sign.