New Research Calls for Stronger Safety Tests in Biomedical AI

Members of the HIVE community recently published a new paper in npj Digital Medicine that raises important questions about how to safely use artificial intelligence in medicine. 

Reza Abbasi-Asl, HIVE member and associate professor in the Department of Neurology and the Department of Bioengineering and Therapeutic Sciences at UCSF, led the team that published the article “Robustness tests for biomedical foundation models should tailor to specifications,” with first author Patrick Xian, PhD, a postdoctoral scholar in his lab. 

Biomedical foundation models (BFMs) are large, general-purpose AI systems trained on massive amounts of health-related data. They can be adapted for many different tasks, such as analyzing medical images, predicting disease risk, or supporting clinical decision-making. “As these powerful tools make their way into hospitals and clinics, users must consider the data’s robustness. We’re asking how reliable, consistent, and safe is this data under real-world conditions?” explains Abbasi-Asl. 

In their review of more than 50 BFMs, the authors found that nearly one-third had no robustness testing at all. To close this gap, the team proposes a new framework for evaluation: creating task-specific robustness tests that reflect the actual risks of a given medical setting. This involves identifying the most critical failure scenarios for a clinical task; designing targeted tests that simulate those challenges; and standardizing evaluations to connect high-level AI regulations with real-world implementation. 

“The priority-based approach we recommend is essential for building trust in medical AI and ensuring it can be used safely and effectively,” says Abbasi-Asl. “It’s a crucial step toward creating a standardized, reliable model lifecycle for the next generation of medical AI.”

Date
2025-10-01

Sign up for our newsletter

Support Engineering Innovations at UCSF

Donate