More than a third of medical references generated by some AI platforms may be fabricated, new study finds

14 Apr 2026

Researchers warn fabricated citations in AI-generated surgical information could make it difficult for users to verify medical advice

More than a third of medical references generated by some widely used artificial intelligence (AI) platforms may be fabricated, according to a new study published in The Annals of the Royal College of Surgeons of England.

The research found that popular AI chatbots answering common surgical health questions sometimes produced ‘hallucinated’ citations – references to sources that do not exist. As more patients turn to AI tools to understand symptoms, explore diagnoses and seek medical advice, the authors warn that fabricated references undermine users’ ability to check whether information is accurate or evidence-based.

The study identified large discrepancies between AI platforms in both the quantity and reliability of the references they produced. In the worst-performing models, more than a third (34%) of references were fabricated or unverifiable, while several others produced no hallucinated references at all.

Some fabricated citations closely resembled legitimate scientific literature, with plausible article titles, invented URLs, and attributions to reputable and well-known institutions such as the Mayo Clinic. Users attempting to follow up these references may find they do not exist or that they fail to support the information provided – making it difficult to distinguish fabricated sources from genuine medical evidence.

Ahead of the Future of Surgery Festival later this month, the research, entitled ‘Trust, Truth, and Transparency: Analysing the References Underpinning AI-Generated Surgical Information’, is one of the first comprehensive analyses of the accuracy, quality and transparency of references generated by major AI platforms.

The Future of Surgery Festival, hosted by the Royal College of Surgeons of England, is a first-of-its-kind event bringing together surgeons, trainees and healthcare professionals from across specialties to explore the future of surgical practice, including the growing role of artificial intelligence and digital technologies in healthcare.

For the study, the researchers analysed responses from ChatGPT, Google Gemini, Grok (by X), DeepSeek and other AI systems when asked typical surgical questions, including the symptoms of appendicitis, the risks of gallbladder removal, and alternatives to colon cancer surgery.

In total, 108 chatbot responses containing 1,249 references were analysed and assessed for accuracy, reliability, accessibility, and transparency.

The study’s key findings:

• 25% and 34% of references from some AI models were fabricated or ‘hallucinated’.
• Several AI platforms only produced references when explicitly prompted, limiting users’ ability to verify information.
• Standard and enhanced reasoning models performed very differently, with advanced systems generating higher quality references, suggesting the most reliable information may only be available via premium or subscription-based tools. Around half of cited sources originated from the US or UK, suggesting potential gaps where local clinical guidelines may differ.
• Many cited sources were behind academic paywalls, limiting users’ ability to cross-check AI-generated medical advice.

The authors say the findings highlight both the potential and the current limitations of using AI systems to generate medical information.

Dr Rickvir Sidhu, Dr Arrane Selvamogan and Mr Alex Boddy, who co-authored the research, said:

“AI chatbots are becoming a fantastic starting point for patients who want to understand their condition and feel more confident in the decisions they make about treatment. They can help people explore options, prepare questions, and come to their appointment feeling better informed.

“But it’s important that patients treat chatbot output with caution. Not all information is accurate or evidence-based, and some platforms may generate references that simply don’t exist. These tools should support conversations with your surgeon—not replace them. The safest decisions come from combining good digital information with expert clinical advice.”

Mr Tim Mitchell, President of the Royal College of Surgeons of England (RCS England), said:

“The rapid expansion of artificial intelligence in healthcare is a promising prospect which brings invaluable opportunities for improvements in patient care, but also ethical challenges. As this research shows, inaccuracies and ‘hallucinated’ references remain a real concern, particularly when users rely solely on free tools such as these AI platforms for health advice.

“The excitement around using AI-generated information must be matched with caution, by both patients and doctors. These tools can support understanding but must not replace critical appraisal or evidence-based practice, which underpin informed decision-making and safe patient care.

“As AI evolves, improving transparency, accountability and the reliability of references must be a priority to ensure patient care is enhanced, not compromised.”

ENDS

Notes to editors:
1. https://publishing.rcseng.ac.uk/doi/10.1308/rcsann.2026.0021
2. The Future of Surgery Festival will take place on 20-21 April 2026 at the International Convention Centre in Birmingham.
3. The Royal College of Surgeons of England provides world-class education, assessment, and development to 30,000 surgeons, dental professionals, and members of the wider surgical and dental care teams, at all stages of their careers. Our vision is to see excellent surgical care for everyone. We do this by setting professional standards, facilitating research and championing the best outcomes for patients.
4. For more information, please contact the RCS England press office: telephone: 020 7869 6053/6054/6060; email: pressoffice@rcseng.ac.uk; out-of-hours media enquiries: 0207 869 6056.

Share this page:

Print this page