"What is lupus?" "How long does the flu last?" "How do you treat piles?" These are some of the most common health questions people are asking ChatGPT.
The popularity of large language models (LLMs) like ChatGPT for giving personalized health advice is growing. One in ten Australians now use the platform to ask medical questions, according to a survey of about 2,000 Australians conducted in mid-2024.
The study, published on Tuesday, found that almost two out ofthree people (61%) who use ChatGPT for medical advice ask questions that usually require clinical advice.
"AI tools are popular because they can give quick answers to any question. [However], as with all these tools, there is a there is always a risk that they might give the wrong answer," said study author Julie Ayre from the University of Sydney.
With so many people using AI models to ask about their health conditions, can they be trusted? DW investigates.
How reliable is ChatGPT at diagnosing medical issues?
Researchers are building a scientific consensus around the (un)reliability of medical advice from LLMs, however findings quickly become outdated as new models with better algorithms are released and updated.
One study in 2024 challenged ChatGPT3.5 with 150 medical cases — including patient history, symptoms, and hospital test data — and asked the AI to make diagnoses and a treatment plan.
The results weren't great. ChatGPT only correctly gave the right diagnosis and treatment plan 49% of the time, making it an unreliable tool. The authors concluded that ChatGPT "does not necessarily give factual correctness, despite the vast amount of information it was trained on."
Another study concluded that ChatGPT "did not reliably offer appropriate and personalized medical advice," but could provide suitable background information to medical questions.
When researchers assessed the quality of medical information on ChatGPT in a study in 2023, they asked ChatGPT3.5 "why do you need to treat jaundice caused by gallstone disease?" It answered that alleviating jaundice improves how a patient looks and that improves self-esteem.
"That's really not the clinical rationale," said Sebastian Staubli, a surgeon at Royal Free London NHS Foundation Trust, UK, who led the study.
The newer ChatGPT4.0 gives better answers to the question, highlighting the need to prevent organ damage and disease progression.
LLMs regurgitate but don't understand information
The issue with ChatGPT is that although its medical advice is not completely incorrect, it is also not entirely precise.
The quality of information an AI model is trained on determines the quality of its medical advice. The problem is that no one knows exactly what information specific models are trained on.
LLMs like ChatGPT "use pretty much any information gathered by data crawlers, which harvest information from the Internet," Staubli told DW.
This includes scientifically and medically validated information from health institutions like the NHS or the WHO. But it also can incorporate unreliable information from Reddit posts, poorly researched health articles, and Wikipedia articles.
"The big problem is that if you have lots of wrong or outdated information, it carries a lot of weight in the AI model, and it will think this is the correct answer. It can't understand that new information could be the correct answer," said Staubli.
The ways LLMs learn and process information are fundamentally different to how human intelligence works.
AI cannot solve problems, make deductive analyses, or make weighted judgments like the human mind can. Instead, AI "learns" vast amounts of information, then regurgitates that information when prompted.
"At the end of the day, LLMs are statistically predicting the next most likely word. That's why they regurgitate what they find most often [on the Internet]," said Staubli.
Bad information online gets reinforced just as often as good information, but the AI model can't tell the difference.
AI won't replace human healthcare professionals anytime soon
Despite their flaws, LLMs can be very helpful for people who want to understand their health conditions better. Their strengths lie in simplifying health information and explaining medical jargon, and their accuracy for general health questions has improved over time.
Ayre said their Australian study found that the proportion of people using ChatGPT for medical advice was higher in people who face challenges in accessing and understanding health information, like people with "low health literacy, and people from culturally and linguistically diverse communities."
Staubli too said that LLMs "empower patients and make them more knowledgeable about their health conditions."
"However, patients must understand, and most do, that the quality of information can be flawed."
AI does not understand or inform users about which medical information is evidence-based, which is controversial, or even which information represents a standard of care.
That's why a conversation with a healthcare professional still cannot be replaced by any AI, Staubli said.
ChatGPT echoed Staubli when prompted about the reliability of its medical advice, saying "While I can provide general information about medical topics and explain health concepts, I'm not a substitute for professional medical advice."