The time has not yet come for people to seek all the answers about health and disease through ChatGPT and other forms of artificial intelligence. People working with diabetes can still tell whether a computer program or a person has answered a question about diabetes, and sometimes the computer program even answers incorrectly.
People with diabetes ask such questions as “How much fruit can I eat per day?” and “How should I store insulin on long journeys?”.
They often ask a doctor or other healthcare professional or search for answers from the Danish Diabetes Association’s website or the Danish Diabetes Knowledge Center hosted by Steno Diabetes Center Copenhagen.
Other people ask ChatGPT or another language model.
A new study shows that asking ChatGPT may not yet be the best idea.
Large language models do not always answer questions correctly, and people working with diabetes can distinguish between human answers and answers from a computer.
“Models such as ChatGPT were not developed for clinical purposes, and you should therefore be careful about accepting the answers. Nevertheless, there are excellent opportunities to use artificial intelligence in other ways within diabetes, where the models do not have to advise people with diabetes but instead can help make knowledge about diabetes more accessible,” explains Adam Hulman, Senior Data Scientist and leader of an artificial intelligence research group at Steno Diabetes Center Aarhus and Associate Professor, Department of Public Health, Aarhus University.
The research has been published in PLOS ONE.
ChatGPT can pass a medical licensing examination
The researchers wanted to determine whether large language models can answer questions about diabetes such that the answers cannot be distinguished from those of diabetes healthcare professionals.
Various studies and tests of ChatGPT’s abilities have shown that it can pass the United States Medical Licensing Examination, which includes multiple-choice questions.
In the new study, the researchers wanted to determine whether ChatGPT can also answer questions related to diabetes with no multiple-choice options and whether people working with diabetes can distinguish between the answers ChatGPT provides and the answers available from the Danish Diabetes Association’s website or the Danish Diabetes Knowledge Center. The participants were all 183 employees of Steno Diabetes Center Aarhus, not exclusively clinical personnel but also non-clinical personnel.
The researchers asked ChatGPT to answer 10 frequently asked questions about diabetes. The researchers obtained both the questions and the correct answers from the Danish Diabetes Association and the Danish Diabetes Knowledge Center.
The researchers then asked the 183 respondents to guess which of the two answers to the same question was provided by ChatGPT and which by a healthcare professional.
Adam Hulman explains that confirming the researchers’ hypothesis that ChatGPT would be as good at answering common questions about diabetes as a healthcare professional would require that the respondents’ guesses be correct 50% of the time.
“Percentages exceeding 50% would mean that the respondents could identify the answers from ChatGPT as originating from a computer language model,” he says.
ChatGPT could not fool the experts
The results show that ChatGPT cannot yet fool people working with diabetes into thinking it knows as much as they do.
The respondents correctly guessed that ChatGPT had answered a question in 59.5% of the cases.
This figure rose to 61% among the respondents with clinical contact with people with diabetes and to 67% among those who had previously used ChatGPT.
According to Adam Hulman, this indicates that mainly ChatGPT’s language reveals that ChatGPT is providing the answers and not a healthcare professional.
ChatGPT answers incorrectly
The study also found that ChatGPT was off the mark on two questions, which is also why knowledge about diabetes should still be obtained from reliable sources and not the infinite Internet.
ChatGPT said that gestational diabetes is a form of type 2 diabetes, which is incorrect.
Second, ChatGPT reversed the answers to a question about how prolonged and intense exercise affects blood glucose.
Many of the respondents guessed that first incorrect answer came from ChatGPT, but fewer respondents guessed correctly for the second incorrect answer.
“However, for the question about storing insulin on long journeys, most respondents guessed incorrectly, because 62% thought that the correct answer came from ChatGPT and vice versa,” explains Adam Hulman.
Developing a model to summarise scientific articles
According to Adam Hulman, the study is a step towards improving understanding of what large language models can currently be used for and what their limitations are.
Many people may already seek answers to diabetes-related questions using ChatGPT, but Adam Hulman says that the answers do not always match what a healthcare professional would have answered.
Nevertheless, large language models can be very valuable in diabetes within topics that are slightly further removed from the people with diabetes.
For example, the researchers from Steno Diabetes Center Aarhus and the Danish Diabetes Knowledge Center are collaborating on a digital tool based on large language models. This tool will not directly advise patients but instead make knowledge from scientific studies available to laypeople by extracting the relevant results and issues and writing a summary in plain language.
“ChatGPT is not going to be the first artificial intelligence tool to be used in diabetes, but large language models can make a valuable contribution in other low-risk areas. This also means that artificial intelligence should be implemented in phases, so that both clinicians and laypeople become comfortable with the technology. These will be the first steps towards asking artificial intelligence for advice on specific questions regarding illness,” concludes Adam Hulman.