Computer model uses health data to predict who might develop pancreatic cancer

Therapy Breakthroughs 9. jul 2023 3 min Research director, professor Søren Brunak Written by Kristian Sjøgren

A new deep-learning algorithm uses machine learning to identify people with an increased risk of developing pancreatic cancer based on data on their health history. A researcher says that this will make screening people for this serious disease easier and improve survival.

Interested in Therapy Breakthroughs? We can keep you updated for free.

Follow Therapy Breakthroughs

Research director, professor

Søren Brunak

Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research

Follow Søren

Pancreatic cancer is a very serious disease, with only 11% of the people diagnosed surviving more than 5 years.

One reason for the poor survival is the unspecific symptoms, which creates difficulty in identifying it early.

However, a machine-learning model can plough through the data on people’s health history and identify people with the highest risk of developing pancreatic cancer.

A researcher behind the development of the deep-learning algorithm says that this can help to promote early screening, diagnosis and treatment and hopefully also improve survival for the people with pancreatic cancer.

“The model can potentially identify the people with the greatest risk of developing pancreatic cancer, and then they can be invited for medical work-up. This means that everyone does not have to be examined, since by identifying high-risk individuals, we can hopefully intervene selectively before the disease spreads so much that treatment options are highly limited,” explains Søren Brunak, Professor and Research Director, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen.

The research has been published in Nature Medicine.

Trained a machine-learning model on data from millions of people

The researchers developed and trained a deep-learning algorithm that uses machine learning to identify patterns in health data.

The data used in developing the algorithm is from the Danish National Patient Registry, which contains data on millions of people in Denmark since 1977. The Registry holds data on all hospital contacts, including broken arms, head injuries, serious stomach pain and diabetes.

Using the data, the researchers found that 24,000 people had been diagnosed with pancreatic cancer from 1977 to 2018, and they then asked the algorithm to find patterns in the diagnostic codes that led to the diagnosis. This involved the sequence of the health history and the timing of the various diagnoses in relation to each other.

The researchers trained the algorithm on most of the data but withheld data from 600,000 people to test whether it could subsequently identify the people who actually developed pancreatic cancer.

“People with a broken arm are rarely in doubt about the diagnosis, but the symptoms of pancreatic cancer are much more unspecific and can include stomach pain and other symptoms resulting from other diseases and disorders. A computer model can find weaker patterns in data because it analyses data from millions of people and can thus identify some people at high risk that a doctor might not pick up in the same way,” says Søren Brunak.

Model identifies individuals at high risk

When the researchers had trained the model, they tested it on the withheld data from the 600,000 people and found that it very accurately identified the people who developed pancreatic cancer.

Of the 1,000 people the algorithm assessed as having the greatest risk of developing pancreatic cancer, 320 developed it.

The researchers also validated the algorithm by using data from 3 million military veterans in the United States, which strengthens its quality potential not only in Denmark but also elsewhere.

“One great strength of the algorithm and of the study is that we showed that the model can be used on data from two very different countries,” explains Søren Brunak.

He also notes that how the model can or will be used is a political question.

“Solely focusing on the 1,000 people that the algorithm predicts as having the greatest risk of developing pancreatic cancer will detect many cases and have relatively few false-positive results. But the numbers could also be expanded to test the 10,000 or 100,000 people identified at greatest risk. This would lead to more screening but would also detect more people with pancreatic cancer,” says Søren Brunak.

Helping doctors to become more aware of pancreatic cancer

Although the algorithm is already performing well in identifying people at high risk of pancreatic cancer, the prognostic value of the model can be improved even further.

Søren Brunak calls the current model a prototype, whose prognostic value is at the lower limit of the potential if it is trained on even more data from other sources.

Further developing the algorithm with even more data from, for example, general practitioners, laboratory results, socioeconomic data, genetic data and data from computed tomography and X-rays, can improve the predictive value even more.

The model can thus be used not only to identify people at high risk of developing pancreatic cancer but also to make doctors more aware of other features associated with pancreatic cancer as a disease.

For example, the algorithm identified some diagnostic codes that appear to be associated with the risk of developing pancreatic cancer that were not well characterised chronologically – including gallstones, acid reflux and stomach catarrh.

The algorithm also identified quite similar features in the data from both Denmark and the United States well despite all the differences in diagnostic coding.

There were also differences. For example, the algorithm examined the use of opioids in its risk assessment for the United States but not for Denmark, although whether using opioids is associated with developing pancreatic cancer is unknown.

“We do not assume that one computer model can be used in all countries. Instead, we imagine one that needs to be trained and validated on data from each country to be able to identify people at high risk of developing pancreatic cancer in that country,” concludes Søren Brunak.

Follow Therapy Breakthroughs

“A deep learning algorithm to predict risk of pancreatic cancer from disease trajectories” has been published in Nature Medicine. Several authors are affiliated with the Novo Nordisk Foundation Center for Protein Research, University of Copenhagen.

Research director, professor

Søren Brunak

Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research

Follow Søren

Søren Brunak is a leading pioneer in the biomedical sciences through invention and introduction of new computational strategies for analysis of biomed...

Therapy Breakthroughs

28. apr 2020 4 min

Computer model uses health data to predict who might develop pancreatic cancer

A new deep-learning algorithm uses machine learning to identify people with an increased risk of developing pancreatic cancer based on data on their health history. A researcher says that this will make screening people for this serious disease easier and improve survival.

Interested in Therapy Breakthroughs? We can keep you updated for free.

Søren Brunak

Trained a machine-learning model on data from millions of people

Model identifies individuals at high risk

Helping doctors to become more aware of pancreatic cancer

Søren Brunak

Related articles

Corona crisis stresses people with mental health problems

Research consortium aims to discover RNA medicine for treating an emerging liver disease

People with type 2 diabetes develop cardiovascular disease more often and more rapidly

Nurses and patients face challenges in discussing head and neck cancer

Model for predicting which older people will need home care services

Media frenzy about HPV vaccination resulted in a lower MMR vaccination rate

Activating the complement immune system to boost anticancer therapy

Exciting topics

See all 1019

Diet 48

Technology 49

Liver 39

COVID-19 94

Environment 93

Autism 23

DNA 49

Blood 62

Fertility 19

Food 22

Asthma 9

Nanotechnology 28

Screening 32

Microorganisms 37

Metabolism 68

Depression 28

Nerves 26

Biology 25

Influenza 15

Cholesterol 19

Parasites 13

Gut 46

Chromosomes 20

Chemotherapy 13

Pregnancy 56

Organs 25

Kids 70

Genes 176

Stress 29

Disease 44

Antioxidants 4

Vaccine 46

Teeth 7

Podcasts 14

Plastic 10

Evolution 48

Smoking 21

Dementia 13

Plants 42

Language 7

Cystisc fibrosis 13

Chemistry 79

Migraine 9

Mental health 50

Micromolecules 22

Heart 71

Future 1

Brain 116

HPV 13

Antibodies 24

Enzymes 27

Big data 82

Alzheimers 19

Alcohol 27

Psychology 35

Hormone 58

Sleep 22

Eyes 9

Cancer 140

Muscles 39

Stem cells 34

Microbiome 30