A major Danish research project is using a massive data set about the population of Denmark to determine the progression and trajectories of various diseases, especially type 2 diabetes. The researchers hope to be able to categorize the diseases into subtypes so that treatment can be differentiated and thereby optimized.
Two people diagnosed with type 2 diabetes may have the same disease on paper, but the mechanisms and causes can present very differently in terms of disease history and characteristics.
One person developed diabetes quickly, whereas the process was gradual for someone else.
One person eventually became blind due to type 2 diabetes or faced other disease-related problems; another has no comorbid conditions.
One person is genetically predisposed to developing type 2 diabetes and has a family history; another does not.
Nevertheless, the people with different disease histories are interpreted and treated alike – but a major new research project wants to change this.
Using supercomputers and big data, researchers from the University of Copenhagen, Rigshospitalet and Statistics Denmark and others will identify links between diseases and their trajectories for numerous diseases in which patients are diagnosed as having similar subtypes today, although their disease trajectories and disease burden differ greatly.
The findings of the research project can be used to optimize treatments and preventive measures, to suggest new drug targets and drugs, and to design better clinical trials.
“The project will investigate how to understand disease trajectories. The focus is often solely on the differences between the people who develop an illness and those who do not. But the way patients develop diseases differs vastly, and we want to identify these systematic differences so that we can better understand the diseases and optimize the treatments,” explains the leader of the research project, Søren Brunak, who is a Research Director at the Novo Nordisk Foundation Center for Protein Research of the University of Copenhagen.
The Novo Nordisk Foundation awarded Søren Brunak and his partners a grant of DKK 60 million to delve deeper into understanding a wide range of diseases over 6 years.
Examining diseases from many perspectives
Overall, the researchers behind the new project are using big data to answer a variety of questions related to how diseases develop.
• How does a disease progress for the individual over time and can people be categorized according to their previous disease trajectory?
• Are some diseases linked to other diseases that are not directly related in a chronological sequence?
• What is happening at the genetic and protein levels?
Søren Brunak explains that having the opportunity in Denmark to gather such large quantities of data to elucidate these questions over such a long time frame is quite unique.
The reason for this is that everyone in Denmark was assigned a social security number in 1968 linked to all personal data such as disease history, personal income, address, number of marriages and the like.
In 1977, the Danish National Patient Registry was established, which means that the researchers in Søren Brunak’s group can go back up to 43 years to determine the disease trajectories of people with, for example, cancer, type 2 diabetes or Alzheimer’s disease.
“We have the opportunity to investigate the development of a disease over the long term and not simply when the first signs appear. We can even take the family history into consideration systematically. For the various diseases, we would like to intervene in the development of the disease, such as the development of type 2 diabetes, as early as possible. This requires finding disease markers early. For example, these markers may be other diseases, past treatment or health conditions that arise in a pattern before a person is diagnosed with type 2 diabetes,” says Søren Brunak.
Supercomputers discover genetic links between unrelated diseases
The researchers feed all the data they collect into a supercomputer that can identify patterns in big data.
Data from medical records describe disease trajectories, medicines at different doses and longitudinal laboratory test results. As the computer chews through the data, it can find patterns that would otherwise be impossible to detect.
For example, the computer might find that some people experience hearing loss before developing cancer or type 2 diabetes.
The laboratory test results from the medical records also sometimes contain genetic information and information about the body’s levels of various proteins.
The supercomputer can therefore also identify whether specific genes are involved in a variety of diseases that do not appear to be immediately linked and whether the disease trajectory has a specific sequence.
Hearing loss and type 2 diabetes are not immediately related, but certain genes can nevertheless influence the development of both if they are linked.
Irregular menstruation precedes breast cancer
Søren Brunak explains that two diseases can be linked genetically in different ways.
Genes with certain variants may cause one disease, which then increases the risk of developing the other disease. For example, metabolic syndrome may precede the development of type 2 diabetes.
By treating the diseases in the right order, doctors can remedy both by treating one.
One gene may also independently increase the risk of two diseases, so treating only one disease will not cure the other.
Thus, if one gene is associated with both hearing loss and type 2 diabetes, treating people for hearing loss will not alleviate type 2 diabetes.
In an earlier research project, the researchers have already published data showing that many women experience irregular menstruation before developing breast cancer in a statistically significant way.
These types of discoveries can improve treatment and diagnosis.
“Data are used to find a correlation. We cannot say that one leads to the other, but we can suggest that these two processes are connected, and screening women who have irregular menstruation a little more frequently for breast cancer may therefore be relevant. Many other diseases are probably less clearly linked to the diseases that people end up with. They may share a gene in a way that we did not expect. Many diseases come in random sequence, but in this project we focus on those that appear to have a systematic direction from one disease to another – a direction that our supercomputers can identify from the data we enter,” says Søren Brunak.
Data on 150,000 healthy Danish blood donors
In addition to data from the Danish National Patient Registry and many other registries, the project partners also have access to other unique data that can enrich the understanding of disease trajectories.
One partner is Henrik Ullum at Rigshospitalet and the University of Copenhagen, who is a leader of the Danish Blood Donor Study, in which researchers have collected blood samples from donors over a period of 10 years.
The unique thing about this part of the study is that becoming a blood donor requires being healthy, and blood donors are therefore more healthy on average than Denmark’s general population. Blood donors are therefore ideal for comparing health and disease.
From a research perspective, this reduces the epidemiological background noise comprising the various diseases people have.
However, with time some people inevitably get sick, and some of the 150,000 blood donors in Denmark who have given researchers permission to use their blood for research over the past 10 years have developed cancer, type 2 diabetes, skin diseases and other diseases after donating blood.
“So they are initially healthy, and we know their disease history for more than the past 10 years and have regular blood tests from all of them. We can study all of this for markers and patterns of disease trajectories,” explains Søren Brunak.
Investigating differences in metabolites, genes and proteins
More specifically, the researchers will genotype the genetic material from the many blood samples.
The researchers will not sequence the entire genome but simply investigate variation in around 1 million loci in the DNA sequence to get a genetic overview of the blood donors, which can then be used to find genetic causes for developing various diseases or having a higher risk.
The researchers will also analyse the blood for levels of proteins and various metabolites and link these with the family history of disease to identify patterns that can be used to predict whether one person or another will likely develop a disease.
“We look broadly for patterns in all kinds of data. If the levels of a given metabolite or protein in the blood in connection with some specific genes are linked to an increased risk of developing such diseases as cancer, arthritis or diseases of the nervous system, this information can be used to improve diagnostic methods and to better understand the diseases,” says Søren Brunak.
Statistics Denmark provides specific data for the project
The third partner in the research project is Statistics Denmark, with Laust Hvas Mortensen leading the work.
Statistics Denmark will provide data that extends far beyond medical records and includes data on people’s households.
They will provide data from smart electricity and water meters, and the researchers eventually want to see what is associated with these data.
A smart meter measures the electricity or water consumption in the household at regular intervals, and the data therefore indicate when people get up or go to bed.
The data can also reveal whether some people in a household often get up at night and therefore have disturbed sleep, which is a known risk factor for developing many diseases.
“These data are not based on an individual but on the entire household, and we are only interested in looking at patterns across households in order to find reliable statistical signals. Nevertheless, they provide some very specific information that we would like to link in the long term with data from the more health-related sources,” says Søren Brunak.
The data from Statistics Denmark also include whether people return to work after illness, and cash register data show what people who live in a specific area buy in their local supermarket.
One disease can have many trajectories
While the supercomputer provides an overview of the big data it analyses, the researchers are asking the computer to categorize the diseases people have into subtypes that have the same disease trajectory over time.
Some people may develop various skin diseases, then bowel disease and eventually type 2 diabetes.
Others may be overweight and have high alcohol consumption before developing a disease.
A third group may have the same genes associated with higher risk or the same pattern in their blood proteins before developing type 2 diabetes.
“People can be much more diverse than what having a specific disease might seem to indicate. Perhaps what we refer to today as one disease may in fact be different diseases that require different types of treatment. This is the whole idea of precision medicine: medication should be given individually and personally based on the underlying individual disease trajectory,” explains Søren Brunak.
Women are diagnosed on average 4 years later than men
The two largest subgroups for almost all diseases are men and women, but the differences between the sexes in diagnosis and treatment are not always sufficiently distinguished.
In the current study, the researchers investigated whether men and women differ in the progression of a wide variety of diseases such as cardiovascular diseases and metabolic disorders.
This study showed that, across all diseases that both sexes can get, women are diagnosed on average 4 years later than men.
Søren Brunak says that this is theoretically positive because it could show that women have more disease-free years, but this is obviously not good if certain diseases are related to overlooked conditions. This type of analysis can therefore have implications for how diagnosis and treatment should be differentiated between men and women.
However, examining differences within each sex is also important. Although men get heart disease earlier and more frequently than women, this does not apply to some subgroups of men and women.
In addition, heart disease progresses rapidly for some people but not for others, who may not experience any complications.
Nevertheless, many people with specific heart diseases are treated fairly uniformly, which means that some are being overtreated. In this field, the research is being carried out in collaboration with scientists and clinicians from Denmark, Iceland and Norway.
“This is another example of how we want to use these data to categorize people into subtypes according to how their disease progresses over time. Two people with the same heart disease may be at the same point in their disease trajectory, but one may have developed it quickly and the other slowly. This should influence treatment so we avoid risking overmedicating people,” says Søren Brunak.
People with dementia often have both vascular dementia and Alzheimer’s
Søren Brunak’s research group has also carried out a study on dementia.
Data for people with dementia show that some have been diagnosed with Alzheimer’s, in which they have an abnormal accumulation of amyloid beta protein, resulting in senile plaques in the brain, while others have vascular dementia. However, a group of patients 10 times larger has unspecified dementia: could be one or the other.
In this new study published in Alzheimer’s & Dementia, the researchers attempted to differentiate this large group of unidentified patients by examining the big data. One thing they did was to categorize them by age groups and investigate various potential factors underlying their disease trajectories.
Nevertheless, they could not categorize the patients with unspecified dementia into one of the two well-defined types of dementia.
“Then new information emerged indicating that people could have both types of dementia: mixed dementia. Our research showed that the patient histories did not enable us to categorize the patients into one of the two types. This supports the idea that many people probably have mixed dementia and that it is likely much more frequent than we thought,” explains Søren Brunak.
Guiding clinical trials of Alzheimer medicine
The fact that many people with dementia have mixed dementia may be very significant.
Research on medicine for Alzheimer’s, for example, shows that more than 100 drug candidates have failed in Phase 3 studies because they ultimately were not deemed effective enough.
If some participants with Alzheimer’s had mixed dementia, this makes it less clear to decide whether a treatment is effective on only one type of dementia and not the other.
“We have not invented the concept of mixed dementia, but our study reinforces the conclusion that that many people have it, and perhaps this can be used to guide setting up clinical studies differently to consider these differences,” says Søren Brunak.
Some people with diabetes have a higher risk of dying of sepsis
The researchers already have results that show how big data can help to elucidate differences between people with the same type of diabetes.
Several years ago, several studies investigated the risk of death from sepsis (blood poisoning) among people with diabetes.
Some studies found an increased risk of dying from sepsis; other studies showed the opposite.
Using big data, Søren Brunak and his colleagues categorized people with diabetes into three groups based on the other comorbid conditions they had: 1) no comorbid condition; 2) other somatic diseases such as cancer or cardiovascular disease; and 3) alcohol dependence or another mental or behavioural disorder.
The researchers analysed the data and found that it was mainly the people with type 2 diabetes who had a mental disorder had an increased risk of dying from sepsis.
“Our results indicated an important difference that can only be found by categorizing people with the same disease into subtypes that have different risks and may need differentiated treatment,” he adds.
Categorizing subtypes can be useful in clinical trials and treatment
Søren Brunak hopes that the researchers will primarily categorize people with type 2 diabetes but also other people with other diseases into many subtypes based on all the data available to the researchers.
These subtypes can be used in clinical trials and to help focus on providing optimal treatment to people with different subtypes of what is considered the same disease.
“For treatment, categorizing people into risk groups is valuable because you can then intervene for the people at high risk and keep treating them, and possibly stop treatment for those at low risk,” says Søren Brunak.
In 2017, Søren Brunak was awarded a Novo Nordisk Foundation Challenge Programme grant for the project Big Life-course Data Analytics for Understanding Disease Initiation and Progression in Diabetes and its Complications.