Optimising the use of large biobanks and genetic databases

Breaking new ground 3. mar 2024 3 min Clinical Professor Thomas Werge Written by Kristian Sjøgren

A new study shows a method to extract much more clinically relevant information from relatively raw or apparently less important data from large biobanks and genetic databases. A researcher says that the method was first developed to increase the quantity of human genetic information that is useful for research, since this has been relatively limited.

Interested in Breaking new ground? We can keep you updated for free.

Thomas Werge

Institute of Biological Psychiatry, Mental Health Centre Sct Hans, Copenhagen University Hospital Mental Health Services and Department of Clinical Medicine, University of Copenhagen.

Follow Thomas

Imagine being a researcher who wants to investigate the genetics behind major depressive disorder (MDD).

This requires large quantities of data that could involve examining the genetic profiles of 100,000 people and linking the data with their health history and other observable phenotypic traits.

These studies are very expensive, and the large biobanks may not contain the genetic and phenotypic data needed.

A new study now shows a method of attributing the probability that specific people have some genetic and clinically relevant phenotypic traits without observing or measuring their actual traits.

These data for likely genetic and phenotypic traits can then be used to conduct studies and thereby obtain insight into the genetics behind MDD and how it differs between people.

“The new method is very useful because we can obtain considerably more insight into the genetics behind a disease or disorder without comprehensively characterising the genetics of the people involved in a specific study. In addition, we can save many resources on these typically very expensive studies, which in the long term can enable us to predict whether a given person has an increased risk of developing MDD or is likely to react positively or negatively to a specific treatment,” explains a researcher behind the study, Thomas Werge, Clinical Professor, Institute of Biological Psychiatry, Mental Health Centre Sct Hans, Copenhagen University Hospital Mental Health Services and Department of Clinical Medicine, University of Copenhagen.

The research has been published in Nature Genetics.

No need to know about the traits to be studied

The researchers validated a method to learn more about people’s genetic or phenotypic traits based on other phenotypic traits or other types of data.

For example, researchers may want to study dementia but only have people 30–50 years old to study.

Very few or none of these people have probably developed dementia, so linking the genetics of these people with whether they have developed dementia to determine risk does not make much sense.

Instead of waiting about 20–40 years to see whether the people being studied develop dementia, researchers could use information about their parents’ history of dementia.

“This means that instead of assessing dementia among the people for whom we have genetic data, we estimate a probability of dementia based on knowing whether their parents had dementia. This uses probabilities to fill a gap of knowledge about these people’s future illness trajectory,” says Thomas Werge.

If one parent has or had dementia, a person can be assigned a certain probability of developing dementia, and the probability could be, for example, 50% higher if both parents have or had dementia.

“We do not need to know everything about a person to be able to analyse. If we know about some features, we can calculate a probability for other features of interest, and that provides sufficient data power to be able to draw results from our studies,” adds Thomas Werge.

Inferring relationships from other data sets

The researchers showed that relationships can be identified between genetic differences and traits, and this insight can be used to quite accurately calculate other people’s probability of clinically important traits and genetic variants, so that these people can also be involved in and strengthen studies of disease.

An example could be only knowing a person’s birthweight, education, sex and age but needing the person’s height for analysis.

Again, databases with millions of people can identify correlations between birthweight, education, age, sex and height, and the probable height for each person can be calculated for the people in a specific study cohort.

This probable height can then be included in and significantly strengthen the study to provide useful conclusions.

“So even if we do not know the person’s height, but only a probable height, this can be included and contribute to genetic studies. The interesting thing we show is that researchers can use parental data or other information about the people of interest. Other large data sets provide correlations between phenotypic traits and genetics that enable the probability in a data set to be calculated,” explains Thomas Werge.

Even without data, researchers can learn more about MDD

The researchers showed that their method works in genetic studies of MDD.

MDD can be difficult to study because the diagnosis is not binary (definitely yes or no) in the same way as type 1 diabetes or measured height or weight.

“We can understand the genetics and causes of MDD much better if we have more data on large groups of people with MDD. But as I said, this type of study is very expensive,” says Thomas Werge.

Instead of starting the studies from scratch, the researchers show that data can be obtained from major biobanks such as the UK Biobank and Biobanks in Denmark.

Biobanks often contain genetic data on the participants and general information about their previous illnesses, education and the like, but detailed information is typically lacking about the many special and clinically decisive traits that vary between people with MDD.

However, the researchers do not need to know about these, because they can calculate probabilities for them and thereby analyse based on the very large data sets available.

“The basic data and resources are there, and estimating the missing data needed does not cost a fortune and require a whole career. You can calculate something useful and thereby identify genetics that affects a clinically important aspect of MDD. This may be important for clinical practice and for treating people with MDD,” concludes Thomas Werge.

Follow Breaking new ground

“Phenotype integration improves power and preserves specificity in biobank-based genetic studies of major depressive disorder” has been published in Nature Genetics. The study was funded by the United States National Institutes of Health, the Lundbeck Foundation, the United States National Institute of Mental Health, the University of Copenhagen and the University of Aarhus. The Novo Nordisk Foundation has supported the Danish National Biobank.

Clinical Professor

Thomas Werge

Institute of Biological Psychiatry, Mental Health Centre Sct Hans, Copenhagen University Hospital Mental Health Services and Department of Clinical Medicine, University of Copenhagen.

Follow Thomas

My research has focused on explaining the biological reasons that contributes to the development of severe mental disorders such as schizophrenia and...

Breaking new ground

14. jun 2019 3 min

Optimising the use of large biobanks and genetic databases

Interested in Breaking new ground? We can keep you updated for free.

Thomas Werge

No need to know about the traits to be studied

Inferring relationships from other data sets

Even without data, researchers can learn more about MDD

Thomas Werge

Related articles

DNA techniques unravel a prehistoric mass murder

How the genes of the mother and child influence birth weight

Children of mothers with maternal diabetes have an increased risk of needing corrective lenses

Female hormones make men’s bones strong

Unique region of the sex chromosomes crucial in human development

Researchers map more than 2,600 whole cancer genomes

Researchers identify how the eye perceives and signals motion

Exciting topics

See all 981

Recycling 4

Virus 89

Antibodies 24

Cells 46

Eyes 9

Fungi 25

COVID-19 93

Screening 32

Metabolism 66

Stress 29

Nanotechnology 27

Biology 25

Disease 41

Language 7

Pregnancy 52

Cholesterol 19

Environment 87

DNA 46

Liver 39

Blood 59

CRISPR 23

Fertility 19

Schizophrenia 14

Antioxidants 4

Organs 25

Genes 169

Podcasts 14

Cancer 136

Lungs 21

Obesity 92

Treatment 110

Mental disorders 50

Protein 120

Skin 22

Plants 42

Diabetes 127

Plastic 10

Immune defence 70

Parasites 13

Climate 32

Alzheimers 17

Dementia 12

Chemotherapy 13

Food 22

Nerves 26

Alcohol 27

Kids 66

Influenza 15

ScienceViews 11

HPV 13

Bacteria 112

Bones 41

Chemistry 79

Birds 6

Heart 70

Stem cells 34

Puberty 11

Chromosomes 20

Sleep 21

Drugs 16

Cystisc fibrosis 12

Enzymes 25

Exercise 38