Tiny genes that encode even tinier proteins could not be identified until recently because of their size. But a researcher in Denmark now thinks that the human genome has thousands of them.
In an international collaboration with Danish participation, researchers have examined the human genome with a fine comb and found hundreds of genes that are so small that they could not be discovered until now.
Advanced techniques and bioinformatic supercomputers that can process the unbelievably huge quantity of information in the human genome have enabled this genetic examination.
This research has provided the most complete overview of the genetic processes that enable us all to function well.
In addition, in experiments on human cells, the researchers also cleaved these potential genes to determine whether they are actually genes and, secondarily, whether they affect the survival potential of our cells.
“Until recently, genes that encode proteins that are less than 100 amino acids long could not easily be identified, and we therefore did not know whether the human genome has many or few genes smaller than that. However, determining how many genes there are and whether their function may be relevant to health is important for biomedical research,” explains Matthias Mann, Professor and Research Director, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen and Director, Max Planck Institute of Biochemistry, Munich, Germany.
The research has been published in Science.
The human genome had many fewer genes than expected
When the Human Genome Project finished mapping in 2004 (the initial data emerged in 2001), thousands of researchers had worked for more than 10 years, and the Project had cost 2.7 billion dollars.
The answer everyone anxiously hoped for was the definitive number of genes in the human genome: 80,000, 100,000 or maybe even 120,000?
When the article appeared in Nature, the result surprised everyone: a mere 20,000 genes – no more than in roundworms and fruit flies.
“Researchers realized then that the analysis methods used had some limitations, leaving the question of whether the genome has additional smaller genes and how many. The problem was that the analysis could not show whether the short genetic sequences were real genes or just noise from the techniques used to cleave and analyse the DNA,” says Matthias Mann.
Genes are DNA sequences with special properties
To elucidate this question, the researchers behind the new project deployed numerous tools to find genes that encode proteins of less than 100 amino acids, which had previously been the minimum detectable length.
The researchers mainly used bioinformatic techniques to compare many genomes.
This enabled them to determine that short sequences of DNA that are well conserved across many different genomes are probably genes that encode proteins.
In addition, the researchers also identified some more or less specific requirements for a genetic sequence to be classified as a gene.
“Translation of a gene must normally be initiated from a start codon, and then there are also some other requirements for sequences to be identified as real genes, but this is not always the case,” says Matthias Mann.
Microproteins drown in the background noise
The best way to verify whether a sequence is a gene is to find the protein the gene encodes.
However, this can be difficult for microproteins because the surroundings can mask their expression.
In the human proteome, these proteins appear as long sequences of amino acids against the background noise of many protein fragments.
These microproteins can disappear in the background noise, and confirming their existence is therefore difficult.
In addition, these microproteins can be difficult to find if they are not very frequent.
“But here too the techniques have improved, so that today we can find shorter proteins, which are expressed in smaller quantities,” explains Matthias Mann.
Matthias Mann contributed to the new research by analysing the proteins using mass spectrometry.
Many more human genes than previously thought
Overall, the researchers behind the new project used several genome and protein techniques to identify thousands of new potential genes.
In parallel, the researchers used systematic CRISPR-based screening to cleave the identified genes in human cells to determine whether they affect cell survival.
If cell growth was altered or the cells died, the researchers concluded that the genes encoded important proteins.
The researchers found hundreds of these types of new genes.
“There are at least several thousand genes that we have not yet identified, and we therefore probably need to increase the estimated number of genes in the human genome from 20,000 by thousands,” says Matthias Mann.
Genes affect the immune system and cell cycles
The researchers also investigated what many of the genes do in human cells.
They examined how the encoded proteins interact with other proteins in known protein complexes and the role of these proteins in signalling pathways.
For instance, the researchers labelled the proteins with fluorescent molecules to determine where they are located in the cell.
Some turned out to be present on the cell surface and others in the mitochondria, the cell’s power plants.
In a wider context, this study revealed that many of the newly discovered genes are important for the cell cycle, since they create proteins in specific processes.
A large group of proteins is also involved in determining how the immune system responds to foreign substances.
“It has taken 20 years to get here, but the amazing genetic and proteomic tools we now have available will finally enable us to examine the human genome at much higher resolution, and we will be able to discover many fascinating mechanisms that explain how our organism functions in sickness and in health,” says Matthias Mann.
“Pervasive functional translation of noncanonical human open reading frames” has been published in Science. Matthias Mann is a Research Director at the Novo Nordisk Foundation Center for Protein Research, University of Copenhagen and a Director at the Max Planck Institute of Biochemistry.