The internet is awash with algorithms designed to recognise many things, from plants and animals to the sounds of birds and music. With some adjustment, such algorithms can become important tools in clinical health research.
Artificial intelligence and machine learning are still often considered buzzwords but are increasingly becoming integrated into research on health and disease.
Researchers can use artificial intelligence to extract information from large data sets from which meaningful insights could not be extracted previously.
Although clinical researchers have begun to use machine learning, they are nowhere near exploiting the enormous potential of the many algorithms and data sets that are available online.
According to a new study, published as a preprint on medRxiv, clinical researchers should increasingly reuse algorithms that are already available instead of designing their own.
“Today, there is a chasm between clinical research and what data science can do. Transfer learning repurposes algorithms developed and trained for solving one problem using a huge volume of data, such as from the Internet, to solve a different problem, often in another domain. Clinical research should use this approach to a much greater extent than it does today,” explains a researcher behind the study, Adam Hulman, Senior Data Scientist, Steno Diabetes Center Aarhus.
The review article on medRxiv takes a helicopter perspective on the current use of transfer learning and the potential for expansion.
Internet awash with useful algorithms
One example of transfer learning is an algorithm Google designed to identify the objects shown in images.
Data scientists at Google trained the algorithm to recognise various objects using millions of images. Since the algorithm had identified patterns in the data that were specific images from different categories, it could then classify new images very accurately.
However, the algorithms can be used for much more than this. Instead of designing their own machine learning algorithms, clinical researchers can borrow the algorithms available online or from each other and tailor them in their research.
For example, the algorithm designed and trained to classify everyday images can be transferred with small adjustments to learn to recognize patterns in the eyes of people with diabetic eye disease, thereby determining the disease severity.
“The algorithms are designed and trained to recognize everyday things, but they can be quite easily adapted and reused in clinical research. This has also been done for many years in medical image analysis, but the potential is greater than that,” says Adam Hulman.
Algorithms can identify patterns in more than just photos
Clinical researchers often collect data in spreadsheets but may also work with time-series (such as ECG or continuous blood glucose measurements), sound recordings (such as heart sound) and text (such as electronic health records).
Adam Hulman and colleagues explored the use of transfer learning for non-image data in the clinical literature to find examples of algorithms available on the Internet that could boost the development of clinical prediction models.
One example is a study in which the authors used image recognition algorithms to classify heart sounds.
Adam Hulman says that the researchers behind that study had done two things to analyse their data of heartbeats.
They found an algorithm designed to recognize 500 to 600 different sounds in YouTube videos and fine-tuned it to recognize differences in heart sounds to identify people with heart disease.
The researchers then compared the refined algorithm with another model that transformed the sound of the heartbeat into an image so that they could use the algorithm originally designed to classify everyday images to distinguish the images of heart sounds from sick and healthy individuals.
“There are more and more examples of researchers taking available algorithms and using them in their clinical research. In our study, we reviewed thousands of abstracts and read hundreds of articles to gather the examples and then performed basic descriptive analysis to determine, for example, how widely Google’s image recognition algorithms are used to analyse data other than images,” explains Adam Hulman.
Important to bring scientific fields together
Adam Hulman and colleagues identified 83 peer-reviewed clinical studies that used transfer learning on human non-image data.
As many as 63% of the studies had been published within the previous 12 months, which Adam Hulman says indicates that the field is gaining momentum and that more and more clinical researchers are realising the potential.
Adam Hulman and colleagues also examined the authors of the 83 articles and found that 60% included at least one author with a clinical affiliation and at least one with a technical affiliation. Studies with authors with solely technical affiliations (35%) were more common than studies with authors with solely clinical affiliations (5%).
“This suggests that, although many studies have both clinicians and technical researchers as authors, this gap still needs to be closed so that machine learning in clinical research becomes more than just a buzzword, something that clinicians also understand the benefits of,” says Adam Hulman.
Data and code should be shared
Another aspect of the review is the availability of data, especially clinical data sets.
In data science, researchers and algorithm developers often use openly available data sets to develop the algorithms of the future. If the available data sets happen to be images of dogs and cats, they use these to create their algorithms. However, they can just as easily use available patient data, and Adam Hulman says that this can benefit clinical researchers.
“If you are a data scientist working on a research project and the question is whether you will use an openly available data set or privately owned one, the answer is quite obvious. A culture with greater openness on clinical data is therefore also needed,” he explains.
A third aspect of the review is investigating whether the researchers who reuse algorithms from the Internet also shared their data to benefit other researchers.
Only 27% chose to share the code of the algorithm they had created based on the work of other researchers.
“There is great potential in researchers sharing more data and algorithms. If I developed an algorithm for identifying people with diabetes, and another researcher wants to study athletes with diabetes, they may have great difficulty getting enough data for the analysis. Sharing algorithms enables clinical researchers to analyse subgroups of patients much more easily, for example, if they cannot collect enough data or develop the right algorithm from scratch,” says Adam Hulman, adding that three doctors, a mathematician and a statistician conducted their review and that involving researchers with different backgrounds was an important feature of the project.