Helping medical researchers save lives: machine learning in genomics
It goes without saying that no two people are exactly alike. But when it comes to our health, these differences can have a huge impact in terms of how well we respond to different medical treatments.
Distinct variations in our genome, the complete set of our DNA, has a strong influence on our susceptibility to viruses, and can determine the success of treatments for complex conditions such as heart disease and diabetes. However, it has been difficult to know in advance which treatments will work best for each person. Despite the key to those differences being encoded in our DNA, the problem has been the sheer volume of data associated with each individual’s genome, and the ability of medical scientists to process and analyse this data in a timely manner.
Harnessing the Power of Cloud Computing
To solve this problem, Australia’s National Science Agency, the Commonwealth Scientific and Industrial Research Organisation (CSIRO) developed VariantSpark, which has become the first machine learning (ML) framework to analyse one trillion data points of genomic data, powered by Amazon Web Services (AWS).
VariantSpark uses ML to understand the statistical interaction between genetic factors that explain how the genome influences complex diseases, and can pinpoint the genes which cause them.
Using AWS compute solutions such as EC2 (Elastic Compute Cloud) and EMR (Elastic Map Reduce), VariantSpark uses hundreds of computer processing units (CPUs) to process genomic datasets with one trillion entries at a rate 3.6 times faster than ever before. With AWS, the VariantSpark platform can securely analyse vast amounts of genomic data in just 15 hours, a task that previously took them years.
As a result, researchers now have the ability to understand why different patients are susceptible to particular viruses, which will help them find life-saving treatments for complex conditions such as heart disease, diabetes, and other motor neurone diseases which causes nerve damage and weakens muscle mass.
VariantSpark is available on the AWS Marketplace, a digital platform store that helps customers easily find and start using software that runs on AWS. CSIRO was the first Public Sector organisation in the world to list a service on the AWS Marketplace, and has made VariantSpark free to use. This enables researchers from around the world to collaborate on research, which in turn helps accelerate the development of treatments.
AWS Marketplace also provides a channel for AWS Partner Network (APN) and independent software vendors (ISVs) to sell their solutions, so research organisations can tap into an alternative channel to fund and sustain innovation.
CSIRO Bioinformatics Group leader, Dr Denis Bauer said using ML in this way has provided a deeper understanding of complex diseases in a fraction of the time compared to traditional approaches.
“Our VariantSpark platform can analyse traits, such as diseases or susceptibilities, and uncover which genes may jointly cause them,” Dr Bauer said.
This can provide valuable information about how the disease works on a molecular level, which can ultimately lead to better treatments.
“VariantSpark is already being used to help determine what genes might be linked to cardiovascular disease, motor neurone disease, dementia, and Alzheimer’s disease.”
Dr Bauer said the scalability of AWS has helped CSIRO secure enough compute resources required to process one trillion data points of genomic data.
“Based on our research and testing, it shows VariantSpark is the only method able to scale to ultra-high dimensional genomic data in a reasonable timeframe of 15 hours,” Dr Bauer said.
“This is a significant milestone, as it means that VariantSpark can be scaled to analyse population-level datasets and drive better healthcare outcomes – things that were just pipedreams in the past.”
ML Critical to Future of Healthcare
CSIRO’s Australian e-Health Research Centre CEO, Dr David Hansen, says ML technologies are crucial to the future of healthcare in Australia.
“Machine learning is a critical component of understanding genomic information, which is increasingly being used to shape healthcare delivery in Australia and around the world,” Dr Hansen said.
“Despite recent technology breakthroughs with whole genome sequencing studies, the molecular and genetic origins of complex diseases are still poorly understood which makes prediction, application of appropriate preventive measures, and personalised treatment difficult.”
VariantSpark was developed by CSIRO’s digital health research team at the Australian e-Health Research Centre with support from CSIRO’s digital specialist arm, Data61.