Driving cancer research with comprehensive data types that are complete, accurate, permanent and accessible

By Dr. Marco Marra*

Although we refer to it as a single entity, cancer is not one disease but many. These diseases can be distinct from each other in many ways, for example the areas of the body in which they occur, their responses to treatments and the molecules and genetic alterations that drive them. Recognizing these distinctions and leveraging them to improve cancer patient outcomes have been major challenges in cancer research and care.

Today, however, sophisticated scientific strategies enabled by powerful technologies, like the ability to sequence entire genomes from both tumour and normal samples, can identify (or "profile") many, if not most, of the biomolecular alterations driving cancer. Such advances are allowing us to understand the extent to which cancers differ and also the ways in which different cancers might share biomolecular alterations and perhaps even respond to the same therapies.

The ability to comprehensively profile individual tumours, rapidly and at scale, is extremely powerful: the better we understand the specific biomolecular alterations promoting cancer development and growth, the better we can create strategies to interfere with these properties to fight cancer. To drive comprehensive profiling, various initiatives have focused on generating large collections of data from tumour samples and laboratory cancer models. Such datasets have fueled innovative cancer research and have led to improvements in diagnosis, prognostication, and treatment in many cancer types. However – and this is at least in part due to technological and cost limitations – much of the data generated during the last two decades have not been comprehensive. Instead, they have focused on profiling only selected genes or regions of the genome.

As our understanding of cancer genome biology has grown, so too has the recognition that comprehensive profiling will have the enduring power to drive new and innovative analyses. I believe our ambition should be to invest in data that will be as relevant decades from now as they are today, thus maximizing the future return on our investments, enabling (for example) detection of new cancer aberrations and identification of new cancer cell vulnerabilities that may be targetable using new drugs.  To this end, I believe we should aim to produce large-scale biomolecular profiling data that are complete, accurate, permanent and accessible (CAPA).

One example where CAPA principles are especially relevant relates to the use of artificial intelligence (AI) approaches to facilitate and advance cancer research. AI is a broad term that encompasses various approaches, including machine learning. Machine learning methods are increasingly used in cancer genomics research, and it is expected that new, improved approaches will continue to be developed. While these technologies have the potential to significantly improve our ability to diagnose, understand and treat cancers, we need to train and test machine learning models with large volumes of high-quality, comprehensive data to facilitate the best possible results for researchers, physicians and, most importantly, patients.

Such data will be generated by the Marathon of Hope Cancer Centres Network, a pan-Canadian initiative led by the Terry Fox Research Institute and the Terry Fox Foundation and supported by the Government of Canada and partners across the country. Guided by CAPA principles, one of the key objectives of the first phase of this Network is to build the Gold Cohort, which will be a rich resource of genomic and clinical data from 15,000 Canadian cancer patients. The Gold Cohort promises to be among the largest and most comprehensive cancer genomic datasets produced to date and will be made available to researchers in Canada and beyond to enable, for example, high-quality, impactful cancer research to accelerate precision oncology and improve outcomes for cancer patients here and around the world. We expect the Gold Cohort will enable AI advances in the cancer research field, possibly by enabling better machine learning models to advance precision cancer genomic medicine in both the near and long term.

The Network’s Technology Working Group, which I co-chair, considered the principles of CAPA; the aims and potential contributions of the Network; and available technologies and similarly positioned international cancer initiatives to arrive at the recommendation that the initial focus for Gold Cohort genomic data generation ought to be on whole-genome and transcriptome analysis (WGTA) .

Among the technologies available today, WGTA is best positioned to provide a near-comprehensive overview of a cancer’s DNA and RNA profiles. This allows for the sensitive identification of genomic alterations that may contribute to tumour growth and that, by extension, may be targeted to fight or even eliminate the cancer. This means that this type of analysis can also be used to inform cancer patients’ care in real time, aligning patients to therapies and potentially matching them to clinical trials. Importantly, WGTA can also help inform the interpretation of results from patients on trial, leading to both a better understanding of molecular features that influence patients’ responses to treatment and to the rational selection of treatments that are most likely to benefit individual patients. This type of approach has been developed and implemented in the Personalized OncoGenomics (POG) program at BC Cancer, which I co-founded and co-lead at Canada’s Michael Smith Genome Sciences Centre and the University of British Columbia (UBC) and which is now partially funded by the Marathon of Hope Cancer Centres Network.

In addition to profiling DNA and RNA from tumour samples, WGTA also profiles each patient’s matched ‘normal’ (germline) DNA. This information can also inform treatment decision-making and clinical trial eligibility and allows for the identification of inherited genetic mutations that may have contributed to the patient’s cancer, which can both contribute to our understanding of heritable cancers and help identify family members at risk for developing cancer in the future. This can have a significant impact by enabling enhanced screening of family members, which can lead to early detection of cancers at a stage where they are typically more treatable and potentially curable. 

As we move forward as a research community and a Network, it is imperative that we continue to invest in high-quality, comprehensive data generation and analysis and apply due consideration to emerging technologies to ensure that we maximize the impacts of research enabled by the tremendous gifts of biological samples donated by cancer patients. We must also take care to use technologies equitably and create datasets that reflect cancer patients’ diversity to enable research that can benefit Canada’s peoples, no matter where they reside. This will ensure that we drive innovative research today and that we facilitate future research, further amplifying the return on the significant investments made by all those involved in the Network. This strategy will help position Canada as a leader in precision oncology and, most importantly, has unrealized potential to improve care and outcomes for cancer patients.

 

*Dr. Marra is a Canada Research Chair in Genome Science at the University of British Columbia, Terry Fox Leader in Cancer Genome Science and the Co-founder and co-lead of the Personalized OncoGenomics (POG) program at BC Cancer, Canada's Michael Smith Genome Sciences Centre and the University of British Columbia. He is also the MOHCCN Consortium Lead for the BC Cancer Consortium (BC2C) and a member of the MOHCCN Network Counci.

"This strategy will help position Canada as a leader in precision oncology and, most importantly, has unrealized potential to improve care and outcomes for cancer patients."