top of page

Artificial Intelligence:

Predicting Tumour Evolution

Our inaugural talk of 2019 took place on 22 January and garnered major interest from over 70 attendees, who came from a range of subjects – Medicine and Biology, but also Informatics, Computer Science, and Artificial Intelligence. As biological databases are becoming increasingly complex and information-rich, we invited School of Informatics machine-learning lecturer, Professor Guido Sanguinetti, to explore the thriving intersection of Biology and Informatics.

​

​

​

​

​

​

​

​

​

​

​

​

“I don’t know very much about cancer”, began Prof Sanguinetti, speaking about his recent publication on ‘Detecting repeated cancer evolution from multi-region tumour sequencing data’, “but I hired a very good post-doc, Giulio Caravgna (now at ICR London)… this talk is just about something interesting that we did together.”

 

This excerpt from his talk summarises one of the two things I found most exciting about the talk – how experts from two fields that do not normally interact can unite to create new knowledge and ideas. The second thing that gripped me was the ideological shift that learning about Prof Sanguinetti’s work created in my mind – the concept that classifying tumours based on one type of mutation may be simplistic for therapeutic stratification, and that instead the sequence in which mutations occurred can be used to predict outcomes to therapies.

​

In the beginning of his talk Prof Sanguinetti emphasised the struggle to get funding and to get published for the project, having to wait 13 months to finally receive an opportunity from Nature Methods. “Basic lesson here is that if you think you have a good idea,” he explained to the audience, “don’t give in – because there will be many, many setbacks.” His team went on to develop a machine-learning method that allows classification of patients based on how their tumours will evolve, taking us one step further to targeted therapies.

guidoMountains.JPG

Prof Sanguinetti explained that the vast heterogeneity between individual tumours is a major challenge when developing a therapeutic strategy against cancer. There are two sources of heterogeneity – firstly, there is substantial genetic heterogeneity between cancers of the same type, and secondly there is substantial intrapatient variation. Heterogeneity is also the main way in which cancer resists treatment, and we do not know how heterogeneity relates to phenotype (i.e. treatment stratification, relapse time, etc.).

 

The speaker’s research is based on the idea that each tumour is a result of an evolutionary process. A tumour cell may undergo clonal expansion if its specific set of mutations confers a fitness advantage. We may recapitulate the cancer as a sequence of steps (a sequence of random mutations), as if we were fast-forwarding the ecological evolutionary process suggested by Darwin. Fitness advantages may be related to patient lifestyle factors, the tumour’s access to nutrition, and treatment resistance, whereby therapeutic-induced changes to environment benefit cells with mutations conferring drug resistance.

 

Prof Sanguinetti’s research attempted to model the cancer evolutionary process can help predict the trajectory of patients on treatment, allowing for better therapeutic stratification. The use of phylogenetic techniques to explain cancer diversity has been around for some years. In a previous 2015 paper  mapping the evolutionary history of lethal metastatic prostate cancer, Gundem and colleagues took several samples from metastatic tumour sites from patients and developed a phylogeny of the cancer:

​

Grey points represent mutations that could not be timed, but happened early on during the mutation process. The various termina leaves of the tree were then mapped to groups of cancers either in the main locus (primary tumour) or the metastaic sites (secondary tumours). This kind of analysis allows us to understand the different clonal types present in the cancer and how they relate to distal metastases.

 

With phylogenetic analyses, we can talk in terms of waves of tumour mutations (first wave, second wave, etc.), as well the terminally differentiated clonal subgroups. Furthermore, two aspects are considered. Firstly, the frequency with which mutations occur (the highest being 50% or 1 allele), which allows use to order mutations by the frequency. Secondly, in order to get the tree of life, all the “species” or clones of the cancer need to be assayed – therefore multiple biopsies are needed per tumour, and from a variety of areas for diverse data. The data considered in Prof Sanguinetti’s paper was multiregion sequencing data that was compiled by taking multiple biopsies from each patient and computating the mutations present in each sub-sample.

The group argues that although evolution cannot be predicted because it is a random process, we can gain statistical power to predict evolutionary trajectory by sampling multiple patients, in a principle to detect repeated evolution. In the paper, a list of driver mutations was drawn to predict cancer’s next move – a binary string was generated for each biopsy, as they tried computing similarity between biopsies. As they progressively aggregated the data, the patterns became increasingly similar, resulting in a tree where the common mutations end up at the top, with branches for mutation leaves that differ between the subsets of the biopsies.

 

Initially, people lacked sufficient precision to obtain multiple biopsies, instead sequencing bulk tumours. The issue here was that they did not obtain clear trees, but instead multiple trees that could work equally well, resulting in significant overlaps that treat each patient as a result of the same process, not accounting for intra-patient heterogeneity.

 

Later with greater precision, multiregion sequencing became possible, assuming each patient to be completely different. This approach accounted for intra-patient heterogeneity, but it remained impossible to find similarities and group patients to make predictions. This method is vulnerable to high levels of noise (related to how deeply you sequence your samples), and too much noise means you can’t find similarities between patients.

 

The idea in Prof Sanguinetti’s paper viewed the process of building a phylogenetic tree as an optimisation process, with the concept that even though every patient is different, not all processes are dramatically different, thereby combining the two extreme approaches to group all patients while maintaining the idea that they are all different. This allowed them to create a solution with repeated patterns, sets of mutations in an order that show up repeatedly.

 

Through the paper, the group showed that we could learn cancer models – groups with sequences of mutations, and use Cancer Genome Atlas Data (which also has survival data) to assign survival characteristics to different groups,

so that conclusions were significant from both a biological and clinical point-of-view.

 

Prof Sanguinetti predicts that in the future we will have a set of analyses in the clinic, which we can rapidly apply to people who come into the clinic. This applies the conceptual shift I mentioned at the start – a move towards stratifying cancer based on evolutionary history rather than just the mutations that occurred. What remains to be proven is the phenotypic relevance of evolutionary history to clinical outcomes.

​

-Vishwani Chauhan

*About the speaker*
Professor Sanguinetti is a machine learning lecturer at the School of Informatics. He obtained his undergraduate degree in Physics from the University of Geneva and his DPhil in Mathematics from Oxford. His interest lies in probabilistic modelling of biological systems, with particular emphasis on inference in dynamical systems. The list of his research projects can found here.

 

56730558_1513359812133826_38054110021581
bottom of page