2019 CDAC Schedule (downloadable PDF)
Preliminary Talks Titles and Abstracts
Looking for signatures of misfunction in metagenomic data. A novel approach based on multiple probabilistic models
Biochemical and regulatory pathways have until recently been thought and modelled within one cell type, one organism and one species. This vision is being dramatically changed by the advent of whole microbiomesequencing studies, revealing the role of symbiotic microbial populations in fundamental biochemical functions. The new landscape we face requires the reconstruction of biochemical and regulatory pathways at the community level in a given environment. In order to understand how environmental factors affect the genetic material and the dynamics of the expression from one environment to another, we want to evaluate the quantity of gene protein sequences or transcripts associated to a given pathway by precisely estimating the abundance of protein domains, their weak presence or absence in environmental samples.MetaCLADE is a novel profile-based domain annotation pipeline based on a multi-source domain annotation strategy. It applies directly to reads and improves identification of the catalog of functions in microbiomes. MetaCLADE is applied to simulated data and to more than ten metagenomic and metatranscriptomic datasets from different environments where it outperforms InterProScan in the number of annotated domains. Analysis of cancer samples have been also realised and they highlight signatures distinguishing patients from healthy individuals.In conclusion, learning about the functional activity of environmental microbial communities is a crucial step to understand microbial interactions and large-scale environmental impact. MetaCLADE has been explicitly designed for metagenomic and metatranscriptomic data and allows for the discovery of patterns in divergent sequences, thanks to its multi-source strategy. MetaCLADE highly improves current domain annotation methods and reaches a fine degree of accuracy in annotation of very different environments such as soil and marine ecosystems, ancient metagenomes, human tissues, cancer patients and healthy individuals.
- Research Talk:
GEMME: a simple and fast global epistatic model predicting mutational effects
Natural protein sequences observed today are the result of evolutionary processes selecting for function. They can inform us on which and how sequence variations affect proteins’ biological functions, a central question in biology, bioengineering and medicine. The increasing wealth of genomic data has enabled the accurate prediction of complete mutational landscapes. State-of-the-art methods adressing this problem explicitly or implicitly model inter-dependencies between all positions in the sequence of interest to predict the effect of a particular mutation at a particular position. They infer hundreds of thousands of parameters from very large multiple sequence alignments. They require large variability in the input data and remain time consuming. Here, we present GEMME (www.lcqb.upmc.fr/GEMME), a fast, scalable and simple method to predict mutational outcomes by considering the evolutionary history that relate natural sequences. GEMME infers evolutionary relationships between sequences by quantifying their global similarities. It then uses these relationships, encoded in a tree, to estimate conservation levels and evolution fits required to accomodate mutations. Assessed against 41 experimental high-throughput mutational scans, GEMME overall performs similarly or better than existing methods and runs faster by several orders of magnitude. It greatly improves predictions for viral sequences and, more generally, for very conserved families.
Single-Cell Transcriptomics for Dissecting Cellular Heterogeneity in Cancer – New Technology, New Vistas and New Challenges
This tutorial offers an introduction aimed at both experimental biologists and theoreticians about the concepts that motivate single-cell resolution transcriptome measurements in cancer biology using state-of-the art technologies. The focus is on single-cell RNAseq. Given the rapid development of technology as well as analysis paradigms, which are still in fast flow, we discuss approaches for which de facto consensus has been achieved. We focus on fundamental concepts inspired by thinking of biologists and physicists and distinguish them from the ad hoc heuristic algorithms proposed by computer scientists that however have become indispensable tools to “look at the data”. We ask: What new biological insights for cancer can we gain from single-cell resolution measurements in thousands of genes in thousands of tumor cells? What are pitfalls of the data analytics and technical challenges? First applications to patient tissues will also be discussed.
- Research Talk:
Is there an Inherent Limitation to Cancer Treatment Due to the Complex Cell Population Dynamics?
We study cancer cell population dynamics which exhibits many more complexities than one may think when using ecology analogies: Cancer cell populations, even of isogenic cells, are heterogeneous and contain distinct subpopulations of cells that (i) divide at varying rates, (ii) are susceptible to drugs to varying extent, and (iii) transition into each other –spontaneously as well as in response to cell stress imparted by any treatment (chemotherapy, radiation and even surgery). Specifically, we discovered that almost all treatments either kill the cell or induce a state transition into a stem-like state, essentially converting the surviving cells into possibly the cancer stem cell state – which are known to be drug resistant and able to initiate a tumor. Thus, a cell “cannot not respond”: Non-responding (that is, surviving) cells are not just static innocent bystanders that happen to be resistant but whatever their reason for not dying, perhaps just luck, they are pushed into an alternative state that resemble that of cancer stem cells. In other words, treatment stress trigger a symmetry breaking, resulting in cell death or regeneration. Or, with Nietzsche: “What does not kill me makes me stronger”. But worse, there is a second non-cell autonomous mechanism: we found that the dead cell bodies (“debris”) of cells that have succumbed to the treatment stimulate the surviving cells to transition into the stem-like state. These mechanisms imply that treatment is always a double-edged sword and thus may backfire. Could this establish an inherent limitation to treatment efficacy, such that modern more potent cytotoxic drugs, however selective, may never work satisfactorily? The double-edged sword effect mandates mathematical modeling. In the talk I will show experimental results in cell culture and murine tumor models and discuss a (still evolving) mathematical description of the inherent incurability of cancer based on the bifurcation dynamics.
Genomic-Algorithms Tool Box: Current Developments
Since the turn of this century, the field of biology has been inundated with data, first genomic, then transcriptomic, epigenomic and so on, and now moving towards even single cell omics. Each mode presents its unique challenges. String algorithms with layers of statistical checks and balances have successfully served the class of problems with NGS omic-reads: similarly graph algorithms on “relationship” information and phylogeny algorithms on “evolution” models. I will discuss our use of the above to understand tumor evolution.Also, propelled by the successes in the big data field with ML/AI tools, there is a growing impatience or urgency to make sense of the biological data that is rapidly accumulating. Topological Data Analysis (TDA) takes a refreshingly different view of data that is not necessarily geometric, providing a means to explore multi-way relationships, at arbitrary depths, within the data. Using the example of metagenomic data, I will describe an application of TDA to address the problem of accurate organism identification.Another interesting development has been the melding of ML/AI with classical genetic epidemiology. I will discuss our efforts on this.
- Research Talk:
Precision Oncology: the Promise and the Practice
While cancer has been afflicting humankind since pre-historic times, the understanding of the biology of the disease is still very incomplete. We are in the throes of a genomic (omic) revolution, and an algorithms explosion! How does this affect what we can possibly do for a patient today? I will talk about this saga that I and my team embarked on, and, the challenges we face both on the sides of biology and information.
- Tutorial:Perspective on the Field of Biomolecular Modeling
We re-assess progress in the field of biomolecular modeling and simulation, following up on our perspective published in 2011. By reviewing metrics for the field’s productivity, providing examples of successful predictions of structures/mechanisms and generation of new insights into biomolecular activity, and highlighting collaborations between modeling and experimentation (including experiments driven by modeling predictions), we outline the productive phase of the field whose short-term expectations were overestimated and long-term effects underestimated. We also discuss the impact of field exercises and web games on progress, knowledge-based versus physics-based approaches to structure prediction, and the role of algorithms and computer hardware on the future of the field. Overall, the tremendous success by the biomolecular modeling community regarding utilization of computer power, force-field improvements, and development and application of new algorithms, such as machine learning and multiscale-resolution models, are enhancing the accuracy and scope of modeling and simulation and positioning the field as an exemplary discipline where experiment and theory/modeling are full partners.For background reading:
- T. Schlick, R. Collepardo-Guevara, L. A. Halvorsen, S. Jung, and X. Xiao, “Biomolecular Modeling and Simulation: A Field Coming of Age”, Quart. Rev. Biophys., 44: 191–228 (2011). Published online 12 January 2011, DOI:10.1017/S0033583510000284
- Research Talk:Folding Genes Using Nucleosome Resolution Models of Chromatin
Deciphering chromosome tertiary organization is essential for understanding how genetic information is replicated, transcribed, silenced, and edited to control basic life processes. Many experimental studies of chromatin using nucleosome structure determination, ultra-structural techniques, single-force extension studies, and analysis of chromosomal interactions have revealed important chromatin characteristics as a function of various internal and external conditions, such as looping, compaction, and compartmentalization. Modeling studies, anchored to high-resolution nucleosome models, have explored related questions systematically. In this talk, I will describe multiscale computational approaches for chromatin modeling at nucleosome resolution and recent mesoscale chromatin simulations that incorporate key physical parameters such as nucleosome positions, linker histone binding, and acetylation marks to ‘fold’ in silico the Hox C gene cluster. The folded gene reveals a contact hub that connects an acetylation-rich with a linker histone-rich region. Such chromatin modeling techniques open the way to other computational folding of genes and genomes. Moreover, the resulting folded system emphasizes the heterogeneity of chromatin fibers and hierarchical looping motifs, and underscores how nucleosome positions in combination with epigenetic marks and linker histone binding direct the tertiary folding of fibers and genes to perform their cellular tasks. These chromatin architecture findings have important implications on many important processes including cell differentiation, gene regulation, and disease progression.
- Tutorial Talk:
Automated Machine Learning for Biomedicine
Available biomedical data are exploding, presenting new opportunities for science, understanding the system under study, and creating diagnostic and predictive models. However, each data analysis project requires significant human time and effort, as well as deep knowledge and expertise in data analytics. A new sub-field of machine learning is now emerging trying to fully automate the end-to-end process of the analysis, increase the productivity of experts, and democratize analyses to non-experts called automated machine learning or AutoML. In this talk, we’ll present the challenges of AutoML, delve into some proposed solutions, and present our AutoML tool called Just Add Data Bio or JAD Bio. While the talk is generally addressed to people interested in machine learning, data science, and advanced statistical analysis, the examples and presentation is centered around the analysis of biological and biomedical data.
- Research Talk:
Advances in Feature Selection for High-dimensional Biomedical Data
Typically, in predictive analytics one is not only interested in how to predict, i.e. finding an optimal prediction model, but also in what is predictive, i.e., the molecular quantities that are predictive.
Feature selection is the problem of identifying a minimal-size subset of features (molecular quantities in the context of bioinformatics) that jointly (multivariately) are optimally predictive. These are also called (bio)signatures. Feature selection is not only important but sometimes it is actually the primary goal of the analysis, with the predictive model produced being just a side benefit. Feature selection is a primary tool for knowledge discovery in biomedicine. In the talk, we’ll discuss recent algorithms that scale up to very high dimensional data (e.g. SNP data), scale up to large sample size data (Big volume Data), scale down to very few samples, the connections of feature selection with causal discovery and the causalities of the system.
We’ll also discuss the multiple feature selection problem defined as identifying all statistically-equivalent signatures, its importance, and algorithms to solve it. Applications on real data will be presented.
- Tutorial:Integrated Analysis of Multiple Omics with MOFA
Advances in genomic studies resulted in the availability of multiple measurements querying different omics from the same samples. Integration of multi-omics data allows for a unified view of the processes underlying the biological phenotypes. Among the tools developed for this purpose, Multi Omics Factor Analysis (MOFA) is suited to extract latent factors structuring the dataset. MOFA disentangles axes of heterogeneity that are shared across multiple modalities and those specific to individual data modalities. The tutorial will show the MOFA workflow, from data preparation, model selection and interpretation of the final results.
- Research Talk:Identification of focal deletions in cancer reveals unexpected gene regulation patterns
Analysis of large scale data for cancer samples is a powerful approach to identify molecular processes underlying cancer development. As Copy Number Alterations are a well-known hallmark of cancer, we investigated the relationship of recurrent focal deletions and gene regulation. Firstly, we confirm the expected enrichment of deletions in tumor suppressor genes, linked to their down-regulation; in addition, we identify a subset of genes which are repressed by the expression of pseudogenes in antisense orientation. We validated this unusual behavior for a gene involved in the mitotic spindle and we demonstrate that a focal deletion of the repressing pseudogene leads to aberrant mitotic patterns.
- Tutorial:3D Big Data Interactive Visualization for Cancer Genomics and Precision Medicine
Multiple levels of molecular alteration are functionally involved in cancer initiation, progression, and response to treatment. Reliable prediction of tumour aggressiveness and therapy response requires integrative analysis of all data. Particular attention should be dedicated to interactive visual environments, where end-users could easily navigate and analyse the integrated information, at the genome, gene or patient level. Solutions to this issue, and interactive exploration of their potential will be carried out.
- Research Talk:Colorectal Cancer Precision Medicine: From Patients to Preclinical Models, and back
Colorectal Cancer (CRC) is a heterogeneous disease, with variable molecular pathogenesis, natural history and response to treatments. Therefore, it is important to exploit all available molecular information to enable personalized management of the disease. In this view, global transcriptional profiling of CRC samples has been exploited to challenge such heterogeneity, in three major ways: (1) Stratification of CRC in molecular subtypes, to define patient subgroups with distinct molecular and clinical features; (2) Characterization of the stromal contribution to CRC pathogenesis and response/resistance to treatments; (3) Identification of infrequent but therapeutically actionable molecular alterations. Large collections of preclinical models (cell lines and patient-derived xenografts), assembled in more than ten years of work, enable generation of therapeutic hypotheses for innovative treatments and associated predictors, followed by their preclinical testing and validation, with a perspective of rapid transfer back to patients.
- Research Talk:
Can Liquid Biopsy Detect and Monitor Early Stage Cancer? Inexpensively?
Noninvasive early detection of cancer is a daunting task, but if it can be achieved two goals will be solved. Cancer treated early is almost certainly easier and cheaper to treat. Cancer observed early also means we have a chance to monitor the beginning stages of tumor evolution densely and in sufficient patient populations to have some hope of drawing useful biological and medical insights. At present the most promising approach appears to be detecting cell free cancer DNA in blood. Other easily accessible biological fluids merit attention, but blood is best because the clinical infrastructure needed is well developed. Such plasma liquid biopsies already offer great promise for detection and management of later stage tumors, but early stage detection is particularly challenging, since the number of molecules of cancer-derived DNA in a typical blood sample may be just a few per ml or even less from any particular target. To achieve sufficient detection sensitivity it will be necessary to combine results (sequences, methylation, abundance) on different informative DNA markers, and methods for doing this optimally need to be developed. One clear issue is the need to minimize the detection of DNA derived from blood cells themselves in order to have enough contrast to detect tumor-derived DNA fragments. Understanding the molecular nature of free DNA fragments in blood can guide the development of better methods. In any diagnostic test there are trade offs to be made between sensitivity an d specificity and this issue is likely to be particularly acute for early cancer detection.
- Research Talk:
Dynamical Aspects of Antigen Recognition, Tumor/Immune Interactions, and Spontaneous versus Induced Evolution of Drug Resistance during Cancer Treatment.
This talk will consist of two related parts. The first part, which we published in (Cell Systems, 2017) addresses dynamic pathogen recognition. Since the early 1990s, many authors have independently suggested that self/nonself recognition by the immune system might be modulated by the rates of change of antigen challenges. This work introduces an extremely simple and purely conceptual mathematical model that allows dynamic discrimination of immune challenges. The main component of the model is a motif which is ubiquitous in systems biology, the incoherent feedforward loop, which endows the system with the capability to estimate exponential growth exponents, a prediction which is consistent with experimental work showing that exponentially increasing antigen stimulation is a determinant of immune reactivity. Combined with a bistable system and a simple feedback repression mechanism, an interesting phenomenon emerges as a tumor growth rate increases: elimination, tolerance (tumor growth), again elimination, and finally a second zone of tolerance (tumor escape). This prediction from our model is analogous to the “two-zone tumor tolerance” phenomenon experimentally validated since the mid 1970s. Moreover, we provide a plausible biological instantiation of our circuit using combinations of regulatory and effector T cells.The second part of the talk will be based mostly upon the recently published paper (Greene, Gevertz, and Sontag, ASCO Clinical Cancer Informatics, 2019), and deals with the following topic. Resistance to chemotherapy is a major impediment to the successful treatment of cancer. Classically, resistance has been thought to arise primarily through random genetic mutations, after which mutated cells expand via Darwinian selection. However, recent experimental evidence suggests that the progression to resistance need not occur randomly, but instead may be induced by the therapeutic agent itself.This process of resistance induction can be a result of genetic changes, or can occur through epigenetic alterations that cause otherwise drug-sensitive cancer cells to undergo “phenotype switching”. This relatively novel notion of resistance further complicates the already challenging task of designing treatment protocols that minimize the risk of evolving resistance. In an effort to better understand treatment resistance, we have developed a mathematical modeling framework that incorporates both random and drug-induced resistance. Our model demonstrates that the ability (or lack thereof) of a drug to induce resistance can result in qualitatively different responses to the same drug dose and delivery schedule. The importance of induced resistance in treatment response led us to ask if, in our model, one can determine the resistance induction rate of a drug for a given treatment protocol. Mathematically, we show that the induction parameter in our model is theoretically identifiable. We provide also a solution to an associated optimal control (preprint, arXiv, 2019).
In vivo Molecular imaging for Capturing Heterogeneity in Cancer – New Computational approaches and New Challenges
This tutorial offers an introduction aimed at both experimental physicists, biologists, biotechnologists, physics and computer scientists about the new challenges offered by in vivo molecular imaging in cancer medicine with state-of-the art whole-body or organ-based systems. The focus is on Positron Emission Tomography, Computed Tomography and Magnetic Resonance Imaging combined with new computational methods in order to allow accurate quantification of cancer heterogeneity as candidate “biomarkers”.We present strategies to improve sensitivity of imaging systems and to accurately extract and quantify image features invisible at the naked eye that can have a key role for the early diagnosis, prognosis and the response to treatment.
- Research Talk:
The value of radiomics for the oncological patient
The possibility of studying cancer heterogeneity by the entire primary lesion of a single patient, and by means of existing imaging technologies, has brought medical molecular imaging on the scene of the personalized medicine, to address the right patient to the right treatment in vivo and non invasively. A further significant advancement in the radiomics process occurs with the application of automatic classification techniques. Intelligent systems, trained on radiomic signatures from subjects with known prognosis at follow up, allow to predict the clinical outcome for a single patient, stealing the show to the other -omics technologies.