
Dr Bingxin Lu
About
Biography
I am currenly a Surrey Future Fellow at Section of Systems Biology University of Surrey.
I will start my own research group and please do not hesitate to contact me if you are interested in my research. Details of potential projects will come up soon!
Previously, I was a Postdoc in Chris Barnes’s group at Department of Cell and Developmental Biology University College London, where I have been working on dynamical modeling of chromosomal instability (CIN) in cancer genomes. Before this, I was a Postdoctoral Fellow at Genome Institute of Singapore (Weiwei Zhai’s Group), where I mainly developed pipelines and methods to analyse tumour heterogeneity and clonal evolution in liver and lung cancer genomes. I completed my PhD in Computational Biology under the supervision of Hon Wai Leong at School of Computing National University of Singapore, where I developed machine learning and phylogenetic methods for problems related to lateral gene transfer. I obtained my Master’s and Bachelor’s degree from Software Engineering Institute East China Normal University, where I led the development of platforms for high-throughput biological data analysis, including RNA-Seq and proteomic data.
ResearchResearch interests
My research is in the broad field of computational biology, which bridges software engineering, machine learning, algorithms, statistics, phylogenetics, population genetics, and omics. I am particularly interested in developing new computational methods and models to address important biological problems related to human health. My goal is to facilitate the mining of new knowledge from the accumulating huge amounts of data for the biological and biomedical community. I have developed several new methods and applied available methods to tackle basic questions arising in the study of species and cancer evolution. My current primary interests are evolutionary dynamics of cancer genomes, especially those driven by chromosome instability, which are still less well studied than point mutations but critical in tumorigenesis and patient treatment.
Research interests
My research is in the broad field of computational biology, which bridges software engineering, machine learning, algorithms, statistics, phylogenetics, population genetics, and omics. I am particularly interested in developing new computational methods and models to address important biological problems related to human health. My goal is to facilitate the mining of new knowledge from the accumulating huge amounts of data for the biological and biomedical community. I have developed several new methods and applied available methods to tackle basic questions arising in the study of species and cancer evolution. My current primary interests are evolutionary dynamics of cancer genomes, especially those driven by chromosome instability, which are still less well studied than point mutations but critical in tumorigenesis and patient treatment.
Publications
Lung cancer is the world's leading cause of cancer death and shows strong ancestry disparities. By sequencing and assembling a large genomic and transcriptomic dataset of lung adenocarcinoma (LUAD) in individuals of East Asian ancestry (EAS; n = 305), we found that East Asian LUADs had more stable genomes characterized by fewer mutations and fewer copy number alterations than LUADs from individuals of European ancestry. This difference is much stronger in smokers as compared to nonsmokers. Transcriptomic clustering identified a new EAS-specific LUAD subgroup with a less complex genomic profile and upregulated immune-related genes, allowing the possibility of immunotherapy-based approaches. Integrative analysis across clinical and molecular features showed the importance of molecular phenotypes in patient prognostic stratification. EAS LUADs had better prediction accuracy than those of European ancestry, potentially due to their less complex genomic architecture. This study elucidated a comprehensive genomic landscape of EAS LUADs and highlighted important ancestry differences between the two cohorts.
Analysis of live-cell imaging and single-cell genome sequencing data of colorectal cancer organoids identifies temporal dynamics of sub-chromosomal copy-number amplifications. Central to tumor evolution is the generation of genetic diversity. However, the extent and patterns by which de novo karyotype alterations emerge and propagate within human tumors are not well understood, especially at single-cell resolution. Here, we present 3D Live-Seq-a protocol that integrates live-cell imaging of tumor organoid outgrowth and whole-genome sequencing of each imaged cell to reconstruct evolving tumor cell karyotypes across consecutive cell generations. Using patient-derived colorectal cancer organoids and fresh tumor biopsies, we demonstrate that karyotype alterations of varying complexity are prevalent and can arise within a few cell generations. Sub-chromosomal acentric fragments were prone to replication and collective missegregation across consecutive cell divisions. In contrast, gross genome-wide karyotype alterations were generated in a single erroneous cell division, providing support that aneuploid tumor genomes can evolve via punctuated evolution. Mapping the temporal dynamics and patterns of karyotype diversification in cancer enables reconstructions of evolutionary paths to malignant fitness.
Motivation: Genetic material is transferred in a non-reproductive manner across species more frequently than commonly thought, particularly in the bacteria kingdom. On one hand, extant genomes are thus more properly considered as a fusion product of both reproductive and nonreproductive genetic transfers. This has motivated researchers to adopt phylogenetic networks to study genome evolution. On the other hand, a gene's evolution is usually tree-like and has been studied for over half a century. Accordingly, the relationships between phylogenetic trees and networks are the basis for the reconstruction and verification of phylogenetic networks. One important problem in verifying a network model is determining whether or not certain existing phylogenetic trees are displayed in a phylogenetic network. This problem is formally called the tree containment problem. It is NP-complete even for binary phylogenetic networks. Results: We design an exponential time but efficient method for determining whether or not a phylogenetic tree is displayed in an arbitrary phylogenetic network. It is developed on the basis of the so-called reticulation-visible property of phylogenetic networks.
Transcriptome reconstruction is an important application of RNA-Seq, providing critical information for further analysis of transcriptome. Although RNA-Seq offers the potential to identify the whole picture of transcriptome, it still presents special challenges. To handle these difficulties and reconstruct transcriptome as completely as possible, current computational approaches mainly employ two strategies: de novo assembly and genome-guided assembly. In order to find the similarities and differences between them, we firstly chose five representative assemblers belonging to the two classes respectively, and then investigated and compared their algorithm features in theory and real performances in practice. We found that all the methods can be reduced to graph reduction problems, yet they have different conceptual and practical implementations, thus each assembly method has its specific advantages and disadvantages, performing worse than others in certain aspects while outperforming others in anther aspects at the same time. Finally we merged assemblies of the five assemblers and obtained a much better assembly. Additionally we evaluated an assembler using genome-guided de novo assembly approach, and achieved good performance. Based on these results, we suggest that to obtain a comprehensive set of recovered transcripts, it is better to use a combination of de novo assembly and genome-guided assembly.
Intra-tumor heterogeneity (ITH) is a key challenge in cancer treatment, but previous studies have focused mainly on the genomic alterations without exploring phenotypic (transcriptomic and immune) heterogeneity. Using one of the largest prospective surgical cohorts for hepatocellular carcinoma (HCC) with multi-region sampling, we sequenced whole genomes and paired transcriptomes from 67 HCC patients (331 samples). We found that while genomic ITH was rather constant across stages, phenotypic ITH had a very different trajectory and quickly diversified in stage II patients. Most strikingly, 30% of patients were found to contain more than one transcriptomic subtype within a single tumor. Such phenotypic ITH was found to be much more informative in predicting patient survival than genomic ITH and explains the poor efficacy of single-target systemic therapies in HCC. Taken together, we not only revealed an unprecedentedly dynamic landscape of phenotypic heterogeneity in HCC, but also highlighted the importance of studying phenotypic evolution across cancer types. Using a prospective cohort for Hepatocellular Carcinoma (the PLANET study), this work revealed a dynamic landscape of phenotypic intra-tumor heterogeneity, providing several novel approaches for patient treatment and prognosis prediction.
The human reference genome is still incomplete and a number of gene sequences are missing from it. The approaches to uncover them, the reasons causing their absence and their functions are less explored. Here, we comprehensively identified and characterized the missing genes of human reference genome with RNA-Seq data from 16 different human tissues. By using a combined approach of genome-guided transcriptome reconstruction coupled with genome-wide comparison, we uncovered 3.78 and 2.37 Mb transcribed regions in the human genome assemblies of Celera and HuRef either missed from their homologous chromosomes of NCBI human reference genome build 37.2 or partially or entirely absent from the reference. We further identified a significant number of novel transcript contigs in each tissue from de novo transcriptome assembly that are unalignable to NCBI build 37.2 but can be aligned to at least one of the genomes from Celera, HuRef, chimpanzee, macaca or mouse. Our analyses indicate that the missing genes could result from genome misassembly, transposition, copy number variation, translocation and other structural variations. Moreover, our results further suggest that a large portion of these missing genes are conserved between human and other mammals, implying their important biological functions. Totally, 1,233 functional protein domains were detected in these missing genes. Collectively, our study not only provides approaches for uncovering the missing genes of a genome, but also proposes the potential reasons causing genes missed from the genome and highlights the importance of uncovering the missing genes of incomplete genomes.
The Summary: Simulating realistic clonal dynamics of tumors is an important topic in cancer genomics. Here, we present Phylogeny guided Simulator for Tumor Evolution, a tool that can simulate different types of tumor samples including single sector, multi-sector bulk tumor as well as single-cell tumor data under a wide range of evolutionary trajectories. Phylogeny guided Simulator for Tumor Evolution provides an efficient tool for understanding clonal evolution of cancer.
Genomic islands (GIs) are clusters of functionally related genes acquired by lateral genetic transfer (LGT), and they are present in many bacterial genomes. GIs are extremely important for bacterial research, because they not only promote genome evolution but also contain genes that enhance adaption and enable antibiotic resistance. Many methods have been proposed to predict GI. But most of them rely on either annotations or comparisons with other closely related genomes. Hence these methods cannot be easily applied to new genomes. As the number of newly sequenced bacterial genomes rapidly increases, there is a need for methods to detect GI based solely on sequences of a single genome. In this paper, we propose a novel method, GI-SVM, to predict GIs given only the unannotated genome sequence. GI-SVM is based on one-class support vector machine (SVM), utilizing composition bias in terms of k-mer content. From our evaluations on three real genomes, GI-SVM can achieve higher recall compared with current methods, without much loss of precision. Besides, GI-SVM allows flexible parameter tuning to get optimal results for each genome. In short, GI-SVM provides a more sensitive method for researchers interested in a first-pass detection of GI in newly sequenced genomes.
Clusters of genes acquired by lateral gene transfer in microbial genomes, are broadly referred to as genomic islands (GIs). GIs often carry genes important for genome evolution and adaptation to niches, such as genes involved in pathogenesis and antibiotic resistance. Therefore, GI prediction has gradually become an important part of microbial genome analysis. Despite inherent difficulties in identifying GIs, many computational methods have been developed and show good performance. In this mini-review, we first summarize the general challenges in predicting GIs. Then we group existing GI detection methods by their input, briefly describe representative methods in each group, and discuss their advantages as well as limitations. Finally, we look into the potential improvements for better GI prediction.
Si-Wu-Tang (SWT) is a Traditional Chinese Medicine (TCM) formula widely used for the treatments of gynecological diseases. To explore the pharmacological mechanism of SWT, we incorporated microarray data of SWT with our herbal target database TCMID to analyze the potential activity mechanism of SWT's herbal ingredients and targets. We detected 2,405 differentially expressed genes in the microarray data, 20 of 102 proteins targeted by SWT were encoded by these DEGs and can be targeted by 2 FDA-approved drugs and 39 experimental drugs. The results of pathway enrichment analysis of the 20 predicted targets were consistent with that of 2,405 differentially expressed genes, elaborating the potential pharmacological mechanisms of SWT. Further study from a perspective of protein-protein interaction (PPI) network showed that the predicted targets of SWT function cooperatively to perform their multi-target effects. We also constructed a network to combine herbs, ingredients, targets and drugs together which bridges the gap between SWT and conventional medicine, and used it to infer the potential mechanisms of herbal ingredients. Moreover, based on the hypothesis that the same or similar effects between different TCM formulae may result from targeting the same proteins, we analyzed 27 other TCM formulae which can also treat the gynecological diseases, the subsequent result provides additional insight to understand the potential mechanisms of SWT in treating amenorrhea. Our bioinformatics approach to detect the pharmacology of SWT may shed light on drug discovery for gynecological diseases and could be utilized to investigate other TCM formulae as well.
The accurate detection of genomic islands (GIs) in microbial genomes is important for both evolutionary study and medical research, because GIs may promote genome evolution and contain genes involved in pathogenesis. Various computational methods have been developed to predict GIs over the years. However, most of them cannot make full use of GI-associated features to achieve desirable performance. Additionally, many methods cannot be directly applied to newly sequenced genomes. We develop a new method called GI-Cluster, which provides an effective way to integrate multiple GI-related features via consensus clustering. GI-Cluster does not require training datasets or existing genome annotations, but it can still achieve comparable or better performance than supervised learning methods in comprehensive evaluations. Moreover, GI-Cluster is widely applicable, either to complete and incomplete genomes or to initial GI predictions from other programs. GI-Cluster also provides plots to visualize the distribution of predicted GIs and related features. GI-Cluster is available at https://github.com/icelu/GI Cluster.