About

Research

Research interests

Publications

Kasper Karlsson, Moritz J Przybilla, Eran Kotler, Aziz Khan, Hang Xu, Kremena Karagyozova, Alexandra Sockell, Wing H Wong, Katherine Liu, Amanda Mah, Yuan-Hung Lo, Bingxin Lu, Kathleen E Houlahan, Zhicheng Ma, Carlos J Suarez, Chris P Barnes, Calvin J Kuo, Christina Curtis (2023)Deterministic evolution and stringent selection during preneoplasia, In: Nature (London)

The earliest events during human tumour initiation, although poorly characterized, may hold clues to malignancy detection and prevention . Here we model occult preneoplasia by biallelic inactivation of TP53, a common early event in gastric cancer, in human gastric organoids. Causal relationships between this initiating genetic lesion and resulting phenotypes were established using experimental evolution in multiple clonally derived cultures over 2 years. TP53 loss elicited progressive aneuploidy, including copy number alterations and structural variants prevalent in gastric cancers, with evident preferred orders. Longitudinal single-cell sequencing of TP53-deficient gastric organoids similarly indicates progression towards malignant transcriptional programmes. Moreover, high-throughput lineage tracing with expressed cellular barcodes demonstrates reproducible dynamics whereby initially rare subclones with shared transcriptional programmes repeatedly attain clonal dominance. This powerful platform for experimental evolution exposes stringent selection, clonal interference and a marked degree of phenotypic convergence in premalignant epithelial organoids. These data imply predictability in the earliest stages of tumorigenesis and show evolutionary constraints and barriers to malignant transformation, with implications for earlier detection and interception of aggressive, genome-instable tumours.

Kasper Karlsson, Moritz J Przybilla, Eran Kotler, Aziz Khan, Hang Xu, Kremena Karagyozova, Alexandra Sockell, Wing H Wong, Katherine Liu, Amanda Mah, Yuan-Hung Lo, Bingxin Lu, Kathleen E Houlahan, Zhicheng Ma, Carlos J Suarez, Chris P Barnes, Calvin J Kuo, Christina Curtis (2023)Deterministic evolution and stringent selection during preneoplasia Springer Science and Business Media LLC

The earliest events during human tumour initiation, although poorly characterized, may hold clues to malignancy detection and prevention1. Here we model occult preneoplasia by biallelic inactivation of TP53, a common early event in gastric cancer, in human gastric organoids. Causal relationships between this initiating genetic lesion and resulting phenotypes were established using experimental evolution in multiple clonally derived cultures over 2 years. TP53 loss elicited progressive aneuploidy, including copy number alterations and structural variants prevalent in gastric cancers, with evident preferred orders. Longitudinal single-cell sequencing of TP53-deficient gastric organoids similarly indicates progression towards malignant transcriptional programmes. Moreover, high-throughput lineage tracing with expressed cellular barcodes demonstrates reproducible dynamics whereby initially rare subclones with shared transcriptional programmes repeatedly attain clonal dominance. This powerful platform for experimental evolution exposes stringent selection, clonal interference and a marked degree of phenotypic convergence in premalignant epithelial organoids. These data imply predictability in the earliest stages of tumorigenesis and show evolutionary constraints and barriers to malignant transformation, with implications for earlier detection and interception of aggressive, genome-instable tumours.

Kasper Karlsson, Moritz J Przybilla, Eran Kotler, Aziz Khan, Hang Xu, Kremena Karagyozova, Alexandra Sockell, Wing H Wong, Katherine Liu, Amanda Mah, Yuan-Hung Lo, Bingxin Lu, Kathleen E Houlahan, Zhicheng Ma, Carlos J Suarez, Chris P Barnes, Calvin J Kuo, Christina Curtis (2023)Deterministic evolution and stringent selection during preneoplasia Nature

Acknowledgements: We thank Z. Hu, S. Tilk, L. Attardi and A. Bhatt for helpful discussions, the Stanford University Hospital Tissue Procurement Shared Resource facility for specimen procurement and the Stanford Functional Genomics Core for assistance with sequencing. This work was supported by the US Department of Health & Human Services National Institutes of Health Director’s Pioneer Award (no. DP1-CA238296) to C.C. and a National Cancer Institute Cancer Target Discovery and Development Center (no. U01-CA217851) to C.J.K. and C.C. K. Karlsson was supported in part by a Swedish Research Council (Ventenskapsradet) International postdoctoral grant (no. 2018-00454). The earliest events during human tumour initiation, although poorly characterized, may hold clues to malignancy detection and prevention1. Here we model occult preneoplasia by biallelic inactivation of TP53, a common early event in gastric cancer, in human gastric organoids. Causal relationships between this initiating genetic lesion and resulting phenotypes were established using experimental evolution in multiple clonally derived cultures over 2 years. TP53 loss elicited progressive aneuploidy, including copy number alterations and structural variants prevalent in gastric cancers, with evident preferred orders. Longitudinal single-cell sequencing of TP53-deficient gastric organoids similarly indicates progression towards malignant transcriptional programmes. Moreover, high-throughput lineage tracing with expressed cellular barcodes demonstrates reproducible dynamics whereby initially rare subclones with shared transcriptional programmes repeatedly attain clonal dominance. This powerful platform for experimental evolution exposes stringent selection, clonal interference and a marked degree of phenotypic convergence in premalignant epithelial organoids. These data imply predictability in the earliest stages of tumorigenesis and show evolutionary constraints and barriers to malignant transformation, with implications for earlier detection and interception of aggressive, genome-instable tumours.

Bingxin Lu, Kit Curtius, Trevor A Graham, Ziheng Yang, Chris P Barnes (2023)CNETML: maximum likelihood inference of phylogeny from copy number profiles of multiple samples Springer Science and Business Media LLC

Phylogenetic trees based on copy number profiles from multiple samples of a patient are helpful to understand cancer evolution. Here, we develop a new maximum likelihood method, CNETML, to infer phylogenies from such data. CNETML is the first program to jointly infer the tree topology, node ages, and mutation rates from total copy numbers of longitudinal samples. Our extensive simulations suggest CNETML performs well on copy numbers relative to ploidy and under slight violation of model assumptions. The application of CNETML to real data generates results consistent with previous discoveries and provides novel early copy number events for further investigation.

Kasper Karlsson, Moritz Przybilla, Hang Xu, Eran Kotler, Kremena Karagyozova, Alexandra Sockell, Katherine Liu, Amanda Mah, Lo Yuan-Hung, Bingxin Lu, Kathleen Houlahan, Aziz Khan, Zhicheng Ma, Carlos Suarez, Christopher Barnes, Calvin Kuo, Christina Curtis Experimental evolution in TP53 deficient gastric organoids recapitulates tumorigenesis, In: bioRxiv Cold Spring Harbor Laboratory Press

The earliest events during human tumor initiation are poorly characterized but may hold clues as to how to detect and prevent malignancy. Here we model this occult process by engineering TP53 deficiency in primary human gastric organoids and performing experimental evolution in multiple clonally derived cultures over two years, thereby defining causal relationships between this common initiating genetic lesion and resulting phenotypes. TP53 loss elicited progressive aneuploidy, including copy number alterations and complex structural variants that are common in gastric cancers and which follow preferred temporal orders. Longitudinal single cell sequencing of TP53 deficient gastric organoids similarly indicates progression towards malignant transcriptional programs. Moreover, lineage tracing with expressed cellular barcodes demonstrates reproducible dynamics whereby initially rare subclones with shared transcriptional programs repeatedly attain clonal dominance. This powerful platform for experimental evolution exposes stringent selection, clonal interference and a striking degree of phenotypic convergence in pre-malignant epithelial organoids, implying that the earliest stages of tumorigenesis may be predictable while illuminating evolutionary constraints and barriers to malignant transformation. Competing Interest Statement C.C. is an advisor and holds equity in Grail, Ravel, DeepCell and an advisor to Genentech and NanoString. All other authors declare no competing interests.

Bingxin Lu, Kit Curtius, Trevor Graham, Ziheng Yang, Chris Barnes CNETML: Maximum likelihood inference of phylogeny from copy number profiles of spatio-temporal samples, In: BioRxiv Cold Spring Harbor Laboratory Press

Phylogenetic trees based on copy number alterations (CNAs) for multi-region samples of a single cancer patient are helpful to understand the spatio-temporal evolution of cancers, especially in tumours driven by chromosomal instability. Due to the high cost of deep sequencing data, low-coverage data are more accessible in practice, which only allow the calling of (relative) total copy numbers due to the lower resolution. However, methods to reconstruct sample phylogenies from CNAs often use allele-specific copy numbers and those using total copy number are mostly distance matrix or maximum parsimony methods which do not handle temporal data or estimate mutation rates. In this work, we developed a new maximum likelihood method based on a novel evolutionary model of CNAs, CNETML, to infer phylogenies from spatio-temporal samples taken within a single patient. CNETML is the first program to jointly infer the tree topology, node ages, and mutation rates from total copy numbers when samples were taken at different time points. Our extensive simulations suggest CNETML performed well even on relative copy numbers with subclonal whole genome doubling events and under slight violation of model assumptions. Theapplication of CNETML to real data from Barrett's esophagus patients also generated consistent results with previous discoveries and novel early CNAs for further investigations. Competing Interest Statement The authors have declared no competing interest.

Jianbin Chen, Hechuan Yang, Audrey Su Min Teo, Lidyana Bte Amer, Faranak Ghazi Sherbaf, Chu Quan Tan, Jacob Josiah Santiago Alvarez, Bingxin Lu, Jia Qi Lim, Angela Takano, Rahul Nahar, Yin Yeng Lee, Cheryl Zi Jin Phua, Khi Pin Chua, Lisda Suteja, Pauline Jieqi Chen, Mei Mei Chang, Tina Puay Theng Koh, Boon-Hean Ong, Devanand Anantham, Anne Ann Ling Hsu, Apoorva Gogna, Chow Wei Too, Zaw Win Aung, Yi Fei Lee, Lanying Wang, Tony Kiat Hon Lim, Andreas Wilm, Poh Sum Choi, Poh Yong Ng, Chee Keong Toh, Wan-Teck Lim, Siming Ma, Bing Lim, Jin Liu, Wai Leong Tam, Anders Jacobsen Skanderup, Joe Poh Sheng Yeong, Eng-Huat Tan, Caretha L Creasy, Daniel Shao Weng Tan, Axel M Hillmer, Weiwei Zhai (2020)Genomic landscape of lung adenocarcinoma in East Asians, In: Nature genetics52(2)pp. 177-186

Lung cancer is the world's leading cause of cancer death and shows strong ancestry disparities. By sequencing and assembling a large genomic and transcriptomic dataset of lung adenocarcinoma (LUAD) in individuals of East Asian ancestry (EAS; n = 305), we found that East Asian LUADs had more stable genomes characterized by fewer mutations and fewer copy number alterations than LUADs from individuals of European ancestry. This difference is much stronger in smokers as compared to nonsmokers. Transcriptomic clustering identified a new EAS-specific LUAD subgroup with a less complex genomic profile and upregulated immune-related genes, allowing the possibility of immunotherapy-based approaches. Integrative analysis across clinical and molecular features showed the importance of molecular phenotypes in patient prognostic stratification. EAS LUADs had better prediction accuracy than those of European ancestry, potentially due to their less complex genomic architecture. This study elucidated a comprehensive genomic landscape of EAS LUADs and highlighted important ancestry differences between the two cohorts.

Jinwen Feng, Chen Ding, Naiqi Qiu, Xiaotian Ni, Dongdong Zhan, Wanlin Liu, Xia Xia, Peng Li, Bingxin Lu, Qi Zhao, Peng Nie, Lei Song, Quan Zhou, Mi Lai, Gaigai Guo, Weimin Zhu, Jian Ren, Tieliu Shi, Jun Qin (2017)Firmiana: towards a one-stop proteomic cloud platform for data processing and analysis, In: Nature biotechnology35(5)pp. 409-412
Yannik Bollen, Ellen Stelloo, Petra van Leenen, Myrna van den Bos, Bas Ponsioen, Bingxin Lu, Markus J. van Roosmalen, Ana C. F. Bolhaqueiro, Christopher Kimberley, Maximilian Mossner, William C. H. Cross, Nicolle J. M. Besselink, Bastiaan van der Roest, Sander Boymans, Koen C. Oost, Sippe G. de Vries, Holger Rehmann, Edwin Cuppen, Susanne M. A. Lens, Geert J. P. L. Kops, Wigard P. Kloosterman, Leon W. M. M. Terstappen, Chris P. Barnes, Andrea Sottoriva, Trevor A. Graham, Hugo J. G. Snippert (2021)Reconstructing single-cell karyotype alterations in colorectal cancer identifies punctuated and gradual diversification patterns, In: Nature genetics53(8)pp. 1187-1195 NATURE PORTFOLIO

Analysis of live-cell imaging and single-cell genome sequencing data of colorectal cancer organoids identifies temporal dynamics of sub-chromosomal copy-number amplifications. Central to tumor evolution is the generation of genetic diversity. However, the extent and patterns by which de novo karyotype alterations emerge and propagate within human tumors are not well understood, especially at single-cell resolution. Here, we present 3D Live-Seq-a protocol that integrates live-cell imaging of tumor organoid outgrowth and whole-genome sequencing of each imaged cell to reconstruct evolving tumor cell karyotypes across consecutive cell generations. Using patient-derived colorectal cancer organoids and fresh tumor biopsies, we demonstrate that karyotype alterations of varying complexity are prevalent and can arise within a few cell generations. Sub-chromosomal acentric fragments were prone to replication and collective missegregation across consecutive cell divisions. In contrast, gross genome-wide karyotype alterations were generated in a single erroneous cell division, providing support that aneuploid tumor genomes can evolve via punctuated evolution. Mapping the temporal dynamics and patterns of karyotype diversification in cancer enables reconstructions of evolutionary paths to malignant fitness.

Bingxin Lu, Louxin Zhang, Hon Wai Leong (2017)A program to compute the soft Robinson–Foulds distance between phylogenetic networks, In: BMC genomics18(Suppl 2)pp. 111-111 BioMed Central
Andreas D. M. Gunawan, Bingxin Lu, Louxin Zhang (2016)A program for verification of phylogenetic network models, In: BIOINFORMATICS32(17)pp. 503-509 Oxford Univ Press

Motivation: Genetic material is transferred in a non-reproductive manner across species more frequently than commonly thought, particularly in the bacteria kingdom. On one hand, extant genomes are thus more properly considered as a fusion product of both reproductive and nonreproductive genetic transfers. This has motivated researchers to adopt phylogenetic networks to study genome evolution. On the other hand, a gene's evolution is usually tree-like and has been studied for over half a century. Accordingly, the relationships between phylogenetic trees and networks are the basis for the reconstruction and verification of phylogenetic networks. One important problem in verifying a network model is determining whether or not certain existing phylogenetic trees are displayed in a phylogenetic network. This problem is formally called the tree containment problem. It is NP-complete even for binary phylogenetic networks. Results: We design an exponential time but efficient method for determining whether or not a phylogenetic tree is displayed in an arbitrary phylogenetic network. It is developed on the basis of the so-called reticulation-visible property of phylogenetic networks.

BingXin Lu, ZhenBing Zeng, TieLiu Shi (2013)Comparative study of de novo assembly and genome-guided assembly strategies for transcriptome reconstruction based on RNA-Seq, In: Science China. Life sciences56(2)pp. 143-155 Science China Press

Transcriptome reconstruction is an important application of RNA-Seq, providing critical information for further analysis of transcriptome. Although RNA-Seq offers the potential to identify the whole picture of transcriptome, it still presents special challenges. To handle these difficulties and reconstruct transcriptome as completely as possible, current computational approaches mainly employ two strategies: de novo assembly and genome-guided assembly. In order to find the similarities and differences between them, we firstly chose five representative assemblers belonging to the two classes respectively, and then investigated and compared their algorithm features in theory and real performances in practice. We found that all the methods can be reduced to graph reduction problems, yet they have different conceptual and practical implementations, thus each assembly method has its specific advantages and disadvantages, performing worse than others in certain aspects while outperforming others in anther aspects at the same time. Finally we merged assemblies of the five assemblers and obtained a much better assembly. Additionally we evaluated an assembler using genome-guided de novo assembly approach, and achieved good performance. Based on these results, we suggest that to obtain a comprehensive set of recovered transcripts, it is better to use a combination of de novo assembly and genome-guided assembly.

Weiwei Zhai, Hannah Lai, Neslihan Arife Kaya, Jianbin Chen, Hechuan Yang, Bingxin Lu, Jia Qi Lim, Siming Ma, Sin Chi Chew, Khi Pin Chua, Jacob Josiah Santiago Alvarez, Pauline Jieqi Chen, Mei Mei Chang, Lingyan Wu, Brian K P Goh, Alexander Yaw-Fui Chung, Chung Yip Chan, Peng Chung Cheow, Ser Yee Lee, Juinn Huar Kam, Alfred Wei-Chieh Kow, Iyer Shridhar Ganpathi, Rawisak Chanwat, Jidapa Thammasiri, Boon Koon Yoong, Diana Bee-Lan Ong, Vanessa H de Villa, Rouchelle D Dela Cruz, Tracy Jiezhen Loh, Wei Keat Wan, Zeng Zeng, Anders Jacobsen Skanderup, Yin Huei Pang, Krishnakumar Madhavan, Tony Kiat-Hon Lim, Glenn Bonney, Wei Qiang Leow, Valerie Chew, Yock Young Dan, Wai Leong Tam, Han Chong Toh, Roger Sik-Yin Foo, Pierce Kah-Hoe Chow (2022)Dynamic phenotypic heterogeneity and the evolution of multiple RNA subtypes in hepatocellular carcinoma: the PLANET study, In: National science review9(3)pp. nwab192-nwab192 Oxford University Press

Intra-tumor heterogeneity (ITH) is a key challenge in cancer treatment, but previous studies have focused mainly on the genomic alterations without exploring phenotypic (transcriptomic and immune) heterogeneity. Using one of the largest prospective surgical cohorts for hepatocellular carcinoma (HCC) with multi-region sampling, we sequenced whole genomes and paired transcriptomes from 67 HCC patients (331 samples). We found that while genomic ITH was rather constant across stages, phenotypic ITH had a very different trajectory and quickly diversified in stage II patients. Most strikingly, 30% of patients were found to contain more than one transcriptomic subtype within a single tumor. Such phenotypic ITH was found to be much more informative in predicting patient survival than genomic ITH and explains the poor efficacy of single-target systemic therapies in HCC. Taken together, we not only revealed an unprecedentedly dynamic landscape of phenotypic heterogeneity in HCC, but also highlighted the importance of studying phenotypic evolution across cancer types. Using a prospective cohort for Hepatocellular Carcinoma (the PLANET study), this work revealed a dynamic landscape of phenotypic intra-tumor heterogeneity, providing several novel approaches for patient treatment and prognosis prediction.

Geng Chen, Charles Wang, Leming Shi, Weida Tong, Xiongfei Qu, Jiwei Chen, Jianmin Yang, Caiping Shi, Long Chen, Peiying Zhou, Bingxin Lu, Tieliu Shi (2013)Comprehensively identifying and characterizing the missing gene sequences in human reference genome with integrated analytic approaches, In: Human genetics132(8)pp. 899-911 Springer Nature

The human reference genome is still incomplete and a number of gene sequences are missing from it. The approaches to uncover them, the reasons causing their absence and their functions are less explored. Here, we comprehensively identified and characterized the missing genes of human reference genome with RNA-Seq data from 16 different human tissues. By using a combined approach of genome-guided transcriptome reconstruction coupled with genome-wide comparison, we uncovered 3.78 and 2.37 Mb transcribed regions in the human genome assemblies of Celera and HuRef either missed from their homologous chromosomes of NCBI human reference genome build 37.2 or partially or entirely absent from the reference. We further identified a significant number of novel transcript contigs in each tissue from de novo transcriptome assembly that are unalignable to NCBI build 37.2 but can be aligned to at least one of the genomes from Celera, HuRef, chimpanzee, macaca or mouse. Our analyses indicate that the missing genes could result from genome misassembly, transposition, copy number variation, translocation and other structural variations. Moreover, our results further suggest that a large portion of these missing genes are conserved between human and other mammals, implying their important biological functions. Totally, 1,233 functional protein domains were detected in these missing genes. Collectively, our study not only provides approaches for uncovering the missing genes of a genome, but also proposes the potential reasons causing genes missed from the genome and highlights the importance of uncovering the missing genes of incomplete genomes.

Hechuan Yang, Bingxin Lu, Lan Huong Lai, Abner Herbert Lim, Jacob Josiah Santiago Alvarez, Weiwei Zhai (2019)PSiTE: a Phylogeny guided Simulator for Tumor Evolution, In: BIOINFORMATICS35(17)pp. 3148-3150 Oxford Univ Press

The Summary: Simulating realistic clonal dynamics of tumors is an important topic in cancer genomics. Here, we present Phylogeny guided Simulator for Tumor Evolution, a tool that can simulate different types of tumor samples including single sector, multi-sector bulk tumor as well as single-cell tumor data under a wide range of evolutionary trajectories. Phylogeny guided Simulator for Tumor Evolution provides an efficient tool for understanding clonal evolution of cancer.

Bingxin Lu, Hon Wai Leong (2016)GI-SVM: A sensitive method for predicting genomic islands based on unannotated sequence of a single genome, In: Journal of bioinformatics and computational biology14(1)pp. 1640003-1640003 Imperial College Press

Genomic islands (GIs) are clusters of functionally related genes acquired by lateral genetic transfer (LGT), and they are present in many bacterial genomes. GIs are extremely important for bacterial research, because they not only promote genome evolution but also contain genes that enhance adaption and enable antibiotic resistance. Many methods have been proposed to predict GI. But most of them rely on either annotations or comparisons with other closely related genomes. Hence these methods cannot be easily applied to new genomes. As the number of newly sequenced bacterial genomes rapidly increases, there is a need for methods to detect GI based solely on sequences of a single genome. In this paper, we propose a novel method, GI-SVM, to predict GIs given only the unannotated genome sequence. GI-SVM is based on one-class support vector machine (SVM), utilizing composition bias in terms of k-mer content. From our evaluations on three real genomes, GI-SVM can achieve higher recall compared with current methods, without much loss of precision. Besides, GI-SVM allows flexible parameter tuning to get optimal results for each genome. In short, GI-SVM provides a more sensitive method for researchers interested in a first-pass detection of GI in newly sequenced genomes.

Bingxin Lu, Hon Wai Leong (2016)Computational methods for predicting genomic islands in microbial genomes, In: Computational and structural biotechnology journal14(C)pp. 200-206 Research Network of Computational and Structural Biotechnology

Clusters of genes acquired by lateral gene transfer in microbial genomes, are broadly referred to as genomic islands (GIs). GIs often carry genes important for genome evolution and adaptation to niches, such as genes involved in pathogenesis and antibiotic resistance. Therefore, GI prediction has gradually become an important part of microbial genome analysis. Despite inherent difficulties in identifying GIs, many computational methods have been developed and show good performance. In this mini-review, we first summarize the general challenges in predicting GIs. Then we group existing GI detection methods by their input, briefly describe representative methods in each group, and discuss their advantages as well as limitations. Finally, we look into the potential improvements for better GI prediction.

Zhao Fang, Bingxin Lu, Mingyao Liu, Meixia Zhang, Zhenghui Yi, Chengping Wen, Tieliu Shi (2013)Evaluating the pharmacological mechanism of Chinese medicine Si-Wu-Tang through multi-level data integration, In: PloS one8(11)pp. e72334-e72334

Si-Wu-Tang (SWT) is a Traditional Chinese Medicine (TCM) formula widely used for the treatments of gynecological diseases. To explore the pharmacological mechanism of SWT, we incorporated microarray data of SWT with our herbal target database TCMID to analyze the potential activity mechanism of SWT's herbal ingredients and targets. We detected 2,405 differentially expressed genes in the microarray data, 20 of 102 proteins targeted by SWT were encoded by these DEGs and can be targeted by 2 FDA-approved drugs and 39 experimental drugs. The results of pathway enrichment analysis of the 20 predicted targets were consistent with that of 2,405 differentially expressed genes, elaborating the potential pharmacological mechanisms of SWT. Further study from a perspective of protein-protein interaction (PPI) network showed that the predicted targets of SWT function cooperatively to perform their multi-target effects. We also constructed a network to combine herbs, ingredients, targets and drugs together which bridges the gap between SWT and conventional medicine, and used it to infer the potential mechanisms of herbal ingredients. Moreover, based on the hypothesis that the same or similar effects between different TCM formulae may result from targeting the same proteins, we analyzed 27 other TCM formulae which can also treat the gynecological diseases, the subsequent result provides additional insight to understand the potential mechanisms of SWT in treating amenorrhea. Our bioinformatics approach to detect the pharmacology of SWT may shed light on drug discovery for gynecological diseases and could be utilized to investigate other TCM formulae as well.

Bingxin Lu, Hon Wai Leong (2018)GI-Cluster: Detecting genomic islands via consensus clustering on multiple features, In: Journal of bioinformatics and computational biology16(3)pp. 1840010-1840010 Imperial College Press

The accurate detection of genomic islands (GIs) in microbial genomes is important for both evolutionary study and medical research, because GIs may promote genome evolution and contain genes involved in pathogenesis. Various computational methods have been developed to predict GIs over the years. However, most of them cannot make full use of GI-associated features to achieve desirable performance. Additionally, many methods cannot be directly applied to newly sequenced genomes. We develop a new method called GI-Cluster, which provides an effective way to integrate multiple GI-related features via consensus clustering. GI-Cluster does not require training datasets or existing genome annotations, but it can still achieve comparable or better performance than supervised learning methods in comprehensive evaluations. Moreover, GI-Cluster is widely applicable, either to complete and incomplete genomes or to initial GI predictions from other programs. GI-Cluster also provides plots to visualize the distribution of predicted GIs and related features. GI-Cluster is available at https://github.com/icelu/GI Cluster.