Presented By: Interdisciplinary Seminar in Quantitative Methods (ISQM)
Interdisciplinary Seminar in Quantitative Methods (ISQM): Clustering Analysis Through Integrating Diverse, High Dimensional and Noisy Data Sets
Hongyu Zhao, Yale University
ABSTRACT:
Sample clustering has been studied in statistics for many decades and recent advances in collecting diverse, high dimensional, and noisy data present new challenges for clustering analysis. For example, high-throughput genomic technologies coupled with large-scale studies including The Cancer Genome Atlas (TCGA) project have generated rich resources of diverse types of omics data from thousands of patients to better understand disease etiology and treatment responses. Clustering patients into subtypes with similar disease etiologies and/or treatment responses using multiple omics data has the potential to improve the precision of clustering than using a single type of omics data. In another setting, single-cell RNA-sequencing (scRNA-seq) technology can generate genome-wide expression data at the single-cell levels from hundreds to thousands of cells. One important objective in scRNA-seq analysis is to cluster cells where each cluster consists of cells belonging to the same cell type based on gene expression patterns. In this presentation, we will discuss our recently developed methods for analyzing multi-omics cancer and single cell RNA data sets. The improved performance of these methods will be demonstrated on various simulated as well as real TCGA and scRNA-seq data sets. This is joint work with Seyoug Park and Hao Xu.
Sample clustering has been studied in statistics for many decades and recent advances in collecting diverse, high dimensional, and noisy data present new challenges for clustering analysis. For example, high-throughput genomic technologies coupled with large-scale studies including The Cancer Genome Atlas (TCGA) project have generated rich resources of diverse types of omics data from thousands of patients to better understand disease etiology and treatment responses. Clustering patients into subtypes with similar disease etiologies and/or treatment responses using multiple omics data has the potential to improve the precision of clustering than using a single type of omics data. In another setting, single-cell RNA-sequencing (scRNA-seq) technology can generate genome-wide expression data at the single-cell levels from hundreds to thousands of cells. One important objective in scRNA-seq analysis is to cluster cells where each cluster consists of cells belonging to the same cell type based on gene expression patterns. In this presentation, we will discuss our recently developed methods for analyzing multi-omics cancer and single cell RNA data sets. The improved performance of these methods will be demonstrated on various simulated as well as real TCGA and scRNA-seq data sets. This is joint work with Seyoug Park and Hao Xu.
Explore Similar Events
-
Loading Similar Events...