Yihui Zhou
Bio
Yi-Hui Zhou was appointed in July 2018 as a Chancellor’s Faculty Excellence Program connective cluster member in High-Dimensional Integration of Biological Systems. Zhou is an associate professor in the Department of Biological Sciences and part of the Bioinformatics Research Center. Her research interests include statistical genetics, generalized linear models, and next generation and third generation sequencing models. A major theme in her work is to develop an improved understanding of how classical statistical techniques are affected by the high-dimensional nature of modern genomics technologies, and to develop improved techniques for these so-called “high-dimensional, low sample size” data. Zhou has received funding for her work from the National Human Genome Research Institute and the Cystic Fibrosis Foundation, and has been co-investigator on numerous grants from the National Institutes of Health and the U.S. Environmental Protection Agency. She has mentored graduate and undergraduate students with academic backgrounds ranging from bioinformatics to engineering.
Zhou obtained her master’s degree in applied mathematics from the University of Maryland and her Ph.D. in biostatistics from the University of North Carolina at Chapel Hill. As associate director of outreach for the Bioinformatics Research Center, she provides a link to other local and national researchers in biological sciences and bioinformatics.
Publications
- Characterizing PFAS hazards and risks: a human population-based in vitro cardiotoxicity assessment strategy , HUMAN GENOMICS (2024)
- Control of false discoveries in grouped hypothesis testing for eQTL data , BMC BIOINFORMATICS (2024)
- Control of false discoveries in grouped hypothesis testing for eQTL data , Springer Nature (2024)
- Genetic variation in severe cystic fibrosis liver disease is associated with novel mechanisms for disease pathogenesis , HEPATOLOGY (2024)
- Hazard and risk characterization of 56 structurally diverse PFAS using a targeted battery of broad coverage assays using six human cell types , TOXICOLOGY (2024)
- Liver eQTL meta-analysis illuminates potential molecular mechanisms of cardiometabolic traits , AMERICAN JOURNAL OF HUMAN GENETICS (2024)
- Predicting Microbiome Growth Dynamics under Environmental Perturbations , Applied Microbiology (2024)
- AI in healthcare: navigating opportunities and challenges in digital communication , FRONTIERS IN DIGITAL HEALTH (2023)
- Association of Pseudomonas aeruginosa infection stage with lung function trajectory in children with cystic fibrosis , JOURNAL OF CYSTIC FIBROSIS (2023)
- Genetic Modifiers of Cystic Fibrosis Lung Disease Severity , AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE (2023)
Grants
Evaluation of composition and hazards of chemical mixtures, or complex products classified as UVCBs (unknown variable composition or biological substances), presents a multitude of challenges. These include the presence of unknown constituents, a limited basis for grouping additive and independent components, and the lack of toxicity data on most constituents, and whole mixtures or UVCBs. Our long-term goal is to ensure timely risk-based assessment of mixtures and UVCBs that ensures human health protection from the toxicity of known/unknown components. We will accomplish this through integration of novel toxicological (i.e., human cell-based assays), analytical (i.e., ion mobility spectrometry-mass spectrometry), and modeling (i.e., interaction, mediation and dose reconstruction) methods. We plan to demonstrate the application of these methods in the context of rapid risk assessment/management of sites contaminated with uncharacterized chemical mixtures. Aim 1 will determine grouping of chemical mixture components for assessment of hazard(s) through integration of multi-phenotype/multi-tissue bioactivity data from a compendium of human induced pluripotent stem cell (iPSC)-derived cells. Aim 2 will develop approaches for prioritization of components in whole mixtures or UVCBs that are likely to contribute most to joint toxicity through mediation and interaction methods that integrate multi-dimensional analytical and in vitro bioactivity data. Aim 3 will evaluate prediction of joint toxicity of whole or defined mixtures and UVCBs through novel probabilistic additivity models of grouped (Aim 1) and prioritized (Aim 2) components. Aim 4 will be a demonstration of the integration of the proposed exposure, toxicological and modeling methods into a tiered hybrid experimental-computational strategy for rapid risk assessment of complex environmental mixtures and UVCBs. The main outcomes of this project will be a suite of analytical, in vitro, and computational methods and tools that can be applied in a tiered strategy for rapid quantitative characterization of the composition and hazards of complex environmental mixtures and UVCBs.
PreMiEr������������������s microbiome engineering framework will enable the development of a wide range of transformative technologies that solve societal challenges at the interface of health and the environment. However, the dissemination of these same technologies is not without risk as it relies on the responsible development and societal acceptance of microbiome engineering approaches. Thus, in this research core, we will consider the ethical, societal, and policy implications of PreMiEr������������������s evolving microbiome engineering discoveries. There have been national calls for cross-disciplinary and integrated work to better understand the social implications of microbiome science and engineering [1]. In parallel, there is increasing awareness that challenges at the nexus of human- and natural-world coupled systems cannot be solved by technology alone. Through the research and deliberative engagement approaches described below, PreMiEr research will embrace the concept of responsible research and innovation and its elements of anticipation, deliberation, reflexivity, and responsiveness [6]. Particular areas of inquiry will be on social equity of microbiome engineering, ownership and privacy of microbiome data and information, and ethical implications including informed consent, consumer and patient autonomy, beneficence, non-malfeasance, and procedural justice. Core B will also work with the natural scientists and engineers in other thrusts and cores to help identify and address policy and societal questions associated with risk governance and analysis, oversight of microbiome engineering, and equitable distribution of risks and benefits.
The mission of the Center for Human Health and the Environment (CHHE) is to advance understanding of environmental impacts on human health. Through a systems biology framework integrating all levels of biological organization, CHHE aims to elucidate the fundamental mechanisms through which environmental exposures/stressors interface with biomolecules, pathways, the genome, and epigenome to influence human disease. CHHE will develop three interdisciplinary research teams that represent NC State������������������s distinctive strengths. CHHE will implement specific mechanisms to promote intra- and inter-team interactions and build interdisciplinary bridges to advance basic science discovery and translational research in environmental health science along the continuum from genes to population. These teams are; - The Molecular/Cellular-Based Systems and Model Organisms Team will utilize cutting edge molecular/cellular-based systems and powerful vertebrate and invertebrate model organisms to define mechanisms, pathways, GxE interactions, and individual susceptibility to environmental agents. - The Human Population Science Team will integrate expertise on environmental exposures, epidemiology, genomics and epigenomics to identify key human pathways and link exposure and disease across populations. - Bioinformatics Team will develop novel analytics and computational tools to translate Big Data generated across high-throughput and multiscale experiments into systems-level discoveries To further increase the impact and translational capacity of these teams, CHHE will develop three new facility cores that will provide instrumentation, expertise, and training to facilitate basic mechanism- to population-based research. - The Integrative Health Sciences Facility Core will expand the ability of CHHE members to translate basic science discoveries across species and provide mechanistic insights into epidemiological studies by partnering with: a) NC State������������������s Comparative Toxicogenomics Database (CTD); b) East Carolina University Brody School of Medicine and c) NC Dept. of Health and Human Services. - The Comparative Pathobiology Core will be located at NC State������������������s top-ranked College of Veterinary Medicine and its nationally recognized veterinary pathology group to facilitate assessment of the effects of environmental stressors in the many model organisms utilized by CHHE members. - The Systems Technologies Core will introduce state-of-the-art proteomics capabilities and dedicated bioinformatics support to expand the ability of CHHE members to analyze the Next Generation Sequencing data involving the genome, transcriptome and epigenome. As a land-grant university, NC State has an extensive and active Cooperative Extension Service network throughout North Carolina. CHHE will utilize this unique network to develop a highly effective, multi-directional Community Outreach and Engagement Core to disseminate findings that will contribute to addressing disparity in exposures and health outcomes and to educate communities about environmental influences on health. A strong Career Development Core for early stage scientists that is coordinated with a robust Pilot Project Program will support cutting-edge, collaborative and multidisciplinary environmental health projects to enhance the research success and impact of our membership. Through these activities and the purposeful interfacing of different disciplines CHHE will build on NC State������������������s unique research and community outreach strengths to become a premier transformative and synergistic EHS Core Center.
There is variability in the severity of clinical disease in cystic fibrosis (CF), which reflects non-CFTR genetic variants, i.e., ����������������genetic modifiers���������������, and environmental influences. The Gene Modifier Study (GMS, UNC), the Twin and Sibling Study (TSS, JHU), and the EPIC Observational Cohort Study (EPIC, Univ. Washington) have assembled the world's largest cohort of CF patients with comprehensive clinical data and DNA samples. These resources have enabled genome-wide association studies (GWAS) which discovered common genetic variants affecting CF lung disease, CF-related diabetes (CFRD), pseudomonas infection, meconium ileus, and CF-related liver disease (CFLD). Sequencing the entire genome presents the opportunity to discover novel modifier variants, including rare (possibly high effect) variants, thereby providing new therapeutic targets for all individuals with CF. CFTR modulator treatments have become available for most people with CF, but not all genotypes of CFTR and not all complications are fully treated. To discover CF gene modifiers, we will carry out whole genome sequencing (WGS) in 5,200 individuals with CF from the GMS, TSS, and EPIC cohorts. To date, phenotype harmonization demonstrates that the three independent studies have similarities and distinctive, complementary features. Over the past 11 months, these collaborative investigators have made remarkable progress, and initial results have been reported in 9 abstracts at the NACF Conference. To continue this research, we will pursue 3 Specific Aims with focused effort at UNC on CF lung disease and liver disease, including 1) Create, filter, and annotate high-quality sequence data for the consortium samples; 2) Discover rare variants associated with key CF phenotypes; and 3) Validate and extend GWAS analyses for common CF genetic modifiers. We anticipate a continuing need for novel CF treatments, which may result from discovery of non-CFTR genetic modifiers.
The central hypotheses of this proposal are that: (i) stem cell-derived cardiomyocyte cultures constitute an effective organotypic culture model for predictive toxicity screening of environmental chemicals; (ii) a population-based experimental design utilizing a panel of human iPSCs and mouse Collaborative Cross (CC) can assess variation in toxicity to better characterize uncertainties; and (iii) integration of dosimetry with screening provides an in vivo context to in vitro data and improves human health assessments. Project 1 will conduct population-based concentration-response high-content/-throughput in vitro screening of up to 200 ToxCast chemicals in iPSC-derived cardiomyocytes from 100 humans and collect pharmacokinetic data using hepatocytes. Project 2 will conduct mouse population-based in vitro screening of these chemicals in CC ESC-derived cardiomyocytes followed by in vivo validation in the CC strains. Project 3 will conduct dose-response modeling to establish appropriate point of departure, genome-wide association analyses and in vitro-to-in vivo extrapolation modeling.
Ion mobility spectrometry coupled with mass spectrometry (IMS-MS) provides a multidimensional analytical platform capable of simultaneously measuring both the size and mass of a variety of molecules, including xenobiotics and endogenous metabolites in complex environmental and biological samples. Recently, we utilized this technique to (i) discover novel associations of IMS-MS features with health-relevant outcomes in an iPSC cardiomyocyte model system and (ii) predict health-relevant outcomes that are associated with the IMS-MS features, even without their full identification. These efforts have been successful, and our preliminary data indicate that further improvements can be made from improved feature detection and alignment. In addition, we have identified numerous statistical improvements that could be made, including ����������������fingerprinting��������������� complex samples using global properties that do not require feature alignment, refined testing of groups of features, machine learning improvements for accurate prediction of sample attributes, and extension to multi-omics profiling. Inspired by these developments, we propose a comprehensive framework for the statistical analysis of IMS-MS data, anchored by the investigators������������������ strong experience in IMS-MS technology, as well as data processing and machine learning. Existing preliminary data will be used to develop many of the methods. We will more deeply analyze a combination of datasets, including linking IMS-MS to baseline transcriptomic profiling in iPSC cardiomyocytes. In addition, we will analyze a novel dataset including transcriptomic and IMS-MS lipidomic profiles of brains from newborn rats exposed to the flame-retardant mixture Firemaster 550 (FM 550) or its component classes, enabling a multi-omic assessment of exposure effects. Finally, this statistical development will be greatly enhanced by obtaining new lipidomic data covering a mix of biological and technical replicates, which will provide a standard for improvements in IMS-MS feature alignment. The methods and results will be used to prepare for competitive R01 funding by the NCSU investigators and will be immediately useful to CHHE investigators by creating methods and code to be used in collaboration with the CHHE Systems Technologies Core.
This subcontract supports NC State's contribution to the UNC-NCSU site effort for whole genome sequencing and analysis for the combined patient cohorts of EPIC and the Cystic Fibrosis Gene Modifier Consortium. NC State investigators will perform association analyses of individual and joint phenotypes, using single SNP analyses along with gene-based and pathway-based analyses. In addition, public biological databases will be used to inform and enhance the discovery of CF-disease related associations.
This joint proposal from RTI and NCSU seeks to create a multi-faceted three-year Program in Genetic Discovery and Prediction (PGDP), initially organized around a demonstration and feasibility pilot for a highly ambitious effort the team calls the ����������������1000 GWAS Project.��������������� The Project will compile an unprecedented number of publicly available genome-wide association studies (GWAS, representing hundreds of thousands of patients). These studies have been used to identify genetic variants that predispose humans to disease and can be used to predict patient outcomes. The Project will re-analyze the combined data using the latest methods for genetic analysis and quality control, combined with new linkages to standard measures for phenotypes, as well as data on clinical covariates and exposures. In addition, the team will make progress on a GWAS Connector tool to support exploration and prioritization of dbGaP phenotypes for enriched secondary analysis. Finally, the Project will feed back into public repositories, providing an open-source analysis pipeline and community resource for ongoing research. The unprecedented data compilation and comprehensive analysis will reveal subtle and more complex interactions between genes, environmental exposures and resulting disease and treatment outcomes.
This subcontract supports NC State's contribution to the UNC-NCSU site effort for whole genome sequencing and analysis for the combined patient cohorts of EPIC and the Cystic Fibrosis Gene Modifier Consortium. NC State investigators will perform association analyses of individual and joint phenotypes, using single SNP analyses along with gene-based and pathway-based analyses. In addition, public biological databases will be used to inform and enhance the discovery of CF-disease related associations.
The objective of this research proposal is to develop an entirely new approach to the analysis and summary of genome association data. In contrast to approaches that use asymptotic parametric results, or computationally intensive resampling, our approach uses exact permutation moments followed by a density approximation to the relevant statistics. The new approach will be far faster and provide more accurate pvalues than current methods. We will develop these procedures into a new software package PANGEA. PANGEA will be especially useful for next-generation sequence data, and generally for even bigger-data future applications in genomics. The proposal is divided into three Aims: (i) To develop powerful and accurate testing procedures for genetic association studies of SNPs/variants, applicable both to SNP array and NGS platforms and with flexible handling of families and effective covariate control; (ii) To develop fast and accurate empirical pathway analysis approaches for genetic association; (iii) To provide an efficient and user-friendly software, further informed by comprehensive eQTL and ENCODE genomic annotation.
Groups
- Biomedicine
- Computational Genomics and Bioinformatics
- GGA Faculty: Department of Biological Sciences
- Environment
- Genetics and Genomics Pedagogy
- Genome Engineering and Synthetic Biology
- GGA Faculty
- Genetics and Genomics Pedagogy: Graduate
- Biomedicine: Humans
- Genome Engineering and Synthetic Biology: Microbes
- Environment: Natural environments