Xinge Jeng

Assoc Professor

SAS Hall 5264

[email protected]

Bio

Ph.D Statistics Purdue University 2009

Area(s) of Expertise

High-Dimensional Inference
Multiple Testing, Model Selection
Bioinformatics

Publications

Discovering Candidate Genes Regulated by GWAS Signals in Cis and Trans , Statistics in Biosciences (2025)
Spatially adaptive variable screening in presurgical functional magnetic resonance imaging data analysis , Biometrics (2024)
Weak signal inclusion under dependence and applications in genome-wide association study , The Annals of Applied Statistics (2024)
Estimating the proportion of signal variables under arbitrary covariance dependence , Electronic Journal of Statistics (2023)
Transfer learning with false negative control improves polygenic risk prediction , PLOS Genetics (2023)
Effective SNP ranking improves the performance of eQTL mapping , Genetic Epidemiology (2020)
FastLORS: Joint modelling for expression quantitative trait loci mapping in R , Stat (2020)
Model Selection With Mixed Variables on the Lasso Path , Sankhya B (2020)
Variable selection via adaptive false negative control in linear regression , Electronic Journal of Statistics (2019)
Efficient Signal Inclusion With Genomic Applications , Journal of the American Statistical Association (2018)

View all publications

Grants

Date: 08/01/18 - 7/31/22

Amount: $100,000.00

Funding Agencies: National Science Foundation (NSF)

This proposal is an integrated research, education, and outreach program that focuses on the study of statistical inference in high-dimensional regression and its applications in biomedical research. This proposal hopes to achieve the following aims. Aim 1: To develop a new analytic framework for inference-based variable selection when the number of unknown parameters can greatly exceed the sample size. This aim seeks to propose new techniques that integrate developments in penalized regression, large-scale multiple testing, sparse signal estimation, and dimension reduction. A new analytic framework will be constructed to approximate the false discovery proportion, false negative proportion, and the number of relevant variables. Key issues such as de-biased estimation, phase diagram in sparse inference, and adaptivity to unspecified dependence and signal sparsity will be addressed. Aim 2: To provide new inferential tools for precision medicine with a large number of prognostic factors. This study seeks to develop novel regression-based methods and theory to effectively use information on relevant covariates to make personalized treatment decisions. Confidence intervals and p-values for coefficients in the contrast function component of the outcome model will be derived when baseline function may be misspecified. The analytic framework proposed in Aim 1 will be extended to design new treatment regimes to address the major challenges of downplayed contrast effect and model misspecification. Aim 3: To implement education and outreach plans that include developing a new course in the graduate curriculum; involving undergraduates and under-represented groups in research; and providing an accessible website resource for the public.

Date: 09/02/16 - 8/31/18

Amount: $151,500.00

Funding Agencies: National Institutes of Health (NIH)

Next-generation sequencing (NGS) data are being increasingly generated over the last few years. Encompassing the full spectrum of genomic variations, they hold the promise of identifying new sources of heritability from rare variants that were eluded in traditional genome-wide association studies (GWAS). Despite substantial progresses in recent years, current methods are, nonetheless, limited in terms of power and robustness towards the analysis of NGS data, that are characterized by extreme high-dimensionality and low minor allele frequency (MAF). New methods are needed to adapt to these statistical challenges in order to achieve the full potential of NGS data in identifying genetic variations contributing to missing disease heritability. The goal of this project is to develop powerful and adaptive statistical methods for the analysis of sequencing studies. Specifically, the project aims to (1) develop adaptive procedures that can efficiently account for the presence of rare variants while significantly reducing the data dimension for follow-up analysis; and to (2) develop powerful information pooling procedures that can jointly test the effects of genetic variants within a SNP set, gene, or pathway on a disease or trait. Our procedures are completely data-driven and can automatically adapt to the underlying sparsity and dependence structure of the data. Moreover, the proposed methods are computationally efficient under extreme high-dimensionality. These desirable properties make the proposed methods applicable to a myriad of high-dimensional applications. Rigorous theory will be developed to understand the role of sparsity and extreme high-dimensionality in NGS data analysis, and comprehensive simulations will be performed to study the proposed methods. In addition, this project will provide computationally efficient programs and to evaluate the methods using several recent NGS datasets. The programs will be developed in R and efficient Fortran languages. Our computational package will be made publicly available to allow investigators to apply our procedures widely in sequencing studies.

Date: 08/27/15 - 8/26/17

Amount: $38,402.00

Funding Agencies: National Security Agency (NSA)

This proposal seeks to initiate a new direction in signal detection to facilitate the identification of indistinguishable signals under high dimensionality. The fast emerging high-throughput technology advances scientific applications into a new era by enabling detection of information-bearing signals with unprecedented sizes. However, we pay the price for analyzing high-dimensional data, not only in terms of computational cost, but also in terms of the capacity to identify the true signals, as they are more easily obscured by the large amount of noise. Contemporary statistical methods often use false positive control as the criterion to select true signals, so that signals strong enough to stand outside the range of noise can be identified with high confidence. These methods, however, have limited power for the large proportion of true signals that are indistinguishable from the noise under high-dimensionality. This proposal seeks to facilitate the identification of indistinguishable signals as the following specific aims are achieved: (1) To provide theoretical insights for the formation of indistinguishable signals under high-dimensionality, and to develop optimally adaptive methods for the detection of indistinguishable signals. (2) To develop new methodology and application strategy for efficient data screening and sample size guidance. (3) To develop data-adaptive methods for models having realistic dependence structures and heteroscedastic errors. The proposed research is applicable to a wide range of applications such as quality control, telecommunications, psychology, and genomics, where signal detection is a central problem. This study will be particularly valuable in areas where the signal-to-noise ratio is relatively small under high-dimensionality; one such example is rare variants association study, a new frontier in genetic analysis of complex traits. The proposed education and outreach activities will inform a broad audience about the PI's work by training K-12 science educators, teaching graduate and undergraduate students, developing a new graduate course, and involving under-represented groups in research. A website resource will be developed, and software packages will be made freely available to the general public.

View all grants

Xinge Jeng

Bio

Area(s) of Expertise

Publications

Grants

Groups

Find NC State websites, locations and people

MyPack Portal

University Libraries

Academic Calendar

Majors and Careers