Steffen Heber
Bio
Steffen Heber is a full professor in the Department of Computer Science. He has a joint appointment between the Department of Computer Science and the College of Sciences to support the Bioinformatics Research Center (BRC) at NCSU. Dr. Heber has studied mathematics and biology at the University of Heidelberg in Germany. He has received his PhD while working at the German Cancer Research Center (DKFZ) under the supervision of Martin Vingron and Joerg Hoheisel. After postdoctoral research with Pavel Pevzner at UCSD he has joined NCSU in 2003. His research focuses on bioinformatics and computational biology. Dr. Heber develops algorithms for analysis, summarization, and quality control of complex biological data. His research interests include data-driven storytelling, gene transcription and alternative splicing, translation, nature-inspired computation, and common intervals.
Research Areas
- Advanced Learning Technologies
- Algorithms and Theory of Computation
- Artificial Intelligence and Intelligent Agents
- Data Sciences and Analytics
- Graphics, Human Computer Interaction, & User Experience
- Scientific and High Performance Computing
Education
- 2001 Ph.D. in Mathematics, title: Algorithms for Physical Mapping.
- 1998 Staatsexamen in mathematics and biology.
- 1995 Diploma in mathematics, title: Additive Periodizitaet bei Nim-Spielen.
Awards
- Carol Miller Graduate Lecturer Award – 2018 and 2013
- Thank a Teacher Recognition Letter – Fall 2016 (2 nominations), Spring 2015 (3 nominations), Fall 2014, Fall 2013, Spring 2013
- IBM Faculty Award – 2008
- One of the “RASS” Top Ten Papers Advancing the Science of Risk Assessment – 2007
- Best Paper Award, ACMSE – 2007
- Faculty Research and Professional Development Award, NC State University – 2006
- Travel Award, ISCB – 2002
- Best Poster Award, DKFZ Poster Presentation – 2000
- Diploma passed with distinction – 1995
- Baccalaureate exam passed with distinction – 1987
- 1st Place, Baden-Wuerttemberg Mathematics Contest – 1986
Publications
- PeakPass: Automating ChIP-Seq Blacklist Creation , JOURNAL OF COMPUTATIONAL BIOLOGY (2019)
- PeakPass: Automating ChIP-Seq Blacklist Creation , BIOINFORMATICS RESEARCH AND APPLICATIONS, ISBRA 2019 (2019)
- RiboStreamR: a web application for quality control, analysis, and visualization of Ribo-seq data , BMC GENOMICS (2019)
- Disruption of Trim9 function abrogates macrophage motility in vivo , Journal of Leukocyte Biology (2017)
- riboStreamR: A web application for quality control, analysis, and visualization of Ribo-seq data , International conference on computational advances in bio and medical (2017)
- Genome-wide search for translated upstream open reading frames in Arabidopsis thaliana , IEEE Transactions on Nanobioscience (2016)
- Transcriptomic Signature of the SHATTERPROOF2 Expression Domain Reveals the Meristematic Nature of Arabidopsis Gynoecial Medial Domain , PLANT PHYSIOLOGY (2016)
- A Stacking-Based Approach to Identify Translated Upstream Open Reading Frames in Arabidopsis Thaliana , Bioinformatics Research and Applications (2015)
- A stacking-based approach to identify translated upstream open reading frames in Arabidopsis thaliana , Bioinformatics research and applications (isbra 2015) (2015)
- Gene-Specific Translation Regulation Mediated by the Hormone-Signaling Molecule EIN2 , CELL (2015)
Grants
Overview. Changes in gene expression are at the core of many biological processes, from forming a multicellular organism from a fertilized egg to surviving pathogen attacks or coping with environmental pressures. Although transcription regulation plays a critical role in modulating gene expression, growing lines of evidence indicate that gene-specific regulation at the translational level is also critical for many of these important biological processes. Unfortunately, the existing technologies to quantify changes in translation, both at genome-wide and single-gene levels, are technically demanding and costly, thus hindering the widespread investigation of this type of regulation. The development of technologies that make the quantification of translation efficiency routine has the potential to transform the field of gene regulation, allowing for the discovery of many more processes and genes regulated at the translational level. This, in turn, will open new opportunities to manipulate gene expression for both basic and applied purposes. Currently, the most widely used approach to determine the translation level of a gene is the expensive and technically demanding ribosome profiling (aka Ribo-seq) which involves quantifying the levels of each transcript and the corresponding number of associated ribosomes. We argue that this information could also be obtained by a much simpler process of determining the position of just the first or last ribosome (most 5��� or most 3���) in each transcript and then comparing the distribution of these first or last positions between two different experimental conditions. Although in principle, there is no conceptual reason to think that this Ribosome Position Inference (RiboPI) approach would not work, critical technical unknowns make this a high-risk, high-reward proposal. Intellectual merit. The main objective of this proposal is to develop an efficient, simple, and scalable RiboPI technology to quantify translation rates at both genome-wide and single-gene levels. If successful, RiboPI will make translation regulation information as accessible as RNA-seq did for transcriptomics, reducing the cost and time requirements, the complexity of the experimental procedures, and the amount of biological material needed. Not only will this make translation analysis a routine technique in many labs, but it could also bypass some of the limitations of the current technologies, such as the difficulty of mapping the very short ribosome footprints to specific splice variants, alleles, or even homeologs in polyploid species, or enable targeted studies for a group of genes. To achieve this goal, we propose to develop RiboPI, an experimentally simple approach to capture the position of the first or last ribosome in each transcript and computational methods to compare the distribution of these ribosome positions between different experimental conditions to estimate translational levels from this information. The proposed experimental pipeline involves testing novel combinations of in vivo and in vitro molecular biology procedures to efficiently and specifically map the first/last ribosome in a transcript. Some of the unknowns that make this proposal high-risk are (1) the uncertainty of whether suitable experimental conditions can be found (e.g., that preserve ribosome binding and promote reverse transcriptase activity but melt the secondary structure of mRNA) and (2) the ability to infer the efficiency of translation from the distributions of first/last ribosomes on transcripts. By comparing results obtained with classical Ribo-seq to those obtained with RiboPI, we will be able to determine the reliability of the new approach. Broader impacts. In addition to the clear benefits of developing a new experimental approach to quantify gene-specific translation efficiency and popularizing this type of analysis, this project will provide an ideal training platform for undergraduate students to experience first-hand the translation of basic biological knowledge into potentially transformative new technologies.
Title: Transcriptional and translational regulatory networks of hormone signal integration in tomato and Arabidopsis. PI: Jose M. Alonso (Plant Biology, NCSU), Co-PIs:Anna Stepanova (Plant Biology, NCSU), Steffen Heber (Computer Science, NCSU), Cranos Williams (Electric Engineering, NCSU). Overview: Plants, as sessile organisms, need to constantly adjust their intrinsic growth and developmental programs to the environmental conditions. These environmentally triggered ����������������adjustments���������������� often involve changes in the developmentally predefined patterns of one or more hormone activities. In turn, these hormonal changes result in alterations at the gene expression level and the concurrent alterations of the cellular activities. In general, these hormone-mediated regulatory functions are achieved, at least in part, by modulating the transcriptional activity of hundreds of genes. The study of these transcriptional regulatory networks not only provides a conceptual framework to understand the fundamental biology behind these hormone-mediated processes, but also the molecular tools needed to accelerate the progress of modern agriculture. Although often overlooked, understanding of the translational regulatory networks behind complex biological processes has the potential to empower similar advances in both basic and applied plant biology arenas. By taking advantage of the recently developed ribosome footprinting technology, genome-wide changes in translation activity in response to ethylene were quantified at codon resolution, and new translational regulatory elements have been identified in Arabidopsis. Importantly, the detailed characterization of one of the regulatory elements identified indicates that this regulation is NOT miRNA dependent, and that the identified regulatory element is also responsive to the plant hormone auxin, suggesting a role in the interaction between these two plant hormones. These findings not only confirm the basic biological importance of translational regulation and its potential as a signal integration mechanism, but also open new avenues to identifying, characterizing and utilizing additional regulatory modules in plants species of economic importance. Towards that general goal, a plant-optimized ribosome footprinting methodology will be deployed to examine the translation landscape of two plant species, tomato and Arabidopsis, in response to two plant hormones, ethylene and auxin. A time-course experiment will be performed to maximize the detection sensitivity (strong vs. weak) and diversity (early vs. late activation) of additional translational regulatory elements. The large amount and dynamic nature of the generated data will be also utilized to generate hierarchical transcriptional and translational interaction networks between these two hormones and to explore the possible use of these types of diverse information to identify key regulatory nodes. Finally, the comparison between two plant species will provide critical information on the conservation of the regulatory elements identified and, thus, inform research on future practical applications. Intellectual merit: The identification and characterization of signal integration hubs and cis-regulatory elements of translation will allow not only to better understand how information from different origins (environment and developmental programs) are integrated, but also to devise new strategies to control this flow for the advance of agriculture. Broader Impacts: A new outreach program to promote interest among middle and high school kids in combining biology, computers, and engineering. We will use our current NSF-supported Plants4kids platform (ref) with a web-based bilingual divulgation tools, monthly demos at the science museum and local schools to implement this new outreach program. Examples of demonstration modules will include comparison between simple electronic and genetic circuits.
The coordination of spatial patterning cues and cellular proliferation underlies diverse processes from cancerous growth to reproductive development. A long-term objective of my research program is to understand how proliferative cues are coordinated with spatial information during organogenesis. In Arabidopsis thaliana this coordination of patterning and proliferation is necessary within the carpel margin meristem (CMM) to generate ovules that when fertilized will become seeds. The CMM is a vital meristematic structure with a unique pattern of organ initiation, and novel mechanisms of meristematic development that are not yet well characterized. In the previous funding period we demonstrated that the SEUSS (SEU) and AINTEGUMENTA (ANT) transcription factors regulate patterning events within the gynoecium that are critical for carpel margin meristem and ovule development. Our genetic analysis demonstrates that SEU and ANT share a partially redundant and overlapping function essential for proper CMM development. As SEU and ANT do not share sequence similarity, the molecular basis for this redundancy is not understood. We propose that the SEU and ANT activities synergistically converge at key transcriptional nodes. A node in this sense is a gene or a set of related genes that requires the combined activities of SEU and ANT for its proper expression. Our recently published transcriptomic analysis identified many putative nodes encoding known transcriptional regulators. By studying these candidate nodes we hope to better understand the transcriptional hierarchies that control CMM development and uncover the mechanistic basis of the synergistic action of SEU and ANT. Our transcriptomics study cannot determine if these nodes are directly or indirectly regulated by SEU or ANT activity. However, even if these node genes are indirectly controlled by SEU and ANT activity, their expression within the developing CMM suggests they may still play a critical functional role during CMM development. Furthermore, having now identified a set of genes that are enriched for CMM expression we are in a position to study the cis-regulatory elements that support gene expression within the CMM and the medial gynoecial domain. Here we propose to: 1) Identify direct targets of SEU regulation within the CMM to further refine the transcriptional hierarchy required for CMM development; 2) Develop a protoplast sorting protocol for gynoecial cells to enable a systems-biological analysis of developmental events within the CMM; 3) Determine the functional role of one of our high priority candidate genes during CMM development; and 4) Identify evolutionarily conserved cis-regulatory elements that support the expression of our candidate genes in the CMM. Intellectual Merit: Understanding the coordination of cellular proliferation and spatial patterning during organogenesis is of broad interest to scientists working in a diversity of fields. Completion of these specific aims will move us toward this future goal by illuminating the mechanistic basis for the overlapping functions of SEU and ANT during carpel margin meristem and ovule development. Additionally, we expect that by elucidating the molecular mechanisms of the synergistic action of SEU and ANT upon key transcriptional nodes, we will engender a greater understanding of the molecular underpinnings of non-additivity within transcriptional networks and the complexity of developmental programs. Past NSF funding for this project (IOS-0821896) has resulted in the publication of five articles in well-respected journals (two in Plant Physiology, and one each in Developmental Biology, PLoS One, and BMC Plant Biology). Broader impacts: I ensure a broad societal impact from my program by integrating my research efforts with my teaching and training responsibilities and by widely disseminating materials and results. Furthermore, I organize an outreach group that presents hands-on science demonstrations at local North Carolina middle schools. Additionally, our work on the mechanisms of CMM development may lea
The main objective of this proposal is to develop the analytical tools to study an essential, yet still poorly understood, step in gene expression: mRNA translation. While other critical aspects of gene activity, such as changes in the level of a mRNA or a small regulatory RNA, can be easily examined and precisely quantified, only recently it has become technically possible to obtain the same type and quality of information on changes in translation. These new technical developments have opened a new window of opportunity to advance our understanding of key biological processes. Thus, the implementation of this new technology will provide a new competitive edge to a broad community of researchers at NC State University interested in the regulation of gene expression. The complementary research interests of the PIs, their proven experience in employing these types of technologies, and the qualities of the chosen experimental system, provide strong guaranties of the success of this proposal.
Alternative splicing (AS) is an important mechanism of gene regulation that contributes to transcriptome and proteome diversity. The role of AS in specific biological processes is largely uninvestigated. Our goal is to measure extent and functional significance of AS in Arabidopsis thaliana defense against the bacterial pathogen Pseudomonas syringae pv tomato (Pst). Specific objectives include 1) discover Pst-induced AS via long-read mRNA-Seq, 2) characterize genome-wide AS and gene expression patterns associated with Arabidopsis-Pst interactions via short-read mRNA-Seq, and 3) determine the impact of AS on protein structure, validate selected AS isoforms and perform functional analysis of selected AS genes.
Due to its interdisciplinary nature and rapid pace, Bioinformatics is a challenging task for students and teachers. Despite many excellent text books and tutorials, there are hardly any supplementary educational tools such as visualizations, animations, or simulation games available. We will address this lack of resources by developing a library of animations for Bioinformatics algorithms and applications, organizing a symposium about Bioinformatics Education with focus on educational tools, and developing an online Bioinformatics education resource portal.
The Bioinformatics Research Center (BRC) at NC State University is one of the world's premier centers for education and research in bioinformatics. Established by the Board of Governors of the University of North Carolina System in 2000, the BRC is located on NC State University's Centennial Campus in Raleigh. BRC research focuses on the development of new computational and statistical tools for the analysis and interpretation of genomic data. More than 40 faculty in the mathematical, biological, and computer sciences are affiliated with the bioinformatics program. The BRC is also dedicated to providing outstanding educational and training opportunities for graduate students and genomic scientists. The BRC offers graduate programs in bioinformatics and statistical genetics. At the moment, more than 80 students are registered in these programs. A major impediment for the BRC is the lack of computational equipment. Microarray and protein mass spectrometry data are growing steadily, and computationally intensive techniques like Monte Carlo simulations and whole genome analyses require more and more computer power. Although the university [http://hpc.ncsu.edu] and the region [http://www.ncbiogrid.org/] have considerable high-performance computational resources, these resources are often geared to purely academic use and are difficult or impossible to customize for the diverse requirements of our bioinformatics applications. This conflicts with the increasing and continuously changing computational needs resulting from the BRC's interdisciplinary nature and its various industry connections. The BRC's own computing resources are outdated and insufficient for the large number of affiliated researchers. The need for computation by individual researcher and individual research groups is highly variable over time. Sometimes, a researcher may need to simultaneously run many demanding jobs. At other times, the researcher might have little need for computation. This variability of computing needs over time suggests that the most sensible strategy for BRC-affiliated researchers is to share computing resources. Computing is central to bioinformatics and it is essential that the BRC has access to adequate computing. In short, additional computer resources are an absolute necessity for the BRC to remain competitive as a statewide biotechnology center. This proposal requests funds for a 54 dual-Xeon compute node Linux cluster in order to enhance the computational resources of the BRC, and to enter into an HPC partnership with the NC State Information Technology Division (ITD). The BRC purchases HPC hardware (compute blades and/or storage) and any specialized software licenses. NC State ITD provides space, an option to combine the purchased computing power with the one available by the general HPC program, and the system administration and support. In return for services provided by ITD, when computing resources are not being used, the server(s) are available to the general NC State HPC cluster user community. This partnership will leverage the proposed cluster investment in two ways: first, while remaining in complete control of the purchased computers, BRC faculty will gain access to the HPC resources - currently more than 400 blade processors. Second, by having a shared, ITD housed and administered computing resource, BRC faculties will not have to waste space and funds to maintain their own personal systems. This is especially important since many of the BRC associated faculty are starting investigators with small to medium sized groups. If necessary in the future, the proposed cluster could be expanded easily by purchasing new nodes. The proposed Linux cluster in conjunction with ITD's Faculty HPC Partnership Program will considerably improve our ability to advance in research, and it will increase our competitive ability to obtain federal funding, to attract new faculty, as well as to facilitate cross collaborative efforts with other triangle universities and companies by leveraging grid technology. It will strengthen the po
Alternative splicing (AS) is a major contributor to the complexity of the proteome in that some genes might produce hundreds of different transcripts. AS plays an important role in development, physiology, and disease. Understanding alternative splicing is, therefore, of the highest importance. In previous work the PI and colleagues developed the splicing graph, a novel data structure, which integrates all transcripts derived from a gene, and reliably recovers all sampled splice variants. The PI plans to continue this work by developing a database of alternatively spliced proteins, and to focus on long-term research questions such as: What characterizes genes with a huge abundance of alternative splicing? How is alternative splicing controlled? How does alternative splicing influence the proteome? The proposed database of alternatively spliced proteins will greatly expand available protein isoform data and act as a platform for systematic research about the extent and regulation of alternative splicing on the protein level. The database will be used to compare and analyze the extent of alternative splicing on the protein as well as the mRNA level.