CeFH Biostatistical seminar: presentation of "A simple new approach to variable selection in regression, with application to genetic fine-mapping" by Wang et al.
About the CeFH Biostatistical seminar
The biostatistical seminar takes place on a monthly basis. The focus at these seminars is on methods and their mathematical backgrounds. Applications may also be presented. We invite a broad range of researcher from Norway and abroad to discuss various topics such as causal inference, Bayesian methods, variable selection, and more!
Statisticians ans data analysts from the Norwegian Institute of Public Health are invited at every meeting. People from the outside of the institute are also welcome to join. If you want to present your latest research, please contact William Denault.
William Denault PhD student at the Centre for Fertility and Health and Department of Genetics and Bioinformatics at the Norwegian Institute of Public Health, and University of Bergen, will present the following paper:
A simple new approach to variable selection in regression, with application to genetic fine-mapping
By Wang et al
We introduce a simple new approach to variable selection in linear regression, with a particular focus on quantifying uncertainty in which variables should be selected. The approach is based on a new model – the “Sum of Single Effects” (SuSiE) model – which comes from writing the sparse vector of regression coefficients as a sum of “single-effect” vectors, each with one non-zero element. We also introduce a corresponding new fitting procedure – Iterative Bayesian Stepwise Selection (IBSS) – which is a Bayesian analogue of stepwise selection methods. IBSS shares the computational simplicity and speed of traditional stepwise methods, but instead of selecting a single variable at each step, IBSS computes a distribution on variables that captures uncertainty in which variable to select. We provide a formal justification of this intuitive algorithm by showing that it optimizes a variational approximation to the posterior distribution under the SuSiE model. Further, this approximate posterior distribution naturally yields convenient novel summaries of uncertainty in variable selection, providing a Credible Set of variables for each selection. Our methods are particularly well-suited to settings where variables are highly correlated and detectable effects are sparse, both of which are characteristics of genetic fine-mapping applications. We demonstrate through numerical experiments that our methods outper-form existing methods for this task, and illustrate their application to fine-mapping genetic variants influencing alternative splicing in human cell-lines. We also discuss the potential and challenges for applying these methods to generic variable selection problems.