About the CeFH Biostatistical seminar
The biostatistical seminar takes place on a monthly basis. The focus at these seminars is on methods and their mathematical backgrounds. Applications may also be presented. We invite a broad range of researcher from Norway and abroad to discuss various topics such as causal inference, Bayesian methods, variable selection, and more!
Statisticians ans data analysts from the Norwegian Institute of Public Health are invited at every meeting. People from the outside of the institute are also welcome to join. If you want to present your latest research, please contact William Denault.
About the speaker
Tonje Lien is postdoc at Oslo University Hospital and her research focus is on penalized regression for combining GWAS and EWAS data.
Using high-dimensional penalized regression we studied genome-wide DNA-methylation in bone biopsies of 80 postmenopausal women in relation to their bone mineral density (BMD). The women showed BMD varying from severely osteoporotic to normal. Global gene expression data from the same individuals was available, and since DNA-methylation often affects gene expression, the overall aim of this paper was to include both of these omics data sets into an integrated analysis.
The classical penalized regression uses one penalty, but we incorporated individual penalties for each of the DNA-methylation sites. These individual penalties were guided by the strength of association between DNA-methylations and gene transcript levels. DNA-methylations that were highly associated to one or more transcripts got lower penalties and were therefore favored compared to DNA-methylations showing less association to expression. Because of the complex pathways and interactions among genes, we investigated both the association between DNA-methylations and their corresponding cis gene, as well as the association between DNA-methylations and trans-located genes. Two integrating penalized methods were used: first, an adaptive group-regularized ridge regression, and secondly, variable selection was performed through a modified version of the weighted lasso.
When information from gene expressions was integrated, predictive performance was considerably improved, in terms of predictive mean square error, compared to classical penalized regression without data integration. We found a 14.7% improvement in the ridge regression case and a 17% improvement for the lasso case. Our version of the weighted lasso with data integration found a list of 22 interesting methylation sites. Several corresponded to genes that are known to be important in bone formation. Using BMD as response and these 22 methylation sites as covariates, least square regression analyses resulted in R2=0.726, comparable to an average R2=0.438 for 10000 randomly selected groups of DNA-methylations with group size 22.
Two recent types of penalized regression methods were adapted to integrate DNA-methylation and their association to gene expression in the analysis of bone mineral density. In both cases predictions clearly benefit from including the additional information on gene expressions.