PLSKO: a robust knockoff generator to control false discovery rate in omics variable selection

Our PhD student Guannan Yang has submitted her first manuscript!

PLSKO is a new and robust knockoff variable generator that is applicable to various types of omics data. Knockoff generators enable to create ‘knockoff copies’ of original data to control for False Discovery Rate without the need to calculate p-values.

PLSKO is:

  • assumption-free
  • controls FDR with sufficient statistical power in omics studies
  • applicable in complex non-linear cases.
PLSKO: a robust knockoff generator to control false discovery rate in omics variable selection. Guannan YangEllen MenkhorstEvdokia DimitriadisKim-Anh Lê Cao.

 

Abstract

The knockoff framework, combined with variable selection procedure, controls false discovery rate (FDR) without the need for calculating p−values. Hence, it presents an attractive alternative to differential expression analysis of high-throughput biological data. However, current knockoff variable generators make strong assumptions or insufficient approximations that lead to FDR inflation when applied to biological data.

We propose Partial Least Squares Knockoff (PLSKO), an efficient and assumption-free knockoff generator that is robust to varying types of biological omics data. We compare PLSKO with a wide range of existing methods. In simulation studies, we show that PLSKO is the only method that controls FDR with sufficient statistical power in complex non-linear cases. In semi-simulation studies based on real data, we show that PLSKO generates valid knockoff variables for different types of biological data, including RNA-seq, proteomics, metabolomics and microbiome. In preeclampsia multi-omics case studies, we combined PLSKO with Aggregation Knockoff to address the random- ness of knockoffs and improve power, and show that our method is able to select variables that are biologically relevant.