NanoStringDiff: a novel statistical method for differential expression analysis based on NanoString nCounter data

Hong Wang; Craig Horbinski; Hao Wu; Yinxing Liu; Shaoyi Sheng; Jinpeng Liu; Heidi Weiss; Arnold J Stromberg; Chi Wang

doi:10.1093/nar/gkw677

NanoStringDiff: a novel statistical method for differential expression analysis based on NanoString nCounter data

Nucleic Acids Res. 2016 Nov 16;44(20):e151. doi: 10.1093/nar/gkw677. Epub 2016 Jul 28.

Authors

Hong Wang¹, Craig Horbinski², Hao Wu³, Yinxing Liu⁴, Shaoyi Sheng⁵, Jinpeng Liu⁶, Heidi Weiss^{6

7}, Arnold J Stromberg¹, Chi Wang^{8

7}

Affiliations

¹ Department of Statistics, University of Kentucky, Lexington, KY 40536, USA.
² Departments of Pathology and Neurosurgery, Northwestern University, Chicago, IL 60611, USA.
³ Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA.
⁴ Department of Pathology and Laboratory Medicine, University of Kentucky, Lexington, KY 40536 USA.
⁵ Paul Laurence Dunbar High School, Lexington, KY 40513, USA.
⁶ Biostatistics and Bioinformatics Shared Resource Facility, Markey Cancer Center, University of Kentucky, Lexington, KY 40536, USA.
⁷ Department of Biostatistics, University of Kentucky, Lexington, KY 40536, USA.
⁸ Biostatistics and Bioinformatics Shared Resource Facility, Markey Cancer Center, University of Kentucky, Lexington, KY 40536, USA chi.wang@uky.edu.

Abstract

The advanced medium-throughput NanoString nCounter technology has been increasingly used for mRNA or miRNA differential expression (DE) studies due to its advantages including direct measurement of molecule expression levels without amplification, digital readout and superior applicability to formalin fixed paraffin embedded samples. However, the analysis of nCounter data is hampered because most methods developed are based on t-tests, which do not fit the count data generated by the NanoString nCounter system. Furthermore, data normalization procedures of current methods are either not suitable for counts or not specific for NanoString nCounter data. We develop a novel DE detection method based on NanoString nCounter data. The method, named NanoStringDiff, considers a generalized linear model of the negative binomial family to characterize count data and allows for multifactor design. Data normalization is incorporated in the model framework through data normalization parameters, which are estimated from positive controls, negative controls and housekeeping genes embedded in the nCounter system. We propose an empirical Bayes shrinkage approach to estimate the dispersion parameter in the model and a likelihood ratio test to identify differentially expressed genes. Simulations and real data analysis demonstrate that the proposed method performs better than existing methods.

MeSH terms

Algorithms
Computational Biology / methods*
Computer Simulation
Datasets as Topic
Gene Expression Profiling / methods*
Gene Expression Regulation
MicroRNAs / genetics*
Models, Statistical*
RNA, Messenger / genetics*
Reproducibility of Results

Substances

MicroRNAs
RNA, Messenger