Mining, visualizing and comparing multidimensional biomolecular data using the Genomics Data Miner (GMine) Web-Server

Proietti, Carla; Zakrzewski, Martha; Watkins, Thomas S.; Berger, Bernard; Hasan, Shihab; Ratnatunga, Champa N.; Brion, Marie-Jo; Crompton, Peter D.; Miles, John J.; Doolan, Denise L.; Krause, Lutz

doi:10.1038/srep38178

Download PDF

Article
Open access
Published: 06 December 2016

Mining, visualizing and comparing multidimensional biomolecular data using the Genomics Data Miner (GMine) Web-Server

Carla Proietti¹^na1,
Martha Zakrzewski¹^na1,
Thomas S. Watkins¹^na1,
Bernard Berger²^na1,
Shihab Hasan^1,3^na1,
Champa N. Ratnatunga¹^na1,
Marie-Jo Brion³^na1,
Peter D. Crompton⁴^na1,
John J. Miles^1,5^na1,
Denise L. Doolan^1,5^na1 &
…
Lutz Krause^1,3^na1

Scientific Reports volume 6, Article number: 38178 (2016) Cite this article

2408 Accesses
20 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Genomics Data Miner (GMine) is a user-friendly online software that allows non-experts to mine, cluster and compare multidimensional biomolecular datasets. Various powerful visualization techniques are provided, generating high quality figures that can be directly incorporated into scientific publications. Robust and comprehensive analyses are provided via a broad range of data-mining techniques, including univariate and multivariate statistical analysis, supervised learning, correlation networks, clustering and multivariable regression. The software has a focus on multivariate techniques, which can attribute variance in the measurements to multiple explanatory variables and confounders. Various normalization methods are provided. Extensive help pages and a tutorial are available via a wiki server. Using GMine we reanalyzed proteome microarray data of host antibody response against Plasmodium falciparum. Our results support the hypothesis that immunity to malaria is a higher-order phenomenon related to a pattern of responses and not attributable to any single antigen. We also analyzed gene expression across resting and activated T cells, identifying many immune-related genes with differential expression. This highlights both the plasticity of T cells and the operation of a hardwired activation program. These application examples demonstrate that GMine facilitates an accurate and in-depth analysis of complex molecular datasets, including genomics, transcriptomics and proteomics data.

High-throughput proteomics: a methodological mini-review

Article 03 August 2022

Development of a fixed module repertoire for the analysis and interpretation of blood transcriptome data

Article Open access 19 July 2021

Mapping protein states and interactions across the tree of life with co-fractionation mass spectrometry

Article Open access 15 December 2023

Introduction

The development of high-throughput molecular assays has revolutionized medical and biomolecular research and has greatly advanced genome, transcriptome and proteome analysis. Researchers are now able to rapidly and cost-effectively profile gene transcription, protein abundances, immune responses, DNA methylation, histone occupancy, microRNA expression, or metagenomic gene frequencies. Measurements are frequently obtained for hundreds to thousands of features and across multiple samples, biological conditions and time-points. Common characteristics of the generated data are their complexity, multidimensionality and multivariate nature, where variance in the measurements can be attributed to multiple explanatory variables and possible confounders. Analysis of the generated data therefore requires the application of advanced data-mining techniques, multivariate statistics and powerful visualization methods for data exploration.

Several computational tools are available for the analysis of complex biomolecular datasets, such as data generated by protein microarrays or NanoString expression data. The Cyber-T web server¹ allows data transformation and data comparison via Bayesian t-test and ANOVA. The standalone MeV MultiExperiment Viewer² provides powerful methods for data visualization and clustering. Matrix2png³ allows visualization of microarray data and other data types via a heatmap. Additionally, a multitude of R packages are available for the analysis of protein microarrays and other biomolecular data. However, thorough and meaningful analysis of complex biomolecular data using these currently available tools is frequently hampered by the advanced computational skills generally required to use them, by their limited selection of statistical methods, or by the lack of visualization capabilities.

To address this, we developed GMine, a powerful, yet easy to use tool for the higher-level analysis of biomolecular data. GMine allows the analysis of any genomic, metagenomic, transcriptomic or proteomic dataset with several hundred to several thousand biological features (e.g. genes or proteins) across different samples and biological conditions. Measurements can represent intensities, frequencies or counts (e.g. signal intensities, expression values, methylation levels, or sequence read counts generated in metagenomic, ChIP-seq or RNA-seq experiments). The software has been developed with an initial focus on protein microarray antibody responses and NanoString gene expression data. These two techniques are becoming widely used in biomolecular and medical research, but there is a general lack of available easy-to-use software for the analysis of the generated data.

Whole-proteome microarrays are increasingly popular for vaccine target antigen identification, studying immune responses, biomarker discovery, and clinical diagnostics^{4,5,6,7,8,9,10}. In the field of infectious diseases, protein microarrays are widely used for measuring host immune response to proteins of infectious agents, including malaria, schistosomes or viruses^{4,5,11,12,13,14,15}. Protein microarrays allow measurement of antibody responses against the entire proteome of a pathogen. This results in large and complex datasets, with signal intensities measured for thousands of proteins and hundreds of samples. In addition, antibody responses can be attributed to multiple covariates, including gender, age, geographic location, or infection history. The analysis of this type of data therefore requires advanced statistical methods that can assess complex associations while accounting for the various contributing confounding variables; this is enabled by GMine.

NanoString technology performs multiplex quantification of gene expression with high levels of precision and sensitivity, and can detect very small changes of a single transcript. Given the capacity to directly ‘count’ molecules, the platform is free from the inherent multiplex/PCR bias seen in other technologies, such as qPCR, microarray or RNAseq. NanoString enables the precise quantification of expression levels of a large number of genes across multiple biological samples and changes in gene expression can usually be attributed to multiple explanatory variables and confounding factors. The analysis of the generated high-dimensional and multivariate data therefore requires the application of advanced data-mining methods, which is now possible via the GMine web-server.

To demonstrate the utility of GMine for proteome and expression data, we applied our software to two types of studies. First, we used GMine to further analyze previously published proteome microarray data generated in a prospective study investigating host immune response to the malaria parasite Plasmodium falciparum¹⁵. Our intent was to apply GMine to this existing dataset to identify P. falciparum antigens associated with protection from clinical malaria, in order to facilitate the rational design of malaria vaccines. As a second applied study, we used GMine to analyze gene transcription occurring during the course of T cell activation, by comparing the expression of 135 key genes in resting and activated primary human CD4⁺ and CD8⁺ T cells. These two T cell subtypes play crucial roles in orchestrating and executing human immune system function, and understanding their molecular basis is important for guiding and modulating function for preventive or therapeutic purposes.

Results

GMine Server and User Interface

Genomics Data Miner (GMine) is accessible via a free web-server, which provides a user-friendly interface to advanced statistical methods and data-mining algorithms (Table 1). The web-frontend is implemented in Java using the JavaServer Faces architecture and the backend is implemented in Perl and the R statistical programming environment. The user interface is based on HTML and Cascading Style Sheets and interactive visualizations have been realized using JavaScript. No installation, configuration, registration or login is required. Data is kept privately and cannot be viewed by other users. Uploaded data and calculated results are deleted after a user session has terminated. Extensive help pages and a tutorial are provided via a wiki server. GMine is freely available at http://cgenome.net/GMine/.

Table 1 Overview of graphical and statistical methods provided by GMine.

Full size table

The web-frontend provides several analysis tabs that provide access to different data-mining methods (Table 2). High-quality figures are generated in PNG, PDF or SVG format, which can readily be used across an array of scientific outputs. The SVG format allows an easy modification of figures in vector graphics editors, such as Inkscape. Generated figures can be adjusted according to user preferences, including color scheme, coloring of samples by biological condition, figure resolution and figure dimensions.

Table 2 Overview of GMine analysis tabs.

Full size table

Data Upload and Normalization

As input, GMine requires an ‘n × m’ data matrix and a metadata file providing meta-information for each sample (Table S1). The server can be used for any n × m data matrix (with n × m < two million), where the columns correspond to samples, the rows represent features (e.g. proteins) and the values of the data matrix represent measured intensities, frequencies or counts (e.g. signal intensities, expression values, or methylation levels). Input files are simple text files in tab separated or comma separated format and can be created in spreadsheet programs such as Excel. The data upload page provides several normalization and data transformation methods, including quantile normalization, centered log ratio and variance stabilization and calibration for microarray data (VSN) (Table 1). The effects of transformation and normalization procedures on the data are visualized by boxplots and scatterplots (Figure S1). The provided VSN algorithm was originally developed for gene expression microarrays, and is now also widely used for the normalization of protein array data. VSN corrects for the dependency between intensity and variance, which can bias the analysis in standard statistical procedures. The algorithm transforms the data such that the variance remains nearly constant over the whole intensity spectrum.

Analysis Enabled by GMine

Quantitative visualization and univariate data analysis

Data is visualized quantitatively using heatmaps, bubble plots, scatter plots, stripcharts, barcharts and boxplots (Figs 1, 2, 3, 4, 5, 6 and 7). Biological conditions or classes (e.g. case/control) can be compared with a wide range of parametric and non-parametric statistical tests (Table 1) and p-values are adjusted for multiple testing by FDR and Bonferroni correction. Distribution of p-values is presented as histograms and quantile-quantile (QQ) plots. QQ plots characterize the extent to which the observed distribution of the test statistics follows the expected (null) distribution. This allows the detection of evidence for systematic bias. GMine also facilitates hypothesis testing by the Bayesian t-test, which addresses problems associated with low replication levels and technology biases and has been proposed for the analysis of DNA microarrays, protein arrays, quantitative mass spectrometry and next-generation sequencing (RNA-seq) data^1,16.

Multivariate analysis

Multivariate statistics are powerful techniques that can identify complex associations between measurements and multiple explanatory variables. GMine provides a wide range of unsupervised multivariate methods for ordination and clustering (Table 1). When visualizing data in heatmaps (Figs 1a, 3a, 5a and 7a), GMine allows trimming of outliers, selection of the color palette and adjustment of the color range center.

The significance of associations between measurements and multiple explanatory variables can be examined using the supervised statistical methods redundancy analysis (RDA), canonical correspondence analysis (CCA), and multivariable regression. GMine further enables data analysis by Partial Least Squares Regression (PLS), a multivariate methods which can identify associations between individual features and multiple explanatory variables.

Correlation networks visualize positive and negative associations between features, between explanatory variables, and between features and explanatory variables (Figs 1c and 7e). Features and explanatory variables are represented as nodes and measures as node sizes. Edges represent positive (yellow) and negative (blue) associations. Nodes are ordinated in the network by principal coordinates analysis (PCoA), which places correlating nodes in close proximity and anti-correlating nodes at distant locations.

Analysis of repeated measures

GMine allows for the analysis of experimental data from a repeated measures design. For instance, repeated measures are obtained in longitudinal studies or when multiple samples are collected from different tissues of the same subject, e.g. tumor and matched normal samples. In GMine, repeated measures can be analyzed using mixed effect regression models, which incorporate the individual (or animal or cage) as a random effect and explanatory variables (e.g. case/control or treatment) as fixed effects. These models can distinguish between group-specific effects (e.g. average in cases and controls) and subject or cage-specific effects.

Biomarker discovery and classification

GMine provides simple yet powerful methods for biomarker discovery. The discriminatory power of the uploaded data to distinguish between two sample groups (e.g. cases/controls) is ascertained using a support vector machine (SVM) evaluated by leave-one-out cross-validation. The discriminatory power of biomarker candidates is described by the area under the curve (AUC), odds ratio, delta (difference in means in units of standard deviation), and fold change (Fig. 4).

Identification of relevant features from large multidimensional datasets

The high-dimensionality of biomolecular datasets with far more variables than samples can affect the ability of statistical analysis to extract meaningful information^17,18,19,20. Furthermore, biomolecular data frequently contains many features that are either redundant or irrelevant, and can thus be removed without much loss of information. Robust statistical techniques are needed that are able to extract biologically informative features differentiating two or more phenotypes from large multidimensional data. To identify the subset of relevant features predictive of an outcome of interest, GMine provides access to the advanced feature selection methods: step-wise linear regression, LASSO regularized regression, random forest and the linear discriminant analysis (LDA) effect size method (LEfSe)²⁰. LASSO performs both feature selection and regularization to prevent overfitting. Random forest identifies the subset of most relevant features by constructing a collection of decision trees. Overfitting is avoided by constructing trees incorporating only a random subset of the features. LEfSe identifies features most likely to explain differences between classes by coupling standard tests for statistical significance with additional tests encoding biological consistency and effect relevance²⁰. In GMine, selected features are presented as barcharts, where bars depict the importance of each feature (Figures S4 and S7).

As a preprocessing step, GMine also facilitates the reduction of data dimensionality by factor analysis. The main aim of factor analysis is to remove redundant features (i.e. highly correlating features) without much loss of information. Correlating features are described by a lower number of latent variables, called factors. Following factor analysis, the reduced dataset can be analyzed in GMine using the full range of statistical techniques provided.

Application of GMine for the Analysis of Malaria Protein Microarray Data

Using GMine, we re-examined protein microarray data generated in a prospective, longitudinal study investigating host immune response to the malaria parasite P. falciparum¹⁵. The study cohort included 194 individuals from the malaria endemic Kambila region in Mali. From each participant, blood samples were collected before and after the 8-month malaria season. The analyzed dataset represented host antibody response against 491 seropositive P. falciparum proteins profiled for 388 serum samples (194 individuals, 2 samples per individual).

Antibody response to the 491 P. falciparum proteins was strongly correlated (Fig. 1a,b). However, despite the high level of correlation, antibody profiles were highly subject-specific and clustered by age, time relative to the malaria season (before/after), concurrent parasitemia, incidence of febrile malaria during the 8-month study period, and days until the first malaria episode (Figs 1a and 2b). Antibody reactivity to 483 P. falciparum proteins (98.4% of the 491 seropositive proteins) increased with subject age (Fig. 2a) (ANOVA, FDR < 0.05) and was generally higher after the malaria season (Fig. 2a,c). Children showed a stronger increase in antibody response than adults (Fig. 2a) from before to after the malaria season. The significance of observed associations was tested using the multivariate statistical method redundancy analysis (RDA). In accordance with our clustering results, antibody responses were significantly associated with age, time of sample collection (before or after malaria season) and concurrent parasitemia (p < 0.001) (Figure S2). Age, days until first malaria episode, concurrent parasitemia and time of sample collection at the end of malaria season were positively associated with antibody reactivity; incidence of clinical malaria showed a negative association (Fig. 2d, Figure S2). Gender, hemoglobin type and days until the first episode were not associated with antibody reactivity (RDA, p > 0.05) (Figure S2).

Association between antibody response and immunity to malaria

We examined the association between antibody response to P. falciparum proteins before the malaria season and subsequent immunity to malaria during the 8-month malaria season. Using multiple linear regression we regressed antibody response on age and immunity to malaria (absence of clinical malaria yes/no during the 8-month season). None of the 491 seropositive P. falciparum proteins was association with protection from malaria after correction for multiple testing (Table S2).

Association between antibody response and immunity to malaria in children

To avoid the confounding effect of age on malaria risk, we re-analyzed the association between antibody reactivity to P. falciparum proteins and subsequent protection against clinical malaria in the subset of children aged 8 to 10 years. For this analysis we compared antibody profiles of children who did not experience clinical malaria during the 8-month malaria season (“protected”, n = 19) versus those who experienced ≥1 malaria episodes (“susceptible”, n = 29).

Protected and susceptible children had remarkably different antibody profiles before the malaria season, with protected children showing higher breadth (number of positive proteins) and stronger reactivity to a wide range of parasite proteins (Cluster 1 in Fig. 3a,b and c). In accordance with our results obtained by unsupervised clustering (Fig. 3a,d), multivariate analysis showed a significant association between antibody response and protection status (p = 0.02, CCA) (Figs 3e and S3). Antibody profiles against P. falciparum proteins were highly predictive of immunity to malaria. A support vector machine (SVM) evaluated by leave-one-out cross-validation was able to predict immunity to malaria with 95% accuracy.

We identified 62 parasite proteins with significantly increased mean signal intensity in children with resistance to malaria (FDR < 0.05, t-test) (Table S3). Three of these 62 proteins were highly associated with resistance (Bonferroni corrected p-value < 0.05) and were able to discriminate between protected and susceptible children with high accuracy (AUC of 0.8–0.9) (Table S3, Figs 4 and 5). However, antibody reactivity against proteins representing some of the leading malaria vaccine candidates circumsporozoite protein (CSP, PFC0210c), merozoite surface protein 1 (MSP-1, PFI1475w), merozoite surface protein 2 (MSP-2, PFB0300c), apical membrane antigen 1 (AMA-1, PF11_0344) and liver stage antigen 3 (LSA3, PFB0915w) did not discriminate between protected and susceptible children (FDR > 0.05) (Fig. 6). The high-dimensionality of the gene expression dataset can potentially affect the ability of standard statistical analysis to extract meaningful information. We therefore additionally analyzed the data using the LEfSe algorithm, which identified 12 Plasmodium proteins associated with protection (Table S9).

Our results indicate that immunity to malaria is related to a pattern of responses to multiple parasite proteins and not attributable to any single antigen. We therefore sought to identify the signature of the most relevant proteins associated with protection against clinical malaria. Stepwise forward logistic regression identified a six protein signature (Figure S4), which achieved perfect separation (AUC of 1) and improved power for discriminating between protected and susceptible children when compared to individual proteins (Table S3).

Application of GMine for the Analysis of Gene Expression in Human T Cells Following Activation

Using GMine, we examined gene expression of primary CD4⁺ and CD8⁺ T cells in resting (non-activated) and early activated states of 5 adults. T cell-mediated immunity is the central element of the adaptive immune system and involves constant surveillance from antigen-specific T cells to eliminate infections and malignant cells^21,22,23,24. T helper cells (CD4⁺) and effector T cells (CD8⁺) play crucial roles in the coordination and function of the adaptive immune system. CD8⁺ T cells are highly effective in direct lysis of pathogen infected cells. CD4⁺ T helper cells produce cytokines that are directly toxic to target cells, stimulate other immune cell lineages, or mobilize powerful inflammatory mechanisms.

Each T cell lineage was sorted from peripheral blood mononuclear cells (PBMCs) at rest or post activation (4 hrs) with PMA/ionomycin in a paired design from N = 5 healthy individuals. We examined gene expression in each T cell subtype using a custom 135-plex NanoString array covering core genes involved in human T cell recognition, survival, migration, adhesion, cytokine/chemokine secretion, differentiation and exhaustion.

Hierarchical clustering and principal components analysis (PCA) showed clear clustering of gene expression profiles by T cell subtype (CD4⁺/CD8⁺) (Fig. 7a and b) and activation status (Fig. 7a and c). These results were supported by multivariate statistics using canonical correspondence analysis (CCA), demonstrating that resting CD4⁺ and CD8⁺ T cells have significantly different gene expression profiles (p = 0.001, CCA). Analysis by CCA further showed that T cell activation significantly altered gene expression (p = 0.001).

Resting CD4⁺ and CD8⁺ T cells show markedly different gene expression profiles

As expected, resting CD4⁺ and resting CD8⁺ showed different expression profiles (Fig. 7b) and could be distinguished by differential expression of 67 genes (paired t-test, FDR < 0.05, Table S4), most notably the chemokine CCR4, the costimulatory molecule ICOS, the immune checkpoint receptor CTLA4, the chemokine receptor CCR7 and the transcription factor AHR. Resting CD8⁺ showed higher expression of the classic effector molecules PRF1, GZMA and IFN-γ compared to resting CD4⁺ T cells. Similar to our results obtained by paired t-test, LEfSe identified 63 genes that were differentially expressed between CD4⁺ and CD8⁺ T cells (Table S10).

Changes in gene expression after activation with PMA/ionomycin

Post activation, 90 genes were differentially expressed in the CD4⁺ compartment and 84 genes were differentially expressed in the CD8⁺ compartment (FDR < 0.05, paired t-test) (Tables S5, S6). Many genes showed rapid and marked upregulation post activation highlighting how fast transcriptional changes can occur in human lymphocytes. For example, in the CD4⁺ populations, IL-2 increased from one to over 150,000 molecules per sample within four hours. Across both T cell lineages, IL-2 was the most statistically robust marker of recent activation (Tables S5, S6 and Figure S5). The largest fold changes in activated CD4⁺ T cells were observed for IL-17 and IL-22 (Table S5 and Figure S5a), indicating that mitogen activation strongly drives a Th17 response. Unexpectedly, the “early activation” markers CD27 and CD28 were downregulated in post activated CD4⁺ T cells (Figure S5a). The most downregulated gene was KLF2, showing abolished expression upon activation (Table S5 and Figure S5a). KLF2 is involved in T cell trafficking and maintaining quiescence.

In the CD8⁺ T cell subset, the top three most highly expressed cytokines post activation were the classic Th1 effector molecules IL-2, IFN-γ and TNF (Table S6 and Figure S5b). Interestingly, post activated CD8⁺ showed no increase in any effector molecule measured, including GZMA/B/H/K/M/Y. Some effector molecules even decreased (Table S6). This is in contrast to CD4⁺ T cells which showed marked increase of GZMB post activation (Figure S6). Similar results were obtained using Bayesian t-test (data not shown).

Similar to our results obtained by paired t-test, LEfSe identified an association between the activation status of CD4⁺ and CD8⁺ and the expression of 87 and 91 genes, respectively (Tables S11 and S12).

CD4⁺ and CD8⁺ T cells show similar changes in gene expression post activation

Unexpectedly, CD4⁺ and CD8⁺ showed highly similar expression profiles post activation and 73 genes exhibited the same directional expression change across both subsets. This common activation footprint was reflected in a joined cluster of activated CD4⁺ and CD8⁺ lineages in a PCA analysis (Fig. 7c) and suggests the existence of some hardwired activation programs encoded within functionally diverse immune lineages. However, 22 genes showed differential expression in only one of the CD4⁺ and CD8⁺ subtypes (Tables S7, S8).

Gene expression profiles highly predictive of activation status

Network analysis of either CD4⁺ or CD8⁺ gene expression profiles resulted in two distinct, highly correlated gene clusters, representing genes with high expression in activated or resting T cells, respectively (Fig. 7e). Gene expression profiles were highly predictive of T cell activation status. A support vector machine (SVM) trained on the gene expression profiles of either CD4⁺ or CD8⁺ and evaluated by leave-one-out cross-validation achieved 100% accuracy in predicting the activation status (resting/activated). An optimal gene signature for predicting T cell activation status was determined by LASSO regularized regression and comprised five genes for the CD4⁺ (Figure S7a) and eight genes for the CD8⁺ compartments (Figure S7b). Two genes were shared across both signatures (STAT5A, IL2), three genes were unique for CD4⁺ (CD40LG, IRF4, IL10) and six genes were unique for CD8⁺ (IL22, GFI1, IL23A, COR6, IL12RB2, ILGA4).

Conclusions

GMine provides a comprehensive statistical and visualization toolbox that allows users to rapidly obtain comprehensive analyses from multidimensional biomolecular datasets, without scripting or programming knowledge. Robust and thorough analyses are obtained via a broad range of algorithms for clustering, classification and statistical testing. The software enables lab-based researchers, who may be unfamiliar with advanced statistical software packages, to use these complex types of analyses routinely in their work in order to gain a more comprehensive understanding of their data.

Our re-analysis of P. falciparum protein array data replicated previous results, demonstrating that GMine facilitates robust analysis of complex biomolecular datasets. In accordance with Crompton et al.¹⁵ we found that antibody responses to Plasmodium proteins are associated with time relative to malaria season, subject age and protection from malaria. Consistent with the previous study, we observed that susceptible children have strong reactivity to some proteins, but protected children tend to have a significantly higher breadth of response. These results are consistent with other reports indicating that the breadth of antibody response to malaria proteins rather than strong reactivity to individual proteins is critical for developing immunity to malaria^25,26,27. Antibody response to the malaria vaccine candidates CSP, LSA-3, AMA-1, MSP-1 and MSP-2 did not discriminate between protected and susceptible children in agreement with the previous study¹⁵, as well as with results of clinical trials of candidate AMA-1²⁸ and MSP-1²⁹ vaccines.

While Crompton et al. focused on individual antigens, we extended this study by analyzing antibody reactivity profiles and antibody responses against combinations of parasite antigens. We demonstrated that children who were protected or susceptible to malaria differed in their global antibody reactivity to P. falciparum proteins. Antibody profiles of protected and susceptible children formed distinct clusters when analyzed by hierarchical clustering and were predictive of immunity to malaria when using a SVM classifier. Our results further extend previous findings by identifying an antibody signature that dramatically improves the power to discriminate protected from susceptible children as compared with antibody responses to single antigens. These findings support the hypothesis that immunity to malaria is a higher-order phenomenon related to a combination of antibody reactivity rather than to single antibody reactions, consistent with our previous observations in experimentally immunized populations⁶.

We further extended the previous report by showing that antibody reactivity profiles are associated with age and time during the malaria season on a proteome-wide level, as evident by our cluster analysis. Finally, multivariate analysis in GMine found new positive correlations between antibody reactivity against Plasmodium proteins and concurrent parasitemia and day until first febrile malaria episode. Gender and hemoglobin type were not associated with antibody response, consistent with previous studies³⁰.

We also used GMine to study the expression of 135 key immune-related genes in CD4⁺ and CD8⁺ T cells in the resting and post activation state. The immunomonitoring of these core immune cell subsets has led to the discovery of many biological correlates of disease. A more complete understanding of T cell function will lead to new avenues for immunotherapeutic development. Previous studies have examined changes in human naïve and memory CD4⁺ T cell phenotypes following mitogen activation using microarrays and found 100–300 genes differentially expressed between states^31,32,33. Our study establishes proof-of-concept for the potential of GMine to dramatically advance this or similar fields, taking advantage of the new direct digital NanoString technology which is free from the inherent biases of PCR. Using a custom 135 gene array designed for the sensitive and precise profiling of key immune-related genes, GMine analyses showed a digital view of T cell gene expression (Fig. 7). Second, using GMine, we found that the gene expression profiles of CD8⁺ and CD4⁺ T cells are unique at rest but very similar post activation. The latter was a surprising finding and suggests the existence of a common “hardwired” activation system across many highly diverse human T cell lineages. To our knowledge this has not been described or investigated previously; with prior studies investigating only a single lineage. to our knowledge, previous studies investigated only a single lineage^31,32,33,34. The rapid post activation phenotype comprised a cytotoxic gene signature across both CD8⁺ and CD4⁺ T cells. Although not widely appreciated, “helper” CD4⁺ T cells have the potential for cytotoxic activity through the rapid production of effector molecules such as granzymes and FasL³⁵. Our findings highlight the extreme plasticity inbuilt in T cell biology and function.

Our results demonstrate that GMine provides a new framework for analyzing high-throughput omics’ data and that robust and comprehensive analysis can rapidly be obtained using a wide range of advanced data-mining techniques.

Methods

GMine Implementation

The web-frontend is implemented in Java using the JavaServer Faces architecture and the backend is implemented in Perl and the R statistical programming environment using existing packages and newly developed modules. The software is running on a virtual server of the National eResearch Collaboration Tools and Resources (Nectar) cloud. A wiki page providing extensive help and a tutorial, has been created using the MediaWiki package.

Importance of features selected by feature selection is characterized by Akaike information criterion (AIC) of the model if the feature was dropped from the model (step-wise regression), absolute of the t-statistic (LASSO regression), or random permutation (random forest). To generate correlation networks, first associations between features are computed using the Pearson’s correlation index or Spearman’s rho and used to ordinate nodes in a two dimensional plot by principal coordinates analysis (PCoA). In this way, correlating nodes are placed in close proximity and anti-correlating nodes are placed at distant locations in the network. Correlating nodes are then connected with edges.

Malaria Whole-Proteome Microarray Data

We re-examined our previously published protein microarray data generated in a prospective study investigating host immune response to the malaria parasite Plasmodium falciparum¹⁵. Crompton et al. used a protein microarray consisting of 2,320 probes (representing ~23% of the P. falciparum proteome) to profile host immune response (IgG) against malaria parasite proteins. The cohort comprised 225 individuals from Kambila, Mali. Plasma samples were collected from each individual before and after the 8-month malaria season. Individuals were classified as protected (no malaria episode during the 8 months study) or susceptible (≥1 malaria episodes). In our reanalysis we included the subset of 194 individuals for whom samples were available at both time points (before and after the malaria season). In accordance with the previous study, we assigned individuals to two age groups: age 2–10 years (n = 157) and 18–25 years (n = 37). Our analysis included antibody profiles of 388 serum samples (2 × 194).

Data preprocessing

Before re-analyzing the data in GMine, we first background corrected signal intensities and identified seropositive proteins. Protein microarray raw signal intensities (SI) were corrected for spot-specific background using the Axon GenePix Pro 7 software (Molecular Devices, Sunnyvale, CA). Mean intensity of the No-DNA negative control spots from each sample was subtracted from all individual target protein fragment spots. Negative or zero values after background subtraction were assigned a net value of 0. Background corrected data were normalized by log2-transformation. Proteins were considered seropositive for antibody response if the signal intensity was greater than the mean signal intensity of no-DNA negative controls plus 1.5 times the standard deviation (SD). Only seropositive proteins were analyzed further.

Analysis in GMine

For the re-analysis of Plasmodium antibody profiles in GMine, we used the normalized antibody signal intensities against the 491 seropositive protein fragments. Two samples were included for each of the 194 individuals, collected before (May) and after (December) the malaria season, resulting in a total of 388 samples. The following clinical variables were included: subject age (assigned to age groups 2–4/5–7/8–10/18–25 years), time of sample collection (before/after the malaria season), parasitemia (microscopically detectable parasite [yes/no]), hemoglobin type (AA, AC, CC), gender, incidence of malaria (number of clinical malaria episode during 8 month study period), and days until the first malaria episode after the sample collection in May [range: 29 days to 242 days (no episodes)].

Antibody profiles were correlated using Pearson’s correlation. Difference in average antibody signal intensities between age groups were tested by ANOVA. Antibody signal intensities before and after the malaria season were compared by paired t- test. The 388 antibody profiles were associated with multiple explanatory variables by the multivariate statistical methods redundancy analysis (RDA) and canonical correspondence analysis (CCA). In this analysis age, time of the malaria season, parasitemia, hemoglobin type, gender, days until the first malaria episode and incidence of malaria were included as explanatory variables.

A subset of 48 children aged 8 to 10 years was evaluated for association between antibody response and protection against clinical malaria. Children were defined as protected if they did not experience clinical malaria episode during the 8-month study period (n = 19) and susceptible otherwise (n = 29 children). Difference in average signal intensity (magnitude of response) and breadth of response (number of seropositive proteins detected in each individual) between protected and susceptible children were compared by t-test.

Multivariate analysis PCA, PCoA, RDA and CCA were used to assess the association between antibody signal intensities before the malaria season in May and subsequent protection against clinical malaria during the 8-month malaria season. CCA and RDA were run on antibody signal intensities including protection status (protected/susceptible), parasitemia and hemoglobin type as explanatory variables.

The association between antibody response to individual Plasmodium proteins and protection from clinical malaria was estimated by linear regression including age as covariate. The regression models had the form: antibody signal intensity = protection status (protected/susceptible) plus age. Antibody signal intensities before the malaria season between protected and susceptible children were compared by t-test. Area under the curve (AUC) was used to measure the power to discriminate between protected and susceptible children.

A support vector machine (SVM) was trained on the antibody profiles measured in May to discriminate between susceptible and protected children. Classification accuracy was evaluated by leave-one-out cross-validation. A forward stepwise logistic regression model was applied to identify the minimal subset of Plasmodium proteins associated with protection from clinical malaria. Protection status (protected/susceptible) was included as dependent variable and protein specific antibody signal intensities as explanatory variables. The quality of the final model including all selected proteins was evaluated by AUC and Akaike information criterion (AIC). P-values were adjusted for multiple testing by the False Discovery Rate (FDR) and Bonferroni correction.

Primary T Cell Activation and Nanostring Profiling

Gene expression changes in primary T cells lineages were explored using the NanoString technology in order to determine key pathways differentially used between rest and stimulation state. We designed a custom 135-plex nCounter custom codeset around genes encoding for T cell recognition, including cytokines/chemokines, cytokine receptors/chemokine receptors, effector molecules, subset markers, surface receptors, transcription factors, early activation markers, checkpoint markers and signaling molecules. Genes were selected based on published human T cell microarray data and current knowledge in infectious disease, autoimmunity and cancer immunobiology. Peripheral blood mononuclear cells (PBMCs) were derived from 5 healthy genetically unrelated individuals. Experimental protocols were approved by the QIMR Berghofer Medical Research Institute Human Research Ethics Committee and the methods were carried out in accordance with the relevant guidelines and regulations. All samples were obtained with written informed consent. PBMC were isolated by Ficoll-Paque PLUS (GE Healthcare) density gradient centrifugation and cryopreserved in R10 medium (RPMI-1640 containing 10% FBS) supplemented with 10% DMSO (Sigma-Aldrich). PBMCs were then split and left to rest or activated with 1X phorbol12-myristate13-acetate (PMA)/ionomycin (Jomar Life Research) for 4 hrs in R10. Cells were washed twice with PBS (Life Technologies) and stained with LIVE/DEAD^® Fixable Aqua and Violet Dead Cell Stain Kits (Life Technologies) and anti-CD14/CD16/CD19/CD3/CD4/CD8 (BioLegend). CD4⁺ (CD14⁻CD16⁻CD19⁻CD3⁺ CD4⁺ CD8⁻) and CD8⁺ (CD14⁻CD16⁻CD19⁻CD3⁺ CD4⁻CD8⁺ ) T cell populations from both activated and resting subsets were then sorted using an Aria IIIu (BD Biosciences) to > 99% purity. Total RNA from each population was extracted using RNAzol RT (Astral Scientific), isopropanol (Sigma Aldrich) and ethanol (Sigma Aldrich). Overnight nCounter codeset hybridisation was performed as per manufacturer’s instruction using 100 ng of sample RNA quantified by Nanodrop 1000 (Thermo Scientific).

Data preprocessing

Samples were processed using the NanoString GEN2 Prep Station and data acquired using the nCounter Digital Analyser (NanoString Technologies). Quality control and normalization was performed using nSolver (NanoString Technologies) as per the recommended settings. Quality control parameters included background, positive control and housekeeping normalization using spiked positive and negative controls, as well as five housekeeping genes (ACTB, B2M, GAPDH, HPRT1 and RPLP0). Normalization was performed on CD4⁺ and CD8⁺ T cell populations separately.

Analysis in GMine

Normalized data were log2-transformed in GMine. Gene expression profiles were associated with the T cell subtype (CD4⁺/CD8⁺) and with T cell activation status (activated/resting) using the multivariate statistical method canonical correspondence analysis (CCA). T cell subtype and T cell activation state were included as explanatory variables. Changes in expression of individual genes between activated and non-activated (resting) cells and between CD4⁺ and CD8⁺ were identified by paired t-test. P-values were adjusted for multiple testing by the False Discovery Rate (FDR) and Bonferroni correction. Gene expression profiles were correlated by Pearson’s correlation. A support vector machine (SVM) was trained on the gene expression profiles of either CD4⁺ or CD8⁺ to discriminate between activated and resting cell population and evaluated by leave-one-out cross-validation. Gene signatures for predicting T cell activation status were determined by LASSO regularized regression of the differentially expressed genes identified by the paired t-test. The quality of the final model including all selected genes was evaluated by AUC and Akaike information criterion (AIC).

Additional Information

How to cite this article: Proietti, C. et al. Mining, visualizing and comparing multidimensional biomolecular data using the Genomics Data Miner (GMine) Web-Server. Sci. Rep. 6, 38178; doi: 10.1038/srep38178 (2016).

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Kayala, M. A. Cyber-T web server: differential analysis of high-throughput data. Nucleic Acids Res 40, W553–559, doi: 10.1093/nar/gks420 (2012).
Article CAS PubMed PubMed Central Google Scholar
Saeed, A. I. TM4: a free, open-source system for microarray data management and analysis. Biotechniques 34, 374–378 (2003).
Article CAS PubMed Google Scholar
Pavlidis, P. Matrix2png: a utility for visualizing matrix data. Bioinformatics 19, 295–296 (2003).
Article CAS PubMed Google Scholar
Driguez, P. Schistosomiasis vaccine discovery using immunomics. Parasit Vectors 3, 4, doi: 10.1186/1756-3305-3-41756-3305-3-4 [pii] (2010).
Article PubMed PubMed Central Google Scholar
Doolan, D. L. Profiling humoral immune responses to P. falciparum infection with protein microarrays. Proteomics 8, 4680–4694, doi: 10.1002/pmic.200800194 (2008).
Article CAS PubMed PubMed Central Google Scholar
Trieu, A. Sterile protective immunity to malaria is associated with a panel of novel P. falciparum antigens. Mol Cell Proteomics 10, M111 007948, doi: 10.1074/mcp.M111.007948M111.007948 [pii] (2011).
Article PubMed PubMed Central Google Scholar
Tu, S. Protein microarrays for studies of drug mechanisms and biomarker discovery in the era of systems biology. Curr Pharm Des 20, 49–55 (2014).
Article CAS PubMed Google Scholar
Ramachandran, N. Applications of protein microarrays for biomarker discovery. Proteomics Clin Appl 2, 1444–1459, doi: 10.1002/prca.200800032 (2008).
Article CAS PubMed PubMed Central Google Scholar
Song, Q. Novel autoimmune hepatitis-specific autoantigens identified using protein microarray technology. J Proteome Res 9, 30–39, doi: 10.1021/pr900131e (2010).
Article CAS PubMed PubMed Central Google Scholar
Davies, D. H. Profiling the humoral immune response to infection by using proteome microarrays: high-throughput vaccine and diagnostic antigen discovery. Proc Natl Acad Sci USA 102, 547–552, doi: 10.1073/pnas.0408782102 (2005).
Article ADS CAS PubMed PubMed Central Google Scholar
Luevano, M. High-throughput profiling of the humoral immune responses against thirteen human papillomavirus types by proteome microarrays. Virology 405, 31–40, doi: 10.1016/j.virol.2010.05.011 (2010).
Article CAS PubMed Google Scholar
Sundaresh, S. Identification of humoral immune responses in protein microarrays using DNA microarray data analysis techniques. Bioinformatics 22, 1760–1766, doi: btl162 [pii]10.1093/bioinformatics/btl162 (2006).
Article CAS PubMed Google Scholar
Davies, D. H. Vaccinia virus H3L envelope protein is a major target of neutralizing antibodies in humans and elicits protection against lethal challenge in mice. J Virol 79, 11724–11733, doi: 10.1128/JVI.79.18.11724-11733.2005 (2005).
Article CAS PubMed PubMed Central Google Scholar
Davies, D. H. Proteome-wide analysis of the serological response to vaccinia and smallpox. Proteomics 7, 1678–1686, doi: 10.1002/pmic.200600926 (2007).
Article CAS PubMed Google Scholar
Crompton, P. D. A prospective analysis of the Ab response to Plasmodium falciparum before and after a malaria season by protein microarray. Proc Natl Acad Sci USA 107, 6958–6963, doi: 10.1073/pnas.1001323107 (2010).
Article ADS PubMed PubMed Central Google Scholar
Baldi, P. Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes. Bioinformatics 17, 509–519 (2001).
Article CAS PubMed Google Scholar
Sivaraksa, M. Predictive gene lists for breast cancer prognosis: a topographic visualisation study. BMC Med Genomics 1, 8, doi: 10.1186/1755-8794-1-8 (2008).
Article CAS PubMed PubMed Central Google Scholar
Smyth, G. K. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3, Article3, doi: 10.2202/1544-6115.1027 (2004).
Article MathSciNet PubMed Google Scholar
Clarke, R. The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nature Reviews. Cancer 8, 37–49, doi: 10.1038/nrc2294 (2008).
Article CAS PubMed PubMed Central Google Scholar
Segata, N. Metagenomic biomarker discovery and explanation. Genome Biol 12, R60, doi: 10.1186/gb-2011-12-6-r60 (2011).
Article PubMed PubMed Central Google Scholar
Miles, J. J. Understanding the complexity and malleability of T-cell recognition. Immunol Cell Biol 93, 433–441, doi: 10.1038/icb.2014.112 (2015).
Article CAS PubMed Google Scholar
Rossjohn, J. T cell antigen receptor recognition of antigen-presenting molecules. Annu Rev Immunol 33, 169–200, doi: 10.1146/annurev-immunol-032414-112334 (2015).
Article CAS PubMed Google Scholar
Bridgeman, J. S. Structural and biophysical determinants of alphabeta T-cell antigen recognition. Immunology 135, 9–18, doi: 10.1111/j.1365-2567.2011.03515.x (2012).
Article CAS PubMed PubMed Central Google Scholar
Broere, F. In Principles of immunopharmacology (eds Nijkamp, Franciscus Petrus & Parnham, Michael J. ) 1 online resource (xxxi, 728 p (Birkh*user Verlag, 2011).
Osier, F. H. New antigens for a multicomponent blood-stage malaria vaccine. Sci. Transl. Med. 6, 247ra102, doi: 10.1126/scitranslmed.3008705 (2014).
Article CAS PubMed PubMed Central Google Scholar
Dent, A. E. Plasmodium falciparum Protein Microarray Antibody Profiles Correlate With Protection From Symptomatic Malaria in Kenya. J Infect Dis 212, 1429–1438, doi: 10.1093/infdis/jiv224 (2015).
Article CAS PubMed PubMed Central Google Scholar
Richards, J. S. Identification and prioritization of merozoite antigens as targets of protective human immunity to Plasmodium falciparum malaria for vaccine and biomarker development. J Immunol 191, 795–809, doi: 10.4049/jimmunol.1300778 (2013).
Article CAS PubMed PubMed Central Google Scholar
Sagara, I. A randomized controlled phase 2 trial of the blood stage AMA1-C1/Alhydrogel malaria vaccine in children in Mali. Vaccine 27, 3090–3098, doi: 10.1016/j.vaccine.2009.03.014 (2009).
Article CAS PubMed PubMed Central Google Scholar
Ogutu, B. R. Blood stage malaria vaccine eliciting high antigen-specific antibody concentrations confers no protection to young children in Western Kenya. PLoS One 4, e4708, doi: 10.1371/journal.pone.0004708 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Tan, X. Hemoglobin S and C heterozygosity enhances neither the magnitude nor breadth of antibody responses to a diverse array of Plasmodium falciparum antigens. J Infect Dis 204, 1750–1761, doi: 10.1093/infdis/jir638 (2011).
Article CAS PubMed PubMed Central Google Scholar
Liu, K. Augmentation in expression of activation-induced genes differentiates memory from naive CD4⁺ T cells and is a molecular mechanism for enhanced cellular response of memory CD4⁺ T cells. J Immunol 166, 7335–7344 (2001).
Article CAS PubMed Google Scholar
Hess, K. Kinetic assessment of general gene expression changes during human naive CD4⁺ T cell activation. Int Immunol 16, 1711–1721, doi: 10.1093/intimm/dxh172 (2004).
Article CAS PubMed Google Scholar
Bektas, A. Age-associated alterations in inducible gene transcription in human CD4⁺ T lymphocytes. Aging 5, 18–36 (2013).
Article CAS PubMed PubMed Central Google Scholar
Lin, Z. Comparative microarray analysis of gene expression during activation of human peripheral blood T cells and leukemic Jurkat T cells. Lab Invest; a journal of technical methods and pathology 83, 765–776 (2003).
Article CAS Google Scholar
Medina, M. A. Granzyme B- and Fas ligand-mediated cytotoxic function induced by mitogenic CD28 stimulation of human memory CD4⁺ T cells. J Leukoc Biol 91, 759–771, doi: 10.1189/jlb.0511264 (2012).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We would like to thank all GMine users who have provided valuable feedback, gave suggestions on how to improve the software or helped to identify problems and bugs. We also acknowledge the National eResearch Collaboration Tools and Resources (Nectar) project for providing cloud computing resources for GMine. The Translational Research Institute is supported by a grant from the Australian Government. This work was supported by project grant funding the NHMRC (APP1069281 & APP1078987) and Perpetual (FR2013/0946). JJM is supported by the NHMRC Career Development Fellowship. DLD is supported by an NHMRC Principal Research Fellowship. PDC is supported by the Intramural Research Program of the NIH, National Institute of Allergy and Infectious Diseases.

Author information

Proietti Carla and Zakrzewski Martha contributed equally to this work.

Authors and Affiliations

QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
Carla Proietti, Martha Zakrzewski, Thomas S. Watkins, Shihab Hasan, Champa N. Ratnatunga, John J. Miles, Denise L. Doolan & Lutz Krause
Nestlé Research Centre, Vers-chez-les-Blanc, Lausanne, Switzerland
Bernard Berger
The University of Queensland Diamantina Institute, The University of Queensland, Translational Research Institute, Brisbane, QLD, Australia
Shihab Hasan, Marie-Jo Brion & Lutz Krause
Malaria Infection Biology and Immunity Unit, Laboratory of Immunogenetics, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, Maryland, 20852, USA
Peter D. Crompton
Centre for Biosecurity and Tropical Infectious Diseases, Australian Institute of Tropical Health & Medicine, James Cook University, Cairns, QLD, Australia
John J. Miles & Denise L. Doolan

Authors

Carla Proietti
View author publications
You can also search for this author in PubMed Google Scholar
Martha Zakrzewski
View author publications
You can also search for this author in PubMed Google Scholar
Thomas S. Watkins
View author publications
You can also search for this author in PubMed Google Scholar
Bernard Berger
View author publications
You can also search for this author in PubMed Google Scholar
Shihab Hasan
View author publications
You can also search for this author in PubMed Google Scholar
Champa N. Ratnatunga
View author publications
You can also search for this author in PubMed Google Scholar
Marie-Jo Brion
View author publications
You can also search for this author in PubMed Google Scholar
Peter D. Crompton
View author publications
You can also search for this author in PubMed Google Scholar
John J. Miles
View author publications
You can also search for this author in PubMed Google Scholar
Denise L. Doolan
View author publications
You can also search for this author in PubMed Google Scholar
Lutz Krause
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceived and designed the experiments: L.K., C.P., B.B., and J.J.M. Software development: L.K., M.Z., S.H. and C.N.R. Analyzed the data: C.P., T.S.W. and J.J.M. Prepared figures: C.P. Contributed reagents/materials/analysis tools: P.D.C. and D.L.D. Wrote the paper: L.K., C.P., M.J.B., J.J.M. and D.L.D. All authors reviewed the manuscript.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Figures

Supplementary Tables

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Proietti, C., Zakrzewski, M., Watkins, T. et al. Mining, visualizing and comparing multidimensional biomolecular data using the Genomics Data Miner (GMine) Web-Server. Sci Rep 6, 38178 (2016). https://doi.org/10.1038/srep38178

Download citation

Received: 28 July 2016
Accepted: 04 November 2016
Published: 06 December 2016
DOI: https://doi.org/10.1038/srep38178

This article is cited by

Characterization of the humoral immune response to the EBV proteome in extranodal NK/T-cell lymphoma
- Zhiwei Liu
- Yomani D. Sarathkumara
- Qing Lan
Scientific Reports (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

High-throughput proteomics: a methodological mini-review

Development of a fixed module repertoire for the analysis and interpretation of blood transcriptome data

Mapping protein states and interactions across the tree of life with co-fractionation mass spectrometry

Introduction

Results

GMine Server and User Interface

Data Upload and Normalization

Analysis Enabled by GMine

Quantitative visualization and univariate data analysis

Multivariate analysis

Analysis of repeated measures

Biomarker discovery and classification

Identification of relevant features from large multidimensional datasets

Application of GMine for the Analysis of Malaria Protein Microarray Data

Association between antibody response and immunity to malaria

Association between antibody response and immunity to malaria in children

Application of GMine for the Analysis of Gene Expression in Human T Cells Following Activation

Resting CD4+ and CD8+ T cells show markedly different gene expression profiles

Changes in gene expression after activation with PMA/ionomycin

CD4+ and CD8+ T cells show similar changes in gene expression post activation

Gene expression profiles highly predictive of activation status

Conclusions

Methods

GMine Implementation

Malaria Whole-Proteome Microarray Data

Data preprocessing

Analysis in GMine

Primary T Cell Activation and Nanostring Profiling

Data preprocessing

Analysis in GMine

Additional Information

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Ethics declarations

Competing interests

Electronic supplementary material

Supplementary Figures

Supplementary Tables

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Characterization of the humoral immune response to the EBV proteome in extranodal NK/T-cell lymphoma

Comments

Search

Quick links

Resting CD4⁺ and CD8⁺ T cells show markedly different gene expression profiles

CD4⁺ and CD8⁺ T cells show similar changes in gene expression post activation