A cross-study gene set enrichment analysis identifies critical pathways in endometriosis

Background Endometriosis is an enigmatic disease. Gene expression profiling of endometriosis has been used in several studies, but few studies went further to classify subtypes of endometriosis based on expression patterns and to identify possible pathways involved in endometriosis. Some of the observed pathways are more inconsistent between the studies, and these candidate pathways presumably only represent a fraction of the pathways involved in endometriosis. Methods We applied a standardised microarray preprocessing and gene set enrichment analysis to six independent studies, and demonstrated increased concordance between these gene datasets. Results We find 16 up-regulated and 19 down-regulated pathways common in ovarian endometriosis data sets, 22 up-regulated and one down-regulated pathway common in peritoneal endometriosis data sets. Among them, 12 up-regulated and 1 down-regulated were found consistent between ovarian and peritoneal endometriosis. The main canonical pathways identified are related to immunological and inflammatory disease. Early secretory phase has the most over-represented pathways in the three uterine cycle phases. There are no overlapping significant pathways between the dataset from human endometrial endothelial cells and the datasets from ovarian endometriosis which used whole tissues. Conclusion The study of complex diseases through pathway analysis is able to highlight genes weakly connected to the phenotype which may be difficult to detect by using classical univariate statistics. By standardised microarray preprocessing and GSEA, we have increased the concordance in identifying many biological mechanisms involved in endometriosis. The identified gene pathways will shed light on the understanding of endometriosis and promote the development of novel therapies.


Background
Endometriosis is defined as the presence of endometriumlike tissue in sites outside the uterine cavity and occurs in 6-10% of women in the general population [1]. The main clinical features are chronic pelvic pain, pain during intercourse, and infertility [2]. As cellular and molecular mech-anisms involved in endometriosis are still uncovered, the classification of this disease evolved from a local disorder to a complex, chronic systemic disease [3]. Despite extensive researches, the etiology of endometriosis remains obscure. Gene expression profiling has been used in several studies of endometriosis, in which from a few to hundreds differentially expressed genes were identified [4][5][6][7][8][9][10][11][12][13][14][15][16][17]. For previously identified genes, their roles in the pathogenesis of endometriosis are further discussed. But it is hard to interpret individual genes on a list with many significant genes.
A common challenge in the analysis of genome wide expression no longer lies in obtaining gene expression profiles, but rather in interpreting the results to gain insights into biological mechanisms [18]. Pathway analysis of microarray data evaluates gene expression profiles of a priori defined biological pathways in association with a phenotype of interest. Recently gene expression patterns were further used in the classification of subtypes of endometriosis as well as in the identification of the pathways involved in endometriosis [4,[13][14][15][16]. So far the observed pathways were discordant between the studies that suggest that these previously identified pathways only represent a fraction of the pathways involved in endometriosis.
Currently the most well-known and widely used approach to gene set analysis, the Gene Set Enrichment Analysis (GSEA) method was introduced by Mootha et al. [19], which was used to identify pre-defined gene sets which exhibited significant differences in expression between samples from normal and patients. The methodology was subsequently refined by Subramanian et al. [18]. The algorithms calculate the statistical significance of the expression changes across groups or pathways rather than individual gene, thus allowing identification of groups or pathways most strongly affected by the observed expression changes. The analysis based on a group of relevant genes instead of on an individual gene increases the like-lihood for investigators to identify the critical functional processes under the biological phenomena under study. GSEA is likely to be more powerful than conventional single-gene methods in the study of complex diseases in which many genes make subtle contributions [20].
In a single data set the GSEA will generally not result in significant findings beyond major pathways. Here we will use standardised microarray preprocessing and GSEA with comprehensive expression profiles in an attempt to find greater data convergence and provide a systematic insight into the pathways altered during endometriosis pathogenesis.

Datasets
We searched GEO http://www.ncbi.nlm.nih.gov/geo/, and ArrayExpress http://www.ebi.ac.uk/arrayexpress/ for the gene expression profiling studies related to endometriosis disease. Data were included in our re-analysis if they met the following conditions: 1) the data is in genomewide; 2) comparison was conducted between endometriosis patients and controls; 3) complete microarray raw or normalized data are available.
Finally six public gene expression data sets were involved in our study, which assessed endometriosis transcripts on a genome-wide basis. In data set GSE7307, total 677 samples from more than 90 distinct tissue types were processed, but only the profiles related to endometriosis and eutopic endometrium were considered here. The data generated from human endometrial endothelial cells by Sha et al. [4] were also included in our combined re-analysis to compare with the whole endometriosis tissues data sets. The related information about these datasets, such as the microarray platform, sample type, sample size, is listed in Table 1. paried: compare eutopic endometrium to ectopic endometrium from the same patients with entire endometrial tissue. unparied: compare eutopic endometrium from women with endometriosis to eutopic endometrium from women without endometriosis. HEECS: human endometrial endothelial cells samples.

Data Preprocessing
Data preprocessing was performed using software packages developed in version 2.4.0 of Bioconductor [21] and R version 2.9.0 [22]. Each Affymetrix dataset was background adjusted, normalized and log2 probe-set intensities calculated using the Robust Multichip Averaging (RMA) algorithm in affy package [23,24], and the Codelink arrays normalizations performed in GSE5108 were retained. Genes which cannot be mapped to any KEGG pathway identified were excluded from the further analysis. The interquartile range (IQR) was used as a measure of variability. From the resulting distribution of IQR values for all genes we set a cut-off so as to exclude values under 0.5. When multiple probe sets target one gene, the probe set with largest variability was kept. Pathway analysis was performed separately in each data set.

Gene set enrichment analysis of pathways
GSEA implemented in the Category package (version 2.10.1 [25]). The goal of GSEA is to determine whether the members of a gene set S randomly distributed throughout the entire reference gene list L or are primarily found at the top or bottom. One of the advantages of GSEA is the relative robustness to noise and outliers in the data. The gene sets represented by less than 10 genes were excluded. The t-statistic mean of the genes was computed in each pathway. Using a permutation test with 1000 times, the significantly changed pathways were identified with p-value ≤ 0.05.

Results
For the studies which used multiple locations (ovarian and peritoneal) or uterine cycle phases (proliferative to secretory) of endometriosis, each type or phase was treated as a separate data set. These six studies provided 9 case-control data sets including 74 endometriosis cases and 74 controls. Common GSEA method was applied to the 9 datasets. For individual analysis, we obtained the significant pathways and the genes included [See Additional file 1, 2, 3 and 4]. The analysis results were summarised in table 2. We postulated that the pathways and genes that appear consistently as different expressed in multiple studies are more likely to be important in endometriosis. To look for such convergence we compared the GSEA results.

Common significant pathways in ovarian endometriosis
Endometriosis is most commonly localized in the ovaries.
We first compared the ovarian endometriosis data sets for their convergence and reproducibility. The results for the ovarian endometriosis data sets were presented in Table 3.

Common significant pathways in peritoneal endometriosis
We also analyzed the data sets from peritoneal endometriosis which is another important endometriosis type. 23  Figure 2. Of these 23 pathways, 12 up-and one down-regulation pathways were also identified from the three ovarian endometriosis data sets. The significances of four pathways (Drug metabolism -cytochrome P450, Metabolism of xenobiotics by cytochrome P450, Olfactory transduction, Toll-like receptor signaling pathway) were found from the data of Eyster et al. [15], but not in the data of Hull et al. [17]. Collective genes in each pathway were extracted [See Additional file 6].

Differential pathways between the timing of the uterine cycle
Burney et al. conducted global gene expression analysis of endometrium from women with and without moderate/ severe stage endometriosis and compared the gene expression signatures across various phases of the menstrual cycle [14]. Specimens were classified as proliferative (PE, d [8][9][10][11][12][13][14], early secretory (ESE, d 15-18), midsecretory (MSE, d 19-23). They did not examine later than Day 23 in the menstrual cycle. Each phase had endometriosis more than one type. We re-analyzed their microarray CEL files and each phase was treated as a separate data set. In proliferative phase, 4 pathways were up-regulated and 3 pathways were down-regulated. Early secretory phase had the most over-represented pathways, 12 were up-regulated and 21 were down-regulated. There is no significant pathway up-regulated at p ≤ 0.05 in midsecretory phase while 22 down-regulated pathways were identified [See Additional file 3]. The overlap of pathways among these phases is very low (Figure 3).

Differential expression in human endometrial endothelial cells
Sha et al. selected the eutopic endothelial compartment as the subject for exploring the differential expression profile between endometriosis patients and normal controls [4]. There is no overlap compared with the common list from ovarian endometriosis data sets using the whole tissue as sample.

Discussion
Endometriosis is an enigmatic disease. No existing single theory can explain all cases of endometriosis. The genome-wide microarrays are very powerful because they allow the identification of gene families or pathways that change in concert in a disease state comprehensively. Biologically relevant inference should therefore be reproducible across laboratories. For single gene analysis, different statistical methods and different datasets examining the same biological condition may lead to significant discrepancies [26]. Pathways analysis applied to different datasets yields interesting common results, diminishing the large discrepancies observed in direct comparisons of lists of differentially expressed genes obtained from different datasets. Therefore, the study of complex diseases through pathway analysis is able to highlight genes weakly connected to the phenotype which may be difficult to detect by using classical univariate statistics.
We have performed gene set enrichment analysis of six independent publicly available gene expression data sets to understand in depth the common biological mechanisms involved in endometriosis. Our study compared the gene expression between lesion locations (ovarian vs. peritoneal), phases of the uterine cycle (proliferative to midsecretory) and cell types (endometrial endothelial cells vs. whole tissue), as well as overall eutopic versus ectopic endometrium. The transcriptomes of eutopic endometrium and ectopic endometrial lesions suggest that ovarian endometriosis and peritoneal disease are different disorders [13]. Our findings suggest that most of the pathways impacted the ovarian and peritoneal endometriosis are consistent. Many of differentially expressed pathways found in this study have already been reported to be involved in endometriosis pathogenesis.
Here, this discussion presents several of the differentially Significant pathways identified and overlap between the 3 ovarian endometriosis data sets expressed pathways and hypotheses regarding the role of these pathways in endometriosis.
Most significant of the common up-regulated pathways are involved in immune system and immune disorders. It has been widely documented that endometriosis, as an inflammatory disease, induces an immune response, leading to both cellular and humoral immune changes [27,28]. The association between endometriosis and immune disorders were literature supported [29]. Also some studies concluded that women with endometriosis do not have a higher risk of having asthma, systemic lupus erythematosus and Sjögren's syndrome than other subjects [30,31]. Our GSEA results showed that expression of Asthma, Graft-versus-host disease, Autoimmune thyroid disease, Allograft rejection, Systemic lupus erythematosus and Type I diabetes mellitus pathways are the significantly imbalanced between endometriosis and eutopic endometrium. We found that human leukocyte antigen (HLA) genes are critical genes in these pathways. HLA are key components of the major histocompatibility complex (MHC), which is involved in immune cell signalling processes such as T-cell activation. People with certain HLA antigens are more likely to develop certain autoimmune diseases, such as Type I Diabetes, Ankylosing spondylitis, Celiac Disease, Systemic Lupus Erythematosus, Myasthenia Gravis and Sjögren's syndrome et al [32].
Cytokine-cytokine receptor interaction and Cell adhesion molecules (CAMs) included in GSEA were up-regulated in endometriosis. Cell adhesion molecules are (glyco) proteins expressed on the cell surface and play a critical role in a wide array of biologic processes that include hemostasis, the immune response, inflammation, embryogenesis, and development of neuronal tissue. Clinical observations and in vitro experiments imply that endometriotic cells are invasive and able to metastasize. Analogous to tumour metastasis, it is likely that cell adhesion molecules are central for the invasion and metastasis of endometriotic cells. The expression of some integrins is aberrant in endometriotic lesions compared to eutopic endometrium [33]. Cytokines are key mediators of intercellular communication within the immune system. Several cytokines including interleukin (IL)-1, 6, 8, 10, tumor necrosis factor (TNF)-α, and vascular endothelial growth factor (VEGF) were reported to be increased in the peritoneal fluid (PF) of women with endometriosis [34][35][36][37][38][39][40][41]. Peroxisome proliferator-activated receptors (PPARs) signaling pathway is up-regulated according to GSEA. PPAR are nuclear hormone receptors that are activated by fatty acids and their derivatives. PPAR-γ is present in human ovarian cells. Activation of PPAR-γ enhances steroidogenesis via activation of StAR protein and leads to the activation of insulin-signaling pathways [42].
The expression patterns of ER (estrogen receptors) and PR (progesterone receptors) in endometriotic lesions are different from those in the eutopic endometrium. Endome-Significant pathways identified and overlap between the 2 peritoneal endometriosis data sets  triosis is an estrogen-dependent disease [43]. Studies of hormone-ligand binding assays and enzyme immunoassays showed a consistent reduction in the content of ER and PR in endometriotic implants [44][45][46][47]. Androgen and estrogen metabolism pathway appeared in most of our re-analysis results. Oxidative phosphorylation pathway possibly affect oocyte quality, fertilization rate, and further embryo development [48], is down-regulated in our analysis.
Burney et al. studied proliferative, early secretory and midsecretory eutopic endometrium (up to Day 23), from women with endometriosis and controls. They found that endometrial gene expression differed most, between these groups, in the early secretory phase (Days 15-18). They found far fewer differences in the mid-secretory phase where no transcripts were found to be up-or down-regulated 4-fold. The result is consistent with the findings of other studies [12,49,50]. Corresponding to their result, there is no significant up-regulated pathway in mid-secretory phase by our GSEA. The molecular phenotype of midsecretory, eutopic endometrium from women with endometriosis and from controls appears to be very similar [50]. Current efforts to develop minimally invasive diagnostic tests for the presence of endometriosis and also tests to distinguish minimal/mild and moderate/severe disease, by sampling the endometrium, should be focused on the early secretory phase of the menstrual cycle [50].
We hypothesize that all cell types in the endometriotic lesion contribute to the pathology of the disease. Matsuzaki et al. compared global gene expression in eutopic endometrium, from controls and patients with deep endometriosis, at various time points throughout the menstrual cycle. They found no genes were up-or downregulated in all phases of the cycle, in either tissue compartments [12]. None of the genes from their study that had been identified as differentially expressed in either the stromal or epithelial compartments was shown to be differentially expressed. This may be due to the relative contribution that the epithelial and stromal transcriptomes make to whole tissue gene expression. Our GSEA result showed that the significant pathways from the human endometrial endothelial cells had low overlap with the list from ovarian endometriosis data sets used whole tissue.

Conclusion
The pathogenesis of endometriosis is likely multifactorial. A deeper understanding of the mechanisms of these diseases can be reached by focusing on deregulation of gene sets or pathways rather than on individual genes. By standardised microarray preprocessing and GSEA, we have increased the concordance to identify many biological mechanisms are involved in endometriosis which are novel in terms of their connection to endometriosis (as mined from the existing literature). More studies about the specific role and interactions of the genes included in related pathways are needed to improve the understanding of endometriosis.

Additional material
Additional file 1 Significant pathways identified and overlap between the phases of uterine cycle Additional file 2