Study population
One hundred and ten patients, aged 18–39 years, who were eligible for treatment by Intra Uterine Insemination (IUI) between June 1997, to December 1999, participated in the study. This study is part of a prospective randomized study of regular menstruating patients to the determination of ovarian reserve [13]. Their infertility was either idiopathic for > 3 years and/or due to a male factor and/or cervical hostility. Cervical hostility was diagnosed by means of a well timed negative postcoital test, that is, no progressive motile spermatozoa seen at a magnification of 400× in good cervical mucus despite normal semen parameters.
Patients had to have regular menstrual cycles with an ovulation, which was confirmed by a biphasic body temperature chart and an endometrium biopty dating in the luteal phase, two ovaries and showed two patent tubes with hysterosalpingography or at least one patent Fallopian tube with no further pathology with diagnostic laparoscopy. They were naive for IVF treatment. Excluded were patients with an oligo- or amenorrhoea (9 or fewer cycles a year) or a severe male factor, defined as (1) less than 1 million motile spermatozoa after Percoll centrifugation (gradient 40/90) and/or (2) > 20% antibodies present on the spermatozoa after processing with Percoll centrifugation (gradient 40/90) and/or (3) > 50% of the spermatozoa without an acrosome. Other exclusion criteria were untreated or insufficiently corrected endocrinopathies, clinically relevant systemic diseases, or a body mass index > 28 kg/m2.
The protocol was approved by the Institutional Review Board and the Committee on ethics of research involving human subjects of the Vrije Universiteit Medical Centre, Amsterdam, the Netherlands. All the couples participating in the study signed informed consent.
Treatment protocol
Patients were randomized by a computer designed 4-blocks system into two groups [13]. Fifty six patients underwent a transvaginal sonography to measure the basal ovarian volume and count of basal antral follicle and a Clomiphene citrate challenge test (CCCT), and 54 patients underwent an transvaginal sonography to measure the basal ovarian volume and count of basal antral follicle and an Exogenous Follicle stimulating hormone Ovarian Reserve Test (EFORT). In all patients, the test was followed by an IVF treatment under a long protocol. The bFSH level, bE2 level and bInhibin B level were determined as an integral part of all CCCT's and EFORT's, as described previously [13]. Van der Meer et al. [15] showed that in eumenorrheic patients, the median (range) FSH threshold level for monofollicular growth was 5.3 (4.3–8.2) IU/l and the median (range) threshold dose was 75 IU (0.5–1.75) FSH/day.
The FSH threshold was determined by a low dose step-up regimen of FSH given intravenously after pituitary desensitization with GnRH agonist. It was concluded that by an increment of 1/2 ampoule of FSH (37.5 IU) above the threshold dose for monofollicular growth, the maximum response is already obtained. It seems that in IVF stimulation maximal effect is reached with FSH dosages up to 225 IE [16–18]. Combining these facts, it can be concluded that an initial stimulation by 3 ampoules of 75 IU of FSH under a long (GnRH agonist suppressed) protocol, probably gives a maximal IVF stimulation, the outcome of which could be used as the gold standard for the cohort size.
Transvaginal sonography measurements
All ultrasound examinations were performed by one of the authors (J.K, R.S) using an Aloka SSD-1700 ultrasound apparatus (5.0 MHz probe).
The volume of each ovary was calculated by measuring in three perpendicular directions and applying the formula for an ellipsoid: (D1 × D2 × D3 × π/6). The volumes of both ovaries were added for the total basal ovarian volume (BOV).
To determine the diameter of the follicle, the mean of measurements in two perpendicular directions was taken. The numbers of follicles in both ovaries were added for the total antral follicle count (AFC). The follicles visualized and counted by TVS in the early follicular phase are 2–10 mm in size.
Clomiphene citrate challenge test (CCCT)
starting on the fifth day of the menstrual cycle (CD 1 = day of onset of menses) 100 mg of Clomiphene citrate (Serophene®; Serono, Geneva, Switzerland) was administered for 5 days. In this study on CD 2 or 3 (basal values) and on CD 10 (stimulated values) the serum FSH was determined. Analysis of the CCCT [13] was performed by the parameter: bFSH + sFSH.
Exogenous Follicle stimulating hormone Ovarian Reserve Test (EFORT)
on CD 3, 300 IU recFSH (Gonal-F®, Serono, Geneva, Switzerland) were administered subcutaneously (s.c). In this study blood samples for the determination of FSH, E2 and Inhibin B were drawn: just before (basal values) and 24 hrs after (stimulated values) the administration of FSH. Analysis of the EFORT [13] included the following parameters: E2-increment and Inhibin B-increment 24 hrs after administration of FSH.
IVF-treatment
The ovarian hyperstimulation protocol was performed according to a long GnRH-agonist protocol starting in the midluteal phase. On CD 3 of the first cycle the ovarian volume and antral follicle count was measured by transvaginal sonography (TVS) examinations as described above. Also on CD 3 the CCCT or the EFORT was performed as described above. In the subsequent midluteal phase, seven days after ovulation, daily s.c. injections with triptoreline-acetate (Decapeptyl®, 0.1 mg/day; Ferring, Hoofddorp, the Netherlands) were started. Because of the administration of the GnRH-agonist, patients were advised to use a barrier type of contraception during this cycle. On CD 3 of the next cycle, ovarian hyperstimulation was started with daily s.c. injections of a fixed dose of 225 IU uFSH (Metrodin HP®, 75 IU/amp; Serono, Geneva, Switzerland), because this dosage probably gives a maximal effect in follicle stimulation. Standard procedures were followed including transvaginal sonography (TVS) (Aloka SSD-1700, 5.0 MHz probe) on CD 2 or 3 and on CD 9 or 10. Daily TVS was performed from the moment when the leading follicle reached a diameter of 16 mm. Ovarian hyperstimulation was continued until the largest follicle reached a diameter of at least 18 mm. The maximum duration of uFSH administration allowed was 16 days. If these criteria were met, Metrodin HP® and Decapeptyl® were discontinued and 10.000 IU of hCG (Profasi®, 10.000 IU/amp; Serono, Geneva, Switzerland) were administered. On the day of hCG, TVS was performed to count the result of ovarian hyperstimulation (all follicles = 10 mm) expressed as the total number of follicles. TVS guided follicular aspiration (FA) was performed 36 hours after hCG administration. On the day of hCG administration E2 was determined. Follicular aspiration was done under systemic analgesia (7.5 mg diazepam orally and 50–100 mg pethidine hydrochloride intramuscularly), and all follicles present were aspirated.
Serum assay
Serum E2 was determined by a competitive imunoassay (Amerlite, Amersham, UK). For E2, the inter-assay CV was 11% at 250 pmol/l and 8% at 8000 pmol/l, the intra-assay coefficient of variation (CV) was 10% at 350 pmol/l. 8% at 1100 pmol/l and 8% at 5000 pmol/l. The lower limit of detection for E2 was 90 pmol/l. In the EFORT and CCCT we measured E2 by a sensitive radioimmunoassay (Sorin, Biomedica, Saluggia, Italy). This measurement of E2 was abbreviated as EE. For EE, the inter-assay CV was 11% at 60 pmol/l, 8% at 200 pmol/l, 11% at 550 pmol/l and 8% at 900 pmol/l. The intra-assay CV was 4% at 110 pmol/l and 5% at 1000 pmol/l. The lower limit of detection for EE was 18 pmol/l. FSH was determined by a commercially available immunometric assay (Amerlite, Amersham, UK). For FSH, the inter-assay CV was 9% at 3 IU/l and 5% at 35 IU/l, the intra-assay CV was 9% at 5 IU/l, 8% at 15 IU/l and 6% at 40 IU/l. The lower limit of detection for FSH was 0.5 IU/l. Inhibin B was determined immunometrically by a commercially available assay (Serotec Limited Oxford UK). For Inhibin B, the inter-assay CV was 17% at 25 ng/L, 14% at 55 ng/L and 9% at 120 ng/L and the intra-assay CV was 8% till 40 ng/l and 5% at > 40 ng/l. The lower limit of detection for Inhibin B was 13 ng/l.
Half-way through the study (after 62 patients), the Amerlite assay used to assess FSH was withdrawn from the market and was replaced by another commercially available assay (Delfia, Wallac, Finland). The two assays have been compared and showed excellent linear correlation, although a shift in the values took place (Delfia FSH = 1.28 × Amerlite FSH + 0.01 (r = 0.9964)). For the Delfia FSH, the inter-assay CV was 5% at 3.5 IU/l and 3% at 15 IU/l. All FSH determinations have been recalculated and are expressed according to the Delfia assay. The lower limit of detection for FSH was 0.5 IU/l.
Values below the detection limit of an assay were assigned a value equal to the detection limit of that assay.
Statistical analysis
The outcome measure of the first part of this study was the result of ovarian hyperstimulation expressed as the number of follicles. In our former study [13], we estimated the value of the independent variables by univariate linear regression, age, bFSH, CCCT-results, E2-increment in EFORT, inhibin B-increment in EFORT. In this study, we estimated by univariate linear regression, the value of the independent variables: total basal ovarian volume and the total basal antral follicle count in predicting the ovarian response. Stepwise regression analysis was used to find a prediction model for the ovarian response. The R square of the correlation of these variable(s) with the total number of follicles obtained after stimulation reflects the proportion of the variability of the number of follicles explained by this variable(s).
The outcome measure of the second part of this study was the result of ovarian hyperstimulation expressed as the number of retrieved oocytes.
We defined a 'poor' ovarian response as less than 6 oocytes after ovarian hyperstimulation in an IVF treatment and a 'hyper' response as more than 20 oocytes after such an IVF treatment. Among women undergoing in vitro fertilization, the chances of a live birth are related to the number of eggs fertilized, presumably because of the greater selection of embryos for transfer. The low success rate when only two eggs were fertilized reflects the lack of choice among embryos for transfer [19]. We have in our laboratory the experience that we have an overall 50–60% chance of fertilisation. Taken this togheter, at least 6 oocytes are required for three or more fertilized eggs.
We defined a hyper response when there were > 20 oocytes. This was based on the knowledge that the pregnancy rates do not increase when > 20 oocytes are retrieved. Moreover, such cases have a significant risk of a severe OHSS [14].
In our former study [14], we examined the value of the independent variables by univariate logistic regression: age, bFSH, binhibin B, CCCT-results, E2-increment in EFORT, inhibin B-increment in EFORT. In this study we examined by univariate logistic regression, the value of the independent variables: total basal ovarian volume and the total basal antral follicle count in predicting the ovarian response in predicting a poor and hyper response after ovarian hyperstimulation in IVF. Subsequently multivariate logistic regression analyses were used to develop prediction models for the ovarian response. The area under the receiver operating characteristic curve (ROC-AUC) was computed to assess the predictive accuracy of the logistic models. ORT evaluation using ovarian response as reference or outcome variable should imply the assessment of predictive accuracy and clinical value of the test. Accuracy refers to the degree by which the outcome condition is predicted correctly. Summary statistics of accuracy include sensitivity (rate of correct identification of cases with poor response) and specificity (rate of correct identification of cases without poor response). To identify all cases that will respond poorly to stimulation without judging many normal responders badly, the test must have high sensitivity and high specificity.
The Receiver Operating Characteristic curve (ROC curve) is a plot of the true positive rate against the false positive rate for the different possible cutpoints of a diagnostic test. An ROC curve demonstrates the tradeoff between sensitivity and specificity (any increase in sensitivity will be accompanied by a decrease in specificity). The closer the curve follows the left-hand border and then the top border of the ROC space, the more accurate the test. The area under the ROC curve provides information on the overall discriminatory capacity of the test. Values of 1.0 imply perfect and that of 0.5 indicate completely absent discrimination.
To define a 'normal' and an 'abnormal' test, sensitivity, specificity, positive predictive value and accuracy were used to find the optimal cut off level.
Comparison of means was done with the unpaired t-test. For all tests the significance level was 0.05.
Statistical analysis of the data was performed with SPSS (Statistical package for Social Sciences; SPSS, Inc., Chicago, IL) for Windows.