In summary, these findings indicate that (a) almost all of the serum FSH measurements after long-term storage at -25°C are less than the original values with an average degradation of 25%, and (b) a shorter storage time (less than 11 months) at -25°C showed much less decline in immunoreactivity than samples stored for an average of two years. Also, almost all the serum FSH measurements that used the same assay manufacturer (Labs A and C) were similar when the frozen storage time was less than 11 months. Taken together, these results imply that frozen storage time has a large impact on the degradation of serum FSH samples, while the specific immunoassay does not appear to be as important of a factor in explaining the variation in the FSH values. This conclusion is supported by the ANOVA results.
The inter-laboratory variation increased directly with increased frozen storage time at -25°C. This trend was observed irrespective of which laboratory or immunoassay was used. The similarity in the FSH sample measurements after frozen storage times of 4-9 weeks and 10.5 months suggests that serum FSH samples are stable through ten months. While the Short Term Batch and the Long Term Batch represent different serum samples, these data strongly suggest that serum FSH samples do degrade to unacceptable levels after 2-3 years of frozen storage at -25°C. The degree of degradation (-25% between Labs A and B, -28% between Labs A and C) after long term frozen storage are unacceptable for research purposes, with the possible exception of studies of postmenopausal women where any value > 40 mIU/mL might be sufficient. The degree of agreement after short-term storage in the Short Term Batch is markedly improved (less than 5% variation using the Labs A and C assay manufacturer).
The results indicate an interesting difference between assay manufacturers in term of the degradation by FSH level. These data imply that the long term degradation is greater at higher levels of FSH with the Abbott Labs machine (Lab B) more so than with the Immulite machine (Labs A and C). This is an area for future research, as our short term batch had no samples over 15 mIU/mL and neither batch had any samples greater than 25 mIU/mL.
A strength of this study is the use of three different laboratories for the assays and two distinct batches of samples, stored for different lengths of time. The use of multiple laboratories served to model collaborative research. Furthermore, comparing two different labs that are part of the same institution was helpful for replicating discrepancies that can occur even within an institution. Having two different batches of samples frozen for different amounts of time allowed investigation of the impact of storage time while limiting the freeze-thaw cycles that could confound the interpretation of the measurements, unless the original samples were initially aliquoted into multiple subsamples.
Limitations of this study include a small sample size and limited population. There were only 30 total samples in this study, limiting the precision of our conclusions. However, this study can serve as a pilot for establishing variables for future research on this topic. The population of samples was limited to women attending a fertility clinic, some of whom will have higher than average FSH samples for reproductive age women. It is possible that the sample degradation may be different for samples with average FSH values versus samples with elevated FSH values; the trend test results imply that possibility. The range of FSH measurements in the Short Term Batch was narrower than in the Long Term Batch: the range of Lab A FSH values was 2.2 - 14.5 mIU/mL in the Short Term Batch versus 5.9 - 22.6 mIU/mL in the Long Term Batch. Thus, the samples do not equally reflect women with elevated, pre-menopausal levels of FSH.
It may also be beneficial in a future study to take the average of multiple measurements for each sample. In this study, we only measured each sample once for each measurement, and this may have allowed for some inaccuracies because each of these assays has an inherent variability. In addition, this study had a complex design with multiple levels of variation in both the length of frozen storage time and assay manufacturer. Further studies could be conducted to look at these variables independently.
This study serves to raise important considerations when using samples from different sites and the effect of frozen storage length. Areas for future research include freezing samples for various lengths of time in order to determine the degree of degradation as time progresses in samples stored for longer than 10 months. Additionally, other temperatures of storage could be investigated. All these samples were stored at -25°C, but other storage temperatures may possibly maintain the sample integrity longer. Urine FSH samples are ideally stored at 4°C , and there is a high correlation between serum and urine FSH samples, so the temperature of storage for serum FSH samples needs to be explored more in order to determine ideal storage conditions.
In addition, variability may also be related to individual laboratory techniques or human lab technique variability. Each laboratory's procedure in this study used a different immunoassay. Therefore, as has been suggested in previous research on FSH reliability, it may be useful to have standardized immunoassays within a research protocol .
FSH measurements have important clinical implications for medical management as well as for entry criteria into studies. Due to the increasing trend towards having multiple clinical sites collaborate on research studies in order to increase sample sizes, it is now essential to ensure that there is acceptable reliability of the FSH measurement across research labs. This study served to demonstrate that frozen storage time primarily and assay manufacturer secondarily may lead to decreased reliability of the measurements of human serum FSH samples.