Does conventional morphological evaluation still play a role in predicting blastocyst formation?

Background Advanced models including time-lapse imaging and artificial intelligence technologies have been used to predict blastocyst formation. However, the conventional morphological evaluation of embryos is still widely used. The purpose of the present study was to evaluate the predictive power of conventional morphological evaluation regarding blastocyst formation. Methods Retrospective evaluation of data from 15,613 patients receiving blastocyst culture from January 2013 through December 2020 in our institution were reviewed. Generalized estimating equations (GEE) were used to establish the morphology-based model. To estimate whether including more features regarding patient characteristics and cycle parameters improve the predicting power, we also establish models including 27 more features with either LASSO regression or XGbosst. The predicted number of blastocyst were associated with the observed number of the blastocyst and were used to predict the blastocyst transfer cancellation either in fresh or frozen cycles. Results Based on early cleavage and routine observed morphological parameters (cell number, fragmentation, and symmetry), the GEE model predicted blastocyst formation with an AUC of 0.779(95%CI: 0.77–0.787) and an accuracy of 74.7%(95%CI: 73.9%-75.5%) in the validation set. LASSO regression model and XGboost model based on the combination of cycle characteristics and embryo morphology yielded similar predicting power with AUCs of 0.78(95%CI: 0.771–0.789) and 0.754(95%CI: 0.745–0.763), respectively. For per-cycle blastocyst yield, the predicted number of blastocysts using morphological parameters alone strongly correlated with observed blastocyst number (r = 0.897, P < 0.0001) and predicted blastocyst transfer cancel with an AUC of 0.926((95%CI: 0.911–0.94). Conclusion The data suggested that routine morphology observation remained a feasible tool to support an informed decision regarding the day of transfer. However, models based on the combination of cycle characteristics and embryo morphology do not increase the predicting power significantly. Supplementary Information The online version contains supplementary material available at 10.1186/s12958-022-00945-y.


Introduction
Advances in embryo culture systems promote the currency of moving toward blastocyst transfer [1]. Extending the duration of embryo culture to the blastocyst stage may have several advantages, including a higher implantation rate over cleavage transfer and the potential to reduce the number of embryos transferred. Theoretically, blastocyst culture may help select the most viable embryo for transfer. However, low blastocyst formation rate may lead to an increase risk of transfer cancellation [1]. While the good-prognosis patients may benefit from blastocyst transfer, the patients with unfavorable characteristics, such as poor response or advanced age, may suffer an increased incidence of canceled transfers [2,3]. Canceled transfer may add to the burden of infertile couples, both emotionally and economically. Therefore, predicting the possibility of blastocyst formation might be the key to giving a meaningful informed consent before providing blastocyst culture, especially for patients with few embryos available on day 3. In efforts to facilitate the clinical decision-making before blastocyst culture and transfer, several models were developed to predict the blastocyst transfer cancellation or blastocyst formation for individual patients [4,5], which demonstrated that the greatest predict value may lie in the number and quality of day 3 embryos. In recent years, 'OMICS' technologies [6], and algorithms created through the use of time-lapse microscopy [7] were used to predict the destiny of day 3 embryos during in vitro culture. While 'OMICS' technologies, such as proteomics and metabolomics for non-invasive embryo developmental capacity assessment, are yet to be recommended for routine use [1], time-lapse microscopy has been introduced as a routine clinical practice and showed a capacity to predict the blastocyst formation with AUCs ranging from 0.6-0.8 across different studies [8][9][10][11][12][13][14][15][16][17][18][19]. Unfortunately, novel technologies inevitably require additional cost or equipment and the expense of technologies may limit their widespread use.
Although afflicted by subjectivity and limited efficacy, conventional embryo morphological assessment at fixed time point remained the standard of practice [20] in the era of 'OMICS' and time-lapse microscope. Since the early days of blastocyst culture, associations between day 3 morphology and blastocyst formation have been demonstrated. However, poor-looking day 3 embryos rejected by conventional embryo morphological assessment may also have a chance to develop into blastocysts and it is believed that the associations between morphology and blastocyst formation do not necessarily correlate with blastocyst viability. Nevertheless, data from studies predicting blastocyst formation using conventional morphological assessment and time-lapse microscopy in the same population [16,18], showed that AUCs of conventional embryo morphological assessment for blastocyst formation were close to that of time-lapse microscopy. Especially, in the work of Petersen et al., which compared six time-lapse algorithms in the same study, only two algorithm surpassed an algorithm based on conventional Alpha/ESHRE consensus assessment in terms of predictive power [16]. Therefore, these data may suggest that the routine practice of laboratory remained a useful tool to predict blastocyst culture cancellation and provide meaningful clinical consultation, without additional cost or equipment. However, most of the previous studies focused on the assessment and selection of individual embryo and the performance of morphological based algorithms in predicting canceled blastocyst transfer cycle is less known. According to the previous studies [4,5], there are still several other clinical and cycle based factors associated with blastocyst development besides the number and quality of day 3 embryos, and the morphology/ morphokinetic of day 3 embryos is also confounded by the cycle based factors they derived from [21]. The purpose of the study was to estimate the value of conventional embryo assessment until day 3 as tool to predict cycle based blastocyst-transfer cancellation rates.
In addition, contribution of cycle based factors to the predictive power was also evaluated by comparing the algorithms involving cycle based factors with those without.

Study subjects
A retrospective analysis was performed on patients who underwent IVF/ICSI treatment in the Center for Reproduction Medicine of the affiliated Chenggong Hospital of Xiamen University, China, between January 2013 to December 2020. Institutional Review Board approval for this retrospective study was obtained from the Ethical Committee of the Medical College Xiamen University. Informed consent was not necessary, because the research was based on non-identifiable records as approved by the ethics committee.
The data from cycles in the period between January 2013 to December 2019 were obtained to create models to predict blastulation. The data from cycles in the period between January 2020 to December 2020 were obtained to validate the model. The inclusion criteria were the cycles accepted blastocyst culture and all parameters recorded precisely.
All patients were treated with conventional agonist or antagonist stimulation protocol in our center as previously described [22]. The initial and ongoing dosage was determined by patients' age, antral follicle count (AFC), BMI, and ovarian response. When at least one follicle reached a mean diameter of 18 mm, An intramuscular injection of human chorionic gonadotropin (4000-6000 IU, hCG; Livzen, China) or a subcutaneous injection of recombinant human chorionic gonadotropin (250 μg, Ovidrel, Merck-Serono, Switzerland) was administrated for final triggering. Oocytes were retrieved under transvaginal ultrasound guidance 34-36 h after hCG injection.

Embryo culture and assessment
Conventional IVF and ICSI protocol in our center were carried out [23]. After insemination, oocytes were cultured individually in preequilibrated Cleavage Medium (Cook) under mineral oil in traditional incubators (C200, Labotect) at 37℃, 6% CO2 and 5% O2 in a humidified atmosphere. In day3 morning, the culture media was switched to Blastocyst Medium (Cook) in the same culture condition. The culture system kept unchanged in the period of study. Embryos were observed at the time according to Istanbul consensus [24]. Fertilization, early cleavage, the number and symmetry of blastomeres, fragmentation level on day 3 and blastocyst formation on day 5 and 6 were recorded.

Statistical analysis
The blastocyst formation as the endpoint was defined as formation of viable blastocysts for either transfer for cryopreservation. Generalized estimating equations (GEE) were used to establish the morphology-based model. The features included in the model were early cleavage (with or without), the cell number on day 3 (2-3 cells, 4-6 cells, 7 cells, 8 cells, 9-11 cells, > 12 cells, and compact), fragmentation rate (continuous), and asymmetry (with or without). The grouping strategy for cell number on day 3 was based on the distribution of blastocyst formation ( Figure S1).
To estimate whether including features regarding patient characteristics and cycle parameters improve the predicting power, we also establish models including 27 more features to establish additional models. The additional features were: female age, male age, GnRH analogues, insemination protocol, TESA/PESA, maternal height, maternal weight, maternal BMI, maternal basal FSH, maternal basal LH, maternal basal PRL, maternal basal E2, maternal basal T, basal AFC, gonadotropin dose, gonadotropin duration, HMG dose, HMG duration, starting dose, FSH on the day of stimulation, LH on the day of stimulation, E2 on the day of stimulation, E2 on the day of triggering, LH on the day of triggering, P on the day of triggering, oocyte yield, and maturation rate of oocytes in the cycle.
Two strategies were used to incorporate the features in the predicting models. First, a Least Absolute Shrinkage and Selection Operator (LASSO) model [25] was used for feature selection, and the resulting features along with morphological parameters were used to predict the blastocyst formation (LASSO model). Second, an Extreme Gradient Boosting (XGboost) algorithm [26] was used to establish gradient boosting trees with the features (XGboost model).
Predicting power of the models was quantified with the area under the receiver operating characteristic (ROC) curve with area under the curve (AUC). A 95% confidence interval (95% CI) was calculated for the AUC. A cutoff point for prediction was determined according to the maximum informedness (sensitivity + specificity-1) and the predictive value (PPV) and negative predictive value (NPV) of the given point were also calculated accordingly.
Because cancellation of blastocyst transfer was cycle based, we also attempted to evaluate the clinical usefulness of the blastocyst formation prediction of individual embryos in a given cycle. Models were used to calculate the predicted number of blastocysts in cycles. The predicted number of blastocyst correlated to the observed number of the blastocyst with Spearman's rank correlation and mean absolute difference between the prediction and observation was calculated.
The cumulative probability of predicted blastocyst formation of individual embryos in a cycle was used to predict whether the cycle have blastocyst for transfer. The cumulative probability was defined as follows. Cumulative probability = 1-∏(1-individulal embryo blastocyst formation rate).
The predicting power was compared to a cycle-based model based on XGboost with 29 features. The features included the aforementioned patient characteristics and cycle parameters, as well as the number of good quality embryos and whether all cleavages were subjected to blastocyst culture in cycles.
Calibration curves were used to report clinical agreement between model predictions and observed outcomes in the large. A calibration curve was plotted by comparing the relationship between model values and observed rates, grouped by deciles of model values. When the predicted number of blastocysts were used for prediction, a logistical transfer was used in order to obtain a linear relationship.
The calibration slope was used to evaluate the spread of the estimated rates with a target value of 1. A slope < 1 suggests that the prediction was too extreme and a slope > 1 suggests the opposite. The calibration intercept with a target value of 0, was an assessment of calibration-in-the-large. The negative intercept suggested overestimation, whereas positive intercept suggest underestimation.

Result
Training data included 13,674 cycles. The median of maternal age is 30 [28][29][30][31][32][33]. 3010(23.1%) cycles accepted ICSI and 10,038(76.9%) accepted IVF treatment. A total Based on early cleavage and routine observed morphological parameters (cell number, fragmentation, and symmetry), we established a predicting model with GEE. The coefficients and interception was shown in Supplementary Table 1 (Table S1). The GEE model predicted blastocyst formation with an AUC of 0.771(95%CI: 0.768-0.774) in the training set and 0.779(95%CI: 0.77-0.787) in the validation set. A cutoff of 0.51 was determined according to the maximum informedness. The accuracy of prediction according  Table 3). The AUC curves and calibration curves were also comparable ( Figure S2). We further explored the discrimination of GEE model with the given cutoff value for blastocyst formation in different subgroup of patients (Table 4). The predicting power in terms of AUCs were similar in the large and denoted a fair performance. However, the discrimination power appeared to be lower in aged patients and patients with fewer oocyte yield.
In clinical practice, whether a blastocyst transfer cycle is canceled may be determined by the availability of all embryos subjected to blastocyst formation in the cycle. To mimic the scenario, we further generated a per-cycle blastocysts prediction based on the models. The predicted number of blastocyst per cycle was simply the sum of individual embryo prediction. For per-cycle blastocyst yield, the predicted number of blastocysts using morphological parameters alone strongly correlated with observed blastocyst number (r = 0.897, P < 0.0001) with a mean absolute error of 0.95 (95%CI: 0.92-0.99).
The predicted number of blastocysts was also used to predict chance of blastocyst transfer with an AUC of 0.926((95%CI: 0.911-0.94). The predicting power of the predicted number of blastocysts for blastocyst transfer cancel surpassed an XGBoost model based on 29 features (AUC 0.885, 95%CI: 0.867-0.903). Figure 1 demonstrated the AUCs and calibration curves of both models. The cycles based model appeared to be overestimate the chance of blastocyst transfer (slope = 1.01, intercept = -0.009) while the predicted number of blastocyst made a prediction closer to observed probability (slope = 1.15, intercept = -0.185).
In Table 5, the prediction of blastocyst transfer was stratified according to patient subgroup. The predictive power in terms of AUC of ROC curves was not significantly differ in patients older than 34 years in comparison with the unselected population. On the other hand, the patients with no good quality embryos and the patients with partial embryos cultured suffered a decreased AUC. However, the AUC in the patients with no good quality embryos still suggested a moderate discriminability with a value of 0.74 (95% CI: 0.68-0.79).

Discussion
Although challenged by novel technologies for embryo assessment, the conventional static morphological assessment is still widespread used with established consensus of practice [24,27], generating large amount of datasets within the past decades. A feasible clinical prediction model based on these datasets may benefit from the large sample size and be easily incorporated in the routine procedures without additional cost. In the present study, we demonstrated the predictive values of conventional static morphological assessment for blastocyst formation and provided a simplified predicting algorithm to predicted canceled blastocyst transfer cycles with a moderate predictive power. In addition, by comparing with models including cycle based parameters, our data also suggested that increase the complexity of the model by taking parameters other than the embryo themselves may not significantly improved the predictive power.
Since the early days of blastocyst culture, many previous studies have investigated the association between conventional morphology assessment and the rates of blastocyst formation [28][29][30]. However, only a few studies quantitatively evaluated the predictive power [16,18]. Basile et al. evaluated the predictive power for blastocyst formation of morphological criteria defined by the Spanish Association of Embryologists (ASEBIR), showing an AUC of 0.717 (CI95%: 0.703-0.732), which is close to the AUC derived from Eeva time lapse system in the same population. In a classic paper comparing   Comparing with the previous studies, our algorithms based on data-driven models using a full set of conventional morphology parameters. Including conventional morphology parameters rather than pre-established classifications may provide more information and therefore increase the predictive power. The time-lapse microscopy, which generates thousands of images during the in vitro culture of an embryo, provides far more information than conventional static observation. Theoretically, this advantage may make it a superior morphology tool to predict the fate of a cultured embryo. However, several earlier studies did not demonstrate satisfying predictive power for blastocyst formation, with AUCs ranging from 0.6 to 0.7. More recently, the time-lapse algorithm developed by Motato et al., predicted blastocyst formation with an AUC value 0.849 (95% CI: 0.835-0.854) [17]. This method, however, requires a culture duration up to 96 h, which may resulted in a delayed decision making. A recent study integrates deep learning algorithms to the time-lapse system, and the predictive power in terms of AUC reaches 0.82 [8]. In comparison the historical performance of time-lapse system in predicting blastocyst formation, the conventional static observation of the old era yields acceptable predictive power and only requires limited resource. Most of the previous morphological studies focused on tracking the destiny of an individual embryo, aiming to select the most competent embryo. On the other hand, whether blastocyst culture yields viable blastocysts for transfer also relates to the decision making. It has been proposed that four good embryos on day 3 may reassure that the patients will benefit from blastocyst transfer [31]. However, the performance of a day 3 morphology based algorithm to predict a canceled blastocyst transfer cycle due to failed blastocyst formation yet to be determined. A few studies used cycle based characteristics, such as maternal age, oocyte yield and the number of good-quality embryos, to predict the probability of blastocyst transfer cancellation [4,5]. Dessolle et al. established a cycle-based model with multivariate logistical regression and showed a AUC of 0.75 (95% CI: 0.73-0.77) for predicting canceled cycles [4]. More recently, a model using multiple classification algorithms predicted cycle based blastocyst formation with an AUC of 0.922 [5]. Both algorithms require the patient characteristics in combination with embryo quality on day 3. Our data suggested that the cumulative probability of morphological assessment based prediction alone also yields a notable prediction power. Independent of cycle based characteristics, the prediction may be more flexible as it is also applicable to the cycles where only a part of embryos is subjected to blastocyst culture. Interestingly, both our cycle based model and model of Dessolle et al. suggested a tendency of overestimation according to the calibration plots, although different statistic methods and populations were involved. It may suggest a potential intrinsic feature of cycle based prediction.
Prevalence of blastocyst formation in different population may be another issue should be considered beyond discriminability when attempt is made to predict the chance of embryo transfer in a given cycle. The rates of blastocyst formation vary widely among patients, ranging from 0% to almost 100% [1]. In unselected population, the chance to have at least one blastocyst to transfer in a cycle may be rather high. For instance, Dessolle et al. observed that the percentage of cycles with blastocyst transfer was about 79% in the study cohort [4]. We also observed a cycle based blastocyst formation rates approximating 90% in the overall population. With a high prevalence, a naive guessing by always predicting 'yes' could still yield high accuracy. On the other hand, however, patients with low embryo yield or advanced age may suffer a higher chance of blastocyst culture failure [2,3], and need a more detailed consults before making decision. Therefore, we sought to evaluate the discriminability of the model in different subgroup of patients, which may represent different scenarios, and the reasonable AUCs were observed. Notably, a remarkable decrease in blastocyst formation rate was observed among the patients who had no good-quality embryo for blastocyst culture, while a moderate discriminability was observed in the ROC curve. The results suggested that conventional morphology observation remains a useful consulting tool, even no good quality embryo was scored.
It is known that in vitro development of embryos is associated with patient-and treatment-related factors [21]. Beyond the morphology/morphokinetic characteristics, the patient-and treatment-related factors may also affect intrinsic characteristics, such as aneuploidy or metabolism [32,33]. These characteristics may not be necessarily associated with the appearance of the embryos. Therefore, adding factors such as age may increase the information available for the prediction. To test this hypothesis, we compared the multivariate model including only morphological parameters with models constructed with LASSO regression or XGboost including both morphological parameters and patient/treatment related factors. Although the patient/treatment related factors were significantly associated with blastulation, the LASSO/XGboost models did not significantly surpass the simple multivariate model in terms of predictive power. Well-known prognosis factors for blastocyst formation, such as AMH and maternal age also affected the quantity and morphology of day 3 embryos [34,35]. The existence of mediation effects, where day 3 embryo quality serve as a mediator, may partially explain why including more patient/treatment related factors may not further improve the performance of the model in the study.
The study is fortified by a large sample size, which may provide a narrow confidence interval for coefficients and reduce the uncertainty of the performance. In addition, we also calibrate the model in several different clinical scenarios, including advanced maternal age, few embryos for culture and blastocyst culture for surplus embryos. The study also suffered from several drawbacks, including retrospective design and subjectivity of methodology. Because the study is single-centered, we could not test the performance of the model in other culture system. Nevertheless, the study may encourage the establishment of predicting model based on existing large datasets in other culture system.