The paper introduced here constructs an artificial intelligence (AI) system to meet the practical needs of embryo assessment in Assisted Reproductive Technology (ART). By utilizing time-lapse imaging data from Day 3 (D3) of embryonic development and the age at oocyte retrieval, the system predicts the formation and quality of blastocysts on Day 5-6 (D5-D6). This provides a reference for clinicians to choose the optimal timing for embryo transfer or cryopreservation, especially for patients with poor prognosis in regions with scarce oocyte supply. The study conducted a detailed classification and evaluation of embryos from Day 2 to Day 4 (D2-D4) and used time-lapse images from different devices, achieving a very high accuracy rate in cell number classification and blastocyst quality prediction. However, the dataset has not been published, and the results may be dependent on a particular data distribution.
Yanai A, Horie A, Sakurai A, Imakita S, Nakamura M, Ikeda A, Shitanaka S, Ohara T, Nakakita B, Ueda A, Kitawaki Y, Sagae Y, Okunomiya A, Mandai M. Innovative AI models for clinical decision-making: predicting blastocyst formation and quality from time-lapse embryo images up to embryonic day 3. Comput Biol Med. 2025 Sep;195:110637. doi: 10.1016/j.compbiomed.2025.110637. Epub 2025 Jun 21. PMID: 40544805.

I. Clinical Background: The Urgent Need to Break Through Embryo Assessment Challenges
In Assisted Reproductive Technology (ART), the timing of embryo transfer affects pregnancy outcomes. Data from Japan in 2021 showed that the pregnancy rate of blastocyst transfer (41%) was twice that of cleavage-stage (20%). However, for patients with poor prognosis such as advanced age (>40 years) and few embryos, extended culture often fails to yield transplantable blastocysts. 47% of patients over 43 years still require cleavage-stage transfer, highlighting the importance of early precise assessment. Traditional static morphological assessment on Day 3 (D3) is insufficient in predicting blastocyst formation. Existing time-lapse imaging models have a D3 prediction AUC of 0.69-0.81 and rely on manual annotation and single-device adaptation. The “black box” nature of AI also reduces clinical trust, necessitating a more comprehensive assessment solution.
II. Study Design: Multi-Center Data and Two-Stage AI Model Construction
This study is a multi-center retrospective design. The study integrated embryo culture data from four medical institutions in Japan from 2018 to 2022, covering 2,792 oocyte retrieval cycles and four different types of time-lapse incubators to ensure data diversity and clinical representativeness. The inclusion criteria focused on 2 pronuclei (2PN) cleavage-stage embryos with good developmental potential, while excluding poor-quality samples, those with missing images or embryo displacement. A total of 23,852 images were used for model training after data preprocessing to unify size and image quality. Blastocyst outcomes were classified according to the Gardner standard, clearly distinguishing between blastocyst formation, high-quality blastocysts, poor-quality blastocysts, and developmental arrest, providing clear and clinically consensus target labels for model training.
AI modeling was carried out in two stages. First, an automatic annotation model was fine-tuned based on the NASNet-A Large pre-trained on ImageNet-1k, using 23,852 images with 17 classifications (1-8 cell stages + Veeck 1-3 grades). In the second stage, an XGBoost model was used to build a blastocyst prediction model, combining morphological features obtained from automatic annotation with clinical information such as oocyte retrieval age, to develop prediction tools for blastocyst formation, high-quality blastocysts, and poor-quality/ arrested embryos.

III. Study Results: Model Performance and Generalization
The three blastocyst-related prediction models constructed in this study demonstrated good performance in testing. The core evaluation indicators (ROC AUC) for the blastocyst formation and high-quality blastocyst prediction models reached 0.87 and 0.88 respectively, with good calibration effects. The prediction model for poor-quality/arrested embryos (PBAE) not only had an ROC AUC of 0.87, but also a better precision-recall AUC (PR AUC) of 0.90. It also supported flexible adjustment of decision thresholds according to clinical needs to balance prediction accuracy and sensitivity.
In terms of model generalization, subgroup analysis by institution showed that the AUC of the blastocyst formation prediction model was no less than 0.83 in all four medical institutions, with no significant differences. Subgroup analysis by age (<35 years, 35-39 years, ≥40 years) indicated that the model’s prediction performance varied little across different age groups. For the prediction of D3 transplantable embryos (≥4 cells) using the PBAE model, the AUC was 0.85 and the Brier score was 0.16, meeting the clinical screening requirements.
SHAP analysis revealed that the core influencing features for model prediction included the proportion of 8-cell embryos at 62.75 hours post-insemination (hpi), the proportion of embryos with ≥2 cells at 29.75 hpi, and oocyte retrieval age. These features had an impact on the prediction results consistent with clinical understanding.

IV. Study Value: Technological Innovation Meets Regional Needs
The AI automatic annotation model constructed in this study demonstrated strong compatibility, seamlessly adapting to four different types of time-lapse incubators. The classification accuracy for each cell stage was as high as 95%, with no need for manual annotation. The assessment time for a single embryo was extremely short, significantly improving assessment efficiency and reducing subjective errors caused by manual annotation.
In terms of prediction performance, the ROC AUC for blastocyst formation and high-quality blastocyst prediction at the D3 stage reached 0.87 and 0.88 respectively, a significant improvement compared to previous D3 prediction models (0.69-0.81). The PR AUC of the PBAE model was 0.90, effectively identifying embryos with no blastocyst development potential and providing strong support for clinical avoidance of futile culture.
The model was validated in multiple institutions, different age groups, and D3 transplantable embryo subgroups, showing stable performance and adaptability to various incubators. It lowered the application threshold for small and medium-sized ART centers. Moreover, the model maintained good accuracy for embryos of patients aged ≥40 years, achieving a precise match between technology and regional clinical needs in areas with scarce oocyte supply and a high proportion of older patients.

V. Study Limitations: Objective Constraints and Directions for Improvement
Firstly, blastocyst grading relied on the conventional Gardner standard of each participating institution without a unified standardized assessment process. The Gardner grading itself has a certain degree of subjectivity, which may lead to differences in blastocyst outcome labels and potentially affect model training and performance.
Secondly, 619 embryos were excluded due to poor image quality (e.g., blurriness, abnormal exposure) or embryo displacement for technical reasons. Although this ensured the quality of the modeling data, it may reduce the model’s generalization in real clinical complex imaging conditions. Moreover, the timing of fertilization was based on medical records rather than direct observation, which could introduce potential time annotation errors.
Furthermore, the model did not incorporate clinical parameters that may affect embryo development, such as fertilization method (IVF/ICSI), sperm quality, and ovulation induction protocols. Also, due to the low rate of pregnancy loss in the mid-to-late stages of clinical pregnancy (only 7%), the sample size was insufficient to deeply analyze the association between embryo features and live birth outcomes.
Lastly, the sample size for some incubator models was relatively small, resulting in occasional non-significant results in related subgroup analyses. Larger-scale multi-center data are needed to further verify the consistency of model performance across different devices.
VI. Clinical Significance and Outlook
The AI model in this study can accurately identify embryos with no blastocyst potential at the D3 stage, helping patients with poor prognosis such as advanced age and few embryos to avoid futile culture, reduce resource waste and waiting time, and is particularly suitable for regions with scarce oocyte supply. Its high annotation consistency promotes the shift of embryo assessment from “experience-dependent” to “data-driven.” The compatibility with multiple types of incubators also aids the standardization construction of small and medium-sized ART centers. The SHAP technique enhances model interpretability and clinical trust, providing evidence-based support for related guideline development. In the future, if the model is integrated with more clinical parameters and validated in larger multi-center studies, it holds promise for further improving clinical decision-making in ART.

