Performance and clinical implications of machine learning models for detecting cervical ossification of the posterior longitudinal ligament: a systematic review

Article information

Asian Spine J. 2025;19(1):148-159
Publication date (electronic) : 2025 January 20
doi : https://doi.org/10.31616/asj.2024.0452
1Department of Orthopaedics, School of Medicine, University of Phayao, Phayao, Thailand
2Department of Orthopaedic Surgery, Seoul Seonam Hospital, Seoul, Korea
3Department of Mathematics, School of Science, University of Phayao, Phayao, Thailand
4Department of Orthopaedics, Phramongkutklao Hospital and College of Medicine, Bangkok, Thailand
5Department of Orthopaedics, Srinagarind Hospital, Khon Kaen University, Khon Kaen, Thailand
6Department of Radiology, Srinagarind Hospital, Khon Kaen University, Khon Kaen, Thailand
7Department of Neurosurgery, CHA Bundang Medical Center, CHA University School of Medicine, Seongnam, Korea
Corresponding author: Wongthawat Liawrungrueang, Department of Orthopaedics, School of Medicine, University of Phayao, Phayao 56000, Thailand, Tel: +66-89-148-3458, Fax: +66-54-466-759, E-mail: mint11871@hotmail.com, mint11871@gmail.com
Received 2024 November 29; Revised 2024 October 23; Accepted 2024 December 12.

Abstract

Ossification of the posterior longitudinal ligament (OPLL) is a significant spinal condition that can lead to severe neurological deficits. Recent advancements in machine learning (ML) and deep learning (DL) have led to the development of promising tools for the early detection and diagnosis of OPLL. This systematic review evaluated the diagnostic performance of ML and DL models and clinical implications in OPLL detection. A systematic review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. PubMed/Medline and Scopus databases were searched for studies published between January 2000 and September 2024. Eligible studies included those utilizing ML or DL models for OPLL detection using imaging data. All studies were assessed for the risk of bias using appropriate tools. The key performance metrics, including accuracy, sensitivity, specificity, and area under the curve (AUC), were analyzed. Eleven studies, comprising a total of 6,031 patients, were included. The ML and DL models demonstrated high diagnostic performance, with accuracy rates ranging from 69.6% to 98.9% and AUC values up to 0.99. Convolutional neural networks and random forest models were the most used approaches. The overall risk of bias was moderate, and concerns were primarily related to participant selection and missing data. In conclusion, ML and DL models show great potential for accurate detection of OPLL, particularly when integrated with imaging techniques. However, to ensure clinical applicability, further research is warranted to validate these findings in more extensive and diverse populations.

Introduction

Ossification of the posterior longitudinal ligament (OPLL) is characterized by abnormal ligament calcification along the spinal column, predominantly affecting the cervical spine [1,2]. This progressive condition can lead to spinal canal stenosis and subsequently myelopathy, resulting in severe neurological deficits such as motor and sensory impairments [3,4]. The prevalence of OPLL varies geographically, with higher incidence rates observed in East Asian populations than in Western countries. Early and accurate diagnosis is crucial to prevent irreversible neurological damage and plan appropriate surgical interventions [1,2,5,6].

Current standard imaging modalities, such as plain radiography, computed tomography (CT), and magnetic resonance imaging (MRI), are commonly used for diagnosing OPLL [2,7,8]. However, these techniques have limited sensitivity and specificity, particularly in detecting early or subtle cases. Advanced imaging techniques such as CT myelography offer better diagnostic accuracy but are invasive and associated with higher radiation exposure [9]. Consequently, noninvasive, automated diagnostic tools are increasingly needed to enhance the accuracy and efficiency of OPLL detection.

Machine learning (ML) and deep learning (DL) models have demonstrated significant potential in medical imaging, enhancing diagnostic capabilities across various clinical areas (Fig. 1) [1012]. Recently, ML and DL have been increasingly applied to spinal conditions, including OPLL, to improve diagnostic accuracy and reduce the burden on healthcare systems. These models, particularly convolutional neural networks (CNNs), can analyze complex imaging data, identify subtle patterns, and differentiate OPLL from other spinal pathologies accurately [11,1315]. Several studies have demonstrated the potential of ML and DL models in OPLL detection, for example, neural networks can detect OPLL on plain cervical radiographs, highlighting the utility of the model in clinical screening [16,17]. However, several factors hinder the adoption of ML and DL models in clinical practice, including heterogeneity in study designs, small sample sizes, and the lack of external validation in diverse populations [1618]. Therefore, the diagnostic performance and clinical utility of these models in OPLL detection must be comprehensively evaluated.

Fig. 1

Synergistic relationships in types of machine learning.

This systematic review aimed to provide an in-depth analysis of the diagnostic performance of ML and DL models in OPLL detection, assessing their accuracy, sensitivity, specificity, and clinical implications and highlighting the strengths and limitations of existing research. The findings will guide future research directions and present the potential clinical applications of ML and DL in the early diagnosis and management of OPLL.

Materials and Methods

This study was conducted in accordance with the Declaration of Helsinki and with approval from the Ethics Committee and Institutional Review Board (IRB) of University of Phayao (IRB approval no., HREC-UP-HSST 1.1/003/68). The data used in this research were acquired from a public resource.

Literature search strategy

A systematic literature search was conducted across PubMed/Medline, Scopus, and Google Scholar databases to extract studies that evaluate the performance of ML and DL models in diagnosing OPLL. The search strategy followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines using a combination of Medical Subject Headings terms and relevant keywords [19]. The search terms included the following: “ossification of the posterior longitudinal ligament,” “OPLL,” “cervical OPLL,” “cervical spine ossification,” “machine learning,” “deep learning,” “artificial intelligence,” “convolutional neural network,” “CNN,” “neural network,” “random forest,” “support vector machine,” “radiography,” “X-ray,” “computed tomography,” “CT,” and “magnetic resonance imaging.” The search was restricted to studies published in English between January 2000 and September 2024. Additional relevant studies were identified by manually searching the reference lists of the included articles. Two reviewers independently screened the studies, disagreements were resolved through discussions, and a third reviewer was consulted when necessary. This ensures consistency and transparency in the study selection process. The PRISMA workflow diagram is presented in Fig. 2.

Fig. 2

Preferred Reporting Items for Systematic Reviews and Meta-Analyses workflow diagram. OPLL, ossification of the posterior longitudinal ligament.

Inclusion and exclusion criteria

Original research articles evaluating the diagnostic performance of ML or DL models in OPLL detection were eligible. Studies must involve patients with confirmed OPLL or related spinal conditions and use imaging data such as from radiography, CT, or MRI for model development and evaluation and report diagnostic performance metrics, such as accuracy, sensitivity, specificity, and area under the curve (AUC). Reviews, meta-analyses, case reports, conference abstracts, editorials, studies not primarily focused on the diagnostic application of ML or DL models for OPLL detection and those that did not report relevant diagnostic performance metrics were excluded.

Data extraction

Data were independently extracted by two reviewers using a standardized extraction form. Extracted information included study characteristics (author, year of publication, country, study design, sample size, and patient demographics), model information (ML or DL model type, imaging modality used, and training/validation methods), diagnostic performance metrics (accuracy, sensitivity, specificity, and AUC), and key findings and limitations. Disagreements were resolved by consensus or involving a third reviewer to ensure data accuracy and completeness.

Assessment of the risk of bias

The risk of bias for each included study was assessed using the Risk of Bias in Nonrandomized Studies of Interventions (ROBINS-I) tool [20]. This tool evaluates the risk of bias across seven domains: biases due to confounding, selection of participants, classification of interventions, deviations from intended interventions, missing data, measurement of outcomes, and selection of reported results. Each domain was rated as “low,” “moderate,” “serious,” or “critical” risk of bias based on predefined criteria. The overall risk of bias for each study was determined by the highest level of risk identified in any single domain. Studies with a “serious” or “critical” risk of bias were considered to have significant limitations that could affect the validity of their findings.

Results

Study characteristics

This systematic review included a total of 11 studies involving 6,031 patients, and all focused on the diagnostic performance of ML and DL models for OPLL detection. These studies were conducted between 2021 and 2024 and originated from Japan, South Korea, China, and Israel. Most studies employed a retrospective design, and only one was a prospective multicenter trial. The sample sizes varied from 100 to 901 patients, employing diverse imaging modalities, including plain radiography, CT, and MRI. Demographics and characteristics of the studies are shown in Table 1 [7,16,17,2128].

Demographic and character of studies

Risk of bias analysis

The risk of bias across the included studies was assessed using the ROBINS-I tool [20]. The overall risk of bias was moderate, with common issues related to participant selection, confounding, and missing data. The studies by Murata et al. [16] and Shemesh et al. [7] exhibited serious concerns because of biases in participant selection and missing data, potentially affecting the generalizability of the studies. Although most studies demonstrated a low risk of bias in the intervention classification and outcome measurement, they were limited by small sample sizes and lack of external validation, which may affect the validity of their findings. A detailed assessment of the risk of bias for each study is presented in Fig. 3 and Table 2 [7,16,17,2128].

Fig. 3

Result of risk of bias analysis using the Risk of Bias in Nonrandomized Studies of Interventions (ROBINS-I) tool.

Risk of bias analysis using ROBINS-I tool for non-randomized controlled trial studies

Performance of ML and DL models in OPLL detection

The included studies highlighted the high diagnostic performance of ML and DL models in OPLL detection (Table 3) [7,16,17,2128]. The reported accuracy ranged from 69.6% to 98.9%, with AUC values up to 0.99. Murata et al. [16] reported the highest accuracy of 98.9% using a Residual Neural Network (ResNet12) on cervical lateral X-ray images, achieving a sensitivity of 97.0%, specificity of 99.4%, and AUC of 0.99. Tamai et al. [22] demonstrated an AUC of 0.94 with the EfficientNetB2 CNN model, outperforming experienced spine surgeons. Maki et al. [21] utilized various ML models, such as LightGBM and XGBoost, to predict surgical outcomes in patients with OPLL, and the random forest model showed an AUC of 0.75 at the 2-year follow-up. Chae et al. [25] reported that their DL model significantly improved radiologist performance in diagnosing OPLL, achieving a sensitivity of 91% and an AUC of 0.851. The diagnostic performance of ML models in detecting cervical OPLL across multiple studies. Fig. 4A highlights the high accuracy of models such as ResNet12 and ResNet101 [7,16,2124,28]. Fig. 4B illustrates the balance between sensitivity and specificity, and certain models achieved near-perfect specificity [7,16,2124,28]. Fig. 4C emphasizes the AUC, where models such as ResNet101 demonstrate robust diagnostic performance [7,16,2124,28]. Fig. 4D provides a comprehensive heatmap summarizing the accuracy, sensitivity, specificity, and AUC, showcasing the relative strengths and limitations of each model [7,16,17,2128].

The performance and artificial neural network in cervical ossification of the posterior longitudinal ligament and clinical implications

Fig. 4

Performance metrics of machine learning models for ossification of the posterior longitudinal ligament (OPLL) detection. (A) Accuracy across studies, (B) sensitivity and specificity comparison, (C) area under the curve (AUC) of machine learning models, and (D) Heatmap summarizing the performance metrics across studies.

Clinical Implications of ML and DL Models

ML and DL models have shown significant clinical potential for the early detection and management of OPLL (Table 3) [7,16,17,2128]. High-accuracy models, such as those developed by Murata et al. [16] and Miura et al. [17], could be integrated into clinical workflows for screening, particularly in primary care and emergency settings, reducing the need for invasive and costly imaging modalities such as CT or MRI. Predictive models, such as those developed by Maki et al. [21] and Ito et al. [26], could aid in preoperative planning and risk stratification, enabling clinicians to identify high-risk cases and optimize surgical strategies. In addition, DL models used by Chae et al. [25] and Shemesh et al. [7] enhanced the diagnostic performance of radiologists, particularly in complex cases. They could be employed to support less experienced clinicians, thus improving the overall diagnostic accuracy and patient care.

Limitations and future directions

Despite the promising outcomes, several limitations must be addressed (Table 3) [7,16,17,2128]. Most studies were conducted in single-center settings with limited sample sizes, primarily involving East Asian populations, which may restrict the generalizability of the findings to other demographics. Moreover, many studies lacked external validation and prospective designs, underscoring the need for larger multicenter trials to verify the clinical utility of these models. Future studies should validate these models in diverse populations and integrate them into clinical practice to assess their actual effect on patient outcomes and healthcare workflows.

Discussion

Recent advancements in artificial intelligence (AI), particularly ML and DL, have greatly improved the diagnostic accuracy of medical imaging. Techniques such as ResNet and gradient-weighted class activation mapping (Grad-CAM) are widely used. ResNet overcomes challenges in training deep models using skip connections, enabling better feature extraction for tasks such as detecting OPLL. Grad-CAM enhances model interpretability by creating heatmaps that highlight important regions in images, offering clinicians valuable insights into AI decision-making. These methods are transforming OPLL detection and diagnosis, bridging gaps in radiological expertise. The findings of this systematic review indicate that ML and DL models demonstrate high diagnostic performance in OPLL detection using various imaging modalities, such as radiography, CT, and MRI. Models such as ResNet12 and EfficientNetB2 have achieved remarkable diagnostic accuracies, often surpassing the performance of experienced spine surgeons [16,22]. Murata et al. [16] reported an accuracy of 98.9% and an AUC of 0.99 using a Residual Neural Network on cervical radiographs, suggesting the potential utility of the model for clinical screening and early detection of OPLL. Tamai et al. [22] reported that the DL model, based on EfficientNetB2, achieved an AUC of 0.94, outperforming spine surgeons in diagnostic accuracy. Such high levels of performance underscore the capability of DL models to detect subtle patterns in imaging data that may be missed by human observers.

The clinical relevance of these findings is substantial, particularly considering the challenges associated with the diagnosis of OPLL during its early stages. Traditional imaging modalities, such as plain radiographs and MRI, often struggle to detect early or subtle OPLL cases, which can lead to delayed diagnosis and progression of neurological symptoms [1,2,29]. DL models can analyze complex imaging data with high precision, facilitating earlier and more accurate diagnosis. This capability is valuable in primary care and emergency settings, where access to specialized spinal imaging and expertise may be limited. By integrating these models into routine clinical workflows, healthcare providers could improve the accuracy and efficiency of OPLL diagnosis, thereby reducing the need for more invasive and costly imaging techniques, such as CT myelography.

Moreover, several studies in this review explored the use of ML models to predict surgical outcomes and complications in patients with OPLL [22,25,26]. DL employed a combination of the LightGBM, XGBoost, and random forest models to predict clinically significant improvements following surgery for cervical OPLL. The models demonstrated good predictive ability, with an AUC of 0.75 for the random forest model at the 2-year follow-up [21]. Such predictive models could be highly beneficial in clinical practice, aiding surgeons in preoperative planning and patient counseling. By identifying key prognostic factors, these models can help clinicians better stratify surgical risk, optimize patient selection for surgical interventions, and develop personalized treatment plans [21].

The limitations and challenges must be resolved before these models can be widely adopted in clinical practice. A major drawback of the current literature is the predominance of single-center retrospective studies, which may introduce biases related to patient selection and data heterogeneity. Furthermore, most studies have been conducted within East Asian populations, which limits the generalizability of the findings to other demographic groups. The ML and DL models used, along with the performance metrics reported across studies, vary considerably, which hinder identifying the most effective approach for OPLL detection. This heterogeneity highlights the need for standardized methodologies and performance metrics in future research.

Another critical challenge is the interpretability of ML and DL models. Although these models can achieve high levels of diagnostic accuracy, their decision-making processes are often vague, making it challenging for clinicians to understand how specific diagnoses are determined. This lack of clarity can hinder the acceptance of these tools in clinical settings, where explainability is essential for ensuring trust and facilitating shared decision-making between clinicians and patients. To improve the clinical applicability of these models, future investigations should focus on developing more interpretable algorithms, possibly by using visual explanation techniques, such as Grad-CAM, or incorporating hybrid models that combine DL with traditional statistical methods [17].

The integration of ML and DL models into clinical practice also poses significant logistical and technical challenges. Their implementation requires a robust infrastructure, including high-quality annotated imaging datasets and computational resources for model training and validation [11,26,30]. Moreover, healthcare providers must be trained for effective use of these tools, and clinical workflows may need to be adapted to incorporate automated diagnostic support. Addressing these challenges will require collaboration among clinicians, data scientists, and policymakers to develop practical strategies for the deployment and maintenance of ML and DL tools in healthcare settings [11]. To facilitate the transition of ML and DL models for OPLL detection into clinical practice, several critical aspects must be addressed, such as acquisition of regulatory approval, integration with hospital systems, and development of practical implementation strategies. Compliance with local and international regulations, such as the U.S. Food and Drug Administration guidelines and the European Union’s Medical Device Regulation, is essential to ensure the safety, performance, and explainability of AI tools. Seamless integration with existing hospital infrastructures, including compliance with electronic health record (her) systems, radiological workflows, and data protection laws, is necessary for successful adoption. Practical strategies, such as phased implementation, clinician training, and continuous feedback mechanisms, can help mitigate implementation challenges. Moreover, cost–benefit analyses and stakeholder engagement are crucial to overcoming barriers such as high initial costs and resistance to new technologies. By addressing these aspects, ML and DL models can significantly enhance the diagnostic accuracy and efficiency of OPLL detection and ensure their safe and reliable deployment in clinical settings.

ML and DL models offer considerable advantages in OPLL diagnosis and management. These models improve diagnostic accuracy by facilitating earlier and more reliable identification of OPLL, which reduces diagnostic errors and improves patient outcomes. They also streamline clinical workflows by automating time-consuming diagnostic processes, thereby shortening the time required for image interpretation and decision-making. Furthermore, ML and DL models improve surgical planning by identifying patient-specific risks and predicting surgical outcomes, leading to better resource allocation and increased patient satisfaction. However, several challenges must be addressed before fully integrating these technologies into clinical practice. ML and DL systems require substantial initial investment, including the costs of acquiring advanced computational hardware, integrating these systems into existing hospital infrastructures, and ensuring compatibility with electronic health record systems. Comprehensive training programs for healthcare professionals, such as radiologists and spine surgeons, are essential to maximize the utility of these models, requiring both time and financial resources. In addition, ongoing costs for system maintenance, software updates, and quality assurance processes add to the long-term financial burden. Despite these challenges, the economic implications of ML and DL integration are encouraging. In high-volume centers, the initial costs can be justified by the long-term benefits, such as improved diagnostic efficiency and optimized surgical planning. Cost–benefit analyses highlight the financial benefits of these systems in institutions attending to large patient caseloads, where decreases in diagnostic errors and resource utilization translate into significant savings. Over time, ML and DL models may also reduce overall healthcare costs by optimizing workflows and decreasing reliance on expensive imaging modalities, offering a sustainable solution to enhancing patient care.

Future studies should focus on large-scale multicenter investigations to validate the diagnostic performance of these models across diverse populations and clinical settings. Such studies should strive to standardize imaging protocols and data preprocessing methods to improve the comparability and reproducibility of the results. Moreover, prospective studies are needed to assess the effects of ML and DL models on clinical outcomes, including diagnostic accuracy, treatment decision-making, and patient satisfaction. Evaluating the cost-effectiveness of these models will be crucial for their wider acceptance, as healthcare systems increasingly prioritize interventions that offer high value relative to their costs.

Conclusions

ML and DL models show significant promise for OPLL detection and management, with potential applications ranging from screening to surgical planning. However, various challenges related to data availability, model interpretability, and clinical integration must be addressed before these tools can be widely implemented. Future investigations should prioritize extensive validation studies and the development of clear, user-friendly models that can seamlessly integrate into clinical practice, ultimately improving the care of patients with OPLL.

Key Points

  • Machine learning (ML) and deep learning (DL) models demonstrated high diagnostic accuracy for detecting cervical ossification of the posterior longitudinal ligament, with accuracy rates ranging from 69.6% to 98.9% and area under the curve values up to 0.99.

  • Convolutional neural networks and random forest models were the most frequently used ML/DL approaches. These models utilized imaging modalities such as plain radiography, computed tomography, and magnetic resonance imaging.

  • ML/DL models have the potential to enhance diagnostic accuracy, reduce reliance on invasive imaging methods, and support clinical decision-making in primary care, emergency, and surgical planning settings.

  • Standardization of methodologies, large-scale multicenter validation studies, and improved model interpretability are critical for the integration of ML/DL tools into clinical practice in future directions.

Notes

Conflict of Interest

No potential conflict of interest relevant to this article was reported.

Acknowledgments

The authors would like to thank the Thailand Science Research and Innovation Fund (Fundamental Fund 2025, Grant No. 5025/2567) and the School of Medicine, University of Phayao.

Author Contributions

Conceptualization: WL. Methodology: WL. Data curation: WL, STC, PS, WC, NT, PT, IH. Formal analysis: WL, WC. Visualization: WL, STC, PS, WC, NT, PT, IH. Project administration: WL. Writing–original draft preparation: WL. Writing–review and editing; WL. Supervision: WL. Final approval of the manuscript: all authors.

References

1. Tsai YJ, Doyle A. Prevalence of ossification of the posterior longitudinal ligament (OPLL) in the Pacific populations in Auckland, New Zealand: a retrospective multicentre study. J Med Imaging Radiat Oncol 2024;68:641–4.
2. Chang H, Kong CG, Won HY, Kim JH, Park JB. Inter- and intra-observer variability of a cervical OPLL classification using reconstructed CT images. Clin Orthop Surg 2010;2:8–12.
3. Zhang B, Zhang Y, Ma B, et al. Does surgical treatment increase the progression of spinal cord injury in patients with ossification of posterior longitudinal ligament of cervical spine?: a systematic review and meta-analysis. J Orthop Surg (Hong Kong) 2021;29:2309499020981782.
4. Yoshii T, Egawa S, Hirai T, et al. A systematic review and meta-analysis comparing anterior decompression with fusion and posterior laminoplasty for cervical ossification of the posterior longitudinal ligament. J Orthop Sci 2020;25:58–65.
5. Yang H, Yang L, Chen D, Wang X, Lu X, Yuan W. Implications of different patterns of “double-layer sign” in cervical ossification of the posterior longitudinal ligament. Eur Spine J 2015;24:1631–9.
6. Xing D, Wang J, Ma JX, et al. Qualitative evidence from a systematic review of prognostic predictors for surgical outcomes following cervical ossification of the posterior longitudinal ligament. J Clin Neurosci 2013;20:625–33.
7. Shemesh S, Kimchi G, Yaniv G, Harel R. MRI-based detection of cervical ossification of the posterior longitudinal ligament using a novel automated machine learning diagnostic tool. Neurosurg Focus 2023;54:E11.
8. Wang S, Song H, Xu X, et al. The CT classification of multilevel cervical ossification of the posterior longitudinal ligament to guide hybrid anterior controllable antedisplacement and fusion vs. posterior laminoplasty. Orthop Surg 2024;16:1571–80.
9. Alaa H, Tung NT, Ueno T, et al. Importance of gap evaluation in the ossification of posterior longitudinal ligament lesions using 3-dimensional computed tomography. Spine J 2025;25:69–79.
10. Liawrungrueang W, Kim P, Kotheeranurak V, Jitpakdee K, Sarasombath P. Automatic detection, classification, and grading of lumbar intervertebral disc degeneration using an artificial neural network model. Diagnostics (Basel) 2023;13:663.
11. Liawrungrueang W, Cho ST, Sarasombath P, Kim I, Kim JH. Current trends in artificial intelligence-assisted spine surgery: a systematic review. Asian Spine J 2024;18:146–57.
12. Liawrungrueang W, Cho ST, Kotheeranurak V, Pun A, Jitpakdee K, Sarasombath P. Artificial neural networks for the detection of odontoid fractures using the Konstanz Information Miner Analytics Platform. Asian Spine J 2024;18:407–14.
13. Liawrungrueang W, Park JB, Cholamjiak W, Sarasombath P, Riew KD. Artificial intelligence-assisted MRI diagnosis in lumbar degenerative disc disease: a systematic review. Global Spine J 2024. Aug. 15. [Epub]. https://doi.org/10.1177/21925682241274372 .
14. Liawrungrueang W, Cho ST, Kotheeranurak V, Jitpakdee K, Kim P, Sarasombath P. Osteoporotic vertebral compression fracture (OVCF) detection using artificial neural networks model based on the AO spine-DGOU osteoporotic fracture classification system. N Am Spine Soc J 2024;19:100515.
15. Liawrungrueang W, Han I, Cholamjiak W, Sarasombath P, Riew KD. Artificial intelligence detection of cervical spine fractures using convolutional neural network models. Neurospine 2024;21:833–41.
16. Murata K, Endo K, Aihara T, et al. Use of residual neural network for the detection of ossification of the posterior longitudinal ligament on plain cervical radiography. Eur Spine J 2021;30:2185–90.
17. Miura M, Maki S, Miura K, et al. Automated detection of cervical ossification of the posterior longitudinal ligament in plain lateral radiographs of the cervical spine using a convolutional neural network. Sci Rep 2021;11:12702.
18. Li KY, Weng JJ, Li HL, Ye HB, Xiang JW, Tian NF. Development of a deep-learning model for diagnosing lumbar spinal stenosis based on CT images. Spine (Phila Pa 1976) 2024;49:884–91.
19. Moher D, Liberati A, Tetzlaff J, Altman DG, ; PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Ann Intern Med 2009;151:264–9.
20. Sterne JA, Hernán MA, Reeves BC, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ 2016;355:i4919.
21. Maki S, Furuya T, Yoshii T, et al. Machine learning approach in predicting clinically significant improvements after surgery in patients with cervical ossification of the posterior longitudinal ligament. Spine (Phila Pa 1976) 2021;46:1683–9.
22. Tamai K, Terai H, Hoshino M, et al. A deep learning algorithm to identify cervical ossification of posterior longitudinal ligaments on radiography. Sci Rep 2022;12:2113.
23. Kim SH, Lee SH, Shin DA. Could machine learning better predict postoperative C5 palsy of cervical ossification of the posterior longitudinal ligament? Clin Spine Surg 2022;35:E419–25.
24. Ogawa T, Yoshii T, Oyama J, et al. Detecting ossification of the posterior longitudinal ligament on plain radiographs using a deep convolutional neural network: a pilot study. Spine J 2022;22:934–40.
25. Chae HD, Hong SH, Yeoh HJ, et al. Improved diagnostic performance of plain radiography for cervical ossification of the posterior longitudinal ligament using deep learning. PLoS One 2022;17:e0267643.
26. Ito S, Nakashima H, Yoshii T, et al. Deep learning-based prediction model for postoperative complications of cervical posterior longitudinal ligament ossification. Eur Spine J 2023;32:3797–806.
27. Zhu J, Lu Q, Zhan X, et al. To infer the probability of cervical ossification of the posterior longitudinal ligament and explore its impact on cervical surgery. Sci Rep 2023;13:9816.
28. Qu Z, Deng B, Sun W, Yang R, Feng H. A convolutional neural network for automated detection of cervical ossification of the posterior longitudinal ligament using magnetic resonance imaging. Clin Spine Surg 2024;37:E106–12.
29. Fukutake K, Ishiwatari T, Takahashi H, et al. Investigation of ossification in the posterior longitudinal ligament using micro-focus X-ray CT scanning and histological examination. Diagn Pathol 2015;10:205.
30. Yao YC, Lin CL, Chen HH, et al. Development and validation of deep learning models for identifying the brand of pedicle screws on plain spine radiographs. JOR Spine 2024;7:e70001.

Article information Continued

Fig. 1

Synergistic relationships in types of machine learning.

Fig. 2

Preferred Reporting Items for Systematic Reviews and Meta-Analyses workflow diagram. OPLL, ossification of the posterior longitudinal ligament.

Fig. 3

Result of risk of bias analysis using the Risk of Bias in Nonrandomized Studies of Interventions (ROBINS-I) tool.

Fig. 4

Performance metrics of machine learning models for ossification of the posterior longitudinal ligament (OPLL) detection. (A) Accuracy across studies, (B) sensitivity and specificity comparison, (C) area under the curve (AUC) of machine learning models, and (D) Heatmap summarizing the performance metrics across studies.

Table 1

Demographic and character of studies

No. Author(s) Year Country Study type Journal Objective of study
1 Murata et al. [16] 2021 Japan Retrospective European Spine Journal Analyzing an automated OPLL diagnosis algorithm with RNN for the screening of OPLL
2 Maki et al. [21] 2021 Japan Retrospective Spine Create an ML prognostic model for surgical outcomes in patients with cervical OPLL.
3 Miura et al. [17] 2021 Japan Retrospective Scientific Reports Evaluate the ability of CNN to diagnose cervical spondylosis and cervical OPLL differentially.
4 Tamai et al. [22] 2022 Japan Retrospective Scientific Reports Validate the diagnostic yield of our deep learning algorithm, which diagnoses the presence/absence of OPLL on cervical radiography, highlights areas of ossification in positive cases and compares its diagnostic accuracy with that of experienced spine physicians.
5 Kim et al. [23] 2022 South Korea Retrospective Clinical spine surgery Investigate whether ML can perform better than a conventional logistic regression in predicting postoperative C5 palsy of cervical OPLL patients.
6 Ogawa et al. [24] 2022 Japan Retrospective (diagnostic) The Spine Journal Evaluate the performance of a CNN model for diagnosing cervical OPLL.
7 Chae et al. [25] 2022 South Korea Retrospective PLoS One Investigate whether DL can improve the diagnostic performance of radiologists for cervical OPLL using plain radiographs.
8 Ito et al. [26] 2023 Japan Prospective (multicenter) European Spine Journal Create a DLM to predict postoperative complications in patients with cervical ossification of the OPLL.
9 Shemesh et al. [7] 2023 Israel Retrospective Neurosurgical Focus Develop AI software and a validated model for identifying and representing cervical OPLL on MRI, obviating the need for spine CT.
10 Zhu et al. [27] 2023 China Retrospective Scientific Reports Predict the incidence of cervical OPLL and explore the postoperative differences between OPLL and other cervical spine surgery patients.
11 Qu et al. [28] 2024 China Retrospective Clinical Spine Surgery Develop and validate a CNN model to distinguish between cervical OPLL and multilevel degenerative spinal stenosis using MRI and to compare diagnostic ability with spine surgeons.

OPLL, ossification of the posterior longitudinal ligament; RNN, recurrent neural network; ML, machine learning; CNN, convolutional neural network; DL, deep learning; DLM, deep learning model; AI, artificial intelligence; MRI, magnetic resonance imaging; CT, computed tomography.

Table 2

Risk of bias analysis using ROBINS-I tool for non-randomized controlled trial studies

Study no. Authors Confounding Selection of participants Classification of interventions Deviations from intended interventions Missing data Measurement of outcomes Selection of reported results Overall risk of bias
1 Murata et al. [16] (2021) Moderate Low Low Moderate Serious Moderate Low Serious
2 Maki et al. [21] (2021) Moderate Low Low Low Low Moderate Low Moderate
3 Miura et al. [17] (2021) Moderate Low Low Low Low Moderate Low Moderate
4 Tamai et al. [22] (2022) Moderate Low Low Low Low Moderate Low Moderate
5 Kim et al. [23] (2022) Moderate Low Low Low Low Moderate Low Moderate
6 Ogawa et al. [24] (2022) Moderate Low Low Low Low Moderate Low Moderate
7 Chae et al. [25] (2022) Moderate Low Low Low Low Moderate Low Moderate
8 Ito et al. [26] (2023) Moderate Low Low Low Low Moderate Low Moderate
9 Shemesh et al. [7] (2023) Moderate Serious Low Low Moderate Moderate Low Serious
10 Zhu et al. [27] (2023) Moderate Low Low Low Moderate Moderate Low Moderate
11 Qu et al. [28] (2024) Moderate Low Low Low Moderate Moderate Low Moderate

ROBINS-I, Risk of Bias in Nonrandomized Studies of Interventions.

Table 3

The performance and artificial neural network in cervical ossification of the posterior longitudinal ligament and clinical implications

No. Author(s) Year Sample size Imaging technique Method/model used Accuracy Sensitivity Specificity AUC F1 score PPV NPV Reference standard Findings Limitations Clinical implications
1 Murata et al. [16] 2021 672 patients (2,318 images) Cervical lateral X-ray ResNet12 98.9% 97.0% 99.4% 0.99 NA NA NA Expert radiologist-confirmed OPLL diagnosis based on X-ray imaging High diagnostic accuracy for OPLL, useful for clinical screening Limited to continuous/mixed OPLL cases; no clinical assessment RNN could help in early OPLL detection, reducing the need for advanced imaging.
2 Maki et al. [21] 2021 478 patients (393 completed 1-year follow-up, 370 at 2-year follow-up) MRI, radiography RF, XGBoost, LightGBM 69.6% 88.4% 39.1% 0.75 NA NA NA Consensus diagnosis by three spine surgeons using MRI and radiography ML models identified key predictors such as preoperative JOA score and symptom duration. Internal validation only; small external cohort ML models could aid in surgical planning and patient counseling.
3 Miura et al. [17] 2021 680 patients (250 OPLL, 250 spondylosis, 180 normal) Plain lateral cervical radiographs Efficient-NetB4 CNN 86% 86% NA NA 87% 87% NA Expert consensus diagnosis using radiographic findings CNN performance equal to or superior to spine surgeons Limited generalizability due to case distribution, lack of visual explanations Potential as a screening tool for OPLL, assisting non-experts in identifying cases.
4 Tamai et al. [22] 2022 486 patients (243 OPLL, 243 controls) Cervical radiography, CT Efficient-NetB2 CNN 88% 90% 86% 0.94 NA NA NA Spine surgeon-confirmed diagnosis based on radiographic findings The CNN model had significantly higher diagnostic accuracy than spine surgeons. Limited generalizability to non-Japanese populations, inability to distinguish small OPLL from osteophytes CNN models could assist in identifying OPLL in radiographs, reducing dependence on CT.
5 Kim et al. [23] 2022 901 patients (26 with C5 palsy) MRI, cervical radiography Adaptive Reinforcement Learning (ADA) with downsampling 94.3% 80% 83.53% 0.88 NA 12.5% 99.3% Diagnosis based on imaging and postoperative outcomes validated by spine surgeons Best ML model (ADA) outperformed logistic regression in predicting postoperative C5 palsy. Small C5 palsy group; potential overfitting; no external validation ML models can identify high-risk patients for C5 palsy, aiding in surgical planning.
6 Ogawa et al. [24] 2022 100 patients (50 OPLL, 50 control) Plain lateral cervical radiographs VGG16 CNN 90% 80% 100% 0.924 NA NA NA CT-confirmed OPLL diagnosis based on radiographs CNN outperformed spine surgeons in accuracy and specificity Small sample size, exclusion of surgical patients, limited OPLL lesion visualization due to overlapping CNN could aid radiologists in improving OPLL detection on radiographs.
7 Chae et al. [25] 2022 407 patients (207 OPLL, 200 controls) Cervical radiography, CT Residual U-Net with atrous spatial pooling NA 91% 69% 0.851 NA NA NA CT-confirmed OPLL diagnosis based on radiographs DL model enhanced diagnostic performance of radiologists in detecting cervical OPLL Small sample size; no external validation; segmental-type OPLL challenging to detect DL models can serve as secondary readers, improving diagnostic confidence and accuracy.
8 Ito et al. [26] 2023 478 patients CT, MRI DLM 74.6% (all); 91.7% (neurological) NA NA 0.917 (neurological) NA NA NA CT-confirmed OPLL diagnosis based on radiographs DLM predicted neurological complications with higher accuracy than logistic regression. Single-database validation; cohort limited to Japanese population DLM aids in surgical planning and patient counseling by identifying complication risks.
9 Shemesh et al. [7] 2023 900 patients (65 cervical OPLL) MRI, CT VGG16 CNN 98% 85% 98% 0.917 NA 85% 98% MRI-based AI system validated against CT findings MRI-based model demonstrated high diagnostic accuracy Single-center study, false positives due to ossified bulging discs MRI-based AI models can reduce reliance on CT scans, aiding surgical decision-making.
10 Zhu et al. [27] 2023 775 patients (144 OPLL) CT, MRI, radiography Random forest, SVM, GLM, XGB, nomogram NA NA NA 0.76 (training), 0.728 (validation) NA NA NA Multidisciplinary spine surgeon consensus diagnosis using imaging ML model predicted cervical OPLL and postoperative differences with moderate accuracy. Single-center study; reliance on internal validation; no comparison with external cohorts ML models can assist in early detection of OPLL, aiding in optimizing surgical planning.
11 Qu et al. [28] 2024 684 patients (272 OPLL, 412 controls) MRI (T1WI, T2WI sagittal) ResNet34, ResNet50, ResNet101 92.98%–97.66% 83.82%–94.12% 99.03%–100% 0.914–0.971 NA NA NA Diagnosis confirmed by spine surgeons using MRI and clinical records ResNet101 had the highest diagnostic accuracy, sensitivity, and specificity compared to spine surgeons. Single-center study, no inclusion of axial images, no evaluation for small OPLL lesions CNN models significantly improve diagnostic accuracy for OPLL using MRI.

AUC, area under the curve; F1 score, harmonic mean of precision and recall; PPV, positive predictive value; NPV, negative predictive value; NA, not applicable; OPLL, ossification of the posterior longitudinal ligament; RNN, residual neural network; MRI, magnetic resonance imaging; RF, random forest; ML, machine learning; JOA, Japanese Orthopaedic Association; CNN, convolutional neural network; CT, computed tomography; DL, deep learning; DLM, deep learning model; AI, artificial intelligence; SVM, support vector machine; GLM, generalized linear model; XGB, XGBoost (extreme gradient boosting); T1WI, T1-weighted imaging; T2WI, T2-weighted imaging.