Development and internal validation of machine-learning models for predicting survival in patients who underwent surgery for spinal metastases

Article information

Asian Spine J. 2024;.asj.2023.0314
Publication date (electronic) : 2024 May 20
doi : https://doi.org/10.31616/asj.2023.0314
1Department of Orthopaedic Surgery, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
2Siriraj Informatics and Data Innovation Center, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
3Department of Orthopaedic Surgery, Faculty of Medicine Vajira Hospital, Navamindradhiraj University, Bangkok, Thailand
Corresponding author: Panya Luksanapruksa Division of Spine Surgery, Department of Orthopedic Surgery, Faculty of Medicine Siriraj Hospital, Mahidol University, 2 Wanglang Road, Bangkoknoi, Bangkok 10700, Thailand Tel: +66-2-419-7969; Fax: +66-2-419-7961; E-mail: panya.luk@mahidol.ac.th; cutecarg@yahoo.com
Received 2023 September 19; Revised 2024 January 17; Accepted 2024 January 23.

Abstract

Study Design

A retrospective study.

Purpose

This study aimed to develop machine-learning algorithms for predicting survival in patients who underwent surgery for spinal metastasis.

Overview of Literature

This study develops machine-learning models to predict postoperative survival in spinal metastasis patients, filling the gaps of traditional prognostic systems. Utilizing data from 389 patients, the study highlights XGBoost and CatBoost algorithms̓ effectiveness for 90, 180, and 365-day survival predictions, with preoperative serum albumin as a key predictor. These models offer a promising approach for enhancing clinical decision-making and personalized patient care.

Methods

A registry of patients who underwent surgery (instrumentation, decompression, or fusion) for spinal metastases between 2004 and 2018 was used. The outcome measure was survival at postoperative days 90, 180, and 365. Preoperative variables were used to develop machine-learning algorithms to predict survival chance in each period. The performance of the algorithms was measured using the area under the receiver operating characteristic curve (AUC).

Results

A total of 389 patients were identified, with 90-, 180-, and 365-day mortality rates of 18%, 41%, and 45% postoperatively, respectively. The XGBoost algorithm showed the best performance for predicting 180-day and 365-day survival (AUCs of 0.744 and 0.693, respectively). The CatBoost algorithm demonstrated the best performance for predicting 90-day survival (AUC of 0.758). Serum albumin had the highest positive correlation with survival after surgery.

Conclusions

These machine-learning algorithms showed promising results in predicting survival in patients who underwent spinal palliative surgery for spinal metastasis, which may assist surgeons in choosing appropriate treatment and increasing awareness of mortality-related factors before surgery.

Introduction

Cancer is among the leading causes of death worldwide. According to estimates from the World Health Organization in 2019, cancer is the second leading cause of death before the age of 70 years in 112 of 183 countries [1]. The spine is one of the most common sites of bone metastases, with a prevalence of up to 50% [2], and spinal metastases were observed in 5%–10% of patients with cancer [3].

Improvements in cancer treatment have increased the survival of patients with metastatic disease and the rate of metastatic epidural spinal cord compression (MESCC). Surgical intervention is often undertaken to reduce pain, stabilize the spine, or address neurologic deficits and has shown better results than conservative treatment [4,5]. Specifically, the combination of surgical intervention and radiotherapy is superior to radiotherapy alone, particularly in patients with MESCC [6,7].

The estimation of life expectancy using the neurologic, oncologic, mechanical, and systemic (NOMS) framework is an important aspect of systemic assessment for patients with spinal metastasis [8] and plays a useful guiding role in deciding for or against surgical interventions [9]. Appropriate surgical strategies are determined based on the estimation of postoperative survival, and previous studies have reported many prognostic scoring systems, such as the revised Tokuhashi score [10], Baur score [11], Tomita score [12], and Skeletal Oncology Research Group (SORG) [13]. However, these tools have been reported to have decreased accuracy over time and underestimate survival because each prognostic scoring system has its suitable and specific population study [14,15], as well as changes in the disease and patient’s characteristics, improving results of concurrent therapy, and presence of mortality and/or survival-related factors that are not included in these predictive tools [16-18].

To improve the accuracy of prediction, this study aimed to develop predictive tools using machine learning. These tools will consider several mortality-related factors, including patient characteristics, disease characteristics, laboratory investigations, and nonsurgical interventions. With the ability to continuously self-learn, these tools will be able to maximize their effectiveness in handling complex data.

Materials and Methods

Guideline

This study followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis guidelines [19].

Source of data

Data were obtained from the medical database of the Department of Orthopedic Surgery. Consecutive patients who underwent spinal surgery between January 2004 and December 2018 were retrospectively identified.

Participants

Consecutive patients who underwent surgery (spinal decompression, fusion, or instrumentation) for spinal metastases between January 2004 and December 2018 were enrolled. The inclusion criteria were as follows: (1) diagnosis of spinal metastasis identified using International Classification of Diseases, 10th revision, Thai Modification codes (ICD 10-TM) C79.5 and C79.8; (2) age ≥18 years; and (3) history of surgery for spinal metastasis identified using International Classification of Diseases, 9th revision, Clinical Modification procedure with extension codes (ICD 9-CM) 03.0, 03.4, 03.09, 81.0, and 81.00–81.08 for surgical procedure.

Ethics approval

This study was approved by the Siriraj Institutional Review Board (SIRB) (COA no., Si401/2020; SIRB protocol no., 195/2563(IRB1). Informed consent was obtained from all individual participants included in the study.

Outcome and predictive variables

The primary outcome of this study was survival at postoperative days 90, 180, and 365. In the selection of preoperative predictors, established scoring systems [10-13,20,21] were used, and recent studies that have highlighted prognostic factors for patients with spinal metastasis were referred [16,18,22]. The previously reported factors are shown in Fig. 1. Our goal was to construct a predictive framework that not only aligns with recognized scoring models but also integrates emerging survival-related factors, some of which have been traditionally overlooked.

Fig. 1.

Categorized factors associated survival rate from previous studies. SORG, Skeletal Oncology Research Group.

Variables

These variables were categorized into four groups:

1) Patient characteristics

These included age (years), sex, body mass index (kg/m2), Charlson comorbidity index (CCI) in addition to metastatic cancer [23], American Society of Anesthesiologists (ASA) physical status classification, Frankel grade classification, Eastern Cooperative Oncology Group (ECOG) performance status, Karnofsky’s performance status (KPS), and history of smoking. These were included based on their established relevance in previous scoring systems such as the revised Tokuhashi score and SORG algorithms, and their known effect on survival as suggested by recent literature.

2) Disease characteristics

These included the primary site of cancer (if presented, or unknown primary tumor), primary tumor histology, level of spinal metastases (cervical, thoracic, lumbar, sacrum, or combination of regions), presence of myelopathy, symptomatic neurological compression level (upper or lower of the cervical, thoracic, lumbar spines, and sacrum), presence of solid-organ metastasis, and time from clinical presentation (pain or neurological deficit) to surgery (days). These factors are critical because they directly relate to the tumor biology and the patient’s clinical presentation, which can be significant indicators of survival outcomes.

3) Laboratory investigations

These include hemoglobin (g/dL), platelet–lymphocyte ratio, albumin (g/dL), alkaline phosphatase (ALP, IU/L), creatinine (mg/dL), and serum calcium levels (mg/dL). These markers provide insight into the patient’s physiological state and correlate with postoperative survival rates.

4) Other interventions

Notably, preoperative chemotherapy was considered because of its potential effect on patient outcomes, reflecting both the treatment landscape and recent findings on the interplay between systemic therapy and surgical interventions.

Missing data

The numbers of missing data were as follows; CCI, 7 (1.8%); ECOG performance status, 8 (2%); ASA classification, 4 (1%); KPS, 4 (1%); Frankel grade classification, 7 (1.8%); exact time from clinical presentation to surgery, 133 (34.2%); body mass index, 87 (22.3%); hemoglobin, 13 (3.3%); albumin level, 28 (7%); ALP, 37 (9.5%); platelet–lymphocyte ratio, 77 (19.8%); serum calcium, 103 (26%); and primary tumor histology, 84 (22%).

Preprocessing

In cases where preoperative data were unavailable, multiple imputations with chained equations were utilized [24]. To reduce the influence of different variable units and quantity levels, numerical variables were scaled to a standard deviation of 1 and a mean of 0, and dummy encoding was employed for categorical variables. Outliers whose laboratory values were 3 standard deviations from the average laboratory value at our hospital were eliminated.

Algorithm training and validation

The following algorithms were selected: extreme gradient boosting, logistic regression, linear discriminant analysis, random forest classifier, naive Bayes, gradient boosting classifier, quadratic discriminant analysis, AdaBoost classifier, CatBoost classifier (https://catboost.ai/), light gradient boosting machine, extra trees classifier, K neighbors classifier, and decision tree classifier. All models were created with Python ver. 3.9 (Python Software Foundation, Wilmington, DE, USA) using the Scikit-learn library ver. 1.0.1 (https://scikit-learn.org/stable/whats_new/v1.0.html#version-1-0-1) under an open-source simplified BSD (Berkeley Software Distribution) license.

In the training dataset, manual parameter tuning, grid search, and random search were conducted to identify the optimal hyperparameters that provided the highest accuracy in a five-fold internal cross-validation of each model. The dataset was randomly divided into the training and testing datasets at an 80:20 ratio. Model training was conducted using the training dataset with performance validation by five-fold cross-validation. A class weighting strategy was also used to ensure that the trained model would take each class into equal account despite class imbalance.

The performances of the algorithms were evaluated using the testing dataset by evaluating and comparing the area under the receiver operating characteristic (ROC) curve (AUC), F1-score, accuracy, and calibration loss among models [25,26]. The performance of a classification model was also assessed using a confusion matrix that compares the actual values (from the testing dataset) with the predicted values. However, these metrics include tradeoffs, such as the tradeoff between precision and recall; thus, the optimal model for deployment was selected using the AUC. The contribution of each variable to the prediction model was evaluated using the Shapley Additive Explanations (SHAP) values. Each point on the chart represents one SHAP value for a prediction and feature. Red and blue indicated higher and lower values of a feature, respectively.

The most accurate predictive models of each period will be explained in detail in the performance assessment section.

Results

Participants

In total, 389 patients with 90-, 180-, and 365-day mortality rates of 71 (18%), 160 (41%), and 174 (45%) postoperatively were identified, respectively. Moreover, 167 (43%) were female. The median age was 57 years (interquartile range, 40.5–73.5 years). The baseline characteristics of the study population are shown in Table 1.

Characteristics of patients (n=389)

Model development and performance for survival prediction

1) 365-day survival prediction

Most algorithms achieved fair to good performance in the internal validation or training datasets (ROC AUC, 0.632–0.731) (Table 2). XGBoost (https://xgboost.ai/) was chosen as the final model with an AUC of the testing dataset of 0.693, accuracy of 0.564, precision of 0.508, recall of 0.857, and F1 score of 0.638. Albumin was the most influential factor in 365-day survival prediction. A higher serum albumin level, lower ALP levels, and lower platelet–lymphocyte ratio positively affected the survival chance. The ROC curve of the XGBoost algorithm and the SHAP value are shown in Fig. 2.

Comparison of the model performance in cross-validation of the dataset for 365-day survival prediction

Fig. 2.

Characteristics of XGBoost model for 365-day survival prediction: receiver operating characteristic (ROC) curve and Shapley Additive Explanations (SHAP) value summary graph and their impact on the prediction. AUC, area under the receiver operating characteristic curve.

2) 180-day survival prediction

Most algorithms achieved fair to good performance in the internal validation or training dataset (ROC AUC, 0.654–0.726) (Table 3). XGBoost was chosen as the final model with an AUC of the testing dataset of 0.744, accuracy of 0.5, precision of 0.473, recall of 1, and F1 score of 0.642. Albumin was the most influential factor in 365-day survival prediction. Higher serum albumin, lower serum calcium, and higher hemoglobin levels positively affected the survival chance. The ROC curve of the XGBoost algorithm and the SHAP value are shown in Fig. 3.

Comparison of the model performance in cross-validation of the dataset for 180-day survival prediction

Fig. 3.

Characteristics of XGBoost model for 180-day survival prediction: receiver operating characteristic (ROC) curve and Shapley Additive Explanations (SHAP) value summary graph and their impact on the prediction. AUC, area under the receiver operating characteristic curve.

3) 90-day survival prediction

Most algorithms achieved fair to good performance in the internal validation or training dataset (ROC AUC, 0.701–0.749) (Table 4). CatBoost was chosen as the final model with an AUC of the testing dataset of 0.758, accuracy of 0.705, precision of 0.658, recall of 0.714, and F1 score of 0.685. Albumin is the most influential factor in 365-day survival prediction. Higher serum albumin levels, lower ALP level, and primary breast cancer positively affected the survival chance. The ROC curve of the CatBoost algorithm and the SHAP value are shown in Fig. 4.

Comparison of the model performance in cross-validation of the dataset for 90-day survival prediction

Fig. 4.

Characteristics of CatBoost model for 90-day survival prediction: receiver operating characteristic (ROC) curve and Shapley Additive Explanations (SHAP) value summary graph and their impact on the prediction. AUC, area under the receiver operating characteristic curve.

Discussion

The primary goal of spinal metastasis surgery is palliative, focusing on preserving or improving the quality of life by controlling pain and maintaining mobility. The treatment of metastatic spinal cancers involves a multidisciplinary approach, including chemotherapy, radiotherapy, and surgery. Various decision-making systems have been developed to assist in choosing the most appropriate treatment for each patient.

An updated review of the treatment strategy for spinal metastasis by Hong et al. [27] in 2022 classified the decision-making system as follows: First, classification-based prognostic models, such as the Tomita et al. [12], Tokuhashi et al. [10], Bauer and Wedin [11], and Katagiri et al. [20] scoring systems, estimate patient survival using various prognostic factors, such as the type of primary cancer and presence of visceral metastasis. Despite their usage, recent studies have highlighted their poor accuracy, partly because of their inability to account for advancements in cancer treatments [28-30]. This has resulted in the development of “second-generation models.” Many previous studies have utilized machine-learning algorithms to develop decision-making systems with satisfactory accuracy in predicting survival [31,32]. Second, principle-based systems, such as the NOMS framework [8] and the LMNOP (location, mechanical instability, neurology, oncology, and patient’s factors) system [33], offer more specific treatment suggestions based on a patient’s oncologic, systemic, and functional states. They are better suited to reflect advancements in various treatments than classification-based models. Future decision-making systems for spinal metastasis are expected to incorporate multiinstitutional data, consider tumor genetics, utilize novel methodologies such as artificial intelligence, and integrate prognostic and principle-based systems for a more comprehensive approach.

In this study, we developed predictive tools for predicting survival in patients with spinal metastasis after palliative surgery. The variables were selected from previous predictive models of metastatic spine disease [10-13,20,21] and other mortality-related factors of patients with spinal metastasis [16,18,22]. The models achieved satisfactory performance with AUCs of 0.793, 0.726, and 0.731 for the 90-, 180-, and 365-day survival prediction. XGBoost achieved superior performance on both cross-validations of the training and testing datasets in 180- and 365-day survival prediction. CatBoost achieved superior performance on both cross-validations of the training and testing datasets in the 90-day survival prediction.

We stratified several previously reported variables of the existing predictive scoring systems into four groups and found that preoperative albumin level was the most important variable in survival prediction, which is related to previous studies [16,18]; however, it has never been included in the most cited predictive scoring systems [10-13,20,21]. Other important laboratory markers include the platelet–lymphocyte ratio, serum calcium, hemoglobin level, serum creatinine, and ALP level. KPS is also an important variable in 90- day survival prediction, which is correlated with previous scoring systems [10,11,17,34].

In a meta-analysis study, Luksanapruksa et al. [18] identified 17 independent poor prognostic factors and categorized them into cancer-specific and nonspecific prognostic factors, such as KPS, time to develop motor deficit before treatment, ECOG performance status, sex, presence of visceral metastases, and primary tumor type. These factors were included as variables during model development in the present study. However, the primary tumor type appears to be one of the most influential factors reported in the present study, and this may have resulted from the moderate amount of missing data of the variables in our study.

This study showed that a higher albumin level and a lower platelet–lymphocyte ratio positively affected the increase in the survival chance of patients after surgery, which corresponded to the findings of Schoenfeld et al. [16]. They reported that the platelet–lymphocyte ratio and serum albumin at presentation were significantly associated with survival and 6-month and 1-year mortality [16].

The important variables can be divided into modifiable (albumin, serum calcium, and hemoglobin levels) and nonmodifiable factors. Adjustment of preoperative modifiable factors to the proper level may increase the probability of survival of these patients after palliative surgery.

Compared with previous machine-learning algorithms to predict the survival of patients with spinal metastasis using SORG [31] and their updated external validation studies [32,35,36], Karhade et al. [31] reported 90-day and 1-year mortality rates of 181 (25.1%) and 385 (54.3%), respectively. The stochastic gradient boosting algorithm demonstrated the best performance for 90-day and 1-year mortality prediction with AUCs of 0.83 and 0.89 on the testing datasets, respectively. The external validation with many cohorts also showed the potential for 90-day and 365-day survival prediction with AUCs of 0.726–0.84 and 0.738–0.9, respectively (Table 5).

Comparison of algorithms between studies

Interestingly, external validation with different populations in another country showed reduced performance of the 90- and 365-day predictive models. This may result from different patient characteristics, patient selection guidelines at each medical center, or treatment algorithms. Even with the high accuracy in the prediction of the machine-learning model, it may not be applicable to the global population with the same accuracy.

Although machine-learning predictive tools hold great promise, they should not be seen as replacements for clinical judgment and expertise. Instead, they should be viewed as complementary tools that can help clinicians make informed decisions. As with any predictive model, the results must be interpreted within the context of the individual patient’s unique circumstances and medical history.

Despite the promising results of this study and the potential benefits of using machine-learning tools in predicting the survival of patients with spinal metastasis, certain limitations need to be acknowledged. First, our data were obtained from a single academic medical center, which resulted in the low number of patients included, potentially leading to the reduced performance of some machine-learning models. Second, missing data for several variables also present a challenge in our study. This limitation is inherent to retrospective studies and could affect the accuracy of our predictive models. Although multiple imputations with chained equations were applied to address this issue, the presence of missing data still introduces some degree of uncertainty. Machine-learning algorithms must thrive on vast and diverse datasets for optimal accuracy and generalizability. In this study, the relatively limited sample size may have hindered the full potential of the models to achieve higher accuracy. We acknowledge that the use of larger, multicenter datasets would likely result in more robust predictive models with enhanced accuracy and applicability to broader patient populations.

Conclusions

This study highlights the potential of machine-learning algorithms in predicting survival in patients who underwent palliative surgery for spinal metastasis. Despite the limitations of this study, it paves the way for future research in this area. As the machine-learning field continues to advance and more data become available, we anticipate even greater accuracy and utility of predictive tools for enhancing patient care and treatment outcomes in the realm of spinal metastasis.

Notes

Conflict of Interest

No potential conflict of interest relevant to this article was reported.

Author Contributions

Conceptualization: BS, SW, PL. Data curation: BS, PI, SW. Formal analysis: KV, PC. Writing–original draft: BS, SW, PL. Writing–review & editing: BS, KV, PI, PC, SW, PL. Project administration: KV. Supervision: PL. All authors read and approved the final manuscript.

Acknowledgments

The authors gratefully acknowledge the patients who agreed to participate in this study and Miss Pinprapha Boonhyad of the Division of Research, Department of Orthopaedic Surgery, Faculty of Medicine Siriraj Hospital, Mahidol University for assistance with statistical analysis, manuscript preparation, and journal submission process.

References

1. Sung H, Ferlay J, Siegel RL, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2021;71:209–49.
2. Aebi M. Spinal metastasis in the elderly. Eur Spine J 2003;12 Suppl 2(Suppl 2):S202–13.
3. Yoshihara H, Yoneoka D. Trends in the surgical treatment for spinal metastasis and the in-hospital patient outcomes in the United States from 2000 to 2009. Spine J 2014;14:1844–9.
4. Schoenfeld AJ, Losina E, Ferrone ML, et al. Ambulatory status after surgical and nonsurgical treatment for spinal metastasis. Cancer 2019;125:2631–7.
5. Schoenfeld AJ, Schwab JH, Ferrone ML, et al. Non-operative management of spinal metastases: a prognostic model for failure. Clin Neurol Neurosurg 2020;188:105574.
6. Fehlings MG, Nater A, Tetreault L, et al. Survival and clinical outcomes in surgically treated patients with metastatic epidural spinal cord compression: results of the prospective multicenter AOSpine study. J Clin Oncol 2016;34:268–76.
7. Patchell RA, Tibbs PA, Regine WF, et al. Direct decompressive surgical resection in the treatment of spinal cord compression caused by metastatic cancer: a randomised trial. Lancet 2005;366:643–8.
8. Laufer I, Rubin DG, Lis E, et al. The NOMS framework: approach to the treatment of spinal metastatic tumors. Oncologist 2013;18:744–51.
9. Cassidy JT, Baker JF, Lenehan B. The role of prognostic scoring systems in assessing surgical candidacy for patients with vertebral metastasis: a narrative review. Global Spine J 2018;8:638–51.
10. Tokuhashi Y, Matsuzaki H, Oda H, Oshima M, Ryu J. A revised scoring system for preoperative evaluation of metastatic spine tumor prognosis. Spine (Phila Pa 1976) 2005;30:2186–91.
11. Bauer HC, Wedin R. Survival after surgery for spinal and extremity metastases: prognostication in 241 patients. Acta Orthop Scand 1995;66:143–6.
12. Tomita K, Kawahara N, Kobayashi T, Yoshida A, Murakami H, Akamaru T. Surgical strategy for spinal metastases. Spine (Phila Pa 1976) 2001;26:298–306.
13. Paulino Pereira NR, Janssen SJ, van Dijk E, et al. Development of a prognostic survival algorithm for patients with metastatic spine disease. J Bone Joint Surg Am 2016;98:1767–76.
14. Ahmed AK, Goodwin CR, Heravi A, et al. Predicting survival for metastatic spine disease: a comparison of nine scoring systems. Spine J 2018;18:1804–14.
15. Tabourel G, Terrier LM, Dubory A, et al. Are spine metastasis survival scoring systems outdated and do they underestimate life expectancy?: caution in surgical recommendation guidance. J Neurosurg Spine 2021;35:527–34.
16. Schoenfeld AJ, Ferrone ML, Passias PG, et al. Laboratory markers as useful prognostic measures for survival in patients with spinal metastases. Spine J 2020;20:5–13.
17. Tokuhashi Y, Uei H, Oshima M, Ajiro Y. Scoring system for prediction of metastatic spine tumor prognosis. World J Orthop 2014;5:262–71.
18. Luksanapruksa P, Buchowski JM, Hotchkiss W, Tongsai S, Wilartratsami S, Chotivichit A. Prognostic factors in patients with spinal metastasis: a systematic review and metaanalysis. Spine J 2017;17:689–708.
19. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ 2015;350:g7594.
20. Katagiri H, Okada R, Takagi T, et al. New prognostic factors and scoring system for patients with skeletal metastasis. Cancer Med 2014;3:1359–67.
21. van der Linden YM, Dijkstra SP, Vonk EJ, Marijnen CA, Leer JW, ; Dutch Bone Metastasis Study Group. Prediction of survival in patients with metastases in the spinal column: results based on a randomized trial of radiotherapy. Cancer 2005;103:320–8.
22. Arrigo RT, Kalanithi P, Cheng I, et al. Predictors of survival after surgical treatment of spinal metastasis. Neurosurgery 2011;68:674–81.
23. Quan H, Li B, Couris CM, et al. Updating and validating the Charlson comorbidity index and score for risk adjustment in hospital discharge abstracts using data from 6 countries. Am J Epidemiol 2011;173:676–82.
24. Laqueur HS, Shev AB, Kagawa RM. SuperMICE: an ensemble machine learning approach to multiple imputation by chained equations. Am J Epidemiol 2022;191:516–25.
25. Steyerberg EW, Vergouwe Y. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J 2014;35:1925–31.
26. Van Calster B, Vickers AJ. Calibration of risk prediction models: impact on decision-analytic performance. Med Decis Making 2015;35:162–9.
27. Hong SH, Chang BS, Kim H, Kang DH, Chang SY. An updated review on the treatment strategy for spinal metastasis from the spine surgeon’s perspective. Asian Spine J 2022;16:799–811.
28. Nakajima H, Watanabe S, Honjoh K, et al. Prognosis after palliative surgery for patients with spinal metastasis: comparison of predicted and actual survival. Cancers (Basel) 2022;14:3868.
29. Hessler C, Vettorazzi E, Madert J, Bokemeyer C, Panse J. Actual and predicted survival time of patients with spinal metastases of lung cancer: evaluation of the robustness of the Tokuhashi score. Spine (Phila Pa 1976) 2011;36:983–9.
30. Nater A, Tetreault LA, Kopjar B, et al. Predictive factors of survival in a surgical series of metastatic epidural spinal cord compression and complete external validation of 8 multivariate models of survival in a prospective North American multicenter study. Cancer 2018;124:3536–50.
31. Karhade AV, Thio QC, Ogink PT, et al. Predicting 90-day and 1-year mortality in spinal metastatic disease: development and internal validation. Neurosurgery 2019;85:E671–81.
32. Yang JJ, Chen CW, Fourman MS, et al. International external validation of the SORG machine learning algorithms for predicting 90-day and one-year survival of patients with spine metastases using a Taiwanese cohort. Spine J 2021;21:1670–8.
33. Paton GR, Frangou E, Fourney DR. Contemporary treatment strategy for spinal metastasis: the “LMNOP” system. Can J Neurol Sci 2011;38:396–403.
34. Tang V, Harvey D, Park Dorsay J, Jiang S, Rathbone MP. Prognostic indicators in metastatic spinal cord compression: using functional independence measure and Tokuhashi scale to optimize rehabilitation planning. Spinal Cord 2007;45:671–7.
35. Karhade AV, Ahmed AK, Pennington Z, et al. External validation of the SORG 90-day and 1-year machine learning algorithms for survival in spinal metastatic disease. Spine J 2020;20:14–21.
36. Shah AA, Karhade AV, Park HY, et al. Updated external validation of the SORG machine learning algorithms for prediction of ninety-day and one-year mortality after surgery for spinal metastasis. Spine J 2021;21:1679–86.

Article information Continued

Fig. 1.

Categorized factors associated survival rate from previous studies. SORG, Skeletal Oncology Research Group.

Fig. 2.

Characteristics of XGBoost model for 365-day survival prediction: receiver operating characteristic (ROC) curve and Shapley Additive Explanations (SHAP) value summary graph and their impact on the prediction. AUC, area under the receiver operating characteristic curve.

Fig. 3.

Characteristics of XGBoost model for 180-day survival prediction: receiver operating characteristic (ROC) curve and Shapley Additive Explanations (SHAP) value summary graph and their impact on the prediction. AUC, area under the receiver operating characteristic curve.

Fig. 4.

Characteristics of CatBoost model for 90-day survival prediction: receiver operating characteristic (ROC) curve and Shapley Additive Explanations (SHAP) value summary graph and their impact on the prediction. AUC, area under the receiver operating characteristic curve.

Table 1.

Characteristics of patients (n=389)

Characteristic Value
Age (yr) 57 [16.5]
Female sex 167 (42.9)
Body mass index (kg/m2) 22.04 [4.9]
History of smoking 56 (14.4)
American Society of Anesthesiologists Physical Status Classification
 Class 2 282 (72.5)
 Class 3 108 (26.3)
The Eastern Cooperative Oncology Group Performance Scale
 0–2 249 (64.0)
 3–4 132 (33.9)
Karnofsky Performance Score
 0–40 47 (12.1)
 50–70 232 (59.6)
 80–100 106 (27.2)
Charlson comorbidities index
 ≤8 290 (74.6)
 >8 92 (23.6)
Level of spinal metastases
 Cervical 13 (3.3)
 Thoracic 96 (24.7)
 Lumbar 61 (15.7)
 Cervical and thoracic 20 (5.1)
 Thoracic and lumbar 45 (11.6)
 More than 2 regions 22 (5.6)
Symptomatic neurologic compression level
 Cervical 20 (5.1)
 Upper thoracic (T1–T6) 53 (13.6)
 Lower thoracic (T7–T12) 64 (16.5)
 Upper lumbar (L1–L3) 52 (13.4)
 Lower lumbar and sacrum 26 (6.7)
 No compression 73 (18.8)
Presence of myelopathy 150 (38.6)
Other organs metastasis 80 (20.6)
Primary site of cancer
 Breast 79 (20.3)
 Lung 69 (17.7)
 Unknown 64 (16.4)
 Prostate 41 (10.5)
 Liver 17 (4.4)
 Thyroid 17 (4.4)
 Hematologic malignancy 16 (4.1)
Others 86 (22.1)
Hemoglobin (g/dL) 12.1 [2.2]
Albumin (g/dL) 3.8 [0.7]
Alkaline phosphatase (U/L) 118 [101.5]
Platelet–lymphocyte ratio 199 [140.28]
Serum creatinine (mg/dL) 0.78 [0.34]
Serum calcium (mg/dL) 9.2 [0.9]
Preoperative chemotherapy 65 (16.7)
Postoperative chemotherapy 103 (26.5)
Postoperative local radiotherapy 193 (49.6)
Presentation to surgery time (day) 14 [23]

Values are presented as median [interquartile range] or number (%).

Table 2.

Comparison of the model performance in cross-validation of the dataset for 365-day survival prediction

Model AUC Accuracy Recall Precision F1
Extreme gradient boosting 0.7308 0.5819 0.9203 0.5202 0.6637
Logistic regression 0.7229 0.6526 0.6462 0.6130 0.6239
Linear discriminant analysis 0.7223 0.6691 0.6462 0.6321 0.6339
Random forest classifier 0.7200 0.6622 0.6247 0.6257 0.6227
Naive Bayes classifier 0.7198 0.6495 0.7824 0.5845 0.6659
Gradient boosting classifier 0.7173 0.6688 0.6104 0.6336 0.6193
Quadratic discriminant analysis 0.715 0.6432 0.7764 0.5779 0.6606
AdaBoost classifier 0.7101 0.6367 0.5962 0.5919 0.5917
CatBoost classifier 0.7083 0.6818 0.6819 0.6352 0.6548
Light gradient boosting machine 0.6987 0.6332 0.5758 0.6013 0.5838
Extra trees classifier 0.6964 0.6463 0.5676 0.6217 0.5871
K neighbors classifier 0.6789 0.5593 0.8698 0.5092 0.6400
Decision tree classifier 0.6317 0.5981 0.5538 0.5542 0.5497

AUC, area under the receiver operating characteristic curve.

Table 3.

Comparison of the model performance in cross-validation of the dataset for 180-day survival prediction

Model AUC Accuracy Recall Precision F1
Extreme gradient boosting 0.7261 0.476 0.9643 0.4595 0.6219
Gradient boosting classifier 0.7231 0.6786 0.6258 0.6612 0.6331
CatBoost classifier 0.7203 0.6757 0.6775 0.6445 0.6514
Random forest classifier 0.7202 0.6593 0.5973 0.6323 0.6082
Linear discriminant analysis 0.7199 0.6528 0.6401 0.6157 0.6219
Logistic regression 0.7182 0.6401 0.633 0.6013 0.6114
Quadratic discriminant analysis 0.7179 0.6499 0.7192 0.5896 0.6462
Extra trees classifier 0.7172 0.6462 0.6192 0.6058 0.6041
Naive Bayes classifier 0.7168 0.6528 0.7055 0.6006 0.6429
Light gradient boosting machine 0.7103 0.653 0.6126 0.6177 0.6
AdaBoost classifier 0.6945 0.6272 0.5703 0.6166 0.5757
K neighbors classifier 0.6736 0.6302 0.7621 0.5666 0.6471
Decision tree classifier 0.6541 0.6177 0.583 0.5946 0.5779

AUC, area under the receiver operating characteristic curve.

Table 4.

Comparison of the model performance in cross-validation of the dataset for 90-day survival prediction

Model AUC Accuracy Recall Precision F1
CatBoost classifier 0.7496 0.682 0.6610 0.6382 0.6451
Light gradient boosting machine 0.7397 0.6818 0.6687 0.6439 0.6505
AdaBoost classifier 0.7355 0.6659 0.7412 0.6085 0.6642
Extra trees classifier 0.7310 0.669 0.5885 0.6545 0.6099
Random forest classifier 0.7297 0.6788 0.6533 0.6415 0.6400
Extreme gradient boosting 0.7252 0.5561 0.9786 0.5036 0.6642
Gradient boosting classifier 0.724 0.6499 0.6044 0.6084 0.6008
Naive Bayes classifier 0.7237 0.5949 0.7923 0.5391 0.6362
Linear discriminant analysis 0.7111 0.6562 0.639 0.6105 0.6218
K neighbors classifier 0.7076 0.6077 0.8418 0.5437 0.6576
Logistic regression 0.7049 0.6467 0.6319 0.5994 0.6130
Quadratic discriminant analysis 0.7014 0.6081 0.6269 0.5540 0.5835
Decision tree classifier 0.6725 0.5981 0.7368 0.5771 0.5801

AUC, area under the receiver operating characteristic curve.

Table 5.

Comparison of algorithms between studies

Studies Author (year) No. of patients Category 90-Day prediction 180-Day prediction 365-Day prediction
Present study 389 Mortality rate (%) 18.00 41.00 45.00
Algorithm CatBoost Extreme gradient boosting Extreme gradient boosting
AUC 0.758 0.744 0.693
Development and internal validation of SORG Karhade et al. [31] (2019) 732 Mortality rate (%) 25.10 54.30
Algorithm Stochastic gradient boosting Stochastic gradient boosting
AUC 0.83 0.89
External validation of SORG Karhade et al. [35] (2020) 176 Mortality rate (%) 22.70 56.20
AUC 0.75 0.77
Updated external validation of SORG Shah et al. [36] (2021) 298 Mortality rate (%) 21.90 52.60
AUC 0.84 0.90
International validation of SORG in Taiwan Yang et al. [32] (2021) 427 Mortality rate (%) 26.00 60.00
AUC 0.73 0.74

AUC, area under the receiver operating characteristic curve; SORG, Skeletal Oncology Research Group.