Development of Machine-Learning Models to Predict Ambulation Outcomes Following Spinal Metastasis Surgery
Article information
Abstract
Study Design
Retrospective cohort study.
Purpose
This study aimed to develop machine-learning algorithms to predict ambulation outcomes following surgery for spinal metastasis.
Overview of Literature
Postoperative ambulation status following spinal metastasis surgery is currently difficult to predict. The improved ability to predict this important postoperative outcome would facilitate management decision-making and help in determining realistic treatment goals.
Methods
This retrospective study included patients who underwent spinal metastasis at a university-based medical center in Thailand between January 2009 and November 2021. Collected data included preoperative parameters and ambulatory status 90 and 180 days following surgery. Thirteen machine-learning algorithms, namely, artificial neural network, logistic regression, CatBoost classifier, linear discriminant analysis, extreme gradient boosting, extra trees classifier, random forest classifier, gradient boosting classifier, light gradient boosting machine, naïve Bayes, K-neighbor classifier, Ada boost classifier, and decision tree classifier were developed to predict ambulatory status 90 and 180 days following surgery. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC) and F1-score.
Results
In total, 167 patients were enrolled. The number of patients classified as ambulatory 90 and 180 days following surgery was 140 (81.9%) and 137 (82.0%), respectively. The extreme gradient boosting algorithm was found to most accurately predict 180-day ambulatory outcome (AUC, 0.85; F1-score, 0.90), and the decision tree algorithm most accurately predicted 90-day ambulatory outcome (AUC, 0.94; F1-score, 0.88).
Conclusions
Machine-learning algorithms were effective in predicting ambulatory status following surgery for spinal metastasis. Based on our data, the extreme gradient boosting and decision tree best predicted postoperative ambulatory status 180 and 90 days after spinal metastasis surgery, respectively.
Introduction
The incidence of spinal metastasis is increasing as evidenced by recent studies that have reported that spinal metastasis occurs in 5%–10% of all patients with cancer [1–3]. To determine treatment options, several factors must be considered, such as disease factors, patient factors, and patient expectations. Considering all these factors, establishing clear and realistic treatment goals is important [4,5].
Treatments for spinal metastasis have rapidly improved to maximize survival and clinical outcomes [6]. However, despite advancements in treatment, some patients continue to have poor clinical outcomes and are unable to ambulate following spinal metastasis surgery [7–10]. A previous study proposed models for predicting ambulatory ability following spinal metastasis surgery, which were developed using conventional statistical methods; however, those models yielded only fair to moderate performance [11].
To yield improved benefits from vast amounts of exponentially generated data, artificial intelligence and machine learning (ML) were recently employed to develop new tools to improve spine treatment and research [12,13]. Several applications using ML in spine surgery were reported with promising results that outperformed conventional statistical methods [14–17].
Postoperative ambulation status following spinal metastasis surgery is difficult to predict, and improved ability to predict this important postoperative outcome would facilitate management decision-making and help in determining clear and realistic treatment goals. Accordingly, this study aimed to develop ML algorithms to predict ambulation outcomes following surgery for spinal metastasis.
Materials and Methods
1. Guidelines
This study followed the “Transparent reporting of a multivariable prediction models for individual prognosis or diagnosis” guidelines and the “Guidelines for developing and reporting machine learning models in biomedical research.” All methods were performed in accordance with the relevant guidelines, regulations, and Declaration of Helsinki. The study protocol was approved by the Institutional Review Board of Siriraj Hospital (COA no., 978/2021, 937/2564 [IRB1]).
2. Patient selection
Consecutive patients who underwent surgery for spinal metastasis at a university-based medical center in Thailand between January 2009 and November 2021 were retrospectively enrolled. The inclusion criteria were as follows: (1) diagnosis of spinal metastasis, (2) age ≥18 years, and (3) history of surgery for cervical, thoracic, lumbar, and/or sacral metastasis/metastases. Patients who expired before 180 days following surgery or who had no records of their ambulatory status 180 days following surgery were excluded. Patients who could not ambulate because of causes other than myelopathy, such as intractable pain, general muscle weakness, or extraspinal problems, were also excluded from the study. Written informed consent was waived by the Siriraj Institutional Review Board because of the retrospective nature of this study.
3. Variables
Preoperative parameters were collected through a retrospective chart review. Factors that were previously reported to be significantly associated with ambulatory status following spinal metastasis surgery were collected, including age, sex, body mass index (BMI) (kg/m2), smoking status, American Society of Anesthesiologists classification, presence of myelopathy before surgery, duration of neurological deficit, Frankel grading, level of spinal compression, level of spinal metastasis, comorbidities, extraspinal bone metastasis, visceral metastasis, preoperative treatment (chemotherapy, radiotherapy, and targeted therapy), primary tumor origin, serum calcium level, albumin level, creatinine level, and preoperative ambulatory status [15,18–22]. Primary tumor histology was also included to fully and clearly describe the primary tumor.
4. Outcomes
A study reported that functional recovery reached the plateau phase 6 months following spinal metastasis surgery [23]. Therefore, ambulatory status 180 days following surgery was selected as the primary outcome, and ambulatory status 90 days following surgery as the secondary outcome. Ambulatory status as “ambulator” was defined as patients who can walk (with or without a gait aid). Conversely, patients who could not walk were classified as “non-ambulators.” Patients were allocated to ambulatory and non-ambulatory groups according to their records when applicable. To blind the assessment of predictors from the results, predictors were separately reviewed from the outcomes by two orthopedic surgeons.
5. Preprocessing
Missing data were cleaned by eliminating patients who had no primary or secondary outcome data. In cases where preoperative data are unavailable, multiple imputations with chained equations were utilized.
To reduce the influence of different variable units and quantity levels, scale numerical variables were used to a standard deviation of 1 and a mean of 0, and dummy encoding was employed for categorical variables. Outliers whose laboratory values are three standard deviations from the average laboratory value at our hospital were removed.
6. Prediction models
The ML models included in this study were used in a previous study to evaluate survival among patients with metastatic disease [24]. To identify the best-performing model for both the primary and secondary outcomes, the performance of all the included ML models was compared.
Thirteen ML models were included in this study, namely, artificial neural network, logistic regression, CatBoost classifier, linear discriminant analysis, extreme gradient boosting, extra trees classifier, random forest classifier, gradient boosting classifier, light gradient boosting machine, naïve Bayes, K-neighbor classifier, Ada boost classifier, and decision tree classifier. All models were created with Python ver. 3.9 (https://www.python.org/) using Scikit-learn library ver. 1.0.1 (https://scikit-learn.org/stable/) under an open-source simplified BSD (Berkeley Software Distribution) license [25]. Grid search was used for hyperparameter tuning of each model with a random state equal to 1,337, and regularization techniques such as L2 regularization were used. For the neural network, Pytorch ver. 1.10 (https://pytorch.org/) was used in model development. After experimenting with various multilayer perceptrons, the optimal configuration was selected for comparison with other ML models. The size of the hidden layer was 10. The ReLU was selected as the activation function. The Adam optimizer with an initial learning rate of 0.001, beta 1 of 0.90, equal 2 of 0.999, and epsilon of 1e-8 was used.
The dataset was randomly divided into the training and testing sets at an 80:20 ratio. Model training was conducted using the training set with performance validation by fivefold cross-validation. A class weighting strategy was also used to ensure that the trained model would take each class into equal account despite class imbalance.
Model performance was evaluated using the testing dataset, and by evaluating and comparing the area under the receiver operating characteristic curve (AUC), F1-score, accuracy, kappa, and Matthews correlation coefficient among the 13 models. An AUC of 0.7–0.8 indicated fair performance, and an AUC of >0.8 indicated good performance. The F1-score, which is calculated using precision and recall parameters, has a maximum possible value of 1.0, which indicates perfect performance. In addition, accuracy, precision, and recall were provided, which are also performance evaluation criteria. However, these metrics include tradeoffs, such as the tradeoff between precision and recall; thus, the optimal model was selected for deployment using AUC.
Results
Although 405 patients with spinal metastasis met the inclusion criteria, only 245 were still alive 180 days following spinal metastasis surgery. Patients who met the exclusion criteria were excluded from the study. Finally, 167 participants were enrolled in this study, including 75 men (44.9%) and 92 women (55.1%). The mean age of all patients was 56.9±11.3 years. Moreover, 140 (81.9%) and 137 (82.0%) patients were classified as ambulatory 90 and 180 days following spinal metastasis surgery, respectively.
Missing data included BMI in seven patients (4.2%), serum calcium level in 21 (12.6%), serum creatinine level in 3 (1.8%), and level of surgery in 1 (0.6%). The baseline characteristics were compared between the ambulatory and non-ambulatory groups at 180 days (Table 1).
Importance factors selected by the extreme gradient boosting classifier that significantly predicted the 180-day ambulatory outcome included serum albumin level, presence of symptomatic spinal compression at the thoracic level, preoperative neurological and ambulatory status, and BMI, as shown in Fig. 1. The importance factors selected by the decision tree algorithm that significantly predicted the 90-day ambulatory outcome included preoperative ambulatory status, age, serum albumin level, days of neurological deficit, and presence of symptomatic spinal compression at the thoracic level, as shown in Fig. 2.
1. Model evaluation for the prediction of the 180-day ambulatory outcome
Among the 13 models that were evaluated, the extreme gradient boosting algorithm has the best performance for predicting the 180-day ambulatory outcome (AUC, 0.85; accuracy, 0.82; precision, 0.82; recall, 1; F1-score, 0.9) (Fig. 3). Data specific to the 180-day prediction performance of all evaluated models are presented in Table 2.
2. Model evaluation for prediction of 90-day ambulatory outcome
Of the 13 evaluated models, the decision tree algorithm demonstrated the best ability to predict 90-day postoperative ambulatory outcome (AUC, 0.94; accuracy, 0.82; precision, 1; recall, 0.79; and F1-score, 0.88) (Fig. 4). Details relating to the 90-day prediction performance of all models are shown in Table 3.
Discussion
Previous studies have reported the benefit of surgery in spinal metastasis relative to regaining ambulatory status, pain relief [8], quality-of-life score, and functional outcome score [23]. Despite promising results from surgery, 3.6%–15.3% of patients remained dependent, and postoperative complications were as high as 29%–34% [7–10]. Consistent with the rates reported from a previous study [10], 82% of patients with spinal metastasis in our study were ambulatory 180 days after their surgery.
Factors previously reported to be significantly associated with postoperative clinical outcome were baseline health-related quality of life, preoperative functional status, preoperative neurological function, interval from symptom onset to treatment, and chronology of motor deficit progression [10,19]. This study demonstrated similar factors for the 90-day outcome, which are related to the preoperative patient status, including preoperative neurological and ambulatory status. By contrast, disease-related factors were found to be most associated with the 180-day postoperative ambulatory outcome, such as level of compression and extent of metastasis. This finding may be explained by the scenario that after a recovery period, the effects of surgery may decrease, and the natural course of the disease may become more dominant.
A clear and realistic treatment goal requires accurate information. Previous studies have proposed models to predict ambulatory status following spinal metastasis surgery that were developed using conventional statistical methods. Numerous studies have demonstrated the efficacy of spinal metastasis surgery in improving ambulatory function using various clinical, radiographic, and treatment-related factors [26–28]. Several studies have identified successful factors for postoperative ambulation. A meta-analysis of patients with spinal metastasis who underwent surgery found that pretreatment ambulatory status, the interval between symptom onset and treatment, and time to the development of motor deficits were associated with postoperative outcomes [10]. In a separate retrospective study, a motor grade of 4 or 5 and the occurrence of major complications were significant factors for the resumption of ambulation [29]. However, few studies have attempted to develop predictive tools. Ohashi et al. [11] retrospectively reviewed 82 cases and reported that ambulatory status recovery is correlated with a duration from the onset of neurological symptoms to gait disability of <5 days (AUC, 0.72) and a Tokuhashi score of <7.5 points (AUC, 0.71). In this study, we successfully developed 13 ML algorithms and identified the best predictive model for ambulatory status 180 (extreme gradient boosting) and 90 (decision tree) days following surgery with AUC values of 0.85 and 0.94, respectively.
In the 180- and 90-day groups, the extreme gradient boosting model and decision tree model yielded the best results, respectively. In contrast to the logistic regression, naïve Bayes, and neural network algorithms, the extreme gradient boosting and decision tree models were originally developed using a decision tree-based model, which is a practical strategy for evaluating relatively small imbalanced-class datasets, such as those used in this study. This may explain why the extreme gradient boosting and decision tree models outperformed the other ML models included in this study.
Since most of our patients were ambulators 180 days following surgery, this imbalance in data adversely affected ML algorithm development. To remedy this issue, we used a class weighting strategy to optimize the training process, and we included the F1-score for model evaluation. The F1-score provides valuable insights as a metric for examining imbalanced datasets. The extreme gradient boosting model, which was shown to best predict ambulatory status 180 days following surgery, had an improved F1-score of 0.90. Another common problem when developing an ML model is overfitting. To counter the potential of overfitting, a 5-fold cross-validation was implemented to continuously monitor model performance during training. Each model was then further evaluated using the testing dataset.
The previously published SORG ML algorithm was widely adopted for treatment decision-making and prediction of survival in patients with spinal metastatic diseases [17]. In addition to the survival rate, postoperative ambulatory status is also a very important factor. To our knowledge, this is the first study to report models that predict 180- and 90-day ambulatory outcomes following spinal metastasis surgery using ML algorithms. As previously mentioned, these models exhibit superior accuracy in predicting this critical factor compared with previously published tools, facilitating the establishment of realistic surgical goals, and aiding in treatment planning. Our combined ML model, which allows the user to predict either 180- or 90-day ambulatory status following surgery, has been deployed as an open-access web application, which can be found at https://share.streamlit.io/orthosiriraj/outcome_post_op_metas_spine/main/main.py.
For limitations in the data analysis, first, our model comparison did not provide the confidence interval of each model’s performance during the training phase. Second, only the best-performing model was used for the comparison among algorithms. Third, this study is also limited by its retrospective single-center design. Fourth, our center is a national tertiary referral hospital, which could limit the generalizability of our findings to other care settings. Fifth, a relatively small amount of included data could limit of the performance of ML. To remedy this limitation and continuously improve the performance of our developed algorithms, we will collect data to refine the performance of our ML models. More multicenter studies and external validation are needed to confirm the results of this study and establish the validity of these algorithms for use in real-world clinical practice.
Conclusions
ML algorithms are effective for predicting ambulatory status after surgery for spinal metastasis. The extreme gradient boosting and decision tree algorithms best predicted postoperative ambulatory status 180 and 90 days after spinal metastasis surgery, respectively. Once externally validated for use in routine clinical practice, these algorithms will improve case management decision-making and help in determining clear and realistic goals of treatment.
Acknowledgments
The authors gratefully acknowledge Miss Sirima Nilnok of the Research Unit of the Department of Orthopaedic Surgery, Faculty of Medicine Siriraj Hospital, Mahidol University for her assistance with statistical analysis, manuscript preparation, and coordination of the journal submission process.
Notes
Conflict of Interest
No potential conflict of interest relevant to this article was reported.
Author Contributions
PL, SW, PI, and PC designed the study. PC, BS, and PI collected, analyzed the data, and contributed substantially to interpretation of data. PL supervised the project. PC, BS, and PI drafted the article. All authors have read and approved the manuscript.