Introduction
The classification of thoracolumbar injury still remains controversial despite of the studies that have been conducted on this in the past several decades [
1-
6]. The classification of thoracolumbar fracture was first suggested by Böhler [
7] in 1929. Although numerous authors have reported new classification systems afterwards, there is still no officially accepted classification system for thoracolumbar fracture.
In regard to the recent classification of thoracolumbar injury such as the Denis classification system and the Arbeitsgemeinschaft fūr osteosynthesefragen (AO) classification system, which have been commonly applied in clinical practice, problems with their validity, reliability and reproducibility have been continuously pointed out [
8-
10]. These problems include shortcomings in traumatic force assessment, accurate morphologic evaluation and effective prognostic assessment of traumatic injuries [
11].
However, with the recent remarkable development in radiology and with the consequent widening knowledge of the anatomy as well the biomechanics of thoracolumbar injury, continuous efforts have been made to establish a more ideal classification system that is applicable to clinical practice, and as one of these efforts, Vaccaro et al. [
12] in 2005 suggested the Thoracolumbar Injury Classification and Severity Score (TLICS).
Therefore, we evaluated the intraobserver and interobserver reliability of the TLICS, and we assessed the validity of selecting the treatment based on this classification system.
Results
The intrarater reliability of each variable of the TLICS showed substantial agreement (
k = 0.753) for determining the injury pattern, almost perfect agreement (
k = 0.81) for whether there was injury in the posterior ligamentous complex, almost perfect agreement (
k = 0.96) for the neurological status assessment and substantial agreement (
k = 0.724) for the sum of the total score (
Table 4). In regard to the interrater reliability of each variable, determination of the injury pattern showed substantial agreement (
k = 0.608), there was substantial agreement (
k = 0.641) for whether there was injury in the posterior ligament complex, almost perfect agreement (
k = 0.91) for the neurological status assessment and moderate agreement (
k = 0.576) for the sum of the total score (
Table 5).
Regarding the validity of the selection of treatment protocols according to the TLICS classification, of the total 684 cases, 114 cases were double-measured by the 3 investigators. There were 362 cases for which the actual total TLICS score was higher than 5 points, and among them, surgery was performed for 355 cases. Among the 195 cases with the total score lower than 3 points, surgery was not performed for 176 cases. It was shown that the percent of correctly selecting treatment according to the TLICS was 95%, the sensitivity was 98%, and the specificity was 90% (
Table 6).
Discussion
The classification of traumatic thoracolumbar injury was first reported in 1929 by Bö hler [
7], and afterwards, spinal fracture was classified to 5 types according to the mechanism of injury. In 1949, Nicoll [
14] revised the concept of instability, and Nicoll [
14] claimed that the fracture gap caused by the comminution of the vertebral body as well as injury of the PLC could induce instability. In 1970, Holdsworth [
15] explained thoracolumbar injury by introducing the concept of the two-column theory. Based on this, Louis [
16] in 1977 was able to explain the structure of the vertebral body by the three-column concept.
With the rapid development of radiology, Denis [
17] in 1983 introduced the three-column concept based on the findings of thoracolumbar CT. According to the injury mechanism and the degree of injury, he classified spinal injury into 4 types of major spinal injury (compression fracture, burst fracture, seat-belt injury and fracture-dislocation) and 4 types of minor spinal injury (articular process fracture, transverse process fracture, spinous process fracture and isthmus fracture). With the widening knowledge of the mechanisms of spinal injury, McCormack et al. [
18] in 1994 suggested the loading-sharing classification for determining the necessity of reinforcing the anterior column during surgical treatment for burst fracture. In addition, Magerl et al. [
19] considered the major external forces placed on the vertebral body as compression, distraction and rotation and according to this, they reported the AO classification that divides thoracolumbar injury to a total of 53 fracture groups.
In such a manner, numerous classification systems for thoracolumbar injury have been reported, and studies on their validity have been continuously conducted, yet none of them is currently accepted as an optimal classification. The prerequisite of an ideal classification method for thoracolumbar injury includes the following factors: it should be comprehensive and it can be readily applied, the reliability and reproducibility should also be ensured and it should be of help to determine the degree of injury and to decide the treatment strategy, and furthermore, it should be of help to evaluate the prognosis. In addition, it should be able to be used as a common language for effective and reliable communication among clinicians. For this, assessment of both the intraobserver and interobserver reliability is essential.
The classification systems that have been recently applied for thoracolumbar injury most frequently in clinical practice include the Denis [
17] classification and the AO classification according to Margerl et al. [
19]. Blauth et al. [
20] have reported that the interobserver reliability of the basic type of AO classification (A, B and C type) was low (fair agreement,
k = 0.33), and in regard to the AO classification, when the basic type was subclassified into subgroups, the interobserver reliability decreased even more. Oner et al. [
8] and Wood et al. [
9] have reported in their study that the Denis classification system showed higher interobserver reliability than did the AO classification system (Oner,
k = 0.60, 0.35; Wood,
k = 0.606, 0.475). Nonetheless, the two classification methods showed unsatisfactory interobserver reliability. The AO classification system includes the most abundant information on the classification of injury, and so it has the advantage that it could classify almost any types of injury, yet it has many limitations to be applied to clinical practice due to the unavoidable complexity and the consequent low reproducibility. On the other hand, the Denis classification is too simple, and so it has the limitation that it does not contain important anatomical and pathophysiologic factors that play an important role in deciding the treatment protocols for injury in the PLC or for nerve injury. In 1986, Bucholz and Grill [
21] pointed out that the limitation of the Denis [
17] classification system results from the absence of considering the dynamic mechanism of spinal injury and the level of the associated nerve injury.
Vaccaro et al. [
22] in 2005 suggested a new classification method, the Thoracolumbar Injury Severity Score (TLISS) and this would compensate for the problems of the previous classification systems and satisfy the requirements of an ideal classification system. This classification scores three variables that are considered to be directly associated with the stability of the vertebral body and the prognosis. These variables are composed of the mechanism of injury as determined by radiologic tests, the present of an injury in the PLC and the neurologic status, and all of these are scored to help decide on the treatment plans. Afterward, the TLISS was modified to the TLICS that includes the radiological injury pattern (the morphology and the fracture pattern), which could be classified objectively instead of subjectively for determining the mechanisms of injuries. Vaccaro et al. [
12] later reported the high reliability and reproducibility of the TLICS. In addition, Whang et al. [
23] reported the results of a study conducted on the reliability of the TLICS, and this revealed the satisfactory reliability of the TLICS by showing that determining the injury pattern showed moderate agreement (
k = 0.626), the presence of injury in the PLC showed moderate agreement (
k = 0.447) and the total score showed moderate agreement (
k = 0.455).
As for our study, the interobserver reliability of the TLICS was higher than substantial agreement for all the categories, and the interobserver reliability showed higher than substantial agreement for all the categories, except that the total score showed moderate agreement (
k = 0.576), and our results were found to be better than the results reported by Whang et al. [
23]. Until now, the
k-value required for a clinically reliable classification system has been determined to be higher than 0.55 [
24], and when considering the opinion of Oner et al. [
25] that the standard is too strict to apply to a classification system for traumatic vertebral fracture, our results could be considered to show excellent reliability. In addition, the results of our study was found to be excellent as compared with the intraobserver and interobserver reliability of the Denis classification system and the AO classification system reported by Oner et al. [
8] and Wood et al. [
9], respectively.
In other words, when establishing a treatment plan through the TLICS, as compared with the treatment method that was actually applied to patients, the validity of the decision-making for surgery was satisfactory for the percent correct, the sensitivity and the specificity. Nevertheless, the specificity was relatively low (90%), which might have resulted from the fact that the choice of treatment method varied depending on the preference of clinicians in some cases. Indeed, in our study we administered operative treatments for the burst fracture with the invasion of a bony fragment to the spinal canal reaching 50% that were without injury in the PLC and neurologic symptoms (TLICS total score = 2), and for the cases of burst fracture associated with fracture in the lamina or fracture in the adjacent spinous process and these case were without neurologic symptoms and injury in the PLC (TLICS total score = 2) (
Figs. 3 and
4). In addition, for the cases that injury in the PLC is not apparent and it is only suspected, operative treatment may be preferred depending on the clinician's opinion. Further evaluation is required when considering that the previous classification systems were unrelated to the selection of treatment plans and the studies on the validity of selecting treatment plans were meager as compared with the studies conducted on the reliability of TLICS.