Introduction
Lumbar spinal stenosis (LSS) is a common and disabling cause of back and leg pain in older people. It is caused by various forms of degeneration of the spinal components, including the intervertebral disc, facet, and ligamentum flavum [
1]. Patients with LSS typically present with a range of low back and leg pain and a variety of neurological symptoms that are aggravated by walking and relieved by bending forward [
2,
3]. Although these symptoms generally improve with appropriate conservative treatment, depending on symptom severity and functioning, surgical intervention is occasionally necessary [
4]. LSS has emerged as the most common indication for spinal surgery [
5,
6]. Hence, it is important to measure symptom severity and functioning in patients with LSS. By determining the cut-off values of an instrument used to evaluate LSS symptoms, such a tool may also be applied to the clinical decision-making process.
Various questionnaires such as the Oswestry Disability Index (ODI) and Oxford Claudication Score have been used to evaluate symptom severity and functioning in patients with LSS. However, none of these are spinal stenosis–specific. Hence, the LSS-specific symptom scale (Fukushima Lumbar Spinal Stenosis Scale, FLS) was developed for this purpose [
7–
10]. FLS is a disease-specific tool that measures the severity of symptoms in LSS, and it is simple and self-administered [
7]. Furthermore, the Zurich Claudication Questionnaire (ZCQ), also known as the Brigham or Swiss Spinal Stenosis questionnaire, is a patient-reported instrument designed to quantify symptom intensity, functional limitation, and postoperative satisfaction in individuals with LSS [
11]. These scales have also been translated into Korean and are used to measure health-related quality of life (HRQOL) in patients with LSS [
12].
Therefore, the purpose of this study was to explore the correlation between the instrument scores and determine their relationship with other HRQOL measures, such as the Korean version of ODI, which is the standard instrument used for this purpose [
13].
Materials and Methods
Study design and participants
This prospective study was conducted between October 2023 and March 2024. We used two distinct analytical cohorts to validate the psychometric properties of K-FLS and K-ZCQ, viz., a cross-sectional cohort for reliability and validity and a longitudinal cohort for responsiveness.
Cross-sectional cohort (reliability and validity)
A total of 123 patients were included in this group, comprising 35 preoperative and 88 postoperative patients. The postoperative patients had undergone surgery at least 3 months before and were in a clinically stable state at the time of the survey. Including both groups allowed us to validate the instruments across the entire spectrum of disease severity, from severe preoperative symptoms to milder postoperative residual symptoms.
Longitudinal cohort (responsiveness)
To determine responsiveness to surgical intervention, we prospectively followed up a subset of the preoperative group. Among the 35 preoperative patients, 27 underwent spinal surgery and completed the 3-month postoperative evaluation.
Inclusion and exclusion criteria
Inclusion criteria were as follows: Participants were aged 50–80 years and had a diagnosis of LSS. Surgical candidacy was defined as a previous LSS surgery within the preceding 24 months or a planned operation at enrollment. Operational definition of LSS. LSS required lumbar magnetic resonance imaging (MRI) findings consistent with Schizas morphological grade ≥B together with ≥1 of the following: ambulation-limiting neurogenic claudication, sensory symptoms in the buttock and/or lower leg (pain, numbness, or tingling), distal lower limb motor weakness, or bowel/bladder dysfunction [
14,
15].
Exclusion criteria were as follows: To minimize confounding, we excluded individuals with serious systemic illness that may affect disability or general health (e.g., sepsis and active malignancy), peripheral polyneuropathy, myelopathic signs (including gait imbalance), acute osteoporotic vertebral compression fracture, gait limitation primarily attributable to lower extremity osteoarthritis (ankle, knee, or hip), ischemic heart disease or peripheral arterial occlusive disease consistent with vascular claudication, or any other neurological/musculoskeletal disorder that independently impairs gait. All patients presented with typical symptoms of LSS, which include neurogenic intermittent claudication and numbness and/or leg pain. In all patients, diagnosis was confirmed by more than one spine surgeon. The level(s) of stenosis were localized using MRI.
Of the initial 140 patients (84 women and 56 men), the following 11 were excluded: five with osteoporotic spine fractures, four with knee osteoarthritis, and two with parkinsonism. Finally, 129 patients (77 women and 52 men) were included in this study, of whom 123 (74 women and 49 men) responded to the second survey. The average age of the 123 participants was 68.8±6.7 years at the time of the surveys.
Data collection procedure
The first assessment was performed at the outpatient clinic using a self-administered questionnaire. The second assessment for test–retest reliability was conducted after 2 weeks. To maximize patient compliance and minimize loss to follow-up, the second survey was administered through a structured phone interview for patients who could not visit the clinic.
Outcome measures
We used K-FLS, K-ZCQ, K-ODI, and VAS. LSS symptoms were measured using a LSS symptom–specific scale (FLS), which was designed by Sekiguchi et al. [
7–
9] and asks patients about their symptoms in specific situations. The K-FLS consists of 25 items, scored from 4 (worst) to 0 (best), and the score is calculated as the sum of answers to each question. Overall scores on the questionnaire range from 0 to 100, with higher scores indicating worse conditions. The K-FLS identified the following four major factors: physical pain (five items), activity limitations (11 items), physical activity (six items), and mental health (three items). The K-ZCQ consists of three domains, each rated on a Likert-type scale. The symptom severity domain has seven questions with five response options, the functional disability domain contains five questions scored on a 4-point scale, and the treatment satisfaction domain consists of six questions that are also rated on a 4-point scale. In all domains, greater numerical values correspond to worse disease status.
Statistical analysis
The reliability of K-FLS and K-ZCQ questionnaires was determined using test–retest comparisons between initial and subsequent evaluations. Intraclass correlation coefficients (ICC, 2, 1) and agreement statistics were used to measure reliability for each item. Internal consistency was evaluated using Cronbach’s α. As the K-FLS and K-ZCQ utilize ordinal Likert scales, Spearman’s rank correlation coefficients were used to determine concurrent and construct validity. Convergent validity (criterion validity) was evaluated to determine how well the questionnaire correlates with another valid instrument measuring similar constructs. Accordingly, the correlations among K-FLS, K-ZCQ, and K-ODI were calculated. Spearman’s rank correlation coefficient values of ≥0.40 were considered satisfactory (
r≥0.81–1.0 as excellent, 0.6–0.80 as very good, 0.41–0.60 as good, 0.21–0.40 as fair, and 0.0–0.20 as poor) [
16]. Finally, responsiveness to change as a psychometric property of the questionnaire was also evaluated; as such, patients’ preoperative and postoperative scores were compared using the paired
t-test to determine whether the scale could capture changes after surgery. Responsiveness was further evaluated using effect size (ES). This index was derived by dividing the mean difference between baseline and 3-month postoperative scores by the standard deviation of baseline values. According to conventional criteria, an ES of 0.8 indicates a large magnitude of change [
17]. Larger ES values indicate greater sensitivity of the instrument in capturing clinical improvement. Data analysis was performed using SPSS ver. 16.0 (SPSS Inc., Chicago, IL, USA).
The adequate sample size was determined based on the following parameters: n=Z
2 P(1–P)/d
2, where n is the sample size, Z is the level of confidence (for the 95% confidence level, which is conventional, the Z-value is 1.96), P is the expected response rate of 90%, and d is the precision of 5% [
18]. Accordingly, the sample size calculation using power analysis yielded 138. Hence, we aimed to enroll approximately 140 participants. This study was conducted according to the Declaration of Helsinki, and the study protocol was reviewed and approved by the Institutional Review Board of Pusan National University Hospital (No. D-2109-003-184). Written informed consent was obtained from all participants.
Results
Initially, 140 patients with LSS who were surgical candidates were enrolled, including both preoperative and postoperative patients. A total 123 patients completed the second assessment, of whom 35 were in the preoperative group and 88 were in the postoperative group at the time of the survey. Among the 35 preoperative patients, 27 completed the survey at 3 months after surgery for analyzing responsiveness.
Table 1 shows the demographic and clinical characteristics of the patients, and
Table 2 presents the mean scores of K-FLS, K-ZCQ, and K-ODI for the study population.
All items of K-FLS and K-ZCQ demonstrated kappa statistics agreement exceeding 0.6 (range, 0.66–0.87 and 0.69–0.90, respectively). K-FLS exhibited excellent test–retest reliability (ICC, 0.93). K-ZCQ demonstrated good to excellent reliability depending on the domain (ICC range, 0.77–0.90) (
Table 3).
Spearman’s rank correlation coefficients were used to determine construct validity by comparing the responses on K-FLS and K-ZCQ with those on K-ODI and VAS for back and leg pain. Each of K-FLS and K-ZCQ exhibited strong correlations with K-ODI and VAS for back and leg pain, supporting their convergent validity (
Table 4). The 27 patients who completed K-FLS and K-ZCQ both before and 3 months after surgery showed significant changes in scores. Moreover, the ES was highest in K-FLS, followed by VAS for leg pain and K-ZCQ, K-ODI, and VAS for back pain, all of which were >0.8 (
Table 5).
Discussion
In studies on patients with LSS, outcome evaluation has frequently depended on generic instruments such as ODI, Visual Analog Scale (VAS), and Short Form-36 (SF-36). Although these tools are widely applied in both clinical practice and research, they were not originally designed for LSS and thus lack disease specificity. Recently, the K-FLS and K-ZCQ have been translated, and their validity has been analyzed [
10,
12].
The K-FLS is a questionnaire comprising 25 items; hence, it can sufficiently evaluate quality of life (QoL), although it has the disadvantage of taking time to complete and reducing the number of study participants. Conversely, the K-ZCQ is a questionnaire consisting of 12 items for patients before surgery and 18 items for patients after surgery; hence, it has the advantage of taking short time to complete the survey. Nevertheless, appropriate QoL evaluation is not possible due to short questions. Therefore, the authors attempted to compare the two LSS surveys, and the purpose of this study was to examine the comparative results of the two LSS surveys to help with future research.
Although the two LSS questionnaires have been explored individually, they have rarely been compared with each other. Overall, the survey scores in our study were similar to those in previous studies that investigated these questionnaires individually [
7–
11,
19–
23]. In our study, both K-FLS and K-ZCQ significantly correlated with K-ODI and VAS; however, we observed that K-FLS demonstrated a stronger correlation with K-ODI and VAS than with K-ZCQ. Furthermore, in the responsiveness evaluation, K-FLS demonstrated higher ES than any other subscales of K-ZCQ. Even the responsiveness was superior in the order of K-FLS, VAS for leg pain, K-ZCQ, K-ODI, and VAS for back pain.
The superior responsiveness and correlation of K-FLS observed in our study can be attributed to its structural characteristics. Although K-ZCQ focuses primarily on physical function and pain intensity related to neurogenic claudication, K-FLS encompasses a broader multidimensional assessment, including numbness, psychosocial aspects, and detailed daily life limitations. Because patients with LSS often experience complex symptoms beyond simple pain—such as sensory disturbances and reduced social participation—K-FLS may be more sensitive in capturing the entire spectrum of clinical improvements after surgery than K-ZCQ.
On the basis of our findings, we suggest a complementary use of both instruments in clinical practice. The K-ZCQ, with its brevity and ease of scoring, remains a valuable screening tool for busy outpatient settings. Nevertheless, for evaluating detailed surgical outcomes or conducting clinical studies where detecting subtle changes in patient status is crucial, we recommend K-FLS due to its higher sensitivity and responsiveness (ES: 2.69 vs. 1.39 for ZCQ symptoms). This distinction allows surgeons to select the most appropriate instrument according to their specific clinical or research goals.
Several limitations must be considered when interpreting the findings of this study. First, the relatively small sample size recruited from a single center may limit the generalizability of the findings. Second, the study design involved two different cohorts (cross-sectional for reliability and longitudinal for responsiveness) rather than a single cohort followed up throughout. A completely prospective longitudinal study with a larger sample would strengthen the validity. Third, the retest survey was administered through telephone interviews for patients who could not visit the clinic. Although this method was selected to maximize compliance, the difference in the administration mode (telephone vs. self-administered) could potentially introduce measurement bias. Finally, we compared only the validated Korean versions; hence, cross-cultural differences compared with the original versions need to be addressed in future studies.
Conclusions
Our aim was to compare two commonly used LSS questionnaires for patients being treated for spinal stenosis. Although comparing and analyzing the results of questionnaires in patients with LSS is a complex process, and the results of questionnaires do not necessarily correlate with radiological findings or disease activity, we compared the two LSS surveys by determining their correlation with other QoL questionnaires and the responsiveness of the two LSS surveys. The K-FLS demonstrated superiority in correlation with K-ODI and VAS and had higher responsiveness than K-ZCQ.
Key Points
Both the Korean version of the Fukushima Lumbar Spinal Stenosis Scale (K-FLS) and the Korean version of the Zurich Claudication Questionnaire (K-ZCQ) exhibited good validity and reliability in Korean patients with lumbar spinal stenosis.
K-FLS demonstrated stronger correlations with disability and pain measures than K-ZCQ.
K-FLS showed greater responsiveness to clinical changes after surgery than K-ZCQ.
Both questionnaires are useful, but K-FLS may be more sensitive for evaluating treatment outcomes.