John A. Hipp, PhD,1 Richard D. Guyer, MD,2 Jack E. Zigler, MD,2 Donna D. Ohnmeiss, Dr. Med,3 Nicholas D. Wharton, MS1
1Medical Metrics, Houston, TX, 2Texas Back Institute, Plano, TX, 3Texas Back Institute Research Foundation, Plano, TX
Lumbar spinal instability is frequently referenced in clinical practice and the scientific literature despite the lack of a standard definition or validated radiographic test. The Quantitative Stability Index (QSI) is being developed as a novel objective test for sagittal plane lumbar instability. The QSI is calculated using lumbar flexion-extension radiographs. The goal of the current study was to use the facet fluid sign on MRI as the "gold standard" and determine if the QSI is significantly different in the presence of the fluid sign.
Sixty-two paired preoperative MRI and flexion-extension exams were obtained from a large FDA IDE study. The MRI exams were assessed for the presence of a facet fluid sign, and the QSI was calculated from sagittal plane intervertebral rotation and translation measurements. The QSI is based on the translation per degree of rotation (TPDR) and is calculated as a Z-score. A QSI > 2 indicates that the TPDR is > 2 std dev above the mean for an asymptomatic and radiographically normal population. The reproducibility of the QSI was also tested.
The mean difference between trained observers in the measured QSI was between -0.28 and 0.36. The average QSI was significantly (P = 0.047, one-way analysis of variance) higher at levels with a definite fluid sign (2.3±3.2 versus 0.60±2.4).
Although imperfect, the facet fluid sign observed may be the best currently available test for lumbar spine instability. Using the facet fluid sign as the "gold standard" the current study documents that the QSI can be expected to be significantly higher in the presence of the facet fluid sign. This supports that QSI might be used to test for sagittal plane lumbar instability.
A validated, objective and practical test for spinal instability would facilitate research to understand the importance of instability in diagnosis and treatment of low-back related disorders.
Despite the lack of a widely accepted and validated definition, spinal instability is a concept commonly used in clinical practice to diagnose and plan treatments, and very frequently (An April 2015 Google Scholar search for the exact phrase "lumbar spinal instability" identified 831 results and another 2390 results for the exact phrase "stability of the lumbar spine" without including patents and citations) reported or discussed in the scientific literature. As early as 1957, Morgan and King described a relationship between back pain and lumbar instability.1 They also noted a relationship between instability and disc degeneration, as did Knutsson in 1944.2 While many clinicians agree that instability is important, and spinal instability is a frequently cited justification for fusion surgery, the true clinical relevance of spinal instability is poorly understood. Instability definitions frequently include the magnitude of intervertebral rotation or translation as part of the instability criteria although there is no consensus on the motion thresholds that define instability. A validated and reliable instability metric that is consistently applied in clinical studies would enable a better understanding of spinal instability and help to objectify its use in diagnosis and treatment planning.
Although instability may occur in multiple planes, the focus of the current study is sagittal plane instability. An objective quantitative metric based on the ratio of intervertebral translation to intervertebral rotation, measured from flexion-extension radiographs, has been suggested as a metric for lumbar instability,3 and may prove effective at identifying abnormal intervertebral motion in the lumbar spine. The Quantitative Stability Index (QSI) is being developed as a novel, practical, and objective metric clinical test for instability. To help validate this metric, for the purpose of this study, the presence of fluid in the facet joints observed in MRI exams of the spine was used as the "gold standard" to identify unstable motion segments. The facet fluid sign is considered to be the best currently available indicator for instability.4-9 The null hypothesis was that the QSI metric would be the same at levels with a fluid sign as it is at levels without a fluid sign.
To test the null hypothesis that the QSI is the same in the presence or absence of the facet fluid sign, T2-weighted preoperative MRI exams were assessed for the facet fluid sign by an independent musculoskeletal radiologist who was blinded to the stability metric. In addition to 25 years of clinical practice, the radiologist had over 6 years of imaging core-lab experience reading imaging studies from multiple large clinical trials of spine treatments, and had reviewed the available literature on the facet fluid sign. Sixty-two preoperative MRI exams were obtained from a large Food and Drug Administration (FDA) regulated Investigation Device Exemption (IDE) study where the subject inclusion criteria included a radiographically confirmed diagnosis of symptomatic degenerative spinal stenosis. The inclusion criteria required radiographic confirmation of stenosis and documented stenosis symptoms. The exclusion criteria included "significant instability" (defined as > 3 mm translation or ≥ 5° angulation) and spondylolisthesis greater than grade 1. There were no specific disc height or disc degeneration criteria.
To avoid confounding effects of the level implanted, only the L4-5 level was assessed. The facet fluid sign was graded as: "none" if there was no evidence of fluid in the left or right facet joints; "possible" if there was some suggestion of fluid in the joints; or "definite" if there was > 2 mm wide layer of hyperintensity within either the left or right joints.
For purposes of the IDE study, sagittal plane intervertebral rotation and intervertebral translation had previously been prospectively measured from pre-operative flexion-extension studies. The flexion-extension studies had been collected at multiple different clinical sites following the image acquisition protocol for the IDE. The rotation and translation data, for those subjects that had a preoperative MRI, were used to retrospectively calculate the instability index. The translation and rotation measurements had been obtained using FDA-cleared, computer-assisted software (QMA, Medical Metrics, Houston, TX). The accuracy and reproducibility of the translation and rotation measurements has been previously reported.10-13 The rotation and translation measurements were produced by analysts who had previously received extensive training and certification in the use of the QMA software. Translation was measured as the displacement of the posterior most edge of the inferior endplate of the superior vertebra, in a direction defined by the superior endplate of the inferior vertebra. The QSI is calculated from the amount of translation per degree of rotation (TPDR) in the sagittal plane Figure 1. TPDR is based on the assumption that in a healthy disc, the relationship between translation and rotation is approximately linear when the magnitude of intervertebral rotation is outside of the neutral zone. For the purposes of this study, it was assumed based on review of the available literature, that a minimum of 3° of sagittal plane intervertebral rotation is required to be outside of the neutral zone.14,15 This threshold was also chosen to avoid excluding a large proportion of cases (Preoperatively, at the L4-5 level, 38% of levels had < 3°, 50% had < 4°, 61% had < 5°).
An additional assumption underlying the instability metric, is that TPDR is relatively consistent between individuals who have no abnormalities at the motion segment. That assumption facilitates use of data for "normal" spines to define the thresholds that differentiate "normal" from "abnormal" motion. The normal range of TPDR used in calculating QSI were obtained from previously collected data for 162 volunteers.16 The 18 to 82 year old (mean 42.2, SD 15.3) volunteers had been studied under an IRB approved protocol (Baylor College of Medicine, Houston, TX), and had satisfied the inclusion and exclusion criteria. Volunteers had been included if they were skeletally mature, and had no history of a spinal disorder, spinal surgery, or back pain or related symptoms which required a visit to a physician. Volunteers had been excluded if they had history of treatment prescribed by a clinician for any symptom (back pain or radicular symptoms) or current complaints of low back symptoms. The presence of known congenital vertebral malformations that would alter the usual biomechanics of the lumbar spine was also an exclusion criterion. Radiographic evidence of degeneration was found in some of the levels in the asymptomatic volunteers. Data for levels with radiographic evidence of degeneration were excluded. There were 658 non-degenerated levels (out of 802 measured levels). TPDR was only calculated if there was at least 3° of rotation (only 6 levels from the asymptomatic population were excluded due to < 3°). There were significant (P<0.0001) differences in TPDR between levels, so level dependent data are required to determine if a subject’s TPDR is within or outside of normal limits. This complicates the interpretation of the TPDR metric as the measured value for each level must be interpreted relative to what is normal for that level. To remove this limitation, a Z-score was calculated from TPDR. The normalized TPDR is referred to as the QMA Stability Index (QSI).
The QSI eliminates the level-dependence of TPDR data, and is calculated from TPDR data as the measured TPDR for the subject minus the mean normal TPDR for the level being measured, divided by the standard deviation for TPDR at that level in the normal asymptomatic population. The QSI can be classified as abnormal if it is greater than 2 standard deviations from the mean for the asymptomatic population, as this would be above the 95% confidence interval, which is a generally accepted definition for normal/abnormal. A QSI of zero means that the TPDR is exactly equal to the average TPDR for a radiographically normal level in asymptomatic volunteers. A negative QSI means the TPDR is below normal and a positive QSI means that the TPDR was above normal. A QSI of 2 can be used to identify a TPDR that is just outside of the normal range. A QSI of 4 would mean that the TPDR is 4 standard deviations above normal and that would be considered definitely unstable. QSI is reported on a continuous scale as it is assumed that instability can occur in various degrees of severity.
The reproducibility of the QSI was established by having three analysts each independently produce QSI for 53 lumbar motion segments (with > 3° of rotation). The 53 motion segments were selected from the image database at Medical Metrics to evenly represent high quality, average quality, and poor quality radiographs.
The reproducibility was quantified by intraclass correlation (two-way, mixed effects) and Bland-Altman analysis. Histograms and tests for skewness and kurtosis were used to assess the distribution of the data for asymptomatic subjects. The association between the fluid sign and the QSI was tested using analysis of variance. All statistics were obtained using Stata Ver 11, College Station, TX.
In the reproducibility study, average intervertebral rotation was 10.0±5.0° and average intervertebral translation was 4.7±2.3 % sagittal plane endplate width. Based on the reproducibility data, the reported QSI did not depend on which analyst measured rotation or translation (P = 0.69, analysis of variance). The average ICC was 0.96 and the ICC between individual analysts was 0.9). The mean difference in measured QSI between analysts was between -0.28 and 0.36, based on the Bland-Altman analyses. However, there were a small number (6 of 53) of levels where differences between analysts was > 1. Review of those cases revealed that the central x-ray beam was substantially oblique to the endplates at the level being analyzed. An out-of-plane (OOP) index was measured for all 53 levels as shown in Figure 2. Using the average of the three analysts as the reference, the error in each measurement was calculated as the difference between each analyst’s measurement and the average. The inter-observer error was strongly associated with the OOP index (P<0.0001). The error was also associated with levels where the OOP index was very different in the flexion versus the extension radiographs. The error in QSI was < 1 in all but 2 levels where the OOP index (maximum of flexion or extension) was < 0.25 and the difference in the OOP index between the flexion and extension radiographs was < 0.2. Based on these data, a rationale guideline would be not to report QSI if the OOP index is > 0.25 or if there is a large difference in the OOP index between the flexion and extension radiographs, and accept that an individual QSI measurement could be up to ±0.5 from the actual QSI.
The TPDR data for each level from L1-2 to L4-5 were normally distributed in the asymptomatic population when data for degenerated levels are excluded (P>0.05 for skewness and kurtosis). At the L5-S1 level, the TPDR data are skewed to lower levels of TPDR. This skewness does not prevent use of QSI at L5-S1, but it should be recognized that QSI is skewed toward low values at L5-S1 in the asymptomatic population.
With respect to determining if QSI is different in the presence of a facet fluid sign, pre-operative MRI exams were assessed for 62 subjects from the IDE study, where subjects had previously generated intervertebral motion measurements at the L4-5 level, and had > 3° of intervertebral rotation. There were 15 levels with a definite fluid sign, 12 with a possible fluid sign, and 35 with no fluid sign. Average intervertebral rotation was 6.8±3.4° and average intervertebral translation was 4.3±2.7 % endplate width. The average QSI was significantly (P = 0.047, one-way analysis of variance) higher at levels with a definite fluid sign (2.3±3.2) versus levels without (0.60±2.4). The average QSI for levels with a possible fluid sign was 0.76±1.98, which was not different than levels with no fluid sign Figure 3. Consistent with the observed association between a definite fluid sign and a higher QSI, Chaput et. al. reported that the larger effusions (fluid sign) were most predictive of instability.6 As a test of the hypothesis that rotation alone or translation alone would have an association with the fluid sign, one-way analysis of variance tests were also performed for translation and rotation. Neither rotation (P = 0.40) or translation (P = 0.73) were significant on their own, supporting that it is the ratio of translation to rotation that is a metric for instability.
Panjabi provided the following general description of spinal instability: "the basic concept of spinal instability is that abnormally large intervertebral motions cause either compression and/or stretching of the inflamed neural elements or abnormal deformations of ligaments, joint capsules, annular fibers, and end-plates, which are known to have significant density of nocioceptors."18 Many investigators have attempted to establish a definition of "abnormally large intervertebral motions," as described in multiple review papers.19-21
Much of the published research focused on identifying an upper limit of intervertebral rotation or translation that can be used to classify a measurement as normal or abnormal. Toward this goal, many publications provide data on normal intervertebral rotation and translation between flexion and extension with the volunteers seated or standing.17,22-30 These publications generally provide a mean and standard deviation for each level. The 95% confidence interval can be calculated from their published data, and the upper-limit used as a threshold to define abnormal motion (UL95%. One problem with that approach is that there is a relatively wide variation in the upper limit of motion between studies, suggesting that the data used to define the upper limit may be specific to the flexion-extension protocol or the radiographic measurement method. Another problem with that strategy is that to apply the UL95% to a patient, it is necessary for the patient to move as much as the asymptomatic volunteers did. However, restricted spinal mobility can be expected in some back pain patients for reasons such as pain with motion, surgical fusion, or fear of further injuring their back. 31 Thus, the apparent amount of measured rotation or translation may be much less than can actually occur in the patient’s spine if the patient were motivated to maximally flex and extend. If the patient does not maximally flex and extend during the exam, then measurements from the radiographs will under-estimate the true motion that can occur at each level. A spine might thereby be reported as having motion below the UL95% for normal even if motion can actually be greater than the UL95% during activities of daily living.
It may not be possible to reliably assure that patients move enough to determine the true maximum rotation and translation possible at each level in their spine. Sengupta et. al. stated that "A common misconception of instability is an abnormal increased range of motion (ROM) in the lumbar motion segment".32 For these reasons, measurements of the qualit of motion may be more valuable than measurements of the quantity of motion. A reliable assessment of the quality of motion requires that there be enough motion between flexion and extension at each intervertebral level to stress the restraints to intervertebral motion.33 Just as an anterior cruciate ligament injury can’t be detected unless the knee is stressed to the point where an intact ACL would restrict motion, incompetency of intervertebral motion restrains can’t be detected unless the spine is stressed to the point where the restraints would normally be expected to restrict motion and contribute significantly to the slope of the elastic region of the rotation versus moment curve.
The ratio of translation to rotation (TPDR) has been described as a simple quality of motion metric, although a practical and clinically applicable implementation has not previously been validated. TPDR has been suggested as a metric for instability or as a method to control for variability due to patient effort.3,34-37 As defined by Bogduk, instability occurs when there is an inordinate amount of translation for the degree of rotation.38 TPDR can be rationalized by recognizing that the spine primary facilitates motion of the body through controlled intervertebral rotations. The vertebral morphology, facet joints, intervertebral disc and ligaments that connect one vertebra to another control the motion between vertebrae, and the surrounding muscles and integrated proprioception system create the intervertebral rotations required to accomplish the movements and positioning of the human body.
Whereas it is easy to understand how controlled intervertebral rotations are required to position the body as needed for activities of daily living, there is no obvious value in primarily controlling intervertebral translations. For example, there would be no reason why the body would need to create several millimeters of shear between vertebrae without any rotation. The amount of intervertebral translation that occurs during intervertebral rotations is likely the minimum required to allow for the required rotations. With damage or degeneration of the disc, facet joints and intervertebral ligaments, the amount of translation that occurs for a required rotation can increase.39-41 For example, Frei et. al. have demonstrated increased translation after removing the disc nucleus.39 While the content of the spinal canal and neural foramen can be protected during controlled rotations with minimal translations, abnormal translation for a given rotation can compromise the spinal canal or foramen.42
Knutsson recognized in a study of lumbar instability published in 1944 that excessive translation might be an early sign of degenerative changes.2 In the uninjured and non-degenerated spine, there is an approximately linear relation between translation and rotation,43,44 and the slope of this relationship is fairly similar between individuals. With injury or degeneration, the slope of the translation versus rotation curve can change. Weiler et. al. were the first to describe using the ratio of translation to rotation and reported that the ratio was significantly higher in the presence of degenerative changes.3 Weiler et. al. calculated the ratio using translation measured in millimeters. This requires accurate determination of magnification in the images, and that is not practical in routine clinical practice. An alternative is to calculate the ratio using translation expressed as a percent of endplate width. Normalization of the TPDR by the endplate width removes influence of vertebral size on the TPDR and facilitates comparisons of data across different individuals.
One limitation of the QSI metric is the requirement for flexion-extension studies where the central x-ray beam is approximately co-planar with the vertebral endplates, and where the amount of intervertebral motion is of a magnitude where translation and rotation would be expected to be linearly related based on data from a normal population. These requirements might be achieved for most patients in routine clinical practice by use of a controlled flexion-extension image acquisition protocol, but that has not yet been validated. An additional limitation of the QSI metric is that accurate and reproducible measurements of rotation and translation are required, as well as a database of rotation and translation measurements that can define "normal" motion. The rotation and translation measurements for a patient must be calculated using the same methods used to define "normal" motion. It is unlikely that QSI could be calculated reliably from manual measurements of rotation and translation. Nevertheless, if QSI (or equivalent validated metric) subsequently proves valuable in improving clinical outcomes, the technical challenges to providing objective metrics in routine clinical practice are relatively minimal.
Center-of-rotation (COR) is another "quality of motion" metric that has been described or used in multiple publications. COR was also calculated for patients in the current study, also by calculating a Z-score so COR is reported in number of standard deviations from average for an asymptomatic population. Some levels had an abnormal z-COR (too caudal, too cranial, too anterior, or too posterior) but had QSI within ±2. In general, levels with QSI > 2 tended to have a COR that was too caudal, but not always. These preliminary observations suggest that the patterns of "quality of motion" abnormities are complex, and will be addressed in a subsequent manuscript. It remains to be determined whether QSI, COR, or a combination will prove most efficacious in diagnosis and treatment planning.
The facet fluid sign has been documented in multiple studies as an indicator of instability.4-9 Hasegawa et. al. found facet opening to be the strongest of the instability predictors that they tested.45 Rihn et. al. reported that the grade of facet effusion was significantly correlated with sagittal radiographic instability in degenerative lumbar disease.5 Based on the available scientific literature, the facet fluid sign was selected as the presumptive "gold standard" to validate QSI as a test for lumbar instability. However, the facet fluid sign may be an imperfect indicator of instability since this sign requires that a gap form between the articular processes of the joint when the patient lies in the supine position, and that the gap will have time to fill with fluid before the MRI is obtained. It is not proven that the joints will always fill with fluid between when the patient gets on the MRI table and when the MRI is obtained. Gas can sometimes be observed with CT exams within the gap between articular processes (e.g. in Ben-Galim et. al.46) demonstrating that the gaps don’t always immediately fill with fluid. In addition, it is possible that the facets might not always open up when the patient is supine (due to muscle spasms, how the patient is positioned, amount of lordosis in the spine, the specific damage or degeneration that exists, etc.). Thus, a perfect relationship between the fluid sign and instability (or an abnormally high QSI) would not be expected. Unfortunately, validation of a diagnostic test requires a "gold standard" and a truly "gold" standard for instability has yet to be established.
Nevertheless, the fluid sign is considered one of the best available indicators of at least one type of instability, and using the fluid sign as the best-available "gold standard," these data help to validate QSI as a potential metric for sagittal plane lumbar instability. The QSI is simple to interpret since it is the number of standard deviations from normal in an asymptomatic and radiographically normal population. A QSI > 2 would indicate that the amount of translation per degree of rotation is > 2 standard deviations above average and that would be very rare in a normal spine. On the other hand, a QSI = 4 would indicate far greater instability. With good quality radiographs, the QSI can be reproducibly calculated and thereby used to assess for instability or the development of instability. This could be useful in monitoring patients following uninstrumented decompressions, or in monitoring levels adjacent to fusions. In addition, the QSI metric may have a health-care cost advantage over the fluid sign, since MRI is an expensive modality. A reliable assessment of instability that can be accomplished using much less expensive imaging would be advantageous. For lack of a true "gold standard," it may be difficult to calculate clinically relevant sensitivity, specificity, negative predictive power and positive predictive power. It may instead be more efficient to focus on research to validate use of QSI in diagnosis and treatment algorithms. Additional research is needed to determine if a TPDR-based metric is a solution to the seven decade2 search for an objective spine stability metric. If so, further research is needed to identify patient populations where instability is prevalent, followed by research to validate diagnosis and treatment algorithms that use objective assessment of instability to improve clinical outcomes.
The support of Vertiflex in allowing use of imaging data from a FDA IDE to address a research hypothesis independent of their clinical study is very gratefully acknowledged.
John Hipp owns stock in, draws a salary from, and is the Chief Scientist and scientific founder of Medical Metrics. Nicholas Wharton is also an employee, stockholder, and officer for Medical Metrics. Rick Guyer is a consultant for DePuy Synthes, is on an advisory board for K2M, is on an advisory board and has stock options in Spinal Kinetics, and receives royalties from Alphatec. Jack Zigler declares no relevant financial disclosures regarding this manuscript. Donna Ohnmeiss also declares no relevant financial disclosures.
John A. Hipp, PhD. Medical Metrics, Inc, 2121 Sage Rd, Suite 300, Houston, TX 77056. firstname.lastname@example.org