ABSTRACT
Background: Heterotopic ossification (HO) is a known risk following cervical total disc replacement (CTDR) surgery, but the cause and effect of HO are not well understood. Reported HO rates vary, and few studies are specifically designed to report HO. The effects on outcomes, and the risk factors for the development of HO have been hypothesized and reported in small-population, retrospective analyses, using univariate statistics.
Methods: Posthoc, multiple-phase analysis of radiographic, clinical, and demographic data for CTDR as it relates to HO was performed. HO was radiographically graded for 164 one-level and 225 two-level CTDR patients using the McAfee and Mehren system. Analysis was performed to correlate HO grades to clinical outcomes and to evaluate potential risk factors for the development of HO using demographics and baseline clinical measures.
Results: At 7 years, 1-level clinically relevant HO grades were 17.6% grade 3 and 11.1% grade 4. Two-level clinically relevant HO grades, evaluated using the highest patient grade, were 26.6% grade 3 and 10.8% grade 4. Interaction between HO and time revealed significance for neck disability index (NDI; P = .04) and Visual Analog Scale (VAS) neck pain (P = .02). When analyzed at each time point NDI was significant at 48–84 months and VAS neck at 60 months. For predictors 2 analyses were run; odds ratios indicated follow-up visit, male sex, and preoperative VAS neck pain are related to HO development, whereas hazard ratios indicated male sex, obesity, endplate coverage, levels treated, and preoperative VAS neck pain.
Conclusions: This is the largest study to report HO rates, and related outcomes and risk factors. To develop an accurate predictive model, further large-scale analyses need to be performed. Based on the results reported here, clinically relevant HO should be more accurately described as motion-restricting HO until a definitive link to outcomes has been established.
INTRODUCTION
The introduction of cervical total disc replacement (CTDR) began with the concept of motion preservation to treat degenerative disc disease (DDD). Motion preservation most closely mimics natural motion of the spine and is believed to preserve the adjacent segments from degeneration over the long term compared with cervical anterior discectomy and fusion (ACDF).1–4 CTDR has been well studied, with multiple US Food and Drug Administration approvals, and long-term clinical trial data indicate it is a safe and effective treatment for cervical DDD.5–15
An unintended and not well understood sequelae of CTDR is heterotopic ossification (HO). Following CTDR, HO can develop around the device between vertebral bodies. In severe cases, HO limits range of motion, sometimes creating fusion of the segment.16–19 Reported rates of HO following CTDR vary drastically, creating more debate and concern around the true rate and impact of HO.20–23 The long-term effects of HO resulting in unintended fusion have not been analyzed.
Analysis of HO risk factors began in hip arthroplasty with studies indicating sex differences in HO development.24,25 For CTDR, HO risk factors have been hypothesized, including lack of nonsteroidal anti-inflammatory drug (NSAID) use postoperatively, sex, age, surgical level, number of treated levels, preoperative degeneration, and surgical technique. Although there are multiple studies that attempt to correlate CTDR HO to risk factors, these studies are limited in scope due to small patient populations (largest is 170 patients), retrospective data, and univariate statistical analyses.18,26–31 Results of these studies are varied and sometimes contradictory, so the question of risk factors for development of HO remains unanswered. This study includes the largest known prospectively collected CTDR data set analyzed with multiple statistical models as a first step to identifying risk factors associated with HO.
MATERIALS AND METHODS
Data were collected as part of a prospective, randomized, controlled, multicenter clinical trial comparing CTDR (Mobi-C, Zimmer Biomet, Austin, Texas) with ACDF for treatment of symptomatic DDD with radiculopathy or myeloradiculopathy at 1 or 2 contiguous levels from C3 to C7. ACDF data collected were not used for this analysis. Enrollment for the CTDR study arm included 164 one-level and 225 two-level patients. Details of the clinical trial have been reported previously.13,15
Study Design
This study was designed as a posthoc, multiple-phase analysis of radiographic, clinical, and demographic data for 1- and 2-level CTDR as it relates to HO.
Radiographic Analysis
Available radiographs were analyzed annually from years 1 to 5 and at 7 years postoperatively. Radiographic evaluations were performed by independent radiologists (MMI Inc., Houston, Texas).
HO was graded using the adapted McAfee and Mehren system by 2 independent US- based, board-certified, fellowship-trained, practicing radiologists.32 In the event of disagreement, a third reader adjudicated their assessments. Representative HO grades are shown in Figure 1, and the protocol definitions of HO are included as Table 1. Grades 0, 1, and 2 were classified as not clinically relevant, whereas grades 3 and 4 HO were classified as clinically relevant.
Range of motion was measured and reported preoperatively and at last follow-up.
Postoperative Care
The protocol under which the data were initially collected included specific provisions for postoperative NSAID use. The protocol prohibited the use of NSAIDs for both CTDR and ACDF from 1 week preoperatively to 3 months postoperatively, unless specifically prescribed to treat HO.
Baseline Characteristics and Clinical Outcomes
Demographics, baseline measures, and clinical outcomes were collected prospectively in the trial. Posthoc analysis was performed to correlate grades of HO to patient clinical outcomes and to evaluate potential risk factors for development of grade 3 or 4 HO using demographics and baseline clinical measures. Demographics and baseline clinical scores included age, sex, body mass index (BMI), segmental range of motion (ROM) in flexion/extension, visual analog scale (VAS) neck pain, and disc space height.
Patient clinical outcomes included neck disability index (NDI), VAS neck/arm pain, and Short Form 12 (SF-12) Physical Component Score (PCS)/Mental Component Score (MCS).
Statistical Methods
Mixed-effects analysis of variance (ANOVA) was used to assess the impact of clinically relevant HO on NDI, VAS neck/arm pain, and SF-12 PCS/MCS scores. A model was created for each of the 5 patient-reported outcomes. For 2-level patients, HO status was classified by the highest grade of HO at either level. The effect of clinically relevant HO was adjusted for follow-up visit, number of levels treated, BMI, age, and sex, with random effects for study site and patients within study sites. An interaction term for HO status and follow-up visit was included to determine whether the effect of HO differed across time. If the interaction term was significant, the HO effect was tested at each follow-up visit. Type 3 tests of fixed effects were used to calculate P values.
Significant predictors of clinically relevant HO were identified by mixed-effects logistic regression. Features were chosen based on exploratory analysis and those believed to have potential relevance to HO. Preoperative features included age, sex, BMI, ROM in flexion/extension, VAS neck pain, and disc space height. Postoperative predictors included follow-up time, endplate coverage, and number of levels treated. Endplate coverage was calculated as the average percent of anterior-posterior device contact of both inferior and superior endplates. For 2-level patients, each level was treated as a separate observation. Random effects for individual patients and level within patients were included to adjust for within-patient and within-level dependencies across multiple follow-up observations. Missing data were imputed using multivariate chained equations. P values and 95% confidence intervals were calculated using Wald approximation.
Additionally, a mixed-effects Cox proportional hazards model was fit with the same features as the previous logistic model This was done to (1) validate the robustness of the logistic model results and (2) express variable effects as hazard ratios, which are more readily interpretable as relative probability ratios for obtaining clinically relevant HO at any given point.
A P value threshold of 0.05 was used to determine statistical significance of model parameter estimates. Statistical computations were performed in R version 3.2.4 (R Foundation for Statistical Computing, Vienna, Austria).
RESULTS
Radiographic Analysis
HO data were available for 65.9% (108 of 164) of 1-level and 70.2% (158 of 225) of 2-level patients at 7 years. Grades of HO at each time point are included as Figures 2, 3, and 4. Two-level HO is presented separately for superior and inferior levels.
One-level clinically relevant HO grades were 28.7%, with 17.6% (grade 3) and 11.1% (grade 4). The nonrelevant HO included 1.9% (grade 0), 2.8% (grade 1), and 66.7% (grade 2). Two-level clinically relevant HO grades were 23.3% superior and 31.7% inferior, with inferior values of 27.0% (grade 3) and 4.7% (grade 4), and superior values of 16.8% (grade 3) and 6.5% (grade 4). The nonrelevant HO for the superior segment included 1.9% (grade 0), 2.6% (grade 1), and 72.3% (grade 2), whereas the inferior segment included 1.4% (grade 0), 2.0% (grade 1), and 64.9% (grade 2). The grades of HO progressed over time, but they slowed after 5 years. From years 5 to 7, only 6.6% of 1-level and 15.2% of 2-level patients experienced a progression of 1 grade or more.
Figure 5 illustrates 2-level HO per patient using the highest grade of HO present. Occurrence of grade 4 HO at 7 years between 1-level (11.1%) and 2-level (10.8%) patients was similar.
ROM at the last measured HO was negatively correlated with HO grade (Figures 6 and 7), confirming grades 3 and 4 HO were associated with restricted segment ROM (1-level: Spearman ρ = −0.63, P < .0001; and 2-level: Spearman ρ = −0.53, P < .0001).
Baseline Characteristics and Clinical Outcomes
Patient baseline characteristics and scores were sorted into clinically relevant and nonrelevant HO present at 7 years (Supplemental Table 1). These results were categorized to evaluate for trends; therefore, statistics were not run on these data.
Progression of HO and Clinical Outcomes
Results comparing patients with and without clinically relevant HO are presented in Table 2. Clinically relevant HO had no significant effect for all outcomes, when generalized across time. Interaction between HO and follow-up visit was significant for NDI (P = .04) and VAS neck pain (P = .02), indicating the effect of HO on NDI and VAS neck pain is time dependent and should be analyzed separately at each follow-up period.
For NDI scores, no significant differences between HO groups were observed at early follow-up (12–36 months). At later times (48–84 months), group NDI scores displayed evidence of divergence, with significantly higher scores in the clinically relevant group (Figure 8 and Table 3). VAS neck pain scores trended higher for the clinically relevant group at later time points, but only month 60 was statistically significant (Figure 9 and Table 3). Sex was a significant effect for NDI, VAS neck/arm pain, and SF-12 PCS scores, with women trending higher pain and function scores.
Clinical Predictors of Clinically Relevant HO
Mixed-effects logistic regression was used to identify significant predictor variables for clinically relevant HO development. Table 4 includes the resulting odds ratio (OR) estimates and associated 95% confidence intervals (CIs). Longer follow-up, male sex, and higher preoperative VAS neck pain scores demonstrated significantly higher odds of clinically relevant HO. Obesity was associated with higher odds for clinically relevant HO, whereas increasing endplate coverage and 2-level treatment were associated with lower odds of HO; however, these correlations did not reach statistical significance. Preoperative age, ROM, and disc height did not demonstrate a definitive relationship with HO development.
The features from the logistic model were used to fit the mixed-effects Cox proportional hazard model. Cox regression estimates model parameters in terms of hazard ratios, as opposed to logistic regression ORs. Results from the Cox regression (Table 5) largely agreed with the logistic regression analysis. When adjusting for other variables, men were approximately 3 times more likely to develop clinically relevant HO than women (P < .0001). Probability of developing HO was inversely related to endplate coverage; that is, a 5% decrease in endplate coverage resulted in 1.3 times higher risk for developing clinically relevant HO (P = .004). Number of levels treated was inversely related to HO development with a 2-level patient half as likely to develop clinically relevant HO as a 1-level counterpart (P = .012). Preoperative VAS neck pain was also related to HO development, with a 30-point increase on the VAS neck pain scale resulting in 1.6 times higher risk (P = .008). Obesity was a significant predictor, with obese patients being 3 times more likely to develop HO (P = .048) Preoperative age, ROM, and disc height were not significant predictors of clinically relevant HO.
DISCUSSION
Understanding potential risk factors for developing HO is critical to assisting surgeons with proper patient selection. The ability to accurately predict long-term patient outcomes would be an invaluable tool. For this analysis, we used multiple modeling techniques to validate the consistency of the results and to determine the most accurate model for future analyses.
This study constitutes the largest published single-source results of HO following CTDR surgery, and it includes 1- and 2-level patient cohorts with radiographic HO data. Radiographic HO data, sometimes viewed as a limitation, was valuable for comparison across multiple studies and devices. Also, in 2011 Tu et al30 published a study that graded HO using CT and plain radiograph in 36 patients. They reported that radiographs had 80% sensitivity and 88.89% specificity in HO detection. Agreement between the CT grading and radiographic grading had an intraclass correlation coefficient of 0.822 (95% CI, 0.710–0.894, P < .001). Grades 3 and 4 were the same when graded by CT and radiograph.
Radiographic reported grade 4 HO per patient (1-level, 11.1%; and 2-level, 10.8%) was similar to HO reported in the literature, although long-term follow-up is not available in all other studies. Multiple meta-analyses and individual device reports have been published on CTDR HO, with minimum 2-year follow-up. Grades of HO using the original McAfee scale include a broad range: grade 0, 12.0%–92.3%; grade 1, 0%–37.4%; grade 2, 2.6%–35.5%; grade 3, 0%–45.0%; grade 4, 0.4%–22.2%.20–22 A review by Kang et al23 included a pooled incidence across multiple studies of 27.7% grade 3 and 7.8% grade 4 HO.
Yi et al19 published a study on HO across multiple CTDR devices. The rates of HO are high, but the study does not differentiate the severity of HO nor does it define clinical impacts.19 There are other studies that either do not use the McAfee scale, or do not report individual HO grades; these studies are not discussed in our report.33–35
The limited long-term data available include 10-year follow-up of the Bryan cervical disc in China. The study included 42 patients with 10-year follow-up; HO rates using the McAfee and Mehren scale were 2.4% grade 2, 33.3% grade 3, and 33.3% grade 4 HO.36
Using a modified McAfee scale, Suchomel et al37 reported on 54 patients treated with ProDisc-C.22,37 At 4-year follow-up, 50 patients, with 60 implants, had HO rates of 37% for grades 0–2, 45% for grade 3, and 18% for grade 4. The authors believed these HO rates were higher than anticipated, and they offer no statistical analysis of potential predictors or impact on clinical outcomes. However, the limited sample size would restrict a robust analysis of this nature.
Although not all US investigational device exemption studies report HO in the literature, the rates that are reported are on the lower end when compared to other studies.
Although rates of HO vary, HO is a result of CTDR surgery, so we sought to gain a better understanding of potential predictors of HO development and its impact on outcomes. Published studies that analyzed outcomes have shown that clinically relevant HO does not impact clinical outcomes, but the statistical methods have been less sophisticated than this study.20,38 The use of a mixed-effects ANOVA to evaluate outcomes was important because it allowed the effect of clinically relevant HO to be adjusted for imbalances in covariates, such as BMI and sex, and allowed us to model HO effects over time. Time is a natural confounder for HO on pain and function scores, because as time progresses, HO rates increase, and with time we witness a rise in patient outcome scores. Although significance was found for NDI (2 points) and VAS neck pain (1.9 points) at later time points, the differences did not reach clinical significance.39 Effects of clinically relevant HO on patient outcomes remains unclear, although this analysis is the first to use modeling methodology to explore these relationships.
In conjunction with understanding the effects of clinically relevant HO on outcomes, it is important to determine possible predictors of clinically relevant HO. The mixed-effects logistic regression, using ORs, identified male sex and higher preoperative VAS scores as significant predictors of clinically relevant HO. Nonsignificant predictors included obesity, endplate coverage, and number of levels treated. Our analysis did not include the use of NSAIDs as a variable in the risk factor analysis, because NSAID use was prohibited by the protocol. Previous studies in arthroplasty, specifically hip arthroplasty, have shown that anti-inflammatory drugs reduce HO. The reason is unclear, but it is hypothesized to be an inhibition of the formation of prostaglandins, which activate local bone growth following trauma.40 Although hip HO was shown to be reduced with anti-inflammatory use, the mechanism is not fully understood, and the HO etiology appears different between lateral and central regions.41
Conducting the first in-depth analysis of HO, we endeavored to understand the various models of analysis and the consistency of findings. The mixed-effects Cox proportional hazard model presents an advantage by estimating hazard ratios, which are direct probability ratios of developing clinically relevant HO of one group compared to another, at any given time. Results of the Cox model were similar to the logistic regression analysis, corroborating significance for male sex and preoperative VAS. Although both models found a correlation between endplate coverage and number of levels treated, only the Cox model produced a strong enough correlation to reach statistical significance. These results pose interesting relationships between HO and predictor variables, but the mechanisms contributing to these relationships are unknown and require further analysis.
Leung et al16 published results of an observational study of the European Bryan disc. Ninety patients completed 12-month postoperative radiographs, with 17.8% experiencing HO; 6.7% were grade 3 or 4. Leung and his coauthors found that male and older patients were at statistically significant higher risk of clinically relevant HO. This appears as the first publication to study HO in the cervical spine similarly to previous reports on HO in total joint replacement.24,25
Wu et al26,27 did not grade the severity of HO present, but they found that significantly more HO was present in 2-level cases using the Bryan disc. Because the specific grades of HO were not reported, it is difficult to discern the differences when compared with this analysis. Chang et al28 analyzed and reported on patients receiving CTDR at C3–C4 versus other cervical levels. Although they found C3–C4 had significantly more and higher grades of HO, the C3–C4 group also had a larger percentage of male patients.28 Male sex appears in multiple studies as a predictor of higher HO rates, although no study to date has offered an explanation for this.18,29 Other studies have found that sex did not impact HO. Tu et al30 found a 1.9% rate of grade 4 HO in 36 patients treated with the Bryan disc, with no differences between sex. However, the average follow-up was less than 24 months and the small patient population would make it difficult to detect differences.30 Ganbat et al31 performed 3-dimensional finite element analysis on HO. When subjected to compressive force, the area of the vertebral body uncovered by the implant footprint had the most HO formation. This is consistent with our finding that more implant endplate coverage reduces the odds of HO. Radiographically, this study did not evaluate the presence or severity of preoperative osteophytes, which has also been linked to HO development.29
In exploratory analysis we found relationships between sex, BMI, pain, and endplate coverage. These relations stress the importance of analyzing predictors for HO in a multivariate context, as we have done here, because these features potentially confound each other's effect on HO. Although this is the first in-depth analysis of level 1 evidence data on HO, this study has limitations. The analysis was performed posthoc, and the issue could be further studied in prospectively planned studies with preplanned statistical analysis. The findings here are limited for clinical decision-making, because we cannot yet make causal inferences. The rates of HO were shown to progress over time, warranting further research into the relationship of HO and inflammatory response. Because the use of NSAIDs was prohibited by the IDE protocol, the effect on HO rate here is unknown.
There remains a paucity of literature analyzing potential surgical technique and implant-specific causes of HO following CTDR surgery. Further analysis needs to be conducted to understand the significance and relationship between each of these possible predictors, and other potential predictors, such as adjacent-level degeneration, sagittal alignment, and operative levels.
Although spine surgeons have traditionally referred to HO as clinically relevant and nonrelevant, this nomenclature appears to be founded on ROM and not impact on clinical outcomes. Based on this analysis, the largest to date, it seems clear that HO terminology should be more accurately defined as motion-restricting and non–motion-restricting.
Footnotes
Disclosures and COI: P.D.N. receives royalties and is on the speaker's bureau for LDR Spine (now ZimmerBiomet); K.A.F. received institutional support from LDR Spine (now ZimmerBiomet); K.E.M. is a former employee of LDR Spine (now ZimmerBiomet); and M.B.S. received institutional support from LDR Spine (ZimmerBiomet).
- ©International Society for the Advancement of Spine Surgery