Abstract
Decision-making in spine surgery is complex due to patients’ heterogeneity and complexity of spinal pathologies and the various surgical options applied to a given pathology. Artificial intelligence/machine learning algorithms provide an opportunity to improve patient selection, surgical planning, and outcomes. The purpose of this article is to present the experience and applications of in spine surgery at 2 large academic health care systems.
Background
Surgical decision-making in spine surgery is complex due to heterogeneity in patients, pathologies, and surgical options. Spine patients may present with significant variability in self-reported health status, comorbidities, and surgical approaches to care.1–6 The heterogeneity of patient presentations and surgical approaches has important implications for the cost and outcomes of care.7
Patient-collected data are growing in both size and complexity. Consequently, clinicians are limited in their ability to consider all patient data in decision-making regarding risk and outcome assessment. Precision medicine approaches to care may empower patients and physicians to make informed decisions regarding appropriate care and hold promise to optimize outcomes of care.7 Artificial intelligence/machine learning (AI/ML) are valuable analytic techniques for the development of models to help personalize decision-making. AI/ML are tools that provide correlations between variables that may not be apparent in traditional multivariate analysis and may reduce the biases of hypothesis-driven modeling.8 In contrast, AI/ML techniques are limited by important issues including poor reporting methodologies, increased clinician burden, fairness, privacy/anonymity, explainability, and interpretability.6,9,10 Particularly for the medical domain, interpretability is critical in providing trust between the AI/ML algorithm and the health care system.11 Although similar algorithms with similar accuracy can behave differently in deployed settings, global and local interpretability methods can help in model selection and generalization.12,13 It is crucial to think about AI/ML algorithms as devices with end users and process flow. These devices require documentation, regulations, and maintenance.14,15 The regulation of ML algorithms is evolving given limited precedence.
AI/ML has been used across multiple aspects of spine surgery16 including spinal medical imaging,17,18 predicting surgical outcome,19–22 identifying patients’ characteristics in deformity patients,23,24 enhancing robotics,25,26 forecasting length of stay and discharge disposition,27–29 capturing surgeons decision-making,13 and predicting mortality and complications.23,24,30 Increasing availability of patient data through electronic health records is leading to integrated clinical prediction models for precision medicine at the point of care.31 However, the development, implementation, and maintenance of these tools can vary widely depending on institutional culture, level of patient interaction, model accuracy, and perceived trust.32
The level of automation of AI/ML can be categorized into 3 groups: assistive, autonomous information, and autonomous decision.15 Assistive describes a device with an overlapping role between a provider and the device, where providers confirm or approve device-provided information in overlapping cases. Autonomous information provides separation between the device and the provider; the latter will act on the interpretation made by the device. The final category, autonomous decision, makes decisions without provider input (Figure 1a). Approximately one-half of medical devices using ML are assistive devices. Increasing the autonomy of these devices would require close monitoring of model performance and more comprehensive regulatory approval to deploy them. Control theory, including closed loop control, would provide the appropriate safety framework to increase autonomy without decreasing risk (Figure 1b). The closed loop control system relies on feedback from the system at agreed-upon intervals. The feedback should match with the expected performance. Differences between the expected and the observed metrics would require adjustment of the system. These adjustments can vary from automated to manual model re-training and assessing data skewness.
Real-world application of AI/ML for modeling outcomes, complications, and costs of care is a difficult task that has been spearheaded by Ames et al and the International Spine Study Group.33–35 The cost of care for adult spinal deformity is highly variable, and cost outliers are an important threat to the sustainability of complex spine surgery, especially in smaller hospitals. Ames et al used regression tree and random forest models to predict cost outliers in deformity surgery.36 Adult spinal deformity is characterized by significant variability regarding management approaches, complications, and outcomes. Durand et al used AI/ML-based models to predict whether adults with spinal deformity were treated operatively or non-operatively with 86% accuracy.37 Patient clustering developed with AI/ML may be useful in the classification of patients and in predicting appropriate surgical approaches and expected outcomes of care.38,39 Predicting complications is important in guiding appropriate care for patients with spinal deformity. Scheer et al used traditional statistical techniques to develop a predictive model for complication in spinal deformity with 87% accuracy.40,41 The application of AI/ML with larger datasets holds promise for more precise and patient-specific prediction tools regarding complications, outcomes, and appropriate care in spinal deformity.42,43 Model development and validation will be an iterative process with the introduction of new data points and techniques.
The successful algorithms in the industry are the algorithms that tend to run in the background with fewer user interactions. These include energy management, fraud detection, spam filter, and similar algorithms.44–46 Improving the effectiveness of AI/ML algorithms would require designing a solution as a system with appropriate control and safety logic, a transparent expected goal, a higher level of autonomy, and clear documentation of the users and maintenance team. In the current article, we present 2 real-world AI/ML applications at academic centers for augmenting and complementing surgeons’ knowledge in decision-making, patient selection, and patient optimization.
Health Care Utilization Metrics at the Cleveland Clinic
At the Cleveland Clinic, an AI/ML tool is utilized to improve patient outcome while minimizing health care expenses. To create this tool, a real-time database was generated of over 55,970 surgical encounters between 2007 and 2022 of patients who underwent surgical intervention at the Cleveland Clinic Health Care System in Ohio and Florida. Patient outcomes, imaging, laboratory results, vitals, medications, costs, and other metrics were all considered in the analytics. From this dataset, limitations were identified for patient-reported outcomes (PROs)/quality of life measures as a pure measure for making decisions. For example, up to 10% of the patients reported fluctuations in their PROs from improving to worsening or worsening to improving during the first 12 months following surgery. PRO metric instability creates difficulty in solely relying on PRO for a visit-to-visit analysis. Consequently, we developed a health care utilization metric to complement patient performance for analysis. This metric is a linear combination of a patient’s encounters with the health care system, use of opioids, office visits, MyChart messages, physical therapy, imaging, postoperative epidural injections, etc. One purpose of this metric development was to objectify postoperative performance using a translatable and transferable language that other health systems and payers would understand. The known heterogeneity of PRO use and adoption across the nation is well documented. The health care utilization metric describes a different aspect of a patient’s recovery but does not replace PRO, with a correlation coefficient of −0.34 between utilization and PRO (Figure 2). In addition to utilization, early and delayed costs are collected.
Three target objectives are optimized per patient before and after surgical intervention (Figure 3). The target objectives are to (1) maximize PROs at 1-year follow-up, (2) minimize health care utilization, and (3) minimize costs. As opposed to usual classification tasks, multiobjective optimization adds additional hyperparameters to be selected, which are the important aspects of the 3 objectives relative to each other. Optimization happens at 3 levels (patient, surgeon, and hospital), with each level having specified modifiable features. Decision-making is based on a search algorithm of the simulated outcome under varying circumstances using causal inference logic as described by Judea Pearl in his book on causality.31 The current system is an assistive device to complement surgeon input. The autonomy of the system can increase to assist with insurance approval and medical preoperative optimization.
Predictive Modeling for Surgical Interventions at UCSF
The assessment and development of AI/ML tools in the University of California, San Francisco (UCSF) Spine Service are a major focus of UCSF’s clinical research initiatives. Accurate information regarding the length of stay and discharge disposition has important implications regarding patient counseling and hospital budgeting. An assistive AI/ML tool for the accurate preoperative identification of patients at risk for extended length of stay after surgery can provide substantial benefits, including more transparent communication on expected benefits and risks of surgery, postoperative planning, cost savings, pre-emptive administrative action, and optimization of modifiable patient risk factors.47–49 Internal work for adopting predictive models led to a comparison of results for commonly used AI/ML prediction tools: the Risk Assessment and Prediction Tool (RAPT) score50,51 and the American College of Surgeon’s (ACS) National Surgical Quality Improvement Program (NSQIP).29 The RAPT score has been used to predict patient outcomes following surgery. It is a cumulative scaled score ranging from 1 to 12 and is composed of components that correspond to patient community support, the extent of home care, gait aid, and preoperative functional ability.50,51 While the RAPT score can be calculated by hand, other tools such as the ACS NSQIP require an online calculator, which utilizes 21 manually input preoperative factors to predict both length of stay and discharge status. To compare these tools, we selected a subset of adult elective spinal fusion patients previously described in Arora et al that had available RAPT scores (1251 patients) or ACS NSQIP scores (420 patients); 140 patients had both scores.29 For predicting patient-specific length of stay in the hospital after surgery, the Pearson’s correlation for ASC NSQIP was 0.461 (P < 2e-16; Figure 4A) and for the RAPT score was −0.361 (P < 2e-16; Figure 4B). Although both of these scores were significantly correlated, the ACS NSQIP was significantly more correlated with the true length of stay than the RAPT score (Fisher’s r-to-z transformation; P = 0.0192). However, comparing ACS NSQIP to RAPT directly, Pearson’s correlation was −0.286, meaning that one patient can have very different scores using different AI/ML models (Figure 4C). This indicates that more work should be done to properly understand how and for what types of patients these models differ, which could potentially be used to improve care.
Discussion
In this study, we report real-world applications of AI/ML-based tools for managing surgical spine patients at 2 academic health care systems and identify major pitfalls. First, using data from the Cleveland Clinic, we demonstrate the need for a health care utilization metric when utilizing AI/ML tools since PRO data alone were shown to be insufficient for studying patient outcomes. This information is then used for optimizing patient outcomes for overall cost in addition to traditional metrics. Finally, using data from the UCSF Spine Service, we demonstrate that existing tools for predicting patient-specific outcomes are often inaccurate and can be in disagreement depending on the tool utilized. Thus, more research is needed to optimize how and which AI/ML tools for predicting outcomes are implemented in new clinical settings.
The application of predictive models and algorithms based upon AI/ML offers promise to improve the management of patients in spine clinics and inpatient settings. Accurate models will empower patients and surgeons to make informed choices regarding optimal care pathways. For health care systems, accurate models will also guide appropriate resource utilization and improve sustainability in providing cost-effective care for patients. However, careful consideration must be made to provide meaningful insights for clinicians. Moreover, the effective adoption and application of AI/ML-based algorithms will require ongoing validation within and between health care systems.
Footnotes
Funding The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests The authors report no conflicts of interest in this work.
Disclosures Sigurd Berven reports consulting fees for Medtronic, Globus, Innovasis, Camber, Accelus, and Kuros; royalties from Medtronic, Stryker, and Elsevier; support for attending meetings/travel from AO Spine; Participation on a data safety monitoring board or advisory board for Zimmer; and stock/stock options from Globus, Propio, GreenSun, and Novapproach. Christopher Ames reports royalties/licenses from Stryker, DePuy Synthes, Next Orthosurgical, Medicrea, Biomet Zimmer Spine, Nuvasive, and K2M; consulting fees from DePuy Synthes, Medtronic, Medicrea, K2M, Agada Medical, and Carlsmed; leadership/fiduciary roles in ISSG (Executive Committee), Operative Neurosurgery, Neurospine (Editorial Board), SRS Safety and Value Committee (Chair), and Global Spinal Analytics (Director); research interests in Titan Spine, Depuy Synthes, and ISSG; and grant funding from SRS. Thomas Mroz reports royalties/licenses from Stryker; payment for serving as faculty from AO Spine; participation on a data safety monitoring board or advisory board for Medtronic (Medtronic TLIF Study); and leadership role for the Global Spine Journal (Deputy Editor). The remaining authors have nothing to report.
- This manuscript is generously published free of charge by ISASS, the International Society for the Advancement of Spine Surgery. Copyright © 2023 ISASS. To see more or order reprints or permissions, see http://ijssurgery.com.