Machine learning – Neurosurgery Blog

23/03/202623/03/2026

Machine learning models for predicting patient satisfaction after adult spinal deformity surgery

J Neurosurg Spine 44:457–468, 2026

This clinical study develops and internally validates machine learning–guided logistic regression models to predict patient satisfaction 24 months after adult spinal deformity (ASD) surgery, using 213 patients and three feature-selection methods. Nine routinely measurable predictors—including postoperative WOMAC function, frailty, pelvic compensation, imaging MCID achievement, rFCSA, and SVA—were identified and ranked by SHAP for their influence on satisfaction.

The model showed strong discrimination (AUROC 0.846) and calibration, yielded a nomogram for individualized prognostication, and emphasizes modifiable targets for perioperative care and rehabilitation. Limitations include single-center retrospective design, modest sample size, and inclusion of postoperative variables limiting preoperative decision use.

Goal Develop and internally validate models to predict patient satisfaction 24 months after adult spinal deformity (ASD) surgery, using SRS-22r satisfaction (high satisfaction defined as score ≥ 4.5).

Cohort 213 ASD patients met criteria; 128 (60%) used for training and 85 (40%) for internal test validation.

Pipeline Used three ML feature-selection methods—LASSO, recursive feature elimination (RFE), and Boruta—and retained variables consistently selected by all three.

Final predictors Nine key indicators were retained: rFCSA, fatty infiltration, frailty, pelvic compensation, postoperative SVA, imaging MCID achievement, postoperative subtotal score, postoperative WOMAC function, and change in WOMAC function.

Model Built an interpretable logistic regression model from these predictors; binary cutoff optimized via ROC/Youden index, with SHAP used to rank feature importance.

Performance In the test set, the model achieved AUROC 0.846 and accuracy 0.812 (also reported AUPRC 0.894 and Brier score 0.153).

Top drivers (SHAP order) Higher postoperative WOMAC function, absence of frailty, imaging MCID achieved, larger WOMAC function improvement, higher rFCSA, higher postoperative subtotal, lower postoperative SVA, successful pelvic compensation, and lower fatty infiltration increased satisfaction likelihood.

Implication/limitation Intended mainly to identify modifiable factors to guide postoperative rehabilitation; practical preoperative counseling is limited because key inputs include postoperative variables, and external multicenter validation is still needed.

19/11/202518/11/2025

An Artiﬁcial Intelligence Tool for the Diagnosis of Facial Pain

Neurosurgery 97:993–1002, 2025

This study presents development and validation of an AI-based diagnostic decision support tool that distinguishes temporomandibular disorders (TMDs) from trigeminal neuralgia (TN) using a standardized facial pain questionnaire and targeted orofacial examination. Supervised machine learning models (Random Forest, Logistic Regression, SVM) were trained on data from 101 patients, with the Random Forest achieving the best performance (≈90% accuracy; ROC-AUC ~0.95).

The analysis identifies clinically interpretable predictors—TMJ and masticatory muscle tenderness favor TMD, while brief electric-shock–like pain and prior response to trigeminal surgery favor TN—and evaluates class imbalance effects and limitations for clinical deployment. The work emphasizes the need for external validation, cautious integration into workflows, and balanced training to improve generalizability.

• Differentiation Challenge: Temporomandibular disorders (TMDs) and trigeminal neuralgia (TN) both cause orofacial pain but require very different treatments, making accurate diagnosis crucial; TMDs are far more common and often misdiagnosed as TN, leading to inappropriate management.

• AI Diagnostic Tool: A machine learning (ML) model using questionnaire data and physical examination can reliably distinguish TMD from TN with approximately 90% accuracy, with a Random Forest Classifier showing the best performance (F1 score up to 0.953).

• Key Predictive Features: The most important diagnostic indicators are TMJ tenderness and masticatory muscle tenderness (favoring TMD), and brief, unpredictable, electric shock–like pain episodes (favoring TN).

• Data Collection: Comprehensive data—including both patient-reported symptoms and structured physical examination—significantly improves diagnostic accuracy compared to using only a subset of features.

• Prevalence and Misdiagnosis: TMDs affect 5–12% of the population, while TN is much rarer (0.03–0.3%); the high prevalence of TMD means misdiagnosis as TN is a significant concern, with many patients meeting criteria for TN2 possibly having TMD instead.

• Model Robustness: Training ML models on balanced datasets (even when real-world prevalence is imbalanced) improves accuracy and reduces false positives for the minority class (TN).

• Clinical Utility: The AI tool provides transparent, interpretable results that align with clinical reasoning, supporting clinicians in differentiating between TMD and TN, but external validation in diverse populations is needed before routine clinical adoption.

• Limitations: Further research is required for external validation, integration into workflows, and to address potential algorithmic bias; overreliance on algorithmic output should be avoided in favor of combined clinical expertise.

07/11/202506/11/2025

Machine Learning–Based Rupture Risk Prediction for Intracranial Aneurysms: A Systematic Review and Meta-Analysis

Neurosurgery 97:1072–1082, 2025

This systematic review and meta-analysis evaluates machine learning (ML) applications for predicting intracranial aneurysm rupture, comparing 124 ML models across 36 retrospective studies (22,462 patients) with the PHASES score. Results show ML—especially deep learning and SVM—achieves higher AUC and specificity than PHASES, with hemodynamic inputs improving test-set specificity but not external validation.

The authors highlight methodological heterogeneity, risks of bias, and overfitting concerns from retrospective single‑center data, urging prospective, standardized studies and external validation before clinical integration of ML rupture‑risk tools.

Machine Learning (ML) Models: ML techniques, including deep learning (DL), support vector machines (SVM), and regression models, show higher specificity and overall diagnostic accuracy than the traditional PHASES score for predicting intracranial aneurysm rupture risk, with comparable sensitivity.

• Deep Learning Performance: DL models achieved the highest sensitivity (up to 0.87), specificity (up to 0.86), and area under the curve (AUC-ROC up to 0.92) among all ML families, indicating strong discriminative ability in rupture risk prediction.

• PHASES Score Limitations: The PHASES score, though widely used, demonstrates lower specificity (0.51) and modest overall discriminative ability (AUC-ROC 0.66), and does not incorporate important risk factors like aneurysm morphology or family history.

• Hemodynamic Parameters: Incorporating hemodynamic variables (e.g., wall shear stress, flow patterns) into ML models improves specificity and accuracy in test sets, but benefits are less pronounced in external validation, possibly due to sample size and generalizability issues.

• Retrospective Data and Overfitting: All included ML models were trained on retrospective, post-rupture data, raising concerns about overfitting and the applicability of these models to pre-rupture clinical decision-making.

• Generalizability Concerns: ML models often perform less well on external validation data due to biases in patient selection, single-center data, and differences in imaging or clinical protocols, while the PHASES score maintains more consistent performance across settings.

• Need for Prospective Validation: There is a critical need for prospective studies and standardized protocols to confirm the clinical utility and reliability of ML-based rupture risk prediction models before integration into routine practice.

• Clinical Implications: ML approaches, especially DL and SVM, have the potential to enhance individualized risk stratification and reduce overtreatment, but methodological challenges and validation in diverse populations remain essential for safe clinical adoption.

22/10/202521/10/2025

Development and Validation of Interpretable Machine Learning Models Incorporating Paraspinal Muscle Quality to Predict Cage Subsidence Risk Following Posterior Lumbar Interbody Fusion

Spine 2025;50:1375–1385

This multicenter retrospective study developed and validated an interpretable LightGBM machine learning model incorporating paraspinal muscle quality and bone metrics to accurately predict cage subsidence risk after PLIF. Key risk factors included lower psoas muscle index, higher fat infiltration, reduced bone density, and suboptimal cage parameters.

• A machine learning model (LightGBM) was developed to predict cage subsidence risk after PLIF, achieving high accuracy (AUC 0.9752, 92% accuracy, F1 score 0.92).

• Key independent risk factors include lower psoas muscle index (PMI), higher fat infiltration (FI), reduced bone density (HU value, VBQ), suboptimal cage position/height, and greater postoperative changes in intervertebral height (IH) and segmental angle (SA).

• Paraspinal muscle quality was a major contributor; removing muscle indicators reduced model accuracy substantially.

• Patients with cage subsidence had poorer paraspinal muscle and bone quality compared to those without subsidence.

• The model was externally validated and deployed as a web-based tool for real-time, individualized clinical risk assessment.

• Findings support personalized surgical planning and risk mitigation strategies for PLIF patients.

• The study emphasizes a multifactorial approach, integrating skeletal, muscular, and surgical parameters for optimal prediction.

• Limitations include retrospective design, use of a single cage type, and lack of comorbidity indices; further prospective studies are needed.

29/08/202528/08/2025

AtlasGPT: a language model grounded in neurosurgery with domain-specific data and document retrieval

J Neurosurg 143:560–567, 2025

AtlasGPT, a neurosurgery-specific large language model grounded in expert-verified sources and retrieval-augmented generation, outperformed GPT-4 and Gemini Advanced on a neurosurgery board exam, showed greater resistance to medical misinformation, and generated more comprehensive, relevant, and well-referenced answer explanations than standard preparation materials.

• AtlasGPT is a neurosurgery-specific large language model (LLM) built on GPT-4 with retrieval-augmented generation (RAG) from trusted neurosurgical sources.

• AtlasGPT outperformed GPT-4 and Gemini Advanced on a 149-question neurosurgery board exam (accuracy: 90.6% vs 80.5%).

• AtlasGPT showed the highest accuracy on spine and imaging-based questions, even without access to image data.

• In adversarial testing, AtlasGPT was more robust to misinformation, being fooled only 14% of the time, compared to 44% for GPT-4 and 68% for Gemini Advanced.

• Expert neurosurgeons rated AtlasGPT’s explanations as more comprehensive, relevant, and better referenced than official board prep materials.

• AtlasGPT did not produce hallucinations or harmful content in its responses.

• The study suggests domain-specific LLMs like AtlasGPT can enhance medical education, decision-making, and exam preparation in complex fields.

• Limitations include use of a single question bank and need for broader source material in future work.

08/08/202507/08/2025

Intraoperative brain tumor classification via laser-induced fluorescence spectroscopy and machine learning

J Neurosurg 143:313–322, 2025

A laser-based device, TumorID, combined with machine learning, rapidly and nondestructively classifies brain tumor tissue intraoperatively. Tested on 46 patients, it distinguished glioma, meningioma, pituitary adenoma, and normal tissue with high accuracy, offering potential to improve neurosurgical decision-making and outcomes

• TumorID is a laser-induced endogenous fluorescence spectroscopy device paired with machine learning for rapid intraoperative brain tumor classification.

• It distinguishes glioma, meningioma, pituitary adenoma, and nonneoplastic tissue in near real time using a 405-nm laser and support vector machine (SVM) algorithm.

• The device requires only 0.5 seconds per scan and does not damage tissue.

• In a study of 46 patients and 761 scans, TumorID achieved a multiclass AUROC of 0.809, demonstrating high classification accuracy.

• Neutral porphyrin emission regions were most significant for tissue differentiation.

• TumorID offers objective, fast, and nondestructive tissue diagnostics, potentially improving surgical decision-making and resection outcomes.

• Future directions include in vivo use, prediction of tumor subtypes and genetics, and integration with other data sources for improved accuracy.

21/07/202520/07/2025

Stratifying trigeminal neuralgia and characterizing an abnormal property of brain functional organization: a resting-state fMRI and machine learning study

J Neurosurg 143:74–82, 2025

Resting-state fMRI and machine learning revealed distinct brain connectivity and activity differences between classical and idiopathic trigeminal neuralgia (TN) and controls. These findings identify potential neuroimaging biomarkers for TN subtypes, aiding diagnosis and understanding of TN pathophysiology.

Primary trigeminal neuralgia (TN) includes classical (CTN) and idiopathic (ITN) types, sharing clinical features but differing in neurovascular compression (NVC) presence.

• Resting-state fMRI and machine learning were used to analyze brain functional connectivity and spontaneous activity in 50 TN patients (28 CTN, 22 ITN) and 43 controls.

• TN patients showed increased connectivity between the medial prefrontal cortex (mPFC) and left planum temporale, and decreased connectivity between mPFC and left superior frontal gyrus.

• CTN patients had further reduced connectivity between the left insula and left occipital pole, and decreased activity in the right temporal pole compared to ITN.

• TN patients exhibited heightened neural activity in frontal regions compared to controls.

• Machine learning (support vector machine) distinguished TN patients from controls with moderate accuracy (AUC 0.80).

• Findings suggest potential fMRI biomarkers for TN subtypes, aiding understanding of pathophysiology and improving diagnosis.

• Study limitations include small sample size and exclusion of bilateral/secondary TN, warranting further research.

10/04/202510/04/2025

Artificial intelligence as a modality to enhance the readability of neurosurgical literature for patients

J Neurosurg 142:1189–1195, 2025

The study evaluates ChatGPT 3.5 and GPT4’s ability to generate readable, accurate summaries of neurosurgical literature, enhancing patient comprehension. GPT4 showed higher readability and accuracy, suggesting its potential in improving patient education and bridging the gap between medical findings and public understanding.

Study Overview

• Objective: Assess ChatGPT’s ability to generate readable, accurate neurosurgical summaries.

• Methods: Analyzed 150 abstracts from top neurosurgical journals.

• Models Used: GPT3.5 and GPT4.

Findings

• Readability Improvement: GPT4 summaries more readable than original abstracts.

• Scientific Accuracy: 84.2% of GPT4 summaries maintained moderate accuracy.

• Readability Metrics: GPT4 outperformed GPT3.5 in multiple readability scores.

Implications

• Patient Education: GPT4 can enhance neurosurgical literature comprehension for patients.

• Health Literacy: Potential to improve health literacy nationwide.

Limitations and Future Research

• Accessibility: GPT4’s restricted access limits broader application.

• Future Studies: Explore GPT4’s use in other medical specialties.

05/09/202404/09/2024

Optimal Implant Sizing Using Machine Learning Is Associated With Increased Range of Motion After Cervical Disk Arthroplasty

Neurosurgery 95:627–633, 2024

Cervical disk arthroplasty (CDA) offers the advantage of motion preservation in the treatment of focal cervical pathology. At present, implant sizing is performed using subjective tactile feedback and imaging of trial cages. This study aims to construct interpretable machine learning (IML) models to accurately predict postoperative range of motion (ROM) and identify the optimal implant sizes that maximize ROM in patients undergoing CDA.

METHODS: Adult patients who underwent CDA for single-level disease from 2012 to 2020 were identiﬁed. Patient demographics, comorbidities, and outcomes were collected, including symptoms, examination ﬁndings, subsidence, and reoperation. Affected disk height, healthy rostral disk height, and implant height were collected at sequential time points. Linear regression and IML models, including bagged regression tree, bagged multivariate adaptive regression spline, and k-nearest neighbors, were used to predict ROM change. Model performance was assessed by calculating the root mean square error (RMSE) between predicted and actual changes in ROM in the validation cohort. Variable importance was assessed using RMSE loss. Area under the curve analyses were performed to identify the ideal implant size cutoffs in predicting improved ROM.

RESULTS: Forty-seven patients were included. The average RMSE between predicted and actual ROM was 7.6°(range: 5.8-10.1) in the k-nearest neighbors model, 7.8°(range: 6.5-10.0) in the bagged regression tree model, 7.8°(range: 6.210.0) in the bagged multivariate adaptive regression spline model, and 15.8°(range: 14.3-17.5°) in a linear regression model. In the highest-performing IML model, graft size was the most important predictor with RMSE loss of 6.2, followed by age (RMSE loss = 5.9) and preoperative caudal disk height (RMSE loss = 5.8). Implant size at 110% of the normal adjacent disk height was the optimal cutoff associated with improved ROM.

CONCLUSION: IML models can reliably predict change in ROM after CDA within an average of 7.6 degrees of error. Implants sized comparably with the healthy adjacent disk may maximize ROM.

05/08/202405/08/2024

Eloquent noneloquence: redefinition of cortical eloquence based on outcomes of superficial cerebral cavernous malformation resection

J Neurosurg 141:291–305, 2024

Cerebral cavernous malformations (CMs) are pathological lesions that cause discrete cortical disruption with hemorrhage, and their transcortical resections can cause additional iatrogenic disruption. The analysis of microsurgically treated CMs might identify areas of “eloquent noneloquence,” or cortex that is associated with unexpected deficits when injured or transgressed.

METHODS Patients from a consecutive microsurgical series of superficial cerebral CMs who presented to the authors’ center over a 13-year period were retrospectively analyzed. Neurological outcomes were measured using the modified Rankin Scale (mRS), and new, permanent neurological or cognitive symptoms not detected by changes in mRS scores were measured as additional functional decline. Patients with multiple lesions and surgical encounters for different lesions within the study interval were represented within the cohort as multiple patient entries. Virtual object models for CMs and approach trajectories to subcortical lesions were merged into a template brain model for subtyping and Quicktome connectomic analyses. Parcellation outputs from the models were analyzed for regional cerebral clustering.

RESULTS Overall, 362 CMs were resected in 346 patients, and convexity subtypes were the most common (132/362, 36.5%). Relative to the preoperative mRS score, 327 of 362 cases (90.3%) were in patients who improved or remained stable, 35 (9.7%) were in patients whose conditions worsened, and 47 (13.0%) were in patients who had additional functional decline. Machine learning analyses of lesion objects and trajectory cylinder mapping identified 7 hotspots of novel eloquence: supplementary motor area (bilateral), anterior cingulate cortex (bilateral), posterior cingulate cortex (bilateral), anterior insula (left), frontal pole (right), mesial temporal lobe (left), and occipital cortex (right).

CONCLUSIONS Transgyral and transsulcal resections that circumvent areas of traditional eloquence and navigate areas of presumed noneloquence may nonetheless result in unfavorable outcomes, demonstrating that brain long considered by neurosurgeons to be noneloquent may be eloquent. Eloquent hotspots within multiple large-scale networks redefine the neurosurgical concept of eloquence and call for more refined dissection techniques that maximize transsulcal dissection, intracapsular resection, and tissue preservation. Human connectomics, awareness of brain networks, and prioritization of cognitive outcomes require that we update our concept of cortical eloquence and incorporate this information into our surgical strategies.

19/12/202318/12/2023

Use of cortical volume to predict response to temporary CSF drainage in patients with idiopathic normal pressure hydrocephalus

J Neurosurg 139:1776–1783, 2023

Temporary drainage of CSF with lumbar puncture or lumbar drainage has a high predictive value for identifying patients with suspected idiopathic normal pressure hydrocephalus (iNPH) who may benefit from ventriculoperitoneal shunt insertion. However, it is unclear what differentiates responders from nonresponders. The authors hypothesized that nonresponders to temporary CSF drainage would have patterns of reduced regional gray matter volume (GMV) as compared with those of responders. The objective of the current investigation was to compare regional GMV between temporary CSF drainage responders and nonresponders. Machine learning using extracted GMV was then used to predict outcomes.

METHODS This retrospective cohort study included 132 patients with iNPH who underwent temporary CSF drainage and structural MRI. Demographic and clinical variables were examined between groups. Voxel-based morphometry was used to calculate GMV across the brain. Group differences in regional GMV were assessed and correlated with change in results on the Montreal Cognitive Assessment (MoCA) and gait velocity. A support vector machine (SVM) model that used extracted GMV values and was validated with leave-one-out cross-validation was used to predict clinical outcome.

RESULTS There were 87 responders and 45 nonresponders. There were no group differences in terms of age, sex, baseline MoCA score, Evans index, presence of disproportionately enlarged subarachnoid space hydrocephalus, baseline total CSF volume, or baseline white matter T2-weighted hyperintensity volume (p > 0.05). Nonresponders demonstrated decreased GMV in the right supplementary motor area (SMA) and right posterior parietal cortex as compared with responders (p < 0.001, p < 0.05 with false discovery rate cluster correction). GMV in the posterior parietal cortex was associated with change in MoCA (r 2 = 0.075, p < 0.05) and gait velocity (r 2 = 0.076, p < 0.05). Response status was classified by the SVM with 75.8% accuracy.

CONCLUSIONS Decreased GMV in the SMA and posterior parietal cortex may help identify patients with iNPH who are unlikely to benefit from temporary CSF drainage. These patients may have limited capacity for recovery due to atrophy in these regions that are known to be important for motor and cognitive integration. This study represents an important step toward improving patient selection and predicting clinical outcomes in the treatment of iNPH.

28/11/202227/11/2022

Network-level prediction of set-shifting deterioration after lower-grade glioma resection

J Neurosurg 137:1329–1337, 2022

The aim of this study was to predict set-shifting deterioration after resection of low-grade glioma.

METHODS The authors retrospectively analyzed a bicentric series of 102 patients who underwent surgery for low-grade glioma. The difference between the completion times of the Trail Making Test parts B and A (TMT B-A) was evaluated preoperatively and 3–4 months after surgery. High dimensionality of the information related to the surgical cavity topography was reduced to a small set of predictors in four different ways: 1) overlap between surgical cavity and each of the 122 cortical parcels composing Yeo’s 17-network parcellation of the brain; 2) Tractotron: disconnection by the cavity of the major white matter bundles; 3) overlap between the surgical cavity and each of Yeo’s networks; and 4) disconets: signature of structural disconnection by the cavity of each of Yeo’s networks. A random forest algorithm was implemented to predict the postoperative change in the TMT B-A z-score.

RESULTS The last two network-based approaches yielded significant accuracies in left-out subjects (area under the receiver operating characteristic curve [AUC] approximately equal to 0.8, p approximately equal to 0.001) and outperformed the two alternatives. In single tree hierarchical models, the degree of damage to Yeo corticocortical network 12 (CC 12) was a critical node: patients with damage to CC 12 higher than 7.5% (cortical overlap) or 7.2% (disconets) had much higher risk to deteriorate, establishing for the first time a causal link between damage to this network and impaired set-shifting.

CONCLUSIONS The authors’ results give strong support to the idea that network-level approaches are a powerful way to address the lesion-symptom mapping problem, enabling machine learning–powered individual outcome predictions.

29/09/202228/09/2022

Survival Prediction After Neurosurgical Resection of Brain Metastases: A Machine Learning Approach

Neurosurgery 91:381–388, 2022

Current prognostic models for brain metastases (BMs) have been constructed and validated almost entirely with data from patients receiving up-front radiotherapy, leaving uncertainty about surgical patients.

OBJECTIVE: To build and validate a model predicting 6-month survival after BM resection using different machine learning algorithms.

METHODS: An institutional database of 1062 patients who underwent resection for BM was split into an 80:20 training and testing set. Seven different machine learning algorithms were trained and assessed for performance; an established prognostic model for patients with BM undergoing radiotherapy, the diagnosis-speciﬁc graded prognostic assessment, was also evaluated. Model performance was assessed using area under the curve (AUC) and calibration.

RESULTS: The logistic regression showed the best performance with an AUC of 0.71 in the hold-out test set, a calibration slope of 0.76, and a calibration intercept of 0.03. The diagnosis-speciﬁc graded prognostic assessment had an AUC of 0.66. Patients were stratiﬁed into regular-risk, high-risk and very high-risk groups for death at 6 months; these strata strongly predicted both 6-month and longitudinal overall survival (P < .0005). The model was implemented into a web application that can be accessed through http:// brainmets.morethanml.com.

CONCLUSION: We developed and internally validated a prediction model that accurately predicts 6-month survival after neurosurgical resection for BM and allows for meaningful risk stratiﬁcation. Future efforts should focus on external validation of our model.

16/08/202215/08/2022

Artificial intelligence in predicting early‑onset adjacent segment degeneration following anterior cervical discectomy and fusion

European Spine Journal (2022) 31:2104–2114

Anterior cervical discectomy and fusion (ACDF) is a common surgical treatment for degenerative disease in the cervical spine. However, resultant biomechanical alterations may predispose to early-onset adjacent segment degeneration (EO-ASD), which may become symptomatic and require reoperation. This study aimed to develop and validate a machine learning (ML) model to predict EO-ASD following ACDF.

Methods Retrospective review of prospectively collected data of patients undergoing ACDF at a quaternary referral medical center was performed. Patients > 18 years of age with > 6 months of follow-up and complete pre- and postoperative X-ray and MRI imaging were included. An ML-based algorithm was developed to predict EO-ASD based on preoperative demographic, clinical, and radiographic parameters, and model performance was evaluated according to discrimination and overall performance.

Results In total, 366 ACDF patients were included (50.8% male, mean age 51.4 ± 11.1 years). Over 18.7 ± 20.9 months of follow-up, 97 (26.5%) patients developed EO-ASD. The model demonstrated good discrimination and overall performance according to precision (EO-ASD: 0.70, non-ASD: 0.88), recall (EO-ASD: 0.73, non-ASD: 0.87), accuracy (0.82), F1-score (0.79), Brier score (0.203), and AUC (0.794), with C4/C5 posterior disc bulge, C4/C5 anterior disc bulge, C6 posterior superior osteophyte, presence of osteophytes, and C6/C7 anterior disc bulge identified as the most important predictive features.

Conclusions Through an ML approach, the model identified risk factors and predicted development of EO-ASD following ACDF with good discrimination and overall performance. By addressing the shortcomings of traditional statistics, ML techniques can support discovery, clinical decision-making, and precision-based spine care.

01/04/202231/03/2022

Prediction of Shunt Responsiveness in Suspected Patients With Normal Pressure Hydrocephalus Using the Lumbar Infusion Test: A Machine Learning Approach

Neurosurgery 90:407–418, 2022

Machine learning (ML) approaches can signiﬁcantly improve the classical Rout -based evaluation of the lumbar infusion test (LIT) and the clinical management of the normal pressure hydrocephalus.

OBJECTIVE: To develop a ML model that accurately identiﬁes patients as candidates for permanent cerebral spinal ﬂuid shunt implantation using only intracranial pressure and electrocardiogram signals recorded throughout LIT.

METHODS: This was a single-center cohort study of prospectively collected data of 96 patients who underwent LIT and 5-day external lumbar cerebral spinal ﬂuid drainage (external lumbar drainage) as a reference diagnostic method. A set of selected 48 intracranial pressure/ electrocardiogram complex signal waveform features describing nonlinear behavior, wavelet transform spectral signatures, or recurrent map patterns were calculated for each patient. After applying a leave-one-out cross-validation training–testing split of the data set, we trained and evaluated the performance of various state-of-the-art ML algorithms.

RESULTS: The highest performing ML algorithm was the eXtreme Gradient Boosting. This model showed a good calibration and discrimination on the testing data, with an area under the receiver operating characteristic curve of 0.891 (accuracy: 82.3%, sensitivity: 86.1%, and speciﬁcity: 73.9%) obtained for 8 selected features. Our ML model clearly outperforms the classical Rout based manual classiﬁcation commonly used in clinical practice with an accuracy of 62.5%.

CONCLUSION: This study successfully used the ML approach to predict the outcome of a 5-day external lumbar drainage and hence which patients are likely to beneﬁt from permanent shunt implantation. Our automated ML model thus enhances the diagnostic utility ofLIT in management.

31/01/2022

Deep Learning for Outcome Prediction in Neurosurgery

Neurosurgery 90:16–38, 2022

Deep learning (DL) is a powerful machine learning technique that has increasingly been used to predict surgical outcomes. However, the large quantity of data required and lack of model interpretability represent substantial barriers to the validity and reproducibility of DL models.

The objective of this study was to systematically review the characteristics of DL studies involving neurosurgical outcome prediction and to assess their bias and reporting quality.

Literature search using the PubMed, Scopus, and Embase databases identified 1949 records of which 35 studies were included. Of these, 32 (91%) developed and validated a DL model while 3 (9%) validated a pre-existing model. The most commonly represented subspecialty areas were oncology (16 of 35, 46%), spine (8 of 35, 23%), and vascular (6 of 35, 17%). Risk of bias was low in 18 studies (51%), unclear in 5 (14%), and high in 12 (34%), most commonly because of data quality deficiencies.

Adherence to transparent reporting of a multivariable prediction model for individual prognosis or diagnosis reporting standards was low, with a median of 12 transparent reporting of a multivariable prediction model for individual prognosis or diagnosis items (39%) per study not reported. Model transparency was severely limited because code was provided in only 3 studies (9%) and final models in 2 (6%).

With the exception of public databases, no study data sets were readily available. No studies described DL models as ready for clinical use. The use of DL for neurosurgical outcome prediction remains nascent. Lack of appropriate data sets poses a major concern for bias. Although studies have demonstrated promising results, greater transparency in model development and reporting is needed to facilitate reproducibility and validation.

29/07/202129/07/2021

Predicting Spinal Surgery Candidacy From Imaging Data Using Machine Learning

Neurosurgery 89:116–121, 2021

The referral process for consultation with a spine surgeon remains inefficient, given a substantial proportion of referrals to spine surgeons are nonoperative.

OBJECTIVE: To develop a machine-learning-based algorithm which accurately identifies patients as candidates for consultation with a spine surgeon, using only magnetic resonance imaging (MRI).

METHODS: We trained a deep U-Net machine learning model to delineate spinal canals on axial slices of 100 normal lumbar MRI scans which were previously delineated by expert radiologists and neurosurgeons. We then tested the model against lumbar MRI scans for 140 patients who had undergone lumbar spine MRI at our institution (60 of whom ultimately underwent surgery, and 80 of whom did not). The model generated automated segmentations of the lumbar spinal canals and calculated a maximum degree of spinal stenosis for each patient,which served as our biomarker for surgical pathology warranting expert consultation.

RESULTS: Themachine learning model correctly predicted surgical candidacy (ie, whether patients ultimately underwent lumbar spinal decompression) with high accuracy (area under the curve = 0.88), using only imaging data from lumbar MRI scans.

CONCLUSION: Automated interpretation of lumbar MRI scans was sufficient to correctly determine surgical candidacy in nearly 90% of cases. Given that a significant proportion of referrals placed for spine surgery evaluation fail to meet criteria for surgical intervention, our model could serve as a valuable tool for patient triage and thereby address some of the inefficiencies within the outpatient surgical referral process.

27/07/202127/07/2021

Machine Learning for the Prediction of Molecular Markers in Glioma on Magnetic Resonance Imaging: A Systematic Review and Meta-Analysis

Neurosurgery 89:31–44, 2021

Molecular characterization of glioma has implications for prognosis, treatment planning, and prediction of treatment response. Current histopathology is limited by intratumoral heterogeneity and variability in detection methods. Advances in computational techniques have led to interest in mining quantitative imaging features to noninvasively detect genetic mutations.

OBJECTIVE: To evaluate the diagnostic accuracy of machine learning (ML) models in molecular subtyping gliomas on preoperative magnetic resonance imaging (MRI).

METHODS: A systematic search was performed following PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) guidelines to identify studies up to April 1, 2020. Methodological quality of studies was assessed using the Quality Assessment for Diagnostic Accuracy Studies (QUADAS)-2. Diagnostic performance estimates were obtained using a bivariate model and heterogeneity was explored using metaregression.

RESULTS: Forty-four original articles were included. The pooled sensitivity and specificity for predicting isocitrate dehydrogenase (IDH) mutation in training datasets were 0.88 (95% CI 0.83-0.91) and 0.86 (95% CI 0.79-0.91), respectively, and 0.83 to 0.85 in validation sets. Use of data augmentation and MRI sequence type were weakly associated with heterogeneity. Both O6-methylguanine-DNA methyltransferase (MGMT) gene promoter methylation and 1p/19q codeletion could be predicted with a pooled sensitivity and specificity between 0.76 and 0.83 in training datasets.

CONCLUSION: ML application to preoperative MRI demonstrated promising results for predicting IDHmutation, MGMT methylation, and 1p/19q codeletion in glioma. Optimized ML models could lead to a noninvasive, objective tool that captures molecular information important for clinical decisionmaking. Future studies should use multicenter data, external validation and investigate clinical feasibility of ML models.

12/11/202011/11/2020

Correlations between genomic subgroup and clinical features in a cohort of more than 3000 meningiomas

J Neurosurg 133:1345–1354, 2020

Recent large-cohort sequencing studies have investigated the genomic landscape of meningiomas, identifying somatic coding alterations in NF2, SMARCB1, SMARCE1, TRAF7, KLF4, POLR2A, BAP1, and members of the PI3K and Hedgehog signaling pathways. Initial associations between clinical features and genomic subgroups have been described, including location, grade, and histology. However, further investigation using an expanded collection of samples is needed to confirm previous findings, as well as elucidate relationships not evident in smaller discovery cohorts.

METHODS Targeted sequencing of established meningioma driver genes was performed on a multiinstitution cohort of 3016 meningiomas for classification into mutually exclusive subgroups. Relevant clinical information was collected for all available cases and correlated with genomic subgroup. Nominal variables were analyzed using Fisher’s exact tests, while ordinal and continuous variables were assessed using Kruskal-Wallis and 1-way ANOVA tests, respectively. Machine-learning approaches were used to predict genomic subgroup based on noninvasive clinical features.

RESULTS Genomic subgroups were strongly associated with tumor locations, including correlation of HH tumors with midline location, and non-NF2 tumors in anterior skull base regions. NF2 meningiomas were significantly enriched in male patients, while KLF4 and POLR2A mutations were associated with female sex. Among histologies, the results confirmed previously identified relationships, and observed enrichment of microcystic features among “mutation unknown” samples. Additionally, KLF4-mutant meningiomas were associated with larger peritumoral brain edema, while SMARCB1 cases exhibited elevated Ki-67 index. Machine-learning methods revealed that observable, noninvasive patient features were largely predictive of each tumor’s underlying driver mutation.

CONCLUSIONS Using a rigorous and comprehensive approach, this study expands previously described correlations between genomic drivers and clinical features, enhancing our understanding of meningioma pathogenesis, and laying further groundwork for the use of targeted therapies. Importantly, the authors found that noninvasive patient variables exhibited a moderate predictive value of underlying genomic subgroup, which could improve with additional training data. With continued development, this framework may enable selection of appropriate precision medications without the need for invasive sampling procedures.

19/02/202018/02/2020

An Online Calculator for the Prediction of Survival in Glioblastoma Patients Using Classical Statistics and Machine Learning

Neurosurgery, Volume 86, Issue 2, February 2020, Pages E184–E192

Although survival statistics in patients with glioblastoma multiforme (GBM) are well-defined at the group level, predicting individual patient survival remains challenging because of significant variation within strata.

OBJECTIVE: To compare statistical and machine learning algorithms in their ability to predict survival in GBM patients and deploy the best performing model as an online survival calculator.

METHODS: Patients undergoing an operation for a histopathologically confirmed GBM were extracted from the Surveillance Epidemiology and End Results (SEER) database (2005-2015) and split into a training and hold-out test set in an 80/20 ratio. Fifteen statistical and machine learning algorithms were trained based on 13 demographic, socioeconomic, clinical, and radiographic features to predict overall survival, 1-yr survival status, and compute personalized survival curves.

RESULTS: In total, 20 821 patients met our inclusion criteria. The accelerated failure time model demonstrated superior performance in terms of discrimination (concordance index = 0.70), calibration, interpretability, predictive applicability, and computational efficiency compared to Cox proportional hazards regression and other machine learning algorithms. This model was deployed through a free, publicly available software interface (https://cnoc-bwh.shinyapps.io/gbmsurvivalpredictor/).

CONCLUSION: The development and deployment of survival prediction tools require a multimodal assessment rather than a single metric comparison. This study provides a framework for the development of prediction tools in cancer patients, as well as an online survival calculator for patients with GBM. Future efforts should improve the interpretability, predictive applicability, and computational efficiency of existing machine learning algorithms, increase the granularity of population-based registries, and externally validate the proposed prediction tool.

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: