Deep Learning for Outcome Prediction in Neurosurgery

Neurosurgery 90:16–38, 2022

Deep learning (DL) is a powerful machine learning technique that has increasingly been used to predict surgical outcomes. However, the large quantity of data required and lack of model interpretability represent substantial barriers to the validity and reproducibility of DL models.

The objective of this study was to systematically review the characteristics of DL studies involving neurosurgical outcome prediction and to assess their bias and reporting quality.

Literature search using the PubMed, Scopus, and Embase databases identified 1949 records of which 35 studies were included. Of these, 32 (91%) developed and validated a DL model while 3 (9%) validated a pre-existing model. The most commonly represented subspecialty areas were oncology (16 of 35, 46%), spine (8 of 35, 23%), and vascular (6 of 35, 17%). Risk of bias was low in 18 studies (51%), unclear in 5 (14%), and high in 12 (34%), most commonly because of data quality deficiencies.

Adherence to transparent reporting of a multivariable prediction model for individual prognosis or diagnosis reporting standards was low, with a median of 12 transparent reporting of a multivariable prediction model for individual prognosis or diagnosis items (39%) per study not reported. Model transparency was severely limited because code was provided in only 3 studies (9%) and final models in 2 (6%).

With the exception of public databases, no study data sets were readily available. No studies described DL models as ready for clinical use. The use of DL for neurosurgical outcome prediction remains nascent. Lack of appropriate data sets poses a major concern for bias. Although studies have demonstrated promising results, greater transparency in model development and reporting is needed to facilitate reproducibility and validation.