Article ; Online: Use of Response Permutation to Measure an Imaging Dataset's Susceptibility to Overfitting by Selected Standard Analysis Pipelines.
2024
Abstract: Rationale and objectives: This study demonstrates a method for quantifying the impact of overfitting on the receiving operator characteristic curve (AUC) when using standard analysis pipelines to develop imaging biomarkers. We illustrate the approach ... ...
Abstract | Rationale and objectives: This study demonstrates a method for quantifying the impact of overfitting on the receiving operator characteristic curve (AUC) when using standard analysis pipelines to develop imaging biomarkers. We illustrate the approach using two publicly available repositories of radiology and pathology images for breast cancer diagnosis. Materials and methods: For each dataset, we permuted the outcome (cancer diagnosis) values to eliminate any true association between imaging features and outcome. Seven types of classification models (logistic regression, linear discriminant analysis, Naïve Bayes, linear support vector machines, nonlinear support vector machine, random forest, and multi-layer perceptron) were fitted to each scrambled dataset and evaluated by each of four techniques (all data, hold-out, 10-fold cross-validation, and bootstrapping). After repeating this process for a total of 50 outcome permutations, we averaged the resulting AUCs. Any increase over a null AUC of 0.5 can be attributed to overfitting. Results: Applying this approach and varying sample size and the number of imaging features, we found that failing to control for overfitting could result in near-perfect prediction (AUC near 1.0). Cross-validation offered greater protection against overfitting than the other evaluation techniques, and for most classification algorithms a sample size of at least 200 was required to assess as few as 10 features with less than 0.05 AUC inflation attributable to overfitting. Conclusion: This approach could be applied to any curated dataset to suggest the number of features and analysis approaches to limit overfitting. |
---|---|
Language | English |
Publishing date | 2024-04-12 |
Publishing country | United States |
Document type | Journal Article |
ZDB-ID | 1355509-1 |
ISSN | 1878-4046 ; 1076-6332 |
ISSN (online) | 1878-4046 |
ISSN | 1076-6332 |
DOI | 10.1016/j.acra.2024.02.028 |
Database | MEDical Literature Analysis and Retrieval System OnLINE |
Full text online
More links
Kategorien
In stock of ZB MED Cologne/Königswinter
Zs.A 4594: Show issues | Location: Je nach Verfügbarkeit (siehe Angabe bei Bestand) bis Jg. 1994: Bestellungen von Artikeln über das Online-Bestellformular Jg. 1995 - 2021: Lesesall (2.OG) ab Jg. 2022: Lesesaal (EG) |
Order via subito
This service is chargeable due to the Delivery terms set by subito. Orders including an article and supplementary material will be classified as separate orders. In these cases, fees will be demanded for each order.