Which metric is most appropriate for evaluating a model on imbalanced classification data?

Question

Accepted Answer

The correct answer is F1-score or AUC-ROC. Option B is correct because with imbalanced datasets, accuracy is misleading since a model predicting the majority class always achieves high accuracy. F1-score balances precision and recall for the minority class, while AUC-ROC measures the model's ability to discriminate between classes across all thresholds, both of which reflect true model quality on imbalanced data. Option A (precision only) ignores false negatives and misses the recall dimension critical for minority class detection. Option C (recall only) ignores false positives and can be gamed by predicting the positive class for every sample. Option D (accuracy) is the most misleading metric on imbalanced data, as a model that never predicts the rare class can still score very high.