Which metric is most appropriate for evaluating a multi-class classification model?

Question

Accepted Answer

The correct answer is Accuracy, Precision, Recall, and F1-score. Option A is correct because evaluating a multi-class classification model requires a balanced view across all classes, making Accuracy, Precision, Recall, and F1-score together the most appropriate suite of metrics; F1-score in particular balances precision and recall and can be computed per-class or as a macro/weighted average. Option B is insufficient because accuracy alone can be misleading when classes are imbalanced, giving an inflated score even when the model fails on minority classes. Option C is incorrect because precision alone ignores false negatives and does not capture whether the model is missing true positives across classes. Option D is incorrect because recall alone ignores false positives and does not reflect the model's ability to avoid incorrect classifications.