Define accuracy, precision, recall, and F1-score as metrics for evaluating classification models and explain their significance. Discuss the strengths and limitations of each metric.
Definition: Accuracy measures the proportion of correctly classified instances out of the total number of instances in the dataset.
Formula:
$$ {Accuracy} = \frac {Number \ of \ Correct \ Predictions} {Total \ Number \ of \ Predictions} $$
Limitations: Accuracy can be misleading, especially in the presence of imbalanced datasets where one class dominates the other. It may not reflect the true performance of the model if the classes are not balanced.
Example:
Definition: Precision measures the proportion of true positive predictions out of all positive predictions made by the model.
Formula:
$$ Precision = \frac {True \ Positives} {True \ Positives \ + \ False \ Positives} $$
Limitations: Precision does not consider false negatives, which can be problematic when false positives are costly. It may prioritize minimizing false positives at the expense of false negatives.
Example:
Definition: Recall measures the proportion of true positive predictions out of all actual positive instances in the dataset.
Formula:
$$ Recall=\frac {True \ Positives} {True \ Positives \ + \ False \ Negatives} $$
Limitations: Recall does not consider false positives, which can be problematic when false negatives are costly. It may prioritize minimizing false negatives at the expense of false positives.
Example:
Definition: F1-score is the harmonic mean of precision and recall, providing a balanced measure of a model's performance.
Formula:
$$ F1-score = 2 \ × \frac {Precision \ \times \ Recall} {Precision \ + \ Recall} $$
Limitations: F1-score treats precision and recall equally, which may not be suitable for all scenarios. It may not be ideal when the cost of false positives and false negatives differs significantly.
Example:
Describe how a confusion matrix is constructed and how it can be used to evaluate model performance.
Confusion Matrix is a performance measurement for the machine learning classification problems where the output can be two or more classes. It is a table with combinations of predicted and actual values.
A confusion matrix is defined as the table that is often used to describe the performance of a classification model on a set of the test data for which the true values are known.
Here,
Using Confusion Matrix to Evaluate Model Performance:
Explain the concept of a ROC curve and discuss how it can be used to evaluate the performance of binary classification models.
The Receiver Operator Characteristic (ROC) is a probability curve that plots the TPR(True Positive Rate) against the FPR(False Positive Rate) at various threshold values and separates the ‘signal’ from the ‘noise’.
The Area Under the Curve (AUC) is the measure of the ability of a classifier to distinguish between classes. From the graph, we simply say the area of the curve ABCDE and the X and Y-axis.
In a ROC curve, the X-axis value shows False Positive Rate (FPR), and Y-axis shows True Positive Rate (TPR). Higher the value of X means higher the number of False Positives(FP) than True Negatives(TN), while a higher Y-axis value indicates a higher number of TP than FN. So, the choice of the threshold depends on the ability to balance between FP and FN.
Using ROC Curve for Evaluation:
Explain the concept of cross-validation and compare k-fold cross-validation with stratified cross-validation.
Cross-validation
K-fold cross validation
In this technique, the whole dataset is partitioned in k parts of equal size and each partition is called a fold. It’s known as k-fold since there are k parts where k can be any integer - 3,4,5, etc.
One fold is used for validation and other K-1 folds are used for training the model. To use every fold as a validation set and other left-outs as a training set, this technique is repeated k times until each fold is used once.
Stratified k-fold validation
Describe the process of hyperparameter tuning and model selection and discuss its importance in improving model performance.