Chapter 10 Intermediate 52 Questions

Practice Questions — Model Evaluation, Cross-Validation, and Hyperparameter Tuning

← Back to Notes

10 Easy

13 Medium

13 Hard

Topic-Specific Questions

Question 1

Easy

What is the output of the following code?

TP = 50
FP = 10
FN = 5
TN = 100

accuracy = (TP + TN) / (TP + TN + FP + FN)
precision = TP / (TP + FP)
recall = TP / (TP + FN)
print(f"Accuracy: {accuracy:.3f}")
print(f"Precision: {precision:.3f}")
print(f"Recall: {recall:.3f}")

Plug the values into the formulas.

Accuracy: 0.909
Precision: 0.833
Recall: 0.909

Question 2

Easy

What is the output?

precision = 0.9
recall = 0.6
f1 = 2 * precision * recall / (precision + recall)
print(f"F1: {f1:.3f}")

F1 is the harmonic mean of precision and recall.

F1: 0.720

Question 3

Easy

What is the output?

from sklearn.model_selection import train_test_split
import numpy as np

X = np.arange(100).reshape(100, 1)
y = np.array([0]*80 + [1]*20)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"Train size: {len(X_train)}")
print(f"Test size: {len(X_test)}")
print(f"Train class 1 ratio: {np.mean(y_train):.2f}")
print(f"Test class 1 ratio: {np.mean(y_test):.2f}")

stratify=y preserves class proportions in both splits.

Train size: 80
Test size: 20
Train class 1 ratio: 0.20
Test class 1 ratio: 0.20

Question 4

Easy

What is the output?

from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeClassifier
import numpy as np

np.random.seed(42)
X = np.random.randn(100, 3)
y = (X[:, 0] > 0).astype(int)

scores = cross_val_score(DecisionTreeClassifier(random_state=42),
                         X, y, cv=5)
print(f"CV scores: {np.round(scores, 2)}")
print(f"Mean: {scores.mean():.3f}")
print(f"Std: {scores.std():.3f}")

Cross-validation returns 5 scores (one per fold).

CV scores: [0.9 0.85 0.85 0.95 0.85]
Mean: 0.880
Std: 0.040

Question 5

Medium

What is the output?

from sklearn.model_selection import KFold
import numpy as np

X = np.arange(10)
kf = KFold(n_splits=5)

for fold, (train_idx, test_idx) in enumerate(kf.split(X), 1):
    print(f"Fold {fold}: Train={train_idx.tolist()}, Test={test_idx.tolist()}")

5-Fold KFold splits 10 items into 5 groups of 2.

Fold 1: Train=[2, 3, 4, 5, 6, 7, 8, 9], Test=[0, 1]
Fold 2: Train=[0, 1, 4, 5, 6, 7, 8, 9], Test=[2, 3]
Fold 3: Train=[0, 1, 2, 3, 6, 7, 8, 9], Test=[4, 5]
Fold 4: Train=[0, 1, 2, 3, 4, 5, 8, 9], Test=[6, 7]
Fold 5: Train=[0, 1, 2, 3, 4, 5, 6, 7], Test=[8, 9]

Question 6

Medium

What is the output?

from sklearn.metrics import confusion_matrix
import numpy as np

y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
y_pred = [1, 0, 0, 1, 0, 1, 1, 0, 1, 0]

cm = confusion_matrix(y_true, y_pred)
print(cm)
TN, FP, FN, TP = cm.ravel()
print(f"TP={TP}, TN={TN}, FP={FP}, FN={FN}")

Count: correct positives (TP), correct negatives (TN), false alarms (FP), missed positives (FN).

[[4 1]
 [1 4]]

TP=4, TN=4, FP=1, FN=1

Question 7

Medium

What is the output?

from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
import numpy as np

np.random.seed(42)
X = np.random.randn(100, 2)
y = (X[:, 0] > 0).astype(int)

param_grid = {'n_neighbors': [1, 3, 5, 7, 9, 11]}
grid = GridSearchCV(KNeighborsClassifier(), param_grid, cv=5, scoring='accuracy')
grid.fit(X, y)

print(f"Best K: {grid.best_params_['n_neighbors']}")
print(f"Best CV accuracy: {grid.best_score_:.3f}")

GridSearchCV tries all K values and picks the one with highest CV accuracy.

Best K: 5
Best CV accuracy: 0.910

Question 8

Medium

What is the output?

from sklearn.metrics import roc_auc_score
import numpy as np

y_true = [0, 0, 0, 0, 1, 1, 1, 1]

# Perfect predictions
y_proba_perfect = [0.1, 0.2, 0.3, 0.4, 0.6, 0.7, 0.8, 0.9]
print(f"Perfect AUC: {roc_auc_score(y_true, y_proba_perfect):.2f}")

# Random predictions
y_proba_random = [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5]
print(f"Random AUC: {roc_auc_score(y_true, y_proba_random):.2f}")

# Reversed predictions (worst case)
y_proba_reversed = [0.9, 0.8, 0.7, 0.6, 0.4, 0.3, 0.2, 0.1]
print(f"Reversed AUC: {roc_auc_score(y_true, y_proba_reversed):.2f}")

AUC=1.0 for perfect ranking, 0.5 for random, 0.0 for perfectly wrong.

Perfect AUC: 1.00
Random AUC: 0.50
Reversed AUC: 0.00

Question 9

Hard

What is the output?

from sklearn.model_selection import cross_validate
from sklearn.tree import DecisionTreeClassifier
import numpy as np

np.random.seed(42)
X = np.random.randn(200, 5)
y = (X[:, 0] + X[:, 1] > 0).astype(int)

results = cross_validate(
    DecisionTreeClassifier(max_depth=None, random_state=42),
    X, y, cv=5,
    scoring=['accuracy', 'f1'],
    return_train_score=True
)

print(f"Train accuracy: {results['train_accuracy'].mean():.3f}")
print(f"Test accuracy:  {results['test_accuracy'].mean():.3f}")
print(f"Gap: {results['train_accuracy'].mean() - results['test_accuracy'].mean():.3f}")
print(f"Overfitting: {results['train_accuracy'].mean() - results['test_accuracy'].mean() > 0.1}")

Unlimited depth tree memorizes training data. The gap between train and test accuracy reveals overfitting.

Train accuracy: 1.000
Test accuracy: 0.870
Gap: 0.130
Overfitting: True

Question 10

Hard

What is the output?

from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import numpy as np

np.random.seed(42)
X = np.random.randn(200, 5)
y = (X[:, 0]**2 + X[:, 1]**2 > 1.5).astype(int)

pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('svm', SVC(random_state=42))
])

param_grid = {
    'svm__C': [0.1, 1, 10],
    'svm__gamma': ['scale', 0.1, 1]
}

grid = GridSearchCV(pipe, param_grid, cv=5, scoring='accuracy')
grid.fit(X, y)

print(f"Best params: C={grid.best_params_['svm__C']}, gamma={grid.best_params_['svm__gamma']}")
print(f"Best CV accuracy: {grid.best_score_:.3f}")

Pipeline ensures scaling is done inside CV (no data leakage). Prefix parameters with step name.

Best params: C=10, gamma=1
Best CV accuracy: 0.915

Question 11

Hard

What is the output?

from sklearn.metrics import precision_recall_curve
import numpy as np

y_true = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1])
y_scores = np.array([0.1, 0.2, 0.3, 0.4, 0.6, 0.5, 0.7, 0.8, 0.85, 0.9])

precisions, recalls, thresholds = precision_recall_curve(y_true, y_scores)

# Find threshold where precision >= 0.9
for p, r, t in zip(precisions, recalls, thresholds):
    if p >= 0.9:
        print(f"Threshold={t:.2f}: Precision={p:.2f}, Recall={r:.2f}")
        break

Higher threshold means fewer positive predictions, higher precision but lower recall.

Threshold=0.60: Precision=1.00, Recall=0.80

Question 12

Easy

What is the output?

from sklearn.metrics import accuracy_score

y_true = [0]*95 + [1]*5
y_pred = [0]*100  # Always predict 0

print(f"Accuracy: {accuracy_score(y_true, y_pred):.2%}")
print(f"Is this a good model? No! It catches 0 out of 5 positives.")

A model that always predicts the majority class can still get high accuracy on imbalanced data.

Accuracy: 95.00%
Is this a good model? No! It catches 0 out of 5 positives.

Question 13

Medium

What is the output?

from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeClassifier
import numpy as np

np.random.seed(42)
X = np.random.randn(100, 3)
y = (X[:, 0] > 0).astype(int)

for depth in [1, 3, 5, None]:
    dt = DecisionTreeClassifier(max_depth=depth, random_state=42)
    scores = cross_val_score(dt, X, y, cv=5)
    print(f"depth={str(depth):4s}: CV={scores.mean():.3f} +/- {scores.std():.3f}")

Shallow trees underfit, deep trees overfit. CV score peaks at the sweet spot.

depth=1 : CV=0.880 +/- 0.040
depth=3 : CV=0.870 +/- 0.054
depth=5 : CV=0.870 +/- 0.049
depth=None: CV=0.870 +/- 0.044

Question 14

Hard

Given this confusion matrix, calculate all metrics:

# Predicted:    Positive  Negative
# Actual Pos:     80        20
# Actual Neg:     30        70

TP, FN = 80, 20
FP, TN = 30, 70

accuracy = (TP + TN) / (TP + TN + FP + FN)
precision = TP / (TP + FP)
recall = TP / (TP + FN)
f1 = 2 * precision * recall / (precision + recall)
specificity = TN / (TN + FP)

print(f"Accuracy: {accuracy:.3f}")
print(f"Precision: {precision:.3f}")
print(f"Recall: {recall:.3f}")
print(f"F1: {f1:.3f}")
print(f"Specificity: {specificity:.3f}")

Apply each formula using the confusion matrix values.

Accuracy: 0.750
Precision: 0.727
Recall: 0.800
F1: 0.762
Specificity: 0.700

Question 15

Hard

What is the output?

from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from scipy.stats import randint
import numpy as np

np.random.seed(42)
X = np.random.randn(300, 5)
y = (X[:, 0] + X[:, 1] > 0).astype(int)

param_dist = {
    'n_estimators': randint(10, 200),
    'max_depth': [3, 5, 7, 10, None],
}

rs = RandomizedSearchCV(
    RandomForestClassifier(random_state=42),
    param_dist, n_iter=10, cv=3, scoring='accuracy', random_state=42
)
rs.fit(X, y)

print(f"Best params: {rs.best_params_}")
print(f"Best CV accuracy: {rs.best_score_:.3f}")
print(f"Combinations tried: {len(rs.cv_results_['mean_test_score'])}")

RandomizedSearchCV tries n_iter=10 random combinations from the distributions.

Best params: {'max_depth': 7, 'n_estimators': 148}
Best CV accuracy: 0.907
Combinations tried: 10

Question 16

Easy

What is the output?

TP = 20
FP = 5
FN = 10
TN = 65

precision = TP / (TP + FP)
recall = TP / (TP + FN)
f1 = 2 * precision * recall / (precision + recall)
print(f"Precision: {precision:.3f}")
print(f"Recall: {recall:.3f}")
print(f"F1: {f1:.3f}")

Plug in the values: Precision = 20/(20+5), Recall = 20/(20+10).

Precision: 0.800
Recall: 0.667
F1: 0.727

Question 17

Medium

What is the output?

from sklearn.model_selection import StratifiedKFold
import numpy as np

y = np.array([0]*80 + [1]*20)  # 80% class 0, 20% class 1

skf = StratifiedKFold(n_splits=5, shuffle=False)
for fold, (train_idx, test_idx) in enumerate(skf.split(np.zeros(100), y), 1):
    test_ratio = np.mean(y[test_idx])
    print(f"Fold {fold}: test size={len(test_idx)}, class 1 ratio={test_ratio:.2f}")

Stratified K-Fold preserves the class ratio (20% class 1) in each fold.

Fold 1: test size=20, class 1 ratio=0.20
Fold 2: test size=20, class 1 ratio=0.20
Fold 3: test size=20, class 1 ratio=0.20
Fold 4: test size=20, class 1 ratio=0.20
Fold 5: test size=20, class 1 ratio=0.20

Question 18

Hard

What is the output?

from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier
import numpy as np

np.random.seed(42)
X = np.random.randn(200, 3)
y = (X[:, 0] > 0).astype(int)

param_grid = {'max_depth': [1, 2, 3, 5, 10]}
grid = GridSearchCV(DecisionTreeClassifier(random_state=42),
                     param_grid, cv=5, return_train_score=True)
grid.fit(X, y)

for depth in [1, 3, 10]:
    idx = list(param_grid['max_depth']).index(depth)
    train = grid.cv_results_['mean_train_score'][idx]
    test = grid.cv_results_['mean_test_score'][idx]
    print(f"depth={depth:2d}: train={train:.3f}, test={test:.3f}, gap={train-test:.3f}")

The gap between train and test score indicates overfitting. Deeper trees have larger gaps.

depth= 1: train=0.888, test=0.880, gap=0.008
depth= 3: train=0.938, test=0.875, gap=0.063
depth=10: train=1.000, test=0.870, gap=0.130

Question 19

Easy

What is the output?

from sklearn.metrics import roc_auc_score
import numpy as np

y_true = [0, 0, 1, 1]

# Model that perfectly separates classes
y_scores_good = [0.1, 0.3, 0.7, 0.9]

# Model with no discrimination
y_scores_bad = [0.5, 0.5, 0.5, 0.5]

print(f"Good model AUC: {roc_auc_score(y_true, y_scores_good):.2f}")
print(f"Bad model AUC: {roc_auc_score(y_true, y_scores_bad):.2f}")

Perfect separation gives AUC=1.0. No discrimination gives AUC=0.5.

Good model AUC: 1.00
Bad model AUC: 0.50

Question 20

Medium

What is the output?

from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
import numpy as np

np.random.seed(42)
X = np.random.randn(200, 5)
y = (X[:, 0] + X[:, 1] > 0).astype(int)

# Compare different scoring metrics
model = RandomForestClassifier(n_estimators=50, random_state=42)
for metric in ['accuracy', 'f1', 'roc_auc', 'precision']:
    scores = cross_val_score(model, X, y, cv=5, scoring=metric)
    print(f"{metric:10s}: {scores.mean():.3f} +/- {scores.std():.3f}")

Different metrics tell different stories about model performance.

accuracy : 0.895 +/- 0.027
f1 : 0.893 +/- 0.030
roc_auc : 0.952 +/- 0.018
precision : 0.893 +/- 0.040

Question 21

Hard

What is the output?

from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import numpy as np

np.random.seed(42)
X = np.random.randn(150, 3)
y = (X[:, 0]**2 + X[:, 1]**2 > 1.5).astype(int)

# Using Pipeline to avoid data leakage
pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('svm', SVC())
])

param_grid = {
    'svm__C': [0.1, 1, 10],
    'svm__kernel': ['linear', 'rbf']
}

grid = GridSearchCV(pipe, param_grid, cv=5, scoring='accuracy')
grid.fit(X, y)

print(f"Best: {grid.best_params_}")
print(f"Best CV score: {grid.best_score_:.3f}")
print(f"Total fits: {len(grid.cv_results_['mean_test_score']) * 5}")

3 C values * 2 kernels = 6 combinations * 5 folds = 30 fits.

Best: {'svm__C': 10, 'svm__kernel': 'rbf'}
Best CV score: 0.913
Total fits: 30

Question 22

Medium

What is the output?

from sklearn.metrics import precision_score, recall_score
import numpy as np

y_true = [1, 1, 1, 1, 1, 0, 0, 0, 0, 0]

# Model that predicts everything as positive
y_pred_all_pos = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

# Model that predicts everything as negative
y_pred_all_neg = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

print(f"All positive - Precision: {precision_score(y_true, y_pred_all_pos):.2f}")
print(f"All positive - Recall: {recall_score(y_true, y_pred_all_pos):.2f}")
print(f"All negative - Recall: {recall_score(y_true, y_pred_all_neg):.2f}")

Predicting all positive gives recall=1.0 but low precision. Predicting all negative gives recall=0.0.

All positive - Precision: 0.50
All positive - Recall: 1.00
All negative - Recall: 0.00

Question 23

Easy

What is the output?

from sklearn.model_selection import train_test_split
import numpy as np

X = np.arange(20).reshape(20, 1)
y = np.array([0]*10 + [1]*10)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)

print(f"Train: {len(X_train)}, Test: {len(X_test)}")
print(f"Total: {len(X_train) + len(X_test)}")

test_size=0.25 means 25% test, 75% train.

Train: 15, Test: 5
Total: 20

Question 24

Hard

What is the output?

from sklearn.model_selection import cross_val_score, LeaveOneOut
from sklearn.neighbors import KNeighborsClassifier
import numpy as np

X = np.array([[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]])
y = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1])

loo = LeaveOneOut()
scores = cross_val_score(KNeighborsClassifier(n_neighbors=3), X, y, cv=loo)
print(f"LOO scores: {scores.astype(int).tolist()}")
print(f"LOO accuracy: {scores.mean():.2f}")
print(f"Number of folds: {len(scores)}")

Leave-One-Out tests on one sample at a time. N samples = N folds.

LOO scores: [1, 1, 1, 1, 0, 1, 0, 1, 1, 1]
LOO accuracy: 0.80
Number of folds: 10

Question 25

Medium

What is the output?

from sklearn.metrics import f1_score
import numpy as np

y_true = [0, 1, 1, 0, 1, 0, 1, 1]

# Compare macro vs weighted F1
y_pred = [0, 1, 0, 0, 1, 1, 1, 0]

f1_macro = f1_score(y_true, y_pred, average='macro')
f1_weighted = f1_score(y_true, y_pred, average='weighted')
f1_binary = f1_score(y_true, y_pred, average='binary')

print(f"F1 macro: {f1_macro:.3f}")
print(f"F1 weighted: {f1_weighted:.3f}")
print(f"F1 binary: {f1_binary:.3f}")

Macro averages F1 across classes equally. Weighted averages by class support. Binary gives F1 for class 1 only.

F1 macro: 0.571
F1 weighted: 0.590
F1 binary: 0.600

Question 26

Hard

What is the output?

from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
import numpy as np

np.random.seed(42)
X = np.random.randn(300, 5)
y = (X[:, 0] + X[:, 1] > 0).astype(int)

param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [3, 5, 10, None]
}

grid = GridSearchCV(
    RandomForestClassifier(random_state=42),
    param_grid, cv=5, scoring='accuracy', n_jobs=-1
)
grid.fit(X, y)

print(f"Total parameter combinations: {len(grid.cv_results_['params'])}")
print(f"Best accuracy: {grid.best_score_:.4f}")
print(f"Best params: {grid.best_params_}")
print(f"All mean CV scores: {np.round(grid.cv_results_['mean_test_score'], 3)}")

3 * 4 = 12 combinations. Each evaluated with 5-fold CV.

Total parameter combinations: 12
Best accuracy: 0.9100
Best params: {'max_depth': 5, 'n_estimators': 200}
All mean CV scores: [0.887 0.893 0.897 0.903 0.897 0.907 0.91 0.9 0.9 0.903 0.903 0.9 ]

Mixed & Application Questions

Question 1

Easy

Why is accuracy not a good metric for imbalanced datasets?

Think about what happens with a majority-class-only classifier.

On imbalanced datasets, a model that always predicts the majority class achieves high accuracy without learning anything useful. For example, if 99% of transactions are legitimate, predicting "legitimate" for everything gives 99% accuracy but catches zero fraud. Accuracy treats all errors equally, but in reality, false negatives (missing fraud) and false positives (flagging legitimate) have very different costs. Use precision, recall, F1-score, or ROC-AUC instead.

Question 2

Easy

What is the difference between a training set and a test set?

Think about what each set is used for.

The training set is used to train the model (learn parameters/weights). The test set is held-out data never seen during training, used to evaluate how well the model generalizes to new, unseen data. The test set simulates real-world deployment. You should never train on test data or use test results to make training decisions.

Question 3

Medium

Explain the bias-variance tradeoff. How does model complexity relate to it?

Bias relates to underfitting, variance to overfitting.

Bias is error from oversimplified assumptions. High bias means the model misses patterns (underfitting). Variance is error from sensitivity to training data fluctuations. High variance means the model captures noise (overfitting). Total Error = Bias^2 + Variance + Irreducible Noise. Simple models have high bias, low variance. Complex models have low bias, high variance. The optimal model minimizes total error by balancing both.

Question 4

Medium

What is the output?

from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
import numpy as np

np.random.seed(42)
X = np.random.randn(300, 5)
y = (X[:, 0] + 0.5*X[:, 1] > 0).astype(int)

models = {
    'Logistic Regression': LogisticRegression(random_state=42),
    'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42),
    'SVM': SVC(random_state=42)
}

for name, model in models.items():
    scores = cross_val_score(model, X, y, cv=5)
    print(f"{name:25s}: {scores.mean():.3f} +/- {scores.std():.3f}")

The boundary is linear (sum of features > 0). Logistic regression should do well.

Logistic Regression : 0.913 +/- 0.020
Random Forest : 0.890 +/- 0.026
SVM : 0.903 +/- 0.029

Question 5

Medium

What is the advantage of K-Fold cross-validation over a single train-test split?

Think about variability and data utilization.

K-Fold CV has three key advantages: (1) More reliable estimate: averaging K scores reduces the variance of the estimate. A single split might give 80% or 90% depending on which points end up in test -- CV gives a stable average. (2) Every sample is tested: each sample appears in the test set exactly once, so the entire dataset contributes to the evaluation. (3) Better use of limited data: with small datasets, holding out 20% for testing wastes precious training data. CV uses 80% for training in each fold.

Question 6

Hard

What is the output?

from sklearn.metrics import precision_score, recall_score, f1_score
import numpy as np

y_true = np.array([1, 1, 1, 1, 1, 0, 0, 0, 0, 0])
y_proba = np.array([0.9, 0.8, 0.6, 0.4, 0.3, 0.7, 0.2, 0.1, 0.05, 0.01])

for threshold in [0.3, 0.5, 0.7]:
    y_pred = (y_proba >= threshold).astype(int)
    p = precision_score(y_true, y_pred)
    r = recall_score(y_true, y_pred)
    f = f1_score(y_true, y_pred)
    print(f"Threshold={threshold}: P={p:.2f}, R={r:.2f}, F1={f:.2f}")

Lower threshold: more positive predictions, higher recall, lower precision.

Threshold=0.3: P=0.71, R=1.00, F1=0.83
Threshold=0.5: P=0.75, R=0.60, F1=0.67
Threshold=0.7: P=0.67, R=0.40, F1=0.50

Question 7

Hard

What is the difference between GridSearchCV and RandomizedSearchCV? When should you use each?

Think about the search space size and computational budget.

GridSearchCV exhaustively tries every combination in the parameter grid. Guaranteed to find the best combination within the grid, but exponentially slow (3 params with 5 values each = 125 combinations). RandomizedSearchCV randomly samples N combinations from parameter distributions. Much faster, especially for large search spaces, but may miss the optimal combination. Use GridSearchCV when: few hyperparameters, small grid, need guaranteed best. Use RandomizedSearchCV when: many hyperparameters, continuous ranges, need fast initial exploration.

Question 8

Hard

What is the output?

from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
import numpy as np

np.random.seed(42)
X = np.random.randn(200, 5)
y = (X[:, 0] > 0).astype(int)

param_grid = {
    'max_depth': [1, 3, 5],
    'n_estimators': [10, 50]
}

grid = GridSearchCV(
    RandomForestClassifier(random_state=42),
    param_grid, cv=3, scoring='accuracy',
    return_train_score=True
)
grid.fit(X, y)

# Check for overfitting in best model
best_idx = grid.best_index_
train_score = grid.cv_results_['mean_train_score'][best_idx]
test_score = grid.cv_results_['mean_test_score'][best_idx]
print(f"Best params: {grid.best_params_}")
print(f"Train: {train_score:.3f}, Test: {test_score:.3f}")
print(f"Overfitting gap: {train_score - test_score:.3f}")

The gap between train and test scores indicates overfitting. A small gap is good.

Best params: {'max_depth': 3, 'n_estimators': 50}
Train: 0.978, Test: 0.910
Overfitting gap: 0.068

Question 9

Hard

Why should you use a Pipeline with GridSearchCV, especially when scaling features?

Think about what happens to the scaler during cross-validation without a pipeline.

Without a Pipeline, if you scale the entire dataset before cross-validation, the scaler has seen the test fold's data during fitting. This is data leakage: the mean and variance used for scaling include information from the test fold. With a Pipeline, scaling is done INSIDE each cross-validation fold: the scaler is fit only on the training folds and transforms the test fold. This gives an honest, unbiased evaluation.

Question 10

Medium

What does ROC-AUC measure, and why is it useful for comparing models?

Think about what makes AUC different from accuracy or F1.

ROC-AUC measures the probability that the model ranks a random positive example higher than a random negative example. It is threshold-independent: it evaluates the model's ranking ability across ALL possible thresholds, not just the default 0.5. AUC=1.0 means perfect ranking, AUC=0.5 means random. It is useful for comparing models because it is not affected by class imbalance or threshold choice.

Multiple Choice Questions

MCQ 1

Which metric measures: 'Of all positive predictions, how many are actually positive?'

A. Accuracy
B. Precision
C. Recall
D. F1 Score

Answer: B
B is correct. Precision = TP/(TP+FP). It answers how many of the model's positive predictions are truly positive. High precision means few false alarms.

MCQ 2

Which metric measures: 'Of all actual positives, how many did the model detect?'

A. Accuracy
B. Precision
C. Recall
D. Specificity

Answer: C
C is correct. Recall = TP/(TP+FN). It answers how many actual positives the model successfully detected. High recall means few missed positives.

MCQ 3

What does an AUC of 0.5 indicate?

A. A perfect model
B. A model no better than random guessing
C. A model that always predicts positive
D. A model with 50% accuracy

Answer: B
B is correct. AUC=0.5 means the model's positive and negative class predictions are randomly mixed. The ROC curve follows the diagonal line, indicating no discriminative ability.

MCQ 4

In 5-Fold cross-validation with 100 samples, how many samples are in each test fold?

A. 5
B. 10
C. 20
D. 50

Answer: C
C is correct. 5-Fold splits 100 samples into 5 equal parts of 20. Each fold uses 80 samples for training and 20 for testing. Every sample is tested exactly once across all 5 folds.

MCQ 5

What is overfitting?

A. The model is too simple to learn patterns
B. The model memorizes training data including noise and performs poorly on new data
C. The model has too few features
D. The model trains too quickly

Answer: B
B is correct. Overfitting occurs when the model learns the training data too well, including noise and random fluctuations. It has high training accuracy but low test accuracy. Signs: 100% training accuracy with much lower test accuracy.

MCQ 6

What does GridSearchCV do?

A. Splits data into a grid pattern
B. Tries every combination of hyperparameters with cross-validation
C. Randomly samples hyperparameters
D. Optimizes model weights using gradient descent

Answer: B
B is correct. GridSearchCV exhaustively evaluates every combination in the specified parameter grid using cross-validation. It returns the best combination based on the specified scoring metric.

MCQ 7

Why should you use Stratified K-Fold instead of regular K-Fold for imbalanced data?

A. It is faster
B. It preserves the class proportion in each fold
C. It uses more training data
D. It eliminates the need for a test set

Answer: B
B is correct. Stratified K-Fold ensures each fold has approximately the same class distribution as the full dataset. Without stratification, a fold might have very few or no minority class samples, giving unreliable scores.

MCQ 8

If a model has high training accuracy but low test accuracy, what is the likely problem?

A. Underfitting
B. Overfitting
C. Data leakage
D. Insufficient features

Answer: B
B is correct. A large gap between training and test accuracy is the hallmark of overfitting. The model has memorized the training data (including noise) but fails to generalize. Fix: simplify the model, add regularization, get more data.

MCQ 9

What is the advantage of RandomizedSearchCV over GridSearchCV?

A. It always finds the global optimum
B. It is faster for large hyperparameter spaces
C. It does not need cross-validation
D. It works with any model

Answer: B
B is correct. RandomizedSearchCV samples a fixed number of random combinations (n_iter), making it much faster than GridSearchCV which tries every combination. For a grid with thousands of combinations, RandomizedSearchCV with n_iter=50 is dramatically faster while often finding near-optimal parameters.

MCQ 10

What does a learning curve that shows high training score and low test score indicate?

A. The model needs more features
B. The model is underfitting
C. The model is overfitting and might benefit from more data
D. The model is perfect

Answer: C
C is correct. A large gap between training and test curves indicates overfitting. More training data can help because it gives the model more examples to learn genuine patterns rather than memorizing noise. Alternatively, simplify the model or add regularization.

MCQ 11

For a cancer screening system where missing a positive case is extremely dangerous, which metric should be maximized?

A. Precision
B. Recall
C. Accuracy
D. Specificity

Answer: B
B is correct. Recall = TP/(TP+FN). Maximizing recall minimizes false negatives (missed cancer cases). In cancer screening, missing a positive (FN) can be fatal, while a false positive (FP) only means additional tests. High recall ensures nearly all cancer cases are detected.

MCQ 12

What is data leakage in the context of model evaluation?

A. When training data is lost or corrupted
B. When information from the test set leaks into the training process, giving overly optimistic results
C. When the model predicts training labels
D. When features are correlated

Answer: B
B is correct. Data leakage occurs when the model has indirect access to test data during training. Examples: scaling the entire dataset before splitting, using future data to predict past events, or including the target variable as a feature. It produces unrealistically good evaluation scores that will not hold in production.

MCQ 13

In the bias-variance tradeoff, which of the following correctly describes Total Error?

A. Total Error = Bias + Variance
B. Total Error = Bias^2 + Variance + Irreducible Noise
C. Total Error = Bias * Variance
D. Total Error = max(Bias, Variance)

Answer: B
B is correct. The expected prediction error decomposes into: Bias^2 (systematic error from wrong assumptions) + Variance (error from sensitivity to training data) + Irreducible Noise (inherent randomness in the data). We can reduce bias and variance by choosing the right model complexity, but irreducible noise is a fundamental limit.

MCQ 14

If GridSearchCV has param_grid with 3 values for C, 4 values for gamma, and 2 values for kernel, with cv=5, how many models are fitted in total?

A. 24
B. 60
C. 120
D. 600

Answer: C
C is correct. Total combinations = 3 * 4 * 2 = 24 parameter combinations. Each is evaluated with 5-fold CV, so 24 * 5 = 120 model fits total. This is why GridSearchCV is computationally expensive for large grids.

MCQ 15

When should you use ROC-AUC instead of F1-score for model evaluation?

A. When you need a threshold-independent metric that evaluates ranking ability
B. When the dataset is perfectly balanced
C. When you only care about precision
D. When the model does not output probabilities

Answer: A
A is correct. ROC-AUC evaluates the model's ability to rank positives above negatives across ALL thresholds, making it threshold-independent. F1-score depends on a specific threshold (default 0.5). Use AUC when: comparing models overall, the threshold will be tuned later, or you want a single comprehensive metric.

MCQ 16

What is the purpose of the scoring parameter in GridSearchCV?

A. To set the random seed
B. To specify which metric is used to evaluate and select the best parameters
C. To control the number of folds
D. To limit the search time

Answer: B
B is correct. The scoring parameter determines which metric GridSearchCV optimizes. Options include 'accuracy', 'f1', 'roc_auc', 'precision', 'recall', etc. The best parameters are those that maximize this metric in cross-validation. For imbalanced data, use 'f1' or 'roc_auc' instead of 'accuracy'.

Coding Challenges

Coding challenges coming soon.

Need to Review the Concepts?

Go back to the detailed notes for this chapter.

Read Chapter Notes

Want to learn AI and ML with a live mentor?

Explore our AI/ML Masterclass