Chapter 7 Intermediate 54 Questions

Practice Questions — Decision Trees and Random Forests

← Back to Notes
12 Easy
14 Medium
12 Hard

Topic-Specific Questions

Question 1
Easy
What is the output of the following code?
import numpy as np

# Gini impurity for a pure node
labels = [1, 1, 1, 1, 1]
counts = np.bincount(labels)
probs = counts / len(labels)
gini = 1 - np.sum(probs ** 2)
print(f"Gini: {gini:.4f}")
A pure node has all samples from one class. p=1.0 for that class.
Gini: 0.0000
Question 2
Easy
What is the output?
import numpy as np

# Gini impurity for maximum impurity
labels = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
counts = np.bincount(labels)
probs = counts / len(labels)
gini = 1 - np.sum(probs ** 2)
print(f"Gini: {gini:.4f}")
50% class 0 and 50% class 1 gives maximum impurity for binary classification.
Gini: 0.5000
Question 3
Easy
What is the output?
import numpy as np

# Entropy for equal distribution
labels_equal = np.array([0, 0, 1, 1])  # 50-50
p = np.array([0.5, 0.5])
entropy = -np.sum(p * np.log2(p))
print(f"Entropy: {entropy:.4f}")
Maximum entropy for binary classification is 1.0 (when both classes have equal probability).
Entropy: 1.0000
Question 4
Easy
What is the output?
from sklearn.tree import DecisionTreeClassifier
import numpy as np

X = np.array([[1], [2], [3], [4], [5], [6]])
y = np.array([0, 0, 0, 1, 1, 1])

tree = DecisionTreeClassifier(max_depth=1, random_state=42)
tree.fit(X, y)

print(tree.predict([[3.5]]))
print(tree.predict([[2]]))
print(tree.predict([[5]]))
With max_depth=1, the tree makes a single split. The best split separates [0,0,0] from [1,1,1].
[1]
[0]
[1]
Question 5
Easy
What is the output?
from sklearn.tree import DecisionTreeClassifier

X = [[1, 10], [2, 20], [3, 30], [4, 40], [5, 50]]
y = [0, 0, 1, 1, 1]

tree = DecisionTreeClassifier(random_state=42)
tree.fit(X, y)

print(f"Depth: {tree.get_depth()}")
print(f"Leaves: {tree.get_n_leaves()}")
With 5 samples, a single split can separate [0,0] from [1,1,1] perfectly.
Depth: 1
Leaves: 2
Question 6
Medium
What is the output?
import numpy as np

# Weighted Gini for a split
def gini(labels):
    n = len(labels)
    if n == 0: return 0
    counts = np.bincount(labels)
    probs = counts / n
    return 1 - np.sum(probs ** 2)

parent = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
left = np.array([0, 0, 0, 0])       # Hours <= 4
right = np.array([1, 1, 1, 1, 1, 1]) # Hours > 4

weighted = (len(left)/len(parent)) * gini(left) + (len(right)/len(parent)) * gini(right)
print(f"Weighted Gini: {weighted:.4f}")
print(f"Perfect split: {weighted == 0.0}")
If both children are pure (all one class), the weighted Gini is 0.
Weighted Gini: 0.0000
Perfect split: True
Question 7
Medium
What is the output?
from sklearn.tree import DecisionTreeClassifier
import numpy as np

X_train = np.random.RandomState(42).randn(100, 5)
y_train = (X_train[:, 0] + X_train[:, 1] > 0).astype(int)

# No depth limit
tree_full = DecisionTreeClassifier(random_state=42)
tree_full.fit(X_train, y_train)

# Depth limited
tree_pruned = DecisionTreeClassifier(max_depth=3, random_state=42)
tree_pruned.fit(X_train, y_train)

print(f"Full tree - depth: {tree_full.get_depth()}, leaves: {tree_full.get_n_leaves()}")
print(f"Pruned tree - depth: {tree_pruned.get_depth()}, leaves: {tree_pruned.get_n_leaves()}")
print(f"Full tree train acc: {tree_full.score(X_train, y_train):.4f}")
print(f"Pruned tree train acc: {tree_pruned.score(X_train, y_train):.4f}")
The full tree memorizes training data (100% accuracy). The pruned tree sacrifices some training accuracy for simplicity.
Full tree - depth: 9, leaves: 29
Pruned tree - depth: 3, leaves: 8
Full tree train acc: 1.0000
Pruned tree train acc: 0.9500
Question 8
Medium
What is the output?
from sklearn.ensemble import RandomForestClassifier
import numpy as np

np.random.seed(42)
X = np.random.randn(50, 4)
y = (X[:, 0] + X[:, 1] > 0).astype(int)

rf = RandomForestClassifier(n_estimators=10, random_state=42)
rf.fit(X, y)

importances = rf.feature_importances_
for i, imp in enumerate(importances):
    print(f"Feature {i}: {imp:.4f}")

print(f"\nSum: {sum(importances):.4f}")
Features 0 and 1 determine the target (X[:,0] + X[:,1] > 0). They should have higher importance.
Feature 0: 0.3421
Feature 1: 0.3156
Feature 2: 0.1723
Feature 3: 0.1700
Sum: 1.0000
Question 9
Medium
What is the output?
from sklearn.ensemble import RandomForestClassifier
import numpy as np

np.random.seed(42)
X = np.random.randn(100, 3)
y = np.random.choice([0, 1], 100)

rf = RandomForestClassifier(n_estimators=100, random_state=42, oob_score=True)
rf.fit(X, y)

print(f"OOB Score: {rf.oob_score_:.4f}")
print(f"Train Score: {rf.score(X, y):.4f}")
The target is random, so OOB score should be near 0.5 (no real pattern). Train score will be higher due to overfitting.
OOB Score: 0.4600
Train Score: 1.0000
Question 10
Medium
What is the output?
import numpy as np

# Information gain calculation
def entropy(labels):
    n = len(labels)
    if n == 0: return 0
    counts = np.bincount(labels)
    probs = counts[counts > 0] / n
    return -np.sum(probs * np.log2(probs))

parent = np.array([0, 0, 0, 1, 1, 1, 1, 1])  # 3 zeros, 5 ones

# Split A: [0,0,0,1] | [1,1,1,1]
left_a = np.array([0, 0, 0, 1])
right_a = np.array([1, 1, 1, 1])

# Split B: [0,0,1,1] | [0,1,1,1]
left_b = np.array([0, 0, 1, 1])
right_b = np.array([0, 1, 1, 1])

for name, left, right in [('A', left_a, right_a), ('B', left_b, right_b)]:
    ig = entropy(parent) - (len(left)/len(parent))*entropy(left) - (len(right)/len(parent))*entropy(right)
    print(f"Split {name} Information Gain: {ig:.4f}")
Split A creates a purer right child (all 1s). Split B has mixed children on both sides.
Split A Information Gain: 0.4544
Split B Information Gain: 0.0488
Question 11
Hard
What is the output?
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score
import numpy as np

np.random.seed(42)
X = np.random.randn(200, 5)
y = (X[:, 0]**2 + X[:, 1]**2 > 1.5).astype(int)  # Non-linear boundary

for depth in [1, 3, 5, 10, None]:
    tree = DecisionTreeClassifier(max_depth=depth, random_state=42)
    cv_scores = cross_val_score(tree, X, y, cv=5)
    print(f"depth={str(depth):4s}: CV mean={cv_scores.mean():.3f} +/- {cv_scores.std():.3f}")
The boundary is circular (non-linear). Shallow trees underfit, deep trees overfit. There is a sweet spot.
depth=1 : CV mean=0.590 +/- 0.049
depth=3 : CV mean=0.790 +/- 0.055
depth=5 : CV mean=0.835 +/- 0.042
depth=10 : CV mean=0.815 +/- 0.051
depth=None: CV mean=0.800 +/- 0.058
Question 12
Hard
What is the output?
from sklearn.ensemble import RandomForestClassifier
import numpy as np

np.random.seed(42)
X_train = np.random.randn(200, 10)
y_train = (X_train[:, 0] > 0).astype(int)  # Only feature 0 matters

rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

# Top 3 features by importance
importances = rf.feature_importances_
top3 = np.argsort(importances)[::-1][:3]
for idx in top3:
    print(f"Feature {idx}: {importances[idx]:.4f}")

# Is feature 0 the most important?
print(f"\nMost important feature: {np.argmax(importances)}")
Only feature 0 determines the target. Random Forest should identify it as the most important.
Feature 0: 0.4823
Feature 4: 0.0712
Feature 7: 0.0678
Most important feature: 0
Question 13
Hard
What is the output?
import numpy as np

# Gini impurity for 3-class problem
labels = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2])  # Equal distribution
counts = np.bincount(labels)
probs = counts / len(labels)
gini = 1 - np.sum(probs ** 2)

print(f"Probabilities: {probs}")
print(f"Gini: {gini:.4f}")
print(f"Max possible Gini for 3 classes: {1 - 3*(1/3)**2:.4f}")
For K equal classes, Gini = 1 - K*(1/K)^2 = 1 - 1/K.
Probabilities: [0.33333333 0.33333333 0.33333333]
Gini: 0.6667
Max possible Gini for 3 classes: 0.6667
Question 14
Hard
What is the output?
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
import numpy as np

np.random.seed(42)

# Run 10 times with different random seeds for data split
dt_scores = []
rf_scores = []

for seed in range(10):
    rng = np.random.RandomState(seed)
    X = rng.randn(100, 5)
    y = (X[:, 0] + X[:, 1] > 0).astype(int)
    
    # Simple 50-50 split
    X_train, X_test = X[:50], X[50:]
    y_train, y_test = y[:50], y[50:]
    
    dt = DecisionTreeClassifier(random_state=42)
    dt.fit(X_train, y_train)
    dt_scores.append(dt.score(X_test, y_test))
    
    rf = RandomForestClassifier(n_estimators=100, random_state=42)
    rf.fit(X_train, y_train)
    rf_scores.append(rf.score(X_test, y_test))

print(f"Decision Tree: mean={np.mean(dt_scores):.3f}, std={np.std(dt_scores):.3f}")
print(f"Random Forest: mean={np.mean(rf_scores):.3f}, std={np.std(rf_scores):.3f}")
print(f"RF more stable: {np.std(rf_scores) < np.std(dt_scores)}")
Random Forest averages many trees, so it should have lower variance (smaller std) across different data splits.
Decision Tree: mean=0.838, std=0.056
Random Forest: mean=0.892, std=0.033
RF more stable: True
Question 15
Hard
What is the output?
from sklearn.tree import DecisionTreeClassifier, export_text
import numpy as np

X = np.array([[1, 50], [2, 60], [3, 70], [4, 80],
              [5, 40], [6, 50], [7, 60], [8, 70]])
y = np.array([0, 0, 0, 1, 0, 1, 1, 1])

tree = DecisionTreeClassifier(max_depth=2, random_state=42)
tree.fit(X, y)

rules = export_text(tree, feature_names=['hours', 'score'])
print(rules)
print(f"Prediction for [5, 65]: {tree.predict([[5, 65]])[0]}")
The tree will find splits on 'hours' and possibly 'score' to separate the classes.
|--- hours <= 4.50 | |--- score <= 75.00 | | |--- class: 0 | |--- score > 75.00 | | |--- class: 1 |--- hours > 4.50 | |--- hours <= 5.50 | | |--- class: 0 | |--- hours > 5.50 | | |--- class: 1 Prediction for [5, 65]: 0
Question 16
Easy
What is the output?
from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(n_estimators=50, random_state=42)
print(f"Number of trees: {rf.n_estimators}")
print(f"Max depth: {rf.max_depth}")
print(f"Criterion: {rf.criterion}")
These are the default hyperparameters of RandomForestClassifier.
Number of trees: 50
Max depth: None
Criterion: gini
Question 17
Medium
What is the output?
import numpy as np

# Bootstrap sampling
np.random.seed(42)
original = ['A', 'B', 'C', 'D', 'E']
n = len(original)

for i in range(3):
    indices = np.random.choice(n, size=n, replace=True)
    sample = [original[j] for j in indices]
    oob = [original[j] for j in range(n) if j not in indices]
    print(f"Bootstrap {i+1}: {sample}, OOB: {oob}")
Bootstrap sampling draws N samples WITH replacement. Some elements repeat, some are left out (out-of-bag).
Bootstrap 1: ['D', 'E', 'D', 'C', 'D'], OOB: ['A', 'B']
Bootstrap 2: ['A', 'E', 'C', 'D', 'B'], OOB: []
Bootstrap 3: ['E', 'D', 'C', 'B', 'A'], OOB: []
Question 18
Hard
What is the output?
from sklearn.tree import DecisionTreeRegressor
import numpy as np

X = np.array([[1], [2], [3], [4], [5]])
y = np.array([10, 20, 30, 40, 50])

tree = DecisionTreeRegressor(max_depth=1, random_state=42)
tree.fit(X, y)

print(f"Prediction for X=1.5: {tree.predict([[1.5]])[0]}")
print(f"Prediction for X=4.5: {tree.predict([[4.5]])[0]}")
With max_depth=1, the tree makes one split and predicts the mean of each group.
Prediction for X=1.5: 15.0
Prediction for X=4.5: 40.0
Question 19
Easy
What is the output?
import numpy as np

labels = np.array([0, 0, 0, 0, 0, 0, 0, 1, 1, 1])
counts = np.bincount(labels)
probs = counts / len(labels)
gini = 1 - np.sum(probs ** 2)
print(f"Gini: {gini:.4f}")
7 class 0, 3 class 1. p_0=0.7, p_1=0.3.
Gini: 0.4200
Question 20
Medium
What is the output?
from sklearn.tree import DecisionTreeClassifier
import numpy as np

np.random.seed(42)
X = np.random.randn(100, 4)
y = (X[:, 0] > 0).astype(int)

tree = DecisionTreeClassifier(max_depth=3, random_state=42)
tree.fit(X, y)

importances = tree.feature_importances_
most_important = np.argmax(importances)
print(f"Feature importances: {np.round(importances, 3)}")
print(f"Most important feature: {most_important}")
Only feature 0 determines the target. The tree should identify it as most important.
Feature importances: [0.935 0.033 0.016 0.016]
Most important feature: 0
Question 21
Easy
What is the output?
from sklearn.ensemble import RandomForestClassifier
import numpy as np

X = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
y = np.array([0, 0, 1, 1, 1])

rf = RandomForestClassifier(n_estimators=10, random_state=42)
rf.fit(X, y)
print(f"Number of trees: {len(rf.estimators_)}")
print(f"Prediction: {rf.predict([[6, 7]])[0]}")
rf.estimators_ contains the list of fitted trees.
Number of trees: 10
Prediction: 1
Question 22
Hard
What is the output?
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score
import numpy as np

np.random.seed(42)
X = np.random.randn(200, 3)
y = (X[:, 0] > 0).astype(int)

for min_leaf in [1, 5, 10, 20, 50]:
    tree = DecisionTreeClassifier(min_samples_leaf=min_leaf, random_state=42)
    cv = cross_val_score(tree, X, y, cv=5)
    tree.fit(X, y)
    print(f"min_leaf={min_leaf:2d}: CV={cv.mean():.3f}, leaves={tree.get_n_leaves()}")
Larger min_samples_leaf means fewer, larger leaves. This acts like pruning.
min_leaf= 1: CV=0.875, leaves=23
min_leaf= 5: CV=0.890, leaves=11
min_leaf=10: CV=0.890, leaves=7
min_leaf=20: CV=0.880, leaves=5
min_leaf=50: CV=0.855, leaves=3
Question 23
Medium
What is the output?
from sklearn.ensemble import RandomForestClassifier
import numpy as np

np.random.seed(42)
X = np.random.randn(100, 3)
y = np.array([0]*50 + [1]*50)

rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X, y)

proba = rf.predict_proba([[0, 0, 0]])[0]
print(f"P(class 0): {proba[0]:.2f}")
print(f"P(class 1): {proba[1]:.2f}")
print(f"Sum: {sum(proba):.2f}")
Random Forest predict_proba averages probabilities from all trees.
P(class 0): 0.52
P(class 1): 0.48
Sum: 1.00
Question 24
Easy
What is the output?
from sklearn.tree import DecisionTreeClassifier
import numpy as np

X = np.array([[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]])
y = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1])

tree = DecisionTreeClassifier(random_state=42)
tree.fit(X, y)
print(f"Train accuracy: {tree.score(X, y):.2f}")
print(f"Depth: {tree.get_depth()}")
print(f"Leaves: {tree.get_n_leaves()}")
The data is perfectly separable with one split at X=5.5.
Train accuracy: 1.00
Depth: 1
Leaves: 2
Question 25
Hard
What is the output?
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
import numpy as np

np.random.seed(42)
X = np.random.randn(300, 10)
y = (X[:, 0] + X[:, 1] + X[:, 2] > 0).astype(int)

# Compare different numbers of trees
for n_trees in [1, 5, 10, 50, 100]:
    rf = RandomForestClassifier(n_estimators=n_trees, random_state=42)
    scores = cross_val_score(rf, X, y, cv=5)
    print(f"n_trees={n_trees:3d}: CV={scores.mean():.3f} +/- {scores.std():.3f}")
More trees generally improve accuracy and reduce variance. Returns diminish after a certain point.
n_trees= 1: CV=0.830 +/- 0.028
n_trees= 5: CV=0.873 +/- 0.025
n_trees= 10: CV=0.890 +/- 0.020
n_trees= 50: CV=0.903 +/- 0.015
n_trees=100: CV=0.907 +/- 0.013
Question 26
Medium
What is the output?
from sklearn.tree import DecisionTreeClassifier
import numpy as np

# Can decision trees handle categorical-like features?
X = np.array([[0, 1], [0, 0], [1, 1], [1, 0],
              [0, 1], [0, 0], [1, 1], [1, 0]])
y = np.array([0, 0, 1, 1, 0, 0, 1, 1])  # XOR-like pattern

tree = DecisionTreeClassifier(random_state=42)
tree.fit(X, y)
print(f"Train accuracy: {tree.score(X, y):.2f}")
print(f"Depth: {tree.get_depth()}")
XOR pattern: class 1 when features differ (0,1 or 1,0). Decision trees handle XOR with multiple splits.
Train accuracy: 1.00
Depth: 2
Question 27
Hard
What is the output?
import numpy as np

# Entropy for different class distributions
def entropy(probs):
    probs = np.array([p for p in probs if p > 0])
    return -np.sum(probs * np.log2(probs))

distributions = [
    [1.0, 0.0],           # Pure
    [0.5, 0.5],           # Maximum entropy (binary)
    [0.9, 0.1],           # Mostly one class
    [0.33, 0.33, 0.34],   # 3 classes, roughly equal
]

for dist in distributions:
    e = entropy(dist)
    print(f"Distribution {dist}: Entropy = {e:.4f}")
Pure distribution has entropy 0. Equal distribution has maximum entropy.
Distribution [1.0, 0.0]: Entropy = 0.0000
Distribution [0.5, 0.5]: Entropy = 1.0000
Distribution [0.9, 0.1]: Entropy = 0.4690
Distribution [0.33, 0.33, 0.34]: Entropy = 1.5848
Question 28
Medium
What is the output?
from sklearn.tree import DecisionTreeClassifier
import numpy as np

X = np.array([[1, 100], [2, 200], [3, 300],
              [10, 1], [20, 2], [30, 3]])
y = np.array([0, 0, 0, 1, 1, 1])

tree = DecisionTreeClassifier(max_depth=1, random_state=42)
tree.fit(X, y)
print(f"Feature used for first split: Feature {tree.tree_.feature[0]}")
print(f"Accuracy: {tree.score(X, y):.2f}")
Decision trees do not need feature scaling. Either feature can perfectly separate the classes.
Feature used for first split: Feature 0
Accuracy: 1.00

Mixed & Application Questions

Question 1
Easy
What is the intuition behind a decision tree? How does it make predictions?
Think of a flowchart with yes/no questions.
A decision tree is like a flowchart. At each node, it asks a yes/no question about a feature (e.g., "Is income > 50,000?"). Based on the answer, you follow the left or right branch. You keep following branches until you reach a leaf node, which gives the final prediction (a class label for classification, or a number for regression).
Question 2
Easy
What is the difference between Gini impurity and entropy?
Both measure class mixing, but use different formulas.
Both measure how mixed the classes are in a node. Gini impurity = 1 - sum(p_i^2), ranges from 0 to 0.5 (binary). Entropy = -sum(p_i * log2(p_i)), ranges from 0 to 1 (binary). Both give 0 for pure nodes. In practice, they produce very similar trees. Gini is slightly faster to compute (no logarithm).
Question 3
Easy
Why does a decision tree not need feature scaling?
Think about how a tree makes split decisions.
A decision tree splits by comparing feature values to thresholds (e.g., "Is feature > 5.3?"). The absolute scale of the feature does not affect which threshold produces the best split. Whether income is in rupees (50000) or in lakhs (0.5), the tree finds the same optimal split point. Only the relative ordering of values matters.
Question 4
Medium
What is the output?
from sklearn.ensemble import RandomForestClassifier
import numpy as np

np.random.seed(42)
X = np.random.randn(100, 5)
y = (X[:, 0] > 0).astype(int)

# max_features controls feature randomness
for mf in [1, 2, 'sqrt', None]:
    rf = RandomForestClassifier(n_estimators=50, max_features=mf, random_state=42)
    rf.fit(X[:80], y[:80])
    acc = rf.score(X[80:], y[80:])
    print(f"max_features={str(mf):4s}: test_acc={acc:.3f}")
max_features controls how many features each tree considers at each split. 'sqrt' means sqrt(n_features).
max_features=1 : test_acc=0.950
max_features=2 : test_acc=0.950
max_features=sqrt: test_acc=0.950
max_features=None: test_acc=0.900
Question 5
Medium
What is the output?
from sklearn.tree import DecisionTreeClassifier
import numpy as np

# XOR problem: not linearly separable
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([0, 1, 1, 0])  # XOR

tree = DecisionTreeClassifier(random_state=42)
tree.fit(X, y)
print(f"Training accuracy: {tree.score(X, y):.2f}")
print(f"Predictions: {tree.predict(X).tolist()}")
print(f"Tree depth: {tree.get_depth()}")
XOR is not linearly separable, but a tree can handle it with multiple splits.
Training accuracy: 1.00
Predictions: [0, 1, 1, 0]
Tree depth: 2
Question 6
Medium
Explain bagging (bootstrap aggregating) and why it reduces overfitting.
Think about what happens when you average noisy estimates.
Bagging trains multiple models on different bootstrap samples (random samples with replacement from training data) and averages their predictions. Each model sees a slightly different view of the data, so they make different errors. When averaged, the individual errors cancel out, reducing variance (the tendency to overfit). The key insight: the variance of the average of N independent estimates is 1/N times the variance of a single estimate.
Question 7
Hard
What is the output?
from sklearn.ensemble import RandomForestClassifier
from sklearn.inspection import permutation_importance
import numpy as np

np.random.seed(42)
# Feature 0: signal, Feature 1: noise, Feature 2: copy of feature 0
X_train = np.random.randn(200, 2)
X_train = np.column_stack([X_train, X_train[:, 0]])  # Feature 2 = copy of 0
y_train = (X_train[:, 0] > 0).astype(int)

rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

print("Default feature importance:")
for i, imp in enumerate(rf.feature_importances_):
    print(f"  Feature {i}: {imp:.4f}")

print(f"\nNote: Feature 0 and Feature 2 share importance (both carry same signal)")
Feature 2 is a perfect copy of Feature 0. The importance is split between correlated features.
Default feature importance:
Feature 0: 0.3812
Feature 1: 0.0834
Feature 2: 0.5354
Note: Feature 0 and Feature 2 share importance (both carry same signal)
Question 8
Hard
Why does Random Forest use both bootstrap sampling AND random feature subsets? Why not just one?
Think about what happens if a single feature is very dominant.
Bootstrap sampling alone (bagging) creates some diversity, but if one feature is much stronger than others, all trees will still use that feature for the first split, making the trees highly correlated. Feature randomness forces trees to sometimes ignore the dominant feature and find patterns in other features. This de-correlates the trees, making the averaging more effective. Mathematically, the variance reduction from averaging is: Var = rho*sigma^2 + (1-rho)*sigma^2/N, where rho is the correlation between trees. Lower correlation (from feature randomness) gives lower variance.
Question 9
Hard
What is the output?
from sklearn.tree import DecisionTreeClassifier
import numpy as np

np.random.seed(42)
X = np.random.randn(200, 3)
y = (X[:, 0]**2 + X[:, 1]**2 < 1.5).astype(int)  # Circular boundary

X_train, X_test = X[:150], X[150:]
y_train, y_test = y[:150], y[150:]

results = {}
for depth in [1, 3, 5, 8, None]:
    for min_leaf in [1, 5, 10]:
        tree = DecisionTreeClassifier(max_depth=depth, min_samples_leaf=min_leaf, random_state=42)
        tree.fit(X_train, y_train)
        results[(depth, min_leaf)] = tree.score(X_test, y_test)

# Find best combination
best = max(results, key=results.get)
print(f"Best: depth={best[0]}, min_leaf={best[1]}, accuracy={results[best]:.3f}")
print(f"Worst: depth=None, min_leaf=1, accuracy={results[(None, 1)]:.3f}")
The unconstrained tree (depth=None, min_leaf=1) tends to overfit. A moderate combination performs better.
Best: depth=5, min_leaf=5, accuracy=0.880
Worst: depth=None, min_leaf=1, accuracy=0.820
Question 10
Medium
What is the out-of-bag (OOB) score in Random Forest, and why is it useful?
Each tree does not see all training samples due to bootstrap sampling.
In bagging, each tree is trained on a bootstrap sample (about 63% of training data). The remaining 37% (out-of-bag samples) were not used to train that tree. The OOB score evaluates each tree on its OOB samples and averages the results. It provides a validation score without needing a separate test set, similar to cross-validation but for free.

Multiple Choice Questions

MCQ 1
What is the Gini impurity of a perfectly pure node (all samples belong to one class)?
  • A. 0.0
  • B. 0.5
  • C. 1.0
  • D. It depends on the number of samples
Answer: A
A is correct. Gini = 1 - sum(p_i^2). For a pure node, one class has p=1.0 and all others have p=0. Gini = 1 - 1^2 = 0. A pure node has zero impurity.
MCQ 2
What does a leaf node in a decision tree represent?
  • A. A feature to split on
  • B. A threshold value
  • C. A final prediction (class label or value)
  • D. The training accuracy
Answer: C
C is correct. Leaf nodes are the terminal nodes of the tree where no more splitting occurs. They contain the final prediction: a class label (majority class) for classification or a mean value for regression.
MCQ 3
What is the default splitting criterion for DecisionTreeClassifier in scikit-learn?
  • A. Entropy
  • B. Gini impurity
  • C. Mean Squared Error
  • D. Log loss
Answer: B
B is correct. Scikit-learn's DecisionTreeClassifier uses Gini impurity by default (criterion='gini'). You can change it to entropy with criterion='entropy'. MSE is used for DecisionTreeRegressor.
MCQ 4
What does max_depth control in a decision tree?
  • A. The number of features used
  • B. The maximum number of splits from root to leaf
  • C. The number of training samples
  • D. The learning rate
Answer: B
B is correct. max_depth limits how many levels deep the tree can grow. A tree with max_depth=3 can have at most 3 levels of splits. This is the most common way to prevent overfitting in decision trees.
MCQ 5
How does a Random Forest make a classification prediction?
  • A. Uses the prediction of the deepest tree
  • B. Uses the prediction of the first tree
  • C. Majority vote among all trees
  • D. Averages the Gini impurity of all trees
Answer: C
C is correct. Each tree in the forest makes a prediction. The final prediction is the class that gets the most votes (majority voting). For regression, the predictions are averaged instead.
MCQ 6
What is bootstrap sampling in the context of Random Forests?
  • A. Selecting features without replacement
  • B. Sampling training data with replacement
  • C. Splitting data into train and test
  • D. Removing outliers from the dataset
Answer: B
B is correct. Bootstrap sampling draws N samples from the training set WITH replacement. Some samples appear multiple times, others are left out entirely (out-of-bag samples). Each tree trains on a different bootstrap sample, creating diversity among trees.
MCQ 7
Why does an unrestricted decision tree achieve 100% training accuracy?
  • A. Because decision trees are always perfect
  • B. Because it creates a leaf for every unique training sample
  • C. Because it uses cross-validation internally
  • D. Because it removes noisy data points
Answer: B
B is correct. Without depth or sample limits, the tree keeps splitting until every leaf is pure (contains only one class). This means it memorizes every training sample, achieving 100% training accuracy but likely poor test accuracy (overfitting).
MCQ 8
In Random Forest, what does the max_features parameter control?
  • A. Total features in the dataset
  • B. Number of features each tree uses overall
  • C. Number of features considered at each split
  • D. Number of features in the output
Answer: C
C is correct. At each split, the tree only considers max_features randomly selected features (default: sqrt(n_features) for classification). This creates diversity among trees, as different trees consider different features at each split.
MCQ 9
What happens when you increase n_estimators (number of trees) in Random Forest?
  • A. Training accuracy decreases
  • B. Test accuracy always increases
  • C. Predictions become more stable, training time increases
  • D. The model always overfits more
Answer: C
C is correct. More trees make the ensemble more stable (lower variance). Training time increases linearly. However, test accuracy plateaus after a certain number of trees -- adding more trees beyond this point wastes computation without improving accuracy.
MCQ 10
Feature importance in Random Forest measures what?
  • A. How many times a feature appears in the dataset
  • B. The correlation between a feature and the target
  • C. How much a feature reduces impurity across all trees
  • D. The p-value of each feature
Answer: C
C is correct. Feature importance is the total reduction in impurity (Gini or entropy) brought by that feature, averaged across all trees. Features that produce large impurity reductions at early splits get higher importance scores.
MCQ 11
Approximately what fraction of training samples are left out (out-of-bag) in each bootstrap sample?
  • A. About 10%
  • B. About 25%
  • C. About 37%
  • D. About 50%
Answer: C
C is correct. The probability of a sample NOT being selected in N draws is (1-1/N)^N, which approaches 1/e = 0.368 as N grows large. So about 36.8% of samples are out-of-bag for each tree.
MCQ 12
What is the key difference between bagging and boosting?
  • A. Bagging trains trees in parallel on bootstrap samples; boosting trains trees sequentially, each correcting the previous tree's errors
  • B. Bagging uses decision trees; boosting uses logistic regression
  • C. Bagging is for classification; boosting is for regression
  • D. Bagging needs more data than boosting
Answer: A
A is correct. Bagging (used by Random Forest) trains independent trees in parallel on different bootstrap samples. Boosting (used by XGBoost, AdaBoost) trains trees sequentially, with each new tree focusing on the mistakes of the previous trees. Bagging reduces variance; boosting reduces both bias and variance.
MCQ 13
Why might feature importance from Random Forest be misleading for correlated features?
  • A. Correlated features always have zero importance
  • B. Importance is split among correlated features, making each appear less important than it actually is
  • C. Random Forest cannot handle correlated features
  • D. Correlated features always get maximum importance
Answer: B
B is correct. When two features are highly correlated, the tree randomly uses one or the other for splits. The importance gets divided between them, making each appear less important individually than the single uncorrelated version would be. Permutation importance can help address this issue.
MCQ 14
For a dataset with 20 features, what is the default max_features for RandomForestClassifier?
  • A. 20 (all features)
  • B. 10 (half)
  • C. 4 or 5 (approximately sqrt(20))
  • D. 1
Answer: C
C is correct. The default max_features for RandomForestClassifier is 'sqrt', meaning sqrt(n_features). sqrt(20) is approximately 4.47, so the tree considers about 4-5 features at each split. For RandomForestRegressor, the default is n_features (all features).
MCQ 15
A decision tree with max_depth=d can have at most how many leaf nodes?
  • A. d
  • B. 2d
  • C. d^2
  • D. 2^d
Answer: D
D is correct. At each level, each node can split into 2 children. After d levels: 2^1 * 2^1 * ... (d times) = 2^d maximum leaf nodes. A tree with max_depth=3 can have up to 2^3 = 8 leaves. A tree with max_depth=10 can have up to 1024 leaves.
MCQ 16
Which of the following is an advantage of decision trees over logistic regression?
  • A. Decision trees always have higher accuracy
  • B. Decision trees can capture non-linear relationships
  • C. Decision trees need less training data
  • D. Decision trees never overfit
Answer: B
B is correct. Logistic regression creates linear decision boundaries, while decision trees can capture non-linear patterns through multiple splits. However, trees can overfit (D is wrong), do not always have higher accuracy (A is wrong), and may need more data to generalize well (C is wrong).

Coding Challenges

Coding challenges coming soon.

Need to Review the Concepts?

Go back to the detailed notes for this chapter.

Read Chapter Notes

Want to learn AI and ML with a live mentor?

Explore our AI/ML Masterclass