Chapter 9 Intermediate 52 Questions

Practice Questions — KNN and Naive Bayes Classifiers

← Back to Notes

11 Easy

12 Medium

11 Hard

Topic-Specific Questions

Question 1

Easy

What is the output of the following code?

import numpy as np

# Euclidean distance
a = np.array([1, 2])
b = np.array([4, 6])
dist = np.sqrt(np.sum((a - b) ** 2))
print(f"Distance: {dist:.1f}")

Euclidean distance: sqrt((4-1)^2 + (6-2)^2) = sqrt(9+16).

Distance: 5.0

Question 2

Easy

What is the output?

import numpy as np

# Manhattan distance
a = np.array([1, 2])
b = np.array([4, 6])
dist = np.sum(np.abs(a - b))
print(f"Manhattan: {dist}")

Manhattan distance: |4-1| + |6-2|.

Manhattan: 7

Question 3

Easy

What is the output?

from sklearn.neighbors import KNeighborsClassifier
import numpy as np

X = np.array([[1], [2], [3], [7], [8], [9]])
y = np.array([0, 0, 0, 1, 1, 1])

knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X, y)
print(knn.predict([[5]]))
print(knn.predict([[2]]))

For x=5, the 3 nearest points are 3, 7, 8. For x=2, the 3 nearest are 1, 2, 3.

[1]
[0]

Question 4

Easy

What is the output?

from sklearn.naive_bayes import GaussianNB
import numpy as np

X = np.array([[1], [2], [3], [7], [8], [9]])
y = np.array([0, 0, 0, 1, 1, 1])

nb = GaussianNB()
nb.fit(X, y)

print(f"Class means: {nb.theta_.flatten()}")
print(f"Prediction for [5]: {nb.predict([[5]])[0]}")

GaussianNB computes mean and variance for each class. Class 0 mean ~ 2, Class 1 mean ~ 8.

Class means: [2. 8.]
Prediction for [5]: 1

Question 5

Easy

What is the output?

from sklearn.neighbors import KNeighborsClassifier
import numpy as np

X = np.array([[0, 0], [1, 0], [0, 1], [10, 10], [11, 10], [10, 11]])
y = np.array([0, 0, 0, 1, 1, 1])

knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X, y)
print(knn.predict([[5, 5]]))
print(knn.predict([[0.5, 0.5]]))

K=1 means the prediction is the class of the single nearest neighbor.

[0]
[0]

Question 6

Medium

What is the output?

from sklearn.neighbors import KNeighborsClassifier
import numpy as np

X = np.array([[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]])
y = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1])

for k in [1, 3, 5, 9]:
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X, y)
    pred = knn.predict([[5.5]])[0]
    print(f"K={k}: predict([5.5]) = {pred}")

5.5 is right at the boundary. As K increases, more neighbors from both sides are included.

K=1: predict([5.5]) = 1
K=3: predict([5.5]) = 1
K=5: predict([5.5]) = 1
K=9: predict([5.5]) = 0

Question 7

Medium

What is the output?

from sklearn.naive_bayes import GaussianNB
import numpy as np

X = np.array([[1, 10], [2, 20], [3, 30],
              [10, 1], [20, 2], [30, 3]])
y = np.array([0, 0, 0, 1, 1, 1])

nb = GaussianNB()
nb.fit(X, y)

proba = nb.predict_proba([[5, 5]])[0]
print(f"P(class 0): {proba[0]:.3f}")
print(f"P(class 1): {proba[1]:.3f}")
print(f"Prediction: {nb.predict([[5, 5]])[0]}")

Class 0 has high values in feature 2, class 1 has high values in feature 1. [5,5] is in between.

P(class 0): 0.500
P(class 1): 0.500
Prediction: 0

Question 8

Medium

What is the output?

# Bayes theorem calculation
P_spam = 0.3       # 30% of emails are spam
P_not_spam = 0.7   # 70% are not spam

# P("free" | spam) = 0.8
# P("free" | not spam) = 0.1
P_free_given_spam = 0.8
P_free_given_not_spam = 0.1

# P(spam | "free") = ?
P_free = P_free_given_spam * P_spam + P_free_given_not_spam * P_not_spam
P_spam_given_free = (P_free_given_spam * P_spam) / P_free

print(f"P(free): {P_free:.3f}")
print(f"P(spam | free): {P_spam_given_free:.3f}")

Apply Bayes theorem: P(spam|free) = P(free|spam) * P(spam) / P(free).

P(free): 0.310
P(spam | free): 0.774

Question 9

Medium

What is the output?

from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
import numpy as np

# Feature 1: age (20-60), Feature 2: salary (20000-200000)
X = np.array([[25, 30000], [30, 40000], [55, 180000]])
y = np.array([0, 0, 1])

# Without scaling
knn1 = KNeighborsClassifier(n_neighbors=1)
knn1.fit(X, y)
pred1 = knn1.predict([[40, 100000]])[0]

# With scaling
scaler = StandardScaler()
X_s = scaler.fit_transform(X)
test_s = scaler.transform([[40, 100000]])
knn2 = KNeighborsClassifier(n_neighbors=1)
knn2.fit(X_s, y)
pred2 = knn2.predict(test_s)[0]

print(f"Without scaling: {pred1}")
print(f"With scaling: {pred2}")

Without scaling, salary (range 150K) dominates over age (range 30). The nearest neighbor changes.

Without scaling: 0
With scaling: 1

Question 10

Hard

What is the output?

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
import numpy as np

texts = ["free money", "free gift", "free prize",
         "work meeting", "project deadline", "team lunch"]
labels = [1, 1, 1, 0, 0, 0]  # 1=spam

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

print(f"Vocabulary: {sorted(vectorizer.vocabulary_.keys())}")
print(f"Feature matrix shape: {X.shape}")

mnb = MultinomialNB(alpha=1.0)
mnb.fit(X, labels)

test_emails = ["free lunch", "free money gift"]
X_test = vectorizer.transform(test_emails)
for email, pred in zip(test_emails, mnb.predict(X_test)):
    print(f"\"{email}\" -> {'Spam' if pred == 1 else 'Not Spam'}")

"free" appears only in spam. "lunch" appears only in non-spam. "free lunch" has conflicting signals.

Vocabulary: ['deadline', 'free', 'gift', 'lunch', 'meeting', 'money', 'prize', 'project', 'team', 'work']
Feature matrix shape: (6, 10)
"free lunch" -> Spam
"free money gift" -> Spam

Question 11

Hard

What is the output?

from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
import numpy as np

np.random.seed(42)
X = np.random.randn(200, 2)
y = (X[:, 0]**2 + X[:, 1]**2 < 1).astype(int)  # Circular boundary

results = {}
for k in [1, 5, 15, 50, 100]:
    knn = KNeighborsClassifier(n_neighbors=k)
    scores = cross_val_score(knn, X, y, cv=5)
    results[k] = scores.mean()
    print(f"K={k:3d}: CV accuracy = {scores.mean():.3f}")

best_k = max(results, key=results.get)
print(f"\nBest K: {best_k}")

Circular boundary is non-linear. Small K captures it well but may overfit. Very large K underfits.

K= 1: CV accuracy = 0.925
K= 5: CV accuracy = 0.950
K= 15: CV accuracy = 0.935
K= 50: CV accuracy = 0.890
K=100: CV accuracy = 0.760
Best K: 5

Question 12

Hard

What is the output?

from sklearn.naive_bayes import GaussianNB
import numpy as np

np.random.seed(42)

# Generate data where NB assumption is violated
# Features are highly correlated (x2 = x1 + noise)
X_train = np.random.randn(100, 1)
X_train = np.column_stack([X_train, X_train + np.random.normal(0, 0.1, (100, 1))])
y_train = (X_train[:, 0] > 0).astype(int)

X_test = np.random.randn(50, 1)
X_test = np.column_stack([X_test, X_test + np.random.normal(0, 0.1, (50, 1))])
y_test = (X_test[:, 0] > 0).astype(int)

nb = GaussianNB()
nb.fit(X_train, y_train)

print(f"Training accuracy: {nb.score(X_train, y_train):.3f}")
print(f"Test accuracy: {nb.score(X_test, y_test):.3f}")
print(f"NB still works despite correlated features!")

The features are nearly identical (highly correlated). NB assumes independence but may still work.

Training accuracy: 0.990
Test accuracy: 0.980
NB still works despite correlated features!

Question 13

Medium

What is the output?

from sklearn.neighbors import KNeighborsClassifier
import numpy as np

X = np.array([[1, 1], [2, 2], [3, 3], [7, 7], [8, 8], [9, 9]])
y = np.array([0, 0, 0, 1, 1, 1])

knn_uniform = KNeighborsClassifier(n_neighbors=3, weights='uniform')
knn_distance = KNeighborsClassifier(n_neighbors=3, weights='distance')

knn_uniform.fit(X, y)
knn_distance.fit(X, y)

print(f"Uniform weights: {knn_uniform.predict([[4, 4]])[0]}")
print(f"Distance weights: {knn_distance.predict([[4, 4]])[0]}")

With uniform weights, all 3 neighbors vote equally. With distance weights, closer neighbors have more influence.

Uniform weights: 0
Distance weights: 0

Question 14

Hard

What is the output?

from sklearn.naive_bayes import GaussianNB
import numpy as np

np.random.seed(42)
# 3-class problem
X = np.vstack([
    np.random.randn(30, 2) + [0, 0],
    np.random.randn(30, 2) + [5, 0],
    np.random.randn(30, 2) + [2.5, 4]
])
y = np.array([0]*30 + [1]*30 + [2]*30)

nb = GaussianNB()
nb.fit(X, y)

# Predict probabilities for center point
proba = nb.predict_proba([[2.5, 1.5]])[0]
for i, p in enumerate(proba):
    print(f"P(class {i}): {p:.3f}")
print(f"Prediction: {nb.predict([[2.5, 1.5]])[0]}")

Point [2.5, 1.5] is roughly equidistant from all three class centers.

P(class 0): 0.382
P(class 1): 0.291
P(class 2): 0.327
Prediction: 0

Question 15

Easy

What is the output?

from sklearn.naive_bayes import GaussianNB
import numpy as np

X = np.array([[1], [2], [3], [8], [9], [10]])
y = np.array([0, 0, 0, 1, 1, 1])

nb = GaussianNB()
nb.fit(X, y)
print(f"Class priors: {nb.class_prior_}")
print(f"Class 0 mean: {nb.theta_[0][0]:.1f}")
print(f"Class 1 mean: {nb.theta_[1][0]:.1f}")

Equal number of samples per class, so priors are 0.5 each.

Class priors: [0.5 0.5]
Class 0 mean: 2.0
Class 1 mean: 9.0

Question 16

Easy

What is the output?

from sklearn.neighbors import KNeighborsClassifier
import numpy as np

X = np.array([[0], [1], [2], [10], [11], [12]])
y = np.array([0, 0, 0, 1, 1, 1])

knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X, y)
print(knn.predict([[5]]))
print(knn.predict([[8]]))

For x=5, the 3 nearest are 2(0), 1(0), 0(0) or 10(1). For x=8, nearest are 10(1), 11(1), 12(1).

[0]
[1]

Question 17

Medium

What is the output?

from sklearn.neighbors import KNeighborsClassifier
import numpy as np

X = np.array([[0, 0], [1, 0], [0, 1], [1, 1],
              [5, 5], [6, 5], [5, 6], [6, 6]])
y = np.array([0, 0, 0, 0, 1, 1, 1, 1])

for k in [1, 3, 7]:
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X, y)
    pred = knn.predict([[3, 3]])[0]
    print(f"K={k}: predict([3,3])={pred}")

[3,3] is between the two clusters. The result depends on which neighbors are closer.

K=1: predict([3,3])=0
K=3: predict([3,3])=0
K=7: predict([3,3])=0

Question 18

Hard

What is the output?

from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
import numpy as np

np.random.seed(42)
X = np.random.randn(300, 20)  # 20 features
y = (X[:, 0] > 0).astype(int)

knn = KNeighborsClassifier(n_neighbors=5)
nb = GaussianNB()

knn_cv = cross_val_score(knn, X, y, cv=5).mean()
nb_cv = cross_val_score(nb, X, y, cv=5).mean()

print(f"KNN (20D): {knn_cv:.3f}")
print(f"NB  (20D): {nb_cv:.3f}")
print(f"Winner: {'NB' if nb_cv > knn_cv else 'KNN'}")

20 features but only feature 0 matters. KNN suffers from curse of dimensionality with many irrelevant features.

KNN (20D): 0.750
NB (20D): 0.903
Winner: NB

Question 19

Easy

What is the output?

import numpy as np

# Chebyshev distance
a = np.array([1, 2, 3])
b = np.array([4, 8, 5])

chebyshev = np.max(np.abs(a - b))
print(f"Chebyshev distance: {chebyshev}")

Chebyshev distance is the maximum absolute difference along any dimension.

Chebyshev distance: 6

Question 20

Medium

What is the output?

from sklearn.naive_bayes import GaussianNB
import numpy as np

# Imbalanced classes: 90% class 0, 10% class 1
X = np.vstack([np.random.RandomState(42).randn(90, 2),
               np.random.RandomState(42).randn(10, 2) + 3])
y = np.array([0]*90 + [1]*10)

nb = GaussianNB()
nb.fit(X, y)

print(f"Class priors: {np.round(nb.class_prior_, 2)}")
print(f"Prior for class 0: {nb.class_prior_[0]:.2f}")
print(f"Prior for class 1: {nb.class_prior_[1]:.2f}")

Class priors are estimated from the training data class frequencies.

Class priors: [0.9 0.1]
Prior for class 0: 0.90
Prior for class 1: 0.10

Question 21

Hard

What is the output?

from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
import numpy as np
import time

np.random.seed(42)
X_train = np.random.randn(10000, 10)
y_train = np.random.choice([0, 1], 10000)
X_test = np.random.randn(100, 10)

# Brute force vs KD-tree
for algo in ['brute', 'kd_tree', 'ball_tree']:
    knn = KNeighborsClassifier(n_neighbors=5, algorithm=algo)
    knn.fit(X_train, y_train)
    start = time.time()
    knn.predict(X_test)
    t = time.time() - start
    print(f"{algo:10s}: predict time = {t:.4f}s")

KD-tree and ball-tree are spatial data structures that speed up neighbor search.

brute : predict time = 0.0312s
kd_tree : predict time = 0.0156s
ball_tree : predict time = 0.0178s

Question 22

Medium

What is the output?

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
import numpy as np

texts = ["good good good", "good bad", "bad bad bad"]
y = [1, 0, 0]

vec = CountVectorizer()
X = vec.fit_transform(texts)

nb = MultinomialNB(alpha=1.0)
nb.fit(X, y)

test = vec.transform(["good"])
proba = nb.predict_proba(test)[0]
print(f"P(class 0): {proba[0]:.3f}")
print(f"P(class 1): {proba[1]:.3f}")
print(f"Prediction: {nb.predict(test)[0]}")

"good" appears mostly in class 1 (positive). But class 0 has more training samples (2 vs 1).

P(class 0): 0.342
P(class 1): 0.658
Prediction: 1

Question 23

Easy

What is the output?

from sklearn.neighbors import KNeighborsClassifier
import numpy as np

X = np.array([[1], [2], [3], [4], [5]])
y = np.array([10, 20, 30, 40, 50])

# Wait - this is regression data! Can KNeighborsClassifier handle it?
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X, y)
print(knn.predict([[2.5]]))
print(knn.classes_)

KNeighborsClassifier treats y values as class labels, even if they look continuous.

[20]
[10 20 30 40 50]

Question 24

Hard

What is the output?

from sklearn.naive_bayes import GaussianNB
import numpy as np

np.random.seed(42)

# Feature 1: strongly predictive
# Feature 2: pure noise
X_signal = np.vstack([np.random.randn(50, 1) - 2,
                      np.random.randn(50, 1) + 2])
X_noise = np.random.randn(100, 1) * 10  # Large noise
X = np.column_stack([X_signal, X_noise])
y = np.array([0]*50 + [1]*50)

nb = GaussianNB()
nb.fit(X, y)

print(f"Class 0 means: [{nb.theta_[0][0]:.2f}, {nb.theta_[0][1]:.2f}]")
print(f"Class 1 means: [{nb.theta_[1][0]:.2f}, {nb.theta_[1][1]:.2f}]")
print(f"Class 0 var:   [{nb.var_[0][0]:.2f}, {nb.var_[0][1]:.2f}]")
print(f"Class 1 var:   [{nb.var_[1][0]:.2f}, {nb.var_[1][1]:.2f}]")
print(f"Accuracy: {nb.score(X, y):.3f}")

The signal feature has different means per class. The noise feature has similar means but high variance.

Class 0 means: [-2.01, 0.87]
Class 1 means: [1.87, -1.23]
Class 0 var: [0.84, 93.45]
Class 1 var: [1.12, 108.67]
Accuracy: 0.980

Mixed & Application Questions

Question 1

Easy

Why is KNN called a "lazy learner"?

Think about what happens during the training phase.

KNN is called a "lazy learner" because it does no work during training. The fit() method simply stores the training data. All computation (calculating distances, finding neighbors, voting) happens at prediction time. This is the opposite of "eager learners" like logistic regression or SVM, which learn a model during training and predict quickly.

Question 2

Easy

Why is Naive Bayes called "naive"?

It makes a simplifying assumption about features.

Naive Bayes is called "naive" because it assumes that all features are independent of each other given the class label. In reality, features are often correlated (e.g., a person's height and weight are correlated). Despite this unrealistic assumption, Naive Bayes works surprisingly well in practice, especially for text classification.

Question 3

Medium

Explain the curse of dimensionality and how it affects KNN.

Think about what happens to distances as the number of dimensions increases.

In high-dimensional spaces, all points become approximately equidistant from each other. The ratio of the distance to the nearest neighbor vs the farthest neighbor approaches 1. This means the concept of "nearest neighbor" becomes meaningless -- all neighbors are roughly the same distance away. KNN relies on meaningful distances, so it degrades in high dimensions. Solutions: reduce dimensionality with PCA, use feature selection, or switch to algorithms that handle high dimensions better (like Naive Bayes or linear models).

Question 4

Medium

What is the output?

from sklearn.neighbors import KNeighborsRegressor
import numpy as np

X = np.array([[1], [2], [3], [4], [5]])
y = np.array([10, 20, 30, 40, 50])

knn = KNeighborsRegressor(n_neighbors=3)
knn.fit(X, y)
print(f"Predict [3]: {knn.predict([[3]])[0]:.1f}")
print(f"Predict [6]: {knn.predict([[6]])[0]:.1f}")

KNN regression predicts the average of the K nearest neighbors' values.

Predict [3]: 30.0
Predict [6]: 40.0

Question 5

Medium

When would you choose Naive Bayes over KNN, and vice versa?

Consider data type, dataset size, dimensionality, and speed requirements.

Choose Naive Bayes when: working with text data (spam filtering, sentiment analysis), you have many features (high dimensionality), you need fast training and prediction, you have limited training data, or you need probability estimates. Choose KNN when: the decision boundary is non-linear, you have few features (low dimensionality), you have enough training data, and prediction speed is not critical.

Question 6

Medium

What is the output?

from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import make_moons
from sklearn.model_selection import cross_val_score
import numpy as np

X, y = make_moons(n_samples=300, noise=0.25, random_state=42)

nb_scores = cross_val_score(GaussianNB(), X, y, cv=5)
knn_scores = cross_val_score(KNeighborsClassifier(n_neighbors=5), X, y, cv=5)

print(f"Naive Bayes: {nb_scores.mean():.3f}")
print(f"KNN (K=5):   {knn_scores.mean():.3f}")
print(f"Winner: {'KNN' if knn_scores.mean() > nb_scores.mean() else 'NB'}")

Make_moons creates a non-linear, crescent-shaped boundary. Which algorithm handles non-linearity better?

Naive Bayes: 0.853
KNN (K=5): 0.930
Winner: KNN

Question 7

Hard

What is Laplace smoothing in Naive Bayes, and why is it necessary?

What happens if a word appears in test data but was never seen in training data for a particular class?

Laplace smoothing (additive smoothing) adds a small count (alpha, usually 1) to every feature count. Without smoothing, if a word never appears in spam training emails, P(word|Spam) = 0, which makes the entire product P(all_words|Spam) = 0 regardless of all other words. One unseen word kills the prediction. With smoothing: P(word|Spam) = (count + alpha) / (total + alpha * vocabulary_size). Now the probability is small but not zero.

Question 8

Hard

What is the output?

from sklearn.neighbors import KNeighborsClassifier
import numpy as np
import time

np.random.seed(42)

# Training time vs prediction time
train_sizes = [100, 1000, 10000]
for n in train_sizes:
    X_train = np.random.randn(n, 5)
    y_train = np.random.choice([0, 1], n)
    X_test = np.random.randn(100, 5)
    
    knn = KNeighborsClassifier(n_neighbors=5)
    
    start = time.time()
    knn.fit(X_train, y_train)
    fit_time = time.time() - start
    
    start = time.time()
    knn.predict(X_test)
    pred_time = time.time() - start
    
    print(f"N={n:5d}: fit={fit_time:.4f}s, predict={pred_time:.4f}s")

KNN fit is instant (just stores data). Prediction time grows with training set size.

N= 100: fit=0.0001s, predict=0.0012s
N= 1000: fit=0.0002s, predict=0.0056s
N=10000: fit=0.0003s, predict=0.0423s

Question 9

Hard

How does KNN handle the bias-variance trade-off through the K parameter?

Small K has what kind of bias/variance? Large K?

K=1: Zero training error, low bias, high variance. The model is very sensitive to noise -- a single noisy point can flip the prediction. Complex decision boundary. Large K (e.g., K=N): Always predicts the majority class, high bias, zero variance. The model ignores all local patterns. Optimal K: Balances bias and variance. Found through cross-validation. Typically somewhere between sqrt(N) and a few dozen.

Question 10

Hard

What is the output?

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
import numpy as np

texts = ["good great excellent", "good movie", "great film",
         "bad terrible awful", "bad movie", "terrible film"]
y = [1, 1, 1, 0, 0, 0]  # 1=positive, 0=negative

vec = CountVectorizer()
X = vec.fit_transform(texts)

for alpha in [0.01, 1.0, 100.0]:
    nb = MultinomialNB(alpha=alpha)
    nb.fit(X, y)
    test = vec.transform(["good terrible movie"])
    proba = nb.predict_proba(test)[0]
    pred = nb.predict(test)[0]
    print(f"alpha={alpha:6.2f}: P(neg)={proba[0]:.3f}, P(pos)={proba[1]:.3f}, pred={pred}")

Alpha controls smoothing. Small alpha = probabilities dominated by data. Large alpha = uniform probabilities.

alpha= 0.01: P(neg)=0.500, P(pos)=0.500, pred=0
alpha= 1.00: P(neg)=0.500, P(pos)=0.500, pred=0
alpha=100.00: P(neg)=0.500, P(pos)=0.500, pred=0

Multiple Choice Questions

MCQ 1

In KNN with K=5, how is the prediction made for a new point?

A. Use the 5 farthest training points
B. Majority vote among the 5 nearest training points
C. Average of all training points
D. Use the 5th nearest point only

Answer: B
B is correct. KNN finds the K=5 nearest training points based on distance. For classification, the prediction is the majority class among these 5 neighbors. For regression, it would be the average of their values.

MCQ 2

What happens during the KNN training phase?

A. Weights are learned through gradient descent
B. A decision tree is built
C. Nothing -- the training data is simply stored
D. Cluster centers are computed

Answer: C
C is correct. KNN is a lazy learner. The fit() method simply stores the training data. No model parameters are learned. All computation happens at prediction time when distances are calculated.

MCQ 3

Which distance metric is most commonly used in KNN?

A. Manhattan distance
B. Cosine distance
C. Euclidean distance
D. Hamming distance

Answer: C
C is correct. Euclidean distance (straight-line distance) is the default and most commonly used metric in KNN. It works well for continuous features. Manhattan distance is preferred for high-dimensional or grid-like data.

MCQ 4

What is the "naive" assumption in Naive Bayes?

A. The data is normally distributed
B. All features are independent of each other given the class
C. All classes have equal probability
D. The training data is representative

Answer: B
B is correct. Naive Bayes assumes that features are conditionally independent given the class label. This means P(x1, x2|Class) = P(x1|Class) * P(x2|Class). This assumption is almost never true in practice but simplifies computation enormously.

MCQ 5

Which Naive Bayes variant is best for text classification with word counts?

A. GaussianNB
B. MultinomialNB
C. BernoulliNB
D. ComplementNB

Answer: B
B is correct. MultinomialNB is designed for discrete count features like word frequencies or TF-IDF values. GaussianNB is for continuous features. BernoulliNB is for binary features (word present/absent).

MCQ 6

What is the effect of using K=1 in KNN?

A. Always 100% training accuracy, high risk of overfitting
B. Always 100% test accuracy
C. Smooth decision boundary
D. Fastest prediction time

Answer: A
A is correct. With K=1, each training point is its own nearest neighbor, so training accuracy is always 100%. However, the model is highly sensitive to noise (one noisy point can create incorrect predictions). This is overfitting: memorizing training data rather than learning patterns.

MCQ 7

Why is feature scaling critical for KNN but not for Naive Bayes?

A. KNN uses distances; NB uses probabilities
B. KNN is faster with scaling
C. NB only works with binary features
D. Scaling changes the class labels

Answer: A
A is correct. KNN computes distances between points, so features with larger scales dominate the distance. Naive Bayes computes probabilities based on feature distributions for each class, which are not affected by absolute scale (the mean and variance adjust accordingly).

MCQ 8

What is Laplace smoothing (alpha) in Naive Bayes?

A. A technique to speed up training
B. Adding a small count to all feature frequencies to prevent zero probabilities
C. A method to select the best features
D. A regularization technique that reduces model complexity

Answer: B
B is correct. Laplace smoothing adds alpha (typically 1) to every feature count. This prevents P(feature|class) from being zero when a feature was never observed with a particular class. Without smoothing, one unseen feature would make the entire class probability zero.

MCQ 9

What is the time complexity of KNN prediction for a single point with N training samples and D features?

A. O(1)
B. O(D)
C. O(N * D)
D. O(N^2)

Answer: C
C is correct. For each prediction, KNN must compute the distance to all N training points. Each distance computation takes O(D) time (comparing D features). Total: O(N * D). This makes prediction slow for large training sets.

MCQ 10

In Bayes' theorem P(A|B) = P(B|A)*P(A)/P(B), what is P(A) called?

A. Posterior
B. Likelihood
C. Prior
D. Evidence

Answer: C
C is correct. P(A) is the prior probability (our belief before seeing evidence). P(B|A) is the likelihood (probability of evidence given hypothesis). P(A|B) is the posterior (updated belief after seeing evidence). P(B) is the evidence (normalizing constant).

MCQ 11

Why does KNN struggle with high-dimensional data (curse of dimensionality)?

A. KNN cannot handle more than 10 features
B. In high dimensions, all points become approximately equidistant, making neighbors meaningless
C. KNN needs exponentially more memory
D. The algorithm crashes with many features

Answer: B
B is correct. In high-dimensional spaces, the ratio of nearest-to-farthest distance approaches 1. All points appear roughly equidistant, so the concept of "nearest neighbor" loses its meaning. KNN needs dimensionality reduction (PCA) or feature selection to work well in high dimensions.

MCQ 12

A Naive Bayes spam filter trained on English emails encounters a new word "cryptocurrency" never seen in training. With alpha=0, what happens?

A. The word is ignored
B. P(cryptocurrency|Spam) = 0, making P(Spam|email) = 0 regardless of other words
C. The model crashes with a division by zero
D. The model assigns equal probability to both classes

Answer: B
B is correct. Without smoothing (alpha=0), P("cryptocurrency"|Spam) = 0 because the word was never seen in spam training data. Since NB multiplies all feature probabilities, this zero makes the entire P(Spam|email) = 0, even if every other word screams spam. This is exactly why Laplace smoothing is essential.

MCQ 13

KNN uses algorithm='auto' by default in scikit-learn. What data structure does it use for efficient neighbor search?

A. Hash table
B. Binary search tree
C. KD-tree or Ball-tree (chosen automatically based on data)
D. Graph-based index

Answer: C
C is correct. Scikit-learn automatically chooses between brute force (small datasets), KD-tree (low-dimensional data), and Ball-tree (higher-dimensional data) based on the dataset characteristics. KD-tree reduces lookup from O(N) to O(log N) in low dimensions but degrades to O(N) in high dimensions.

MCQ 14

Why does Naive Bayes often work well despite the independence assumption being violated?

A. The probabilities are always accurate
B. Modern computers can handle the violation
C. The decision boundary depends on which class has higher probability, not the exact probability values
D. Feature independence is always approximately true

Answer: C
C is correct. Even when the estimated probabilities are inaccurate (due to violated independence), the ranking of classes is often correct. Classification only needs to identify which class has the highest probability, not the exact probability value. This is why NB achieves good accuracy despite poor probability calibration.

MCQ 15

Which of the following problems is Naive Bayes least suitable for?

A. Spam email detection
B. Sentiment analysis of text reviews
C. Image classification with pixel features
D. News article categorization

Answer: C
C is correct. Image classification involves highly correlated features (adjacent pixels are similar), strongly violating the independence assumption. Also, pixel values do not follow Gaussian distributions. NB excels at text tasks (A, B, D) where word features are more independent and follow count-based distributions.

MCQ 16

What is the key disadvantage of KNN compared to model-based classifiers like logistic regression?

A. KNN cannot handle multi-class problems
B. KNN prediction is slow because it computes distances to all training points
C. KNN always overfits
D. KNN requires feature engineering

Answer: B
B is correct. KNN must compute the distance from the new point to every training point at prediction time, making it O(n*d) per prediction. Model-based classifiers (logistic regression, SVM) learn parameters during training and predict in O(d) time regardless of training set size.

MCQ 17

Which Naive Bayes variant would you use for a dataset where features are binary (0 or 1)?

A. GaussianNB
B. MultinomialNB
C. BernoulliNB
D. ComplementNB

Answer: C
C is correct. BernoulliNB is specifically designed for binary features. It models each feature as a Bernoulli distribution (probability of being 1 vs 0). GaussianNB assumes continuous features. MultinomialNB assumes count features. BernoulliNB is commonly used for document classification with binary word presence features.

MCQ 18

What does K represent in K-Nearest Neighbors?

A. The number of features
B. The number of training samples to use
C. The number of nearest neighbors to consider for voting
D. The number of classes

Answer: C
C is correct. K is the number of nearest training points that vote on the prediction. For K=5, the model finds the 5 closest training points and takes a majority vote (classification) or average (regression). K is a hyperparameter that must be chosen by the user.

Coding Challenges

Coding challenges coming soon.

Need to Review the Concepts?

Go back to the detailed notes for this chapter.

Read Chapter Notes

Want to learn AI and ML with a live mentor?

Explore our AI/ML Masterclass