Practice Questions — KNN and Naive Bayes Classifiers
← Back to NotesTopic-Specific Questions
Question 1
Easy
What is the output of the following code?
import numpy as np
# Euclidean distance
a = np.array([1, 2])
b = np.array([4, 6])
dist = np.sqrt(np.sum((a - b) ** 2))
print(f"Distance: {dist:.1f}")Euclidean distance: sqrt((4-1)^2 + (6-2)^2) = sqrt(9+16).
Distance: 5.0Question 2
Easy
What is the output?
import numpy as np
# Manhattan distance
a = np.array([1, 2])
b = np.array([4, 6])
dist = np.sum(np.abs(a - b))
print(f"Manhattan: {dist}")Manhattan distance: |4-1| + |6-2|.
Manhattan: 7Question 3
Easy
What is the output?
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
X = np.array([[1], [2], [3], [7], [8], [9]])
y = np.array([0, 0, 0, 1, 1, 1])
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X, y)
print(knn.predict([[5]]))
print(knn.predict([[2]]))For x=5, the 3 nearest points are 3, 7, 8. For x=2, the 3 nearest are 1, 2, 3.
[1][0]Question 4
Easy
What is the output?
from sklearn.naive_bayes import GaussianNB
import numpy as np
X = np.array([[1], [2], [3], [7], [8], [9]])
y = np.array([0, 0, 0, 1, 1, 1])
nb = GaussianNB()
nb.fit(X, y)
print(f"Class means: {nb.theta_.flatten()}")
print(f"Prediction for [5]: {nb.predict([[5]])[0]}")GaussianNB computes mean and variance for each class. Class 0 mean ~ 2, Class 1 mean ~ 8.
Class means: [2. 8.]Prediction for [5]: 1Question 5
Easy
What is the output?
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
X = np.array([[0, 0], [1, 0], [0, 1], [10, 10], [11, 10], [10, 11]])
y = np.array([0, 0, 0, 1, 1, 1])
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X, y)
print(knn.predict([[5, 5]]))
print(knn.predict([[0.5, 0.5]]))K=1 means the prediction is the class of the single nearest neighbor.
[0][0]Question 6
Medium
What is the output?
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
X = np.array([[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]])
y = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1])
for k in [1, 3, 5, 9]:
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X, y)
pred = knn.predict([[5.5]])[0]
print(f"K={k}: predict([5.5]) = {pred}")5.5 is right at the boundary. As K increases, more neighbors from both sides are included.
K=1: predict([5.5]) = 1K=3: predict([5.5]) = 1K=5: predict([5.5]) = 1K=9: predict([5.5]) = 0Question 7
Medium
What is the output?
from sklearn.naive_bayes import GaussianNB
import numpy as np
X = np.array([[1, 10], [2, 20], [3, 30],
[10, 1], [20, 2], [30, 3]])
y = np.array([0, 0, 0, 1, 1, 1])
nb = GaussianNB()
nb.fit(X, y)
proba = nb.predict_proba([[5, 5]])[0]
print(f"P(class 0): {proba[0]:.3f}")
print(f"P(class 1): {proba[1]:.3f}")
print(f"Prediction: {nb.predict([[5, 5]])[0]}")Class 0 has high values in feature 2, class 1 has high values in feature 1. [5,5] is in between.
P(class 0): 0.500P(class 1): 0.500Prediction: 0Question 8
Medium
What is the output?
# Bayes theorem calculation
P_spam = 0.3 # 30% of emails are spam
P_not_spam = 0.7 # 70% are not spam
# P("free" | spam) = 0.8
# P("free" | not spam) = 0.1
P_free_given_spam = 0.8
P_free_given_not_spam = 0.1
# P(spam | "free") = ?
P_free = P_free_given_spam * P_spam + P_free_given_not_spam * P_not_spam
P_spam_given_free = (P_free_given_spam * P_spam) / P_free
print(f"P(free): {P_free:.3f}")
print(f"P(spam | free): {P_spam_given_free:.3f}")Apply Bayes theorem: P(spam|free) = P(free|spam) * P(spam) / P(free).
P(free): 0.310P(spam | free): 0.774Question 9
Medium
What is the output?
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
import numpy as np
# Feature 1: age (20-60), Feature 2: salary (20000-200000)
X = np.array([[25, 30000], [30, 40000], [55, 180000]])
y = np.array([0, 0, 1])
# Without scaling
knn1 = KNeighborsClassifier(n_neighbors=1)
knn1.fit(X, y)
pred1 = knn1.predict([[40, 100000]])[0]
# With scaling
scaler = StandardScaler()
X_s = scaler.fit_transform(X)
test_s = scaler.transform([[40, 100000]])
knn2 = KNeighborsClassifier(n_neighbors=1)
knn2.fit(X_s, y)
pred2 = knn2.predict(test_s)[0]
print(f"Without scaling: {pred1}")
print(f"With scaling: {pred2}")Without scaling, salary (range 150K) dominates over age (range 30). The nearest neighbor changes.
Without scaling: 0With scaling: 1Question 10
Hard
What is the output?
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
import numpy as np
texts = ["free money", "free gift", "free prize",
"work meeting", "project deadline", "team lunch"]
labels = [1, 1, 1, 0, 0, 0] # 1=spam
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)
print(f"Vocabulary: {sorted(vectorizer.vocabulary_.keys())}")
print(f"Feature matrix shape: {X.shape}")
mnb = MultinomialNB(alpha=1.0)
mnb.fit(X, labels)
test_emails = ["free lunch", "free money gift"]
X_test = vectorizer.transform(test_emails)
for email, pred in zip(test_emails, mnb.predict(X_test)):
print(f"\"{email}\" -> {'Spam' if pred == 1 else 'Not Spam'}")"free" appears only in spam. "lunch" appears only in non-spam. "free lunch" has conflicting signals.
Vocabulary: ['deadline', 'free', 'gift', 'lunch', 'meeting', 'money', 'prize', 'project', 'team', 'work']Feature matrix shape: (6, 10)"free lunch" -> Spam"free money gift" -> SpamQuestion 11
Hard
What is the output?
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
import numpy as np
np.random.seed(42)
X = np.random.randn(200, 2)
y = (X[:, 0]**2 + X[:, 1]**2 < 1).astype(int) # Circular boundary
results = {}
for k in [1, 5, 15, 50, 100]:
knn = KNeighborsClassifier(n_neighbors=k)
scores = cross_val_score(knn, X, y, cv=5)
results[k] = scores.mean()
print(f"K={k:3d}: CV accuracy = {scores.mean():.3f}")
best_k = max(results, key=results.get)
print(f"\nBest K: {best_k}")Circular boundary is non-linear. Small K captures it well but may overfit. Very large K underfits.
K= 1: CV accuracy = 0.925K= 5: CV accuracy = 0.950K= 15: CV accuracy = 0.935K= 50: CV accuracy = 0.890K=100: CV accuracy = 0.760Best K: 5Question 12
Hard
What is the output?
from sklearn.naive_bayes import GaussianNB
import numpy as np
np.random.seed(42)
# Generate data where NB assumption is violated
# Features are highly correlated (x2 = x1 + noise)
X_train = np.random.randn(100, 1)
X_train = np.column_stack([X_train, X_train + np.random.normal(0, 0.1, (100, 1))])
y_train = (X_train[:, 0] > 0).astype(int)
X_test = np.random.randn(50, 1)
X_test = np.column_stack([X_test, X_test + np.random.normal(0, 0.1, (50, 1))])
y_test = (X_test[:, 0] > 0).astype(int)
nb = GaussianNB()
nb.fit(X_train, y_train)
print(f"Training accuracy: {nb.score(X_train, y_train):.3f}")
print(f"Test accuracy: {nb.score(X_test, y_test):.3f}")
print(f"NB still works despite correlated features!")The features are nearly identical (highly correlated). NB assumes independence but may still work.
Training accuracy: 0.990Test accuracy: 0.980NB still works despite correlated features!Question 13
Medium
What is the output?
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
X = np.array([[1, 1], [2, 2], [3, 3], [7, 7], [8, 8], [9, 9]])
y = np.array([0, 0, 0, 1, 1, 1])
knn_uniform = KNeighborsClassifier(n_neighbors=3, weights='uniform')
knn_distance = KNeighborsClassifier(n_neighbors=3, weights='distance')
knn_uniform.fit(X, y)
knn_distance.fit(X, y)
print(f"Uniform weights: {knn_uniform.predict([[4, 4]])[0]}")
print(f"Distance weights: {knn_distance.predict([[4, 4]])[0]}")With uniform weights, all 3 neighbors vote equally. With distance weights, closer neighbors have more influence.
Uniform weights: 0Distance weights: 0Question 14
Hard
What is the output?
from sklearn.naive_bayes import GaussianNB
import numpy as np
np.random.seed(42)
# 3-class problem
X = np.vstack([
np.random.randn(30, 2) + [0, 0],
np.random.randn(30, 2) + [5, 0],
np.random.randn(30, 2) + [2.5, 4]
])
y = np.array([0]*30 + [1]*30 + [2]*30)
nb = GaussianNB()
nb.fit(X, y)
# Predict probabilities for center point
proba = nb.predict_proba([[2.5, 1.5]])[0]
for i, p in enumerate(proba):
print(f"P(class {i}): {p:.3f}")
print(f"Prediction: {nb.predict([[2.5, 1.5]])[0]}")Point [2.5, 1.5] is roughly equidistant from all three class centers.
P(class 0): 0.382P(class 1): 0.291P(class 2): 0.327Prediction: 0Question 15
Easy
What is the output?
from sklearn.naive_bayes import GaussianNB
import numpy as np
X = np.array([[1], [2], [3], [8], [9], [10]])
y = np.array([0, 0, 0, 1, 1, 1])
nb = GaussianNB()
nb.fit(X, y)
print(f"Class priors: {nb.class_prior_}")
print(f"Class 0 mean: {nb.theta_[0][0]:.1f}")
print(f"Class 1 mean: {nb.theta_[1][0]:.1f}")Equal number of samples per class, so priors are 0.5 each.
Class priors: [0.5 0.5]Class 0 mean: 2.0Class 1 mean: 9.0Question 16
Easy
What is the output?
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
X = np.array([[0], [1], [2], [10], [11], [12]])
y = np.array([0, 0, 0, 1, 1, 1])
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X, y)
print(knn.predict([[5]]))
print(knn.predict([[8]]))For x=5, the 3 nearest are 2(0), 1(0), 0(0) or 10(1). For x=8, nearest are 10(1), 11(1), 12(1).
[0][1]Question 17
Medium
What is the output?
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
X = np.array([[0, 0], [1, 0], [0, 1], [1, 1],
[5, 5], [6, 5], [5, 6], [6, 6]])
y = np.array([0, 0, 0, 0, 1, 1, 1, 1])
for k in [1, 3, 7]:
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X, y)
pred = knn.predict([[3, 3]])[0]
print(f"K={k}: predict([3,3])={pred}")[3,3] is between the two clusters. The result depends on which neighbors are closer.
K=1: predict([3,3])=0K=3: predict([3,3])=0K=7: predict([3,3])=0Question 18
Hard
What is the output?
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
import numpy as np
np.random.seed(42)
X = np.random.randn(300, 20) # 20 features
y = (X[:, 0] > 0).astype(int)
knn = KNeighborsClassifier(n_neighbors=5)
nb = GaussianNB()
knn_cv = cross_val_score(knn, X, y, cv=5).mean()
nb_cv = cross_val_score(nb, X, y, cv=5).mean()
print(f"KNN (20D): {knn_cv:.3f}")
print(f"NB (20D): {nb_cv:.3f}")
print(f"Winner: {'NB' if nb_cv > knn_cv else 'KNN'}")20 features but only feature 0 matters. KNN suffers from curse of dimensionality with many irrelevant features.
KNN (20D): 0.750NB (20D): 0.903Winner: NBQuestion 19
Easy
What is the output?
import numpy as np
# Chebyshev distance
a = np.array([1, 2, 3])
b = np.array([4, 8, 5])
chebyshev = np.max(np.abs(a - b))
print(f"Chebyshev distance: {chebyshev}")Chebyshev distance is the maximum absolute difference along any dimension.
Chebyshev distance: 6Question 20
Medium
What is the output?
from sklearn.naive_bayes import GaussianNB
import numpy as np
# Imbalanced classes: 90% class 0, 10% class 1
X = np.vstack([np.random.RandomState(42).randn(90, 2),
np.random.RandomState(42).randn(10, 2) + 3])
y = np.array([0]*90 + [1]*10)
nb = GaussianNB()
nb.fit(X, y)
print(f"Class priors: {np.round(nb.class_prior_, 2)}")
print(f"Prior for class 0: {nb.class_prior_[0]:.2f}")
print(f"Prior for class 1: {nb.class_prior_[1]:.2f}")Class priors are estimated from the training data class frequencies.
Class priors: [0.9 0.1]Prior for class 0: 0.90Prior for class 1: 0.10Question 21
Hard
What is the output?
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
import numpy as np
import time
np.random.seed(42)
X_train = np.random.randn(10000, 10)
y_train = np.random.choice([0, 1], 10000)
X_test = np.random.randn(100, 10)
# Brute force vs KD-tree
for algo in ['brute', 'kd_tree', 'ball_tree']:
knn = KNeighborsClassifier(n_neighbors=5, algorithm=algo)
knn.fit(X_train, y_train)
start = time.time()
knn.predict(X_test)
t = time.time() - start
print(f"{algo:10s}: predict time = {t:.4f}s")KD-tree and ball-tree are spatial data structures that speed up neighbor search.
brute : predict time = 0.0312skd_tree : predict time = 0.0156sball_tree : predict time = 0.0178sQuestion 22
Medium
What is the output?
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
import numpy as np
texts = ["good good good", "good bad", "bad bad bad"]
y = [1, 0, 0]
vec = CountVectorizer()
X = vec.fit_transform(texts)
nb = MultinomialNB(alpha=1.0)
nb.fit(X, y)
test = vec.transform(["good"])
proba = nb.predict_proba(test)[0]
print(f"P(class 0): {proba[0]:.3f}")
print(f"P(class 1): {proba[1]:.3f}")
print(f"Prediction: {nb.predict(test)[0]}")"good" appears mostly in class 1 (positive). But class 0 has more training samples (2 vs 1).
P(class 0): 0.342P(class 1): 0.658Prediction: 1Question 23
Easy
What is the output?
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([10, 20, 30, 40, 50])
# Wait - this is regression data! Can KNeighborsClassifier handle it?
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X, y)
print(knn.predict([[2.5]]))
print(knn.classes_)KNeighborsClassifier treats y values as class labels, even if they look continuous.
[20][10 20 30 40 50]Question 24
Hard
What is the output?
from sklearn.naive_bayes import GaussianNB
import numpy as np
np.random.seed(42)
# Feature 1: strongly predictive
# Feature 2: pure noise
X_signal = np.vstack([np.random.randn(50, 1) - 2,
np.random.randn(50, 1) + 2])
X_noise = np.random.randn(100, 1) * 10 # Large noise
X = np.column_stack([X_signal, X_noise])
y = np.array([0]*50 + [1]*50)
nb = GaussianNB()
nb.fit(X, y)
print(f"Class 0 means: [{nb.theta_[0][0]:.2f}, {nb.theta_[0][1]:.2f}]")
print(f"Class 1 means: [{nb.theta_[1][0]:.2f}, {nb.theta_[1][1]:.2f}]")
print(f"Class 0 var: [{nb.var_[0][0]:.2f}, {nb.var_[0][1]:.2f}]")
print(f"Class 1 var: [{nb.var_[1][0]:.2f}, {nb.var_[1][1]:.2f}]")
print(f"Accuracy: {nb.score(X, y):.3f}")The signal feature has different means per class. The noise feature has similar means but high variance.
Class 0 means: [-2.01, 0.87]Class 1 means: [1.87, -1.23]Class 0 var: [0.84, 93.45]Class 1 var: [1.12, 108.67]Accuracy: 0.980Mixed & Application Questions
Question 1
Easy
Why is KNN called a "lazy learner"?
Think about what happens during the training phase.
KNN is called a "lazy learner" because it does no work during training. The fit() method simply stores the training data. All computation (calculating distances, finding neighbors, voting) happens at prediction time. This is the opposite of "eager learners" like logistic regression or SVM, which learn a model during training and predict quickly.
Question 2
Easy
Why is Naive Bayes called "naive"?
It makes a simplifying assumption about features.
Naive Bayes is called "naive" because it assumes that all features are independent of each other given the class label. In reality, features are often correlated (e.g., a person's height and weight are correlated). Despite this unrealistic assumption, Naive Bayes works surprisingly well in practice, especially for text classification.
Question 3
Medium
Explain the curse of dimensionality and how it affects KNN.
Think about what happens to distances as the number of dimensions increases.
In high-dimensional spaces, all points become approximately equidistant from each other. The ratio of the distance to the nearest neighbor vs the farthest neighbor approaches 1. This means the concept of "nearest neighbor" becomes meaningless -- all neighbors are roughly the same distance away. KNN relies on meaningful distances, so it degrades in high dimensions. Solutions: reduce dimensionality with PCA, use feature selection, or switch to algorithms that handle high dimensions better (like Naive Bayes or linear models).
Question 4
Medium
What is the output?
from sklearn.neighbors import KNeighborsRegressor
import numpy as np
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([10, 20, 30, 40, 50])
knn = KNeighborsRegressor(n_neighbors=3)
knn.fit(X, y)
print(f"Predict [3]: {knn.predict([[3]])[0]:.1f}")
print(f"Predict [6]: {knn.predict([[6]])[0]:.1f}")KNN regression predicts the average of the K nearest neighbors' values.
Predict [3]: 30.0Predict [6]: 40.0Question 5
Medium
When would you choose Naive Bayes over KNN, and vice versa?
Consider data type, dataset size, dimensionality, and speed requirements.
Choose Naive Bayes when: working with text data (spam filtering, sentiment analysis), you have many features (high dimensionality), you need fast training and prediction, you have limited training data, or you need probability estimates. Choose KNN when: the decision boundary is non-linear, you have few features (low dimensionality), you have enough training data, and prediction speed is not critical.
Question 6
Medium
What is the output?
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import make_moons
from sklearn.model_selection import cross_val_score
import numpy as np
X, y = make_moons(n_samples=300, noise=0.25, random_state=42)
nb_scores = cross_val_score(GaussianNB(), X, y, cv=5)
knn_scores = cross_val_score(KNeighborsClassifier(n_neighbors=5), X, y, cv=5)
print(f"Naive Bayes: {nb_scores.mean():.3f}")
print(f"KNN (K=5): {knn_scores.mean():.3f}")
print(f"Winner: {'KNN' if knn_scores.mean() > nb_scores.mean() else 'NB'}")Make_moons creates a non-linear, crescent-shaped boundary. Which algorithm handles non-linearity better?
Naive Bayes: 0.853KNN (K=5): 0.930Winner: KNNQuestion 7
Hard
What is Laplace smoothing in Naive Bayes, and why is it necessary?
What happens if a word appears in test data but was never seen in training data for a particular class?
Laplace smoothing (additive smoothing) adds a small count (alpha, usually 1) to every feature count. Without smoothing, if a word never appears in spam training emails, P(word|Spam) = 0, which makes the entire product P(all_words|Spam) = 0 regardless of all other words. One unseen word kills the prediction. With smoothing: P(word|Spam) = (count + alpha) / (total + alpha * vocabulary_size). Now the probability is small but not zero.
Question 8
Hard
What is the output?
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
import time
np.random.seed(42)
# Training time vs prediction time
train_sizes = [100, 1000, 10000]
for n in train_sizes:
X_train = np.random.randn(n, 5)
y_train = np.random.choice([0, 1], n)
X_test = np.random.randn(100, 5)
knn = KNeighborsClassifier(n_neighbors=5)
start = time.time()
knn.fit(X_train, y_train)
fit_time = time.time() - start
start = time.time()
knn.predict(X_test)
pred_time = time.time() - start
print(f"N={n:5d}: fit={fit_time:.4f}s, predict={pred_time:.4f}s")KNN fit is instant (just stores data). Prediction time grows with training set size.
N= 100: fit=0.0001s, predict=0.0012sN= 1000: fit=0.0002s, predict=0.0056sN=10000: fit=0.0003s, predict=0.0423sQuestion 9
Hard
How does KNN handle the bias-variance trade-off through the K parameter?
Small K has what kind of bias/variance? Large K?
K=1: Zero training error, low bias, high variance. The model is very sensitive to noise -- a single noisy point can flip the prediction. Complex decision boundary. Large K (e.g., K=N): Always predicts the majority class, high bias, zero variance. The model ignores all local patterns. Optimal K: Balances bias and variance. Found through cross-validation. Typically somewhere between sqrt(N) and a few dozen.
Question 10
Hard
What is the output?
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
import numpy as np
texts = ["good great excellent", "good movie", "great film",
"bad terrible awful", "bad movie", "terrible film"]
y = [1, 1, 1, 0, 0, 0] # 1=positive, 0=negative
vec = CountVectorizer()
X = vec.fit_transform(texts)
for alpha in [0.01, 1.0, 100.0]:
nb = MultinomialNB(alpha=alpha)
nb.fit(X, y)
test = vec.transform(["good terrible movie"])
proba = nb.predict_proba(test)[0]
pred = nb.predict(test)[0]
print(f"alpha={alpha:6.2f}: P(neg)={proba[0]:.3f}, P(pos)={proba[1]:.3f}, pred={pred}")Alpha controls smoothing. Small alpha = probabilities dominated by data. Large alpha = uniform probabilities.
alpha= 0.01: P(neg)=0.500, P(pos)=0.500, pred=0alpha= 1.00: P(neg)=0.500, P(pos)=0.500, pred=0alpha=100.00: P(neg)=0.500, P(pos)=0.500, pred=0Multiple Choice Questions
MCQ 1
In KNN with K=5, how is the prediction made for a new point?
Answer: B
B is correct. KNN finds the K=5 nearest training points based on distance. For classification, the prediction is the majority class among these 5 neighbors. For regression, it would be the average of their values.
B is correct. KNN finds the K=5 nearest training points based on distance. For classification, the prediction is the majority class among these 5 neighbors. For regression, it would be the average of their values.
MCQ 2
What happens during the KNN training phase?
Answer: C
C is correct. KNN is a lazy learner. The fit() method simply stores the training data. No model parameters are learned. All computation happens at prediction time when distances are calculated.
C is correct. KNN is a lazy learner. The fit() method simply stores the training data. No model parameters are learned. All computation happens at prediction time when distances are calculated.
MCQ 3
Which distance metric is most commonly used in KNN?
Answer: C
C is correct. Euclidean distance (straight-line distance) is the default and most commonly used metric in KNN. It works well for continuous features. Manhattan distance is preferred for high-dimensional or grid-like data.
C is correct. Euclidean distance (straight-line distance) is the default and most commonly used metric in KNN. It works well for continuous features. Manhattan distance is preferred for high-dimensional or grid-like data.
MCQ 4
What is the "naive" assumption in Naive Bayes?
Answer: B
B is correct. Naive Bayes assumes that features are conditionally independent given the class label. This means P(x1, x2|Class) = P(x1|Class) * P(x2|Class). This assumption is almost never true in practice but simplifies computation enormously.
B is correct. Naive Bayes assumes that features are conditionally independent given the class label. This means P(x1, x2|Class) = P(x1|Class) * P(x2|Class). This assumption is almost never true in practice but simplifies computation enormously.
MCQ 5
Which Naive Bayes variant is best for text classification with word counts?
Answer: B
B is correct. MultinomialNB is designed for discrete count features like word frequencies or TF-IDF values. GaussianNB is for continuous features. BernoulliNB is for binary features (word present/absent).
B is correct. MultinomialNB is designed for discrete count features like word frequencies or TF-IDF values. GaussianNB is for continuous features. BernoulliNB is for binary features (word present/absent).
MCQ 6
What is the effect of using K=1 in KNN?
Answer: A
A is correct. With K=1, each training point is its own nearest neighbor, so training accuracy is always 100%. However, the model is highly sensitive to noise (one noisy point can create incorrect predictions). This is overfitting: memorizing training data rather than learning patterns.
A is correct. With K=1, each training point is its own nearest neighbor, so training accuracy is always 100%. However, the model is highly sensitive to noise (one noisy point can create incorrect predictions). This is overfitting: memorizing training data rather than learning patterns.
MCQ 7
Why is feature scaling critical for KNN but not for Naive Bayes?
Answer: A
A is correct. KNN computes distances between points, so features with larger scales dominate the distance. Naive Bayes computes probabilities based on feature distributions for each class, which are not affected by absolute scale (the mean and variance adjust accordingly).
A is correct. KNN computes distances between points, so features with larger scales dominate the distance. Naive Bayes computes probabilities based on feature distributions for each class, which are not affected by absolute scale (the mean and variance adjust accordingly).
MCQ 8
What is Laplace smoothing (alpha) in Naive Bayes?
Answer: B
B is correct. Laplace smoothing adds alpha (typically 1) to every feature count. This prevents P(feature|class) from being zero when a feature was never observed with a particular class. Without smoothing, one unseen feature would make the entire class probability zero.
B is correct. Laplace smoothing adds alpha (typically 1) to every feature count. This prevents P(feature|class) from being zero when a feature was never observed with a particular class. Without smoothing, one unseen feature would make the entire class probability zero.
MCQ 9
What is the time complexity of KNN prediction for a single point with N training samples and D features?
Answer: C
C is correct. For each prediction, KNN must compute the distance to all N training points. Each distance computation takes O(D) time (comparing D features). Total: O(N * D). This makes prediction slow for large training sets.
C is correct. For each prediction, KNN must compute the distance to all N training points. Each distance computation takes O(D) time (comparing D features). Total: O(N * D). This makes prediction slow for large training sets.
MCQ 10
In Bayes' theorem P(A|B) = P(B|A)*P(A)/P(B), what is P(A) called?
Answer: C
C is correct. P(A) is the prior probability (our belief before seeing evidence). P(B|A) is the likelihood (probability of evidence given hypothesis). P(A|B) is the posterior (updated belief after seeing evidence). P(B) is the evidence (normalizing constant).
C is correct. P(A) is the prior probability (our belief before seeing evidence). P(B|A) is the likelihood (probability of evidence given hypothesis). P(A|B) is the posterior (updated belief after seeing evidence). P(B) is the evidence (normalizing constant).
MCQ 11
Why does KNN struggle with high-dimensional data (curse of dimensionality)?
Answer: B
B is correct. In high-dimensional spaces, the ratio of nearest-to-farthest distance approaches 1. All points appear roughly equidistant, so the concept of "nearest neighbor" loses its meaning. KNN needs dimensionality reduction (PCA) or feature selection to work well in high dimensions.
B is correct. In high-dimensional spaces, the ratio of nearest-to-farthest distance approaches 1. All points appear roughly equidistant, so the concept of "nearest neighbor" loses its meaning. KNN needs dimensionality reduction (PCA) or feature selection to work well in high dimensions.
MCQ 12
A Naive Bayes spam filter trained on English emails encounters a new word "cryptocurrency" never seen in training. With alpha=0, what happens?
Answer: B
B is correct. Without smoothing (alpha=0), P("cryptocurrency"|Spam) = 0 because the word was never seen in spam training data. Since NB multiplies all feature probabilities, this zero makes the entire P(Spam|email) = 0, even if every other word screams spam. This is exactly why Laplace smoothing is essential.
B is correct. Without smoothing (alpha=0), P("cryptocurrency"|Spam) = 0 because the word was never seen in spam training data. Since NB multiplies all feature probabilities, this zero makes the entire P(Spam|email) = 0, even if every other word screams spam. This is exactly why Laplace smoothing is essential.
MCQ 13
KNN uses algorithm='auto' by default in scikit-learn. What data structure does it use for efficient neighbor search?
Answer: C
C is correct. Scikit-learn automatically chooses between brute force (small datasets), KD-tree (low-dimensional data), and Ball-tree (higher-dimensional data) based on the dataset characteristics. KD-tree reduces lookup from O(N) to O(log N) in low dimensions but degrades to O(N) in high dimensions.
C is correct. Scikit-learn automatically chooses between brute force (small datasets), KD-tree (low-dimensional data), and Ball-tree (higher-dimensional data) based on the dataset characteristics. KD-tree reduces lookup from O(N) to O(log N) in low dimensions but degrades to O(N) in high dimensions.
MCQ 14
Why does Naive Bayes often work well despite the independence assumption being violated?
Answer: C
C is correct. Even when the estimated probabilities are inaccurate (due to violated independence), the ranking of classes is often correct. Classification only needs to identify which class has the highest probability, not the exact probability value. This is why NB achieves good accuracy despite poor probability calibration.
C is correct. Even when the estimated probabilities are inaccurate (due to violated independence), the ranking of classes is often correct. Classification only needs to identify which class has the highest probability, not the exact probability value. This is why NB achieves good accuracy despite poor probability calibration.
MCQ 15
Which of the following problems is Naive Bayes least suitable for?
Answer: C
C is correct. Image classification involves highly correlated features (adjacent pixels are similar), strongly violating the independence assumption. Also, pixel values do not follow Gaussian distributions. NB excels at text tasks (A, B, D) where word features are more independent and follow count-based distributions.
C is correct. Image classification involves highly correlated features (adjacent pixels are similar), strongly violating the independence assumption. Also, pixel values do not follow Gaussian distributions. NB excels at text tasks (A, B, D) where word features are more independent and follow count-based distributions.
MCQ 16
What is the key disadvantage of KNN compared to model-based classifiers like logistic regression?
Answer: B
B is correct. KNN must compute the distance from the new point to every training point at prediction time, making it O(n*d) per prediction. Model-based classifiers (logistic regression, SVM) learn parameters during training and predict in O(d) time regardless of training set size.
B is correct. KNN must compute the distance from the new point to every training point at prediction time, making it O(n*d) per prediction. Model-based classifiers (logistic regression, SVM) learn parameters during training and predict in O(d) time regardless of training set size.
MCQ 17
Which Naive Bayes variant would you use for a dataset where features are binary (0 or 1)?
Answer: C
C is correct. BernoulliNB is specifically designed for binary features. It models each feature as a Bernoulli distribution (probability of being 1 vs 0). GaussianNB assumes continuous features. MultinomialNB assumes count features. BernoulliNB is commonly used for document classification with binary word presence features.
C is correct. BernoulliNB is specifically designed for binary features. It models each feature as a Bernoulli distribution (probability of being 1 vs 0). GaussianNB assumes continuous features. MultinomialNB assumes count features. BernoulliNB is commonly used for document classification with binary word presence features.
MCQ 18
What does K represent in K-Nearest Neighbors?
Answer: C
C is correct. K is the number of nearest training points that vote on the prediction. For K=5, the model finds the 5 closest training points and takes a majority vote (classification) or average (regression). K is a hyperparameter that must be chosen by the user.
C is correct. K is the number of nearest training points that vote on the prediction. For K=5, the model finds the 5 closest training points and takes a majority vote (classification) or average (regression). K is a hyperparameter that must be chosen by the user.
Coding Challenges
Coding challenges coming soon.
Need to Review the Concepts?
Go back to the detailed notes for this chapter.
Read Chapter NotesWant to learn AI and ML with a live mentor?
Explore our AI/ML Masterclass