What Is It?
What Is K-Nearest Neighbors (KNN)?
K-Nearest Neighbors is one of the simplest and most intuitive ML algorithms. The idea is: to classify a new point, look at the K closest training points and let them vote. The majority class among those K neighbors is the prediction.
# KNN in one sentence:
# "Tell me who your neighbors are, and I will tell you who you are."
#
# How KNN classifies a new point (star):
#
# x x x o o
# x x o o
# x * o o (K=3: 2 nearest are 'x', 1 is 'o')
# x x o o -> Prediction: 'x' (majority vote)
# o o
#
# K=3: look at the 3 closest points, majority class winsKNN is a lazy learner: it does not actually learn a model during training. It just stores the training data. All the computation happens at prediction time when it searches for the K nearest neighbors.
What Is Naive Bayes?
Naive Bayes is a probabilistic classifier based on Bayes' theorem. It calculates the probability of each class given the features, and predicts the class with the highest probability. The "naive" part comes from the assumption that all features are independent of each other (which is rarely true in practice, yet the algorithm works surprisingly well).
# Bayes' Theorem:
# P(Class | Features) = P(Features | Class) * P(Class) / P(Features)
#
# For classification, we compare:
# P(Spam | words) vs P(Not Spam | words)
#
# The class with higher posterior probability wins.
#
# "Naive" assumption: features are independent
# P(word1, word2, word3 | Spam) = P(word1|Spam) * P(word2|Spam) * P(word3|Spam)
# This simplification makes computation tractable.
Why Does It Matter?
Why Learn KNN and Naive Bayes?
1. KNN: Zero Training, Instant Understanding
KNN requires no training phase at all. Aarav can explain it to anyone in 30 seconds: "Find the K closest points and vote." This makes it perfect for prototyping and baselines. When a new ML project starts, KNN is often the first algorithm tried.
2. Naive Bayes: Fast, Scalable, and Surprisingly Effective
Naive Bayes is extremely fast to train (one pass through the data) and predict (just multiply probabilities). Despite its "naive" independence assumption, it works remarkably well for text classification, spam filtering, and sentiment analysis. Gmail's spam filter was originally based on Naive Bayes.
3. Different Strengths, Perfect for Comparison
KNN is distance-based (geometric intuition) while Naive Bayes is probability-based (statistical intuition). Understanding both gives you two fundamentally different approaches to classification. Comparing them on the same dataset teaches you when each approach excels.
4. Foundation for Advanced Methods
KNN is the foundation for recommendation systems ("users who bought this also bought..."). Naive Bayes is the foundation for more sophisticated probabilistic models like Bayesian networks and topic models (LDA). Understanding these simple versions prepares you for the advanced ones.
5. Interview Favorites
Both KNN and Naive Bayes are extremely popular in ML interviews. Interviewers love asking: "Implement KNN from scratch", "Explain Bayes' theorem with an example", "When would KNN fail?", "Why is it called naive?"
Detailed Explanation
Detailed Explanation
Part 1: K-Nearest Neighbors (KNN)
1.1 How KNN Works
# KNN Algorithm:
# 1. Store all training data (no actual training)
# 2. For a new point:
# a. Calculate distance from the new point to ALL training points
# b. Sort by distance
# c. Pick the K closest points (neighbors)
# d. For classification: majority vote among K neighbors
# e. For regression: average of K neighbors' values1.2 Distance Metrics
KNN needs a way to measure "closeness". The most common distance metrics are:
# Euclidean Distance (L2): sqrt(sum((x_i - y_i)^2))
# - Most common, works well for continuous features
# - Sensitive to scale (must normalize features!)
# Example: distance between (1,2) and (4,6)
# = sqrt((4-1)^2 + (6-2)^2) = sqrt(9+16) = sqrt(25) = 5.0
# Manhattan Distance (L1): sum(|x_i - y_i|)
# - Sum of absolute differences
# - Better for high-dimensional data or grid-like movement
# Example: distance between (1,2) and (4,6)
# = |4-1| + |6-2| = 3 + 4 = 7
# Minkowski Distance (Lp): (sum(|x_i - y_i|^p))^(1/p)
# - Generalization: p=2 is Euclidean, p=1 is Manhattan1.3 Choosing K
# Small K (e.g., K=1):
# - Very sensitive to noise (one noisy neighbor can flip the prediction)
# - Complex decision boundary (overfitting)
# - Low bias, high variance
#
# Large K (e.g., K=50):
# - Very smooth boundary (may miss local patterns)
# - Less sensitive to noise (underfitting)
# - High bias, low variance
#
# Rule of thumb: start with K = sqrt(N) where N = training set size
# Always use odd K for binary classification (avoids ties)
# Use cross-validation to find the best K1.4 Curse of Dimensionality
KNN struggles in high-dimensional spaces. As the number of features increases, the concept of "distance" becomes meaningless because all points become approximately equidistant. This is the curse of dimensionality.
# In 1D: 10 points cover the space well
# In 2D: need 10^2 = 100 points
# In 10D: need 10^10 points to cover the space!
#
# Solution: use dimensionality reduction (PCA) before KNN
# or use feature selection to reduce the number of features1.5 Importance of Scaling
Since KNN uses distances, feature scaling is critical. A feature with range [0, 100000] will dominate over a feature with range [0, 1].
Part 2: Naive Bayes
2.1 Bayes' Theorem Recap
# Bayes' Theorem:
# P(A|B) = P(B|A) * P(A) / P(B)
#
# In ML terms:
# P(Class|Features) = P(Features|Class) * P(Class) / P(Features)
#
# posterior = likelihood * prior / evidence
#
# Since P(Features) is the same for all classes, we compare:
# P(Class_k|Features) proportional to P(Features|Class_k) * P(Class_k)
#
# Example: Is this email spam?
# P(Spam|"free money") proportional to P("free money"|Spam) * P(Spam)
# P(Not Spam|"free money") proportional to P("free money"|Not Spam) * P(Not Spam)2.2 The "Naive" Independence Assumption
Computing P(Features|Class) for many features simultaneously is intractable. The naive assumption simplifies this by assuming all features are independent:
# Without naive assumption (intractable):
# P(x1, x2, x3, ..., xn | Class) = ??? (need exponential data)
#
# With naive assumption:
# P(x1, x2, x3, ..., xn | Class) = P(x1|Class) * P(x2|Class) * ... * P(xn|Class)
#
# Each P(xi|Class) can be estimated from data easily!
# This is why it is called "Naive" Bayes.2.3 Types of Naive Bayes
# 1. Gaussian Naive Bayes (GaussianNB)
# - Features are continuous and assumed to follow a normal distribution
# - P(x_i|Class) = (1/sqrt(2*pi*sigma^2)) * exp(-(x_i - mu)^2 / (2*sigma^2))
# - Use for: continuous features (height, weight, temperature)
#
# 2. Multinomial Naive Bayes (MultinomialNB)
# - Features are discrete counts (word frequencies, occurrence counts)
# - P(x_i|Class) = (count of x_i in Class + alpha) / (total words in Class + alpha*vocab_size)
# - Use for: text classification (bag-of-words, TF-IDF)
#
# 3. Bernoulli Naive Bayes (BernoulliNB)
# - Features are binary (present/absent, 0/1)
# - P(x_i=1|Class) = (count of docs with x_i in Class + alpha) / (total docs in Class + 2*alpha)
# - Use for: binary features, document classification (word present or not)2.4 Laplace Smoothing
If a word never appears in spam emails during training, P(word|Spam) = 0, which would make the entire product 0 (one unseen word kills the entire prediction). Laplace smoothing (alpha=1) adds a small count to prevent zero probabilities.
# Without smoothing: P("cryptocurrency"|Spam) = 0/100 = 0
# With smoothing (alpha=1): P("cryptocurrency"|Spam) = (0+1)/(100+V) where V = vocab size
# Now the probability is small but not zero.
Code Examples
import numpy as np
from collections import Counter
class KNNClassifier:
def __init__(self, k=3):
self.k = k
def fit(self, X, y):
"""Store training data (no actual training)."""
self.X_train = np.array(X)
self.y_train = np.array(y)
def euclidean_distance(self, x1, x2):
return np.sqrt(np.sum((x1 - x2) ** 2))
def predict_single(self, x):
# Calculate distances to all training points
distances = [self.euclidean_distance(x, x_train)
for x_train in self.X_train]
# Get K nearest neighbor indices
k_indices = np.argsort(distances)[:self.k]
k_labels = self.y_train[k_indices]
# Majority vote
most_common = Counter(k_labels).most_common(1)
return most_common[0][0]
def predict(self, X):
return np.array([self.predict_single(x) for x in X])
def score(self, X, y):
predictions = self.predict(X)
return np.mean(predictions == y)
# Generate data
np.random.seed(42)
# Class 0: centered around (2, 2)
X0 = np.random.randn(30, 2) + np.array([2, 2])
# Class 1: centered around (6, 6)
X1 = np.random.randn(30, 2) + np.array([6, 6])
X = np.vstack([X0, X1])
y = np.array([0]*30 + [1]*30)
# Shuffle
idx = np.random.permutation(60)
X, y = X[idx], y[idx]
# Split
X_train, X_test = X[:48], X[48:]
y_train, y_test = y[:48], y[48:]
# Train and evaluate with different K
for k in [1, 3, 5, 7, 11]:
knn = KNNClassifier(k=k)
knn.fit(X_train, y_train)
train_acc = knn.score(X_train, y_train)
test_acc = knn.score(X_test, y_test)
print(f"K={k:2d}: Train={train_acc:.3f}, Test={test_acc:.3f}")
# Predict for a specific point
knn = KNNClassifier(k=3)
knn.fit(X_train, y_train)
point = np.array([4, 4])
pred = knn.predict_single(point)
print(f"\nPrediction for point {point}: Class {pred}")import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris
from sklearn.metrics import classification_report
# Load iris dataset
iris = load_iris()
X, y = iris.data, iris.target
# Split and scale
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_test_s = scaler.transform(X_test)
# Find optimal K using cross-validation
k_range = range(1, 31)
cv_scores = []
for k in k_range:
knn = KNeighborsClassifier(n_neighbors=k)
scores = cross_val_score(knn, X_train_s, y_train, cv=5)
cv_scores.append(scores.mean())
best_k = k_range[np.argmax(cv_scores)]
print(f"Best K: {best_k} (CV accuracy: {max(cv_scores):.4f})")
# Plot K vs accuracy
plt.figure(figsize=(10, 5))
plt.plot(k_range, cv_scores, 'bo-', markersize=4)
plt.axvline(x=best_k, color='r', linestyle='--', label=f'Best K={best_k}')
plt.xlabel('K (Number of Neighbors)')
plt.ylabel('Cross-Validation Accuracy')
plt.title('KNN: Finding Optimal K')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
# Train final model with best K
knn_best = KNeighborsClassifier(n_neighbors=best_k)
knn_best.fit(X_train_s, y_train)
y_pred = knn_best.predict(X_test_s)
print(f"\nTest Accuracy: {knn_best.score(X_test_s, y_test):.4f}")
print(f"\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))import numpy as np
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
from sklearn.datasets import load_iris, fetch_20newsgroups
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.metrics import accuracy_score
# 1. Gaussian Naive Bayes (continuous features)
print("=== Gaussian Naive Bayes ===")
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
iris.data, iris.target, test_size=0.3, random_state=42
)
gnb = GaussianNB()
gnb.fit(X_train, y_train)
print(f"Iris accuracy: {gnb.score(X_test, y_test):.4f}")
print(f"Class prior probabilities: {np.round(gnb.class_prior_, 3)}")
print(f"Feature means per class:")
for i, name in enumerate(iris.target_names):
print(f" {name}: {np.round(gnb.theta_[i], 2)}")
# 2. Multinomial Naive Bayes (word counts / TF-IDF)
print(f"\n=== Multinomial Naive Bayes ===")
texts = [
"free money win lottery prize",
"congratulations you won free gift",
"claim your free reward now",
"limited offer free discount sale",
"meeting scheduled for tomorrow",
"project deadline next week",
"team lunch plans for friday",
"quarterly report due monday",
"free lunch tomorrow everyone invited",
"submit your expense report free"
]
labels = [1, 1, 1, 1, 0, 0, 0, 0, 0, 0] # 1=spam, 0=not spam
vectorizer = CountVectorizer()
X_text = vectorizer.fit_transform(texts)
mnb = MultinomialNB(alpha=1.0) # alpha=1 is Laplace smoothing
mnb.fit(X_text, labels)
# Predict new emails
new_emails = ["free money for you", "meeting tomorrow at office"]
X_new = vectorizer.transform(new_emails)
predictions = mnb.predict(X_new)
for email, pred in zip(new_emails, predictions):
print(f" '{email}' -> {'Spam' if pred == 1 else 'Not Spam'}")
# 3. Bernoulli Naive Bayes (binary features)
print(f"\n=== Bernoulli Naive Bayes ===")
# Create binary features (word present or not)
X_binary = (X_text > 0).astype(int)
bnb = BernoulliNB(alpha=1.0)
bnb.fit(X_binary, labels)
print(f"Bernoulli NB accuracy on training: {bnb.score(X_binary, labels):.4f}")import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_classification, make_moons
import time
print("=== Dataset 1: Linear Pattern ===")
X1, y1 = make_classification(n_samples=500, n_features=10,
n_informative=5, random_state=42)
scaler = StandardScaler()
X1_s = scaler.fit_transform(X1)
knn = KNeighborsClassifier(n_neighbors=5)
nb = GaussianNB()
knn_scores = cross_val_score(knn, X1_s, y1, cv=5)
nb_scores = cross_val_score(nb, X1, y1, cv=5) # NB doesn't need scaling
print(f"KNN (K=5): {knn_scores.mean():.3f} +/- {knn_scores.std():.3f}")
print(f"Naive Bayes: {nb_scores.mean():.3f} +/- {nb_scores.std():.3f}")
print(f"\n=== Dataset 2: Non-linear Pattern (Moons) ===")
X2, y2 = make_moons(n_samples=500, noise=0.3, random_state=42)
X2_s = scaler.fit_transform(X2)
knn_scores2 = cross_val_score(knn, X2_s, y2, cv=5)
nb_scores2 = cross_val_score(nb, X2, y2, cv=5)
print(f"KNN (K=5): {knn_scores2.mean():.3f} +/- {knn_scores2.std():.3f}")
print(f"Naive Bayes: {nb_scores2.mean():.3f} +/- {nb_scores2.std():.3f}")
print(f"\n=== Speed Comparison (10,000 samples) ===")
X3, y3 = make_classification(n_samples=10000, n_features=20, random_state=42)
X3_s = scaler.fit_transform(X3)
start = time.time()
knn.fit(X3_s[:8000], y3[:8000])
knn.predict(X3_s[8000:])
t_knn = time.time() - start
start = time.time()
nb.fit(X3[:8000], y3[:8000])
nb.predict(X3[8000:])
t_nb = time.time() - start
print(f"KNN: {t_knn:.4f}s")
print(f"NB: {t_nb:.4f}s")
print(f"NB is {t_knn/t_nb:.1f}x faster")
print(f"\n=== Summary ===")
print("KNN: Better for non-linear patterns, needs scaling, slow at prediction")
print("NB: Better for high-dimensional/text data, fast, no scaling needed")import numpy as np
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.pipeline import Pipeline
# Simulated email dataset
emails = [
# Spam emails
"Congratulations you have won a free iPhone click here",
"Free money transfer to your account immediately",
"Win a free vacation package limited time offer",
"Claim your prize money now free gift card",
"Earn money from home free registration no experience",
"Free trial offer click below to claim your reward",
"You are selected for a free laptop giveaway",
"Urgent your account will be closed send password",
"Make money fast guaranteed income work from home",
"Limited offer discount sale free shipping today only",
"Free casino bonus play now and win big prizes",
"Congratulations your email won the lottery jackpot",
"Double your income with this free investment guide",
"Free credit score check no obligation required",
"Get rich quick scheme guaranteed money back",
# Not spam emails
"Meeting scheduled for tomorrow at 3 PM in conference room",
"Please review the quarterly report and provide feedback",
"Team lunch at the new restaurant this Friday",
"Reminder project deadline is next Monday",
"Can you send me the updated budget spreadsheet",
"The client meeting has been rescheduled to Thursday",
"Please find the attached invoice for last month",
"Happy birthday Priya team celebration at noon",
"We need to discuss the new feature requirements",
"The server maintenance is scheduled for this weekend",
"Updated the documentation please review before release",
"Great job on the presentation yesterday everyone",
"Could you help me debug this API endpoint issue",
"The code review for the payment module is pending",
"Welcome Rohan to the engineering team orientation Monday",
]
labels = [1]*15 + [0]*15 # 1=spam, 0=not spam
# Split data
X_train, X_test, y_train, y_test = train_test_split(
emails, labels, test_size=0.3, random_state=42, stratify=labels
)
# Create pipeline: TF-IDF + Multinomial NB
pipeline = Pipeline([
('tfidf', TfidfVectorizer(stop_words='english', max_features=500)),
('nb', MultinomialNB(alpha=1.0))
])
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
print(f"Accuracy: {pipeline.score(X_test, y_test):.4f}")
print(f"\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=['Not Spam', 'Spam']))
print(f"Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
# Predict new emails
new_emails = [
"Free money waiting for you click now",
"Can we reschedule tomorrow's meeting to 4 PM",
"Congratulations you won a free trip to Goa",
"The deployment was successful no issues found"
]
print(f"\nNew email predictions:")
predictions = pipeline.predict(new_emails)
proba = pipeline.predict_proba(new_emails)
for email, pred, prob in zip(new_emails, predictions, proba):
label = 'SPAM' if pred == 1 else 'NOT SPAM'
conf = max(prob) * 100
print(f" [{label:8s}] ({conf:.0f}% confident) \"{email}\"")import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler
# Custom distance calculation
def euclidean(a, b):
return np.sqrt(np.sum((a - b) ** 2))
def manhattan(a, b):
return np.sum(np.abs(a - b))
def chebyshev(a, b):
return np.max(np.abs(a - b))
# Compare distances for two points
p1 = np.array([1, 2])
p2 = np.array([4, 6])
print("Distance from (1,2) to (4,6):")
print(f" Euclidean: {euclidean(p1, p2):.2f}")
print(f" Manhattan: {manhattan(p1, p2):.2f}")
print(f" Chebyshev: {chebyshev(p1, p2):.2f}")
# Compare metrics on a dataset
X, y = make_classification(n_samples=300, n_features=10,
n_informative=5, random_state=42)
scaler = StandardScaler()
X_s = scaler.fit_transform(X)
print(f"\nCross-validation accuracy by distance metric:")
metrics = ['euclidean', 'manhattan', 'chebyshev', 'minkowski']
for metric in metrics:
knn = KNeighborsClassifier(n_neighbors=5, metric=metric)
scores = cross_val_score(knn, X_s, y, cv=5)
print(f" {metric:12s}: {scores.mean():.3f} +/- {scores.std():.3f}")
# Visualize what K-neighborhoods look like
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
np.random.seed(42)
X_vis = np.random.randn(50, 2)
query = np.array([0, 0])
for ax, metric_name, metric_func in zip(axes,
['Euclidean', 'Manhattan', 'Chebyshev'],
[euclidean, manhattan, chebyshev]):
distances = [metric_func(query, x) for x in X_vis]
sorted_idx = np.argsort(distances)
k_nearest = sorted_idx[:5]
ax.scatter(X_vis[:, 0], X_vis[:, 1], c='gray', alpha=0.5, s=30)
ax.scatter(X_vis[k_nearest, 0], X_vis[k_nearest, 1],
c='red', s=100, zorder=5, label='5 Nearest')
ax.scatter(*query, c='blue', s=200, marker='*', zorder=6, label='Query')
ax.set_title(f'{metric_name} Distance')
ax.legend()
ax.grid(True, alpha=0.3)
ax.set_aspect('equal')
plt.tight_layout()
plt.show()import numpy as np
def bayes_theorem(prior, likelihood, evidence):
"""Calculate posterior probability using Bayes' theorem."""
return (likelihood * prior) / evidence
# Example: Medical test for a disease
# - Disease prevalence (prior): 1% of population has the disease
# - Test sensitivity (P(positive|disease)): 99%
# - Test specificity (P(negative|no disease)): 95%
P_disease = 0.01 # Prior: P(Disease)
P_no_disease = 0.99 # Prior: P(No Disease)
P_pos_given_disease = 0.99 # Sensitivity: P(Positive | Disease)
P_neg_given_no_disease = 0.95 # Specificity: P(Negative | No Disease)
P_pos_given_no_disease = 1 - P_neg_given_no_disease # False positive rate = 0.05
# P(Positive) = P(Pos|Disease)*P(Disease) + P(Pos|No Disease)*P(No Disease)
P_positive = P_pos_given_disease * P_disease + P_pos_given_no_disease * P_no_disease
# Bayes' Theorem: P(Disease | Positive)
P_disease_given_pos = bayes_theorem(P_disease, P_pos_given_disease, P_positive)
print("Medical Diagnosis Example")
print("=" * 50)
print(f"Disease prevalence: {P_disease:.2%}")
print(f"Test sensitivity: {P_pos_given_disease:.2%}")
print(f"Test specificity: {P_neg_given_no_disease:.2%}")
print(f"False positive rate: {P_pos_given_no_disease:.2%}")
print(f"\nP(Positive test): {P_positive:.4f} ({P_positive:.2%})")
print(f"P(Disease | Positive): {P_disease_given_pos:.4f} ({P_disease_given_pos:.2%})")
print(f"\nSurprising result: Even with a 99% accurate test,")
print(f"a positive result only means {P_disease_given_pos:.1%} chance of disease!")
print(f"This is because the disease is rare (1%), so most positives are false positives.")
# What if we test again (independent test)?
P_disease_after_second = bayes_theorem(
P_disease_given_pos, # New prior after first test
P_pos_given_disease, # Same sensitivity
P_pos_given_disease * P_disease_given_pos + P_pos_given_no_disease * (1 - P_disease_given_pos)
)
print(f"\nAfter SECOND positive test: {P_disease_after_second:.1%} chance of disease")Common Mistakes
Not Scaling Features Before KNN
from sklearn.neighbors import KNeighborsClassifier
# Feature 1: age (20-60), Feature 2: salary (20000-200000)
X = [[25, 30000], [30, 45000], [50, 150000]]
y = [0, 0, 1]
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X, y)
# Distance dominated by salary (range 170,000 vs age range 30)from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
X = [[25, 30000], [30, 45000], [50, 150000]]
y = [0, 0, 1]
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_scaled, y)
# Now both features contribute equally to distanceUsing Even K for Binary Classification
from sklearn.neighbors import KNeighborsClassifier
# K=4 can cause ties in binary classification
knn = KNeighborsClassifier(n_neighbors=4) # 2 vs 2 tie possible!
knn.fit(X_train, y_train)
# If 2 neighbors are class 0 and 2 are class 1, the result is ambiguousfrom sklearn.neighbors import KNeighborsClassifier
# Use odd K for binary classification to avoid ties
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
# With K=5, there is always a majority (3 vs 2 or 4 vs 1 or 5 vs 0)Applying Naive Bayes to Data with Strong Feature Dependencies
from sklearn.naive_bayes import GaussianNB
# Features are highly correlated (height in cm AND height in inches)
# NB assumes independence -- this violates the assumption badly
X = [[170, 67], [180, 71], [160, 63], [175, 69]] # cm, inches
y = [0, 1, 0, 1]
nb = GaussianNB()
nb.fit(X, y)
# The model double-counts the height informationfrom sklearn.naive_bayes import GaussianNB
# Remove one of the correlated features
# Keep only height in cm
X = [[170], [180], [160], [175]]
y = [0, 1, 0, 1]
nb = GaussianNB()
nb.fit(X, y)
# Or use feature selection/PCA to remove correlated featuresUsing MultinomialNB with Negative Feature Values
from sklearn.naive_bayes import MultinomialNB
from sklearn.preprocessing import StandardScaler
# StandardScaler creates negative values
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X) # Contains negative values!
mnb = MultinomialNB()
mnb.fit(X_scaled, y) # ValueError: negative values!from sklearn.naive_bayes import MultinomialNB, GaussianNB
from sklearn.preprocessing import MinMaxScaler
# Option 1: Use MinMaxScaler (all values >= 0)
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
mnb = MultinomialNB()
mnb.fit(X_scaled, y) # Works!
# Option 2: Use GaussianNB (handles negative values)
gnb = GaussianNB()
gnb.fit(X, y) # Works with any values!Summary
- KNN classifies by finding the K closest training points and taking a majority vote. It is a lazy learner: no model is trained, all computation happens at prediction time.
- Distance metrics: Euclidean (sqrt of sum of squared differences, most common), Manhattan (sum of absolute differences, good for high dimensions), Minkowski (generalization).
- Choosing K: Small K (1-3) overfits (complex boundary, noisy). Large K underfits (smooth boundary, misses patterns). Use cross-validation to find optimal K. Use odd K for binary classification.
- KNN requires feature scaling (distances are scale-dependent). It suffers from the curse of dimensionality in high-dimensional spaces.
- Naive Bayes uses Bayes theorem: P(Class|Features) is proportional to P(Features|Class) * P(Class). It assumes feature independence ("naive").
- Three types: GaussianNB (continuous features, assumes normal distribution), MultinomialNB (word counts/TF-IDF, for text), BernoulliNB (binary features, word present/absent).
- Laplace smoothing (alpha=1) prevents zero probabilities for unseen features. Without it, one unseen word makes the entire probability zero.
- Naive Bayes is extremely fast (training is O(n*d), prediction is O(d)) and works well for text classification, spam filtering, and high-dimensional data.
- KNN vs Naive Bayes: KNN is better for non-linear patterns and small datasets. NB is better for text data, high-dimensional data, and when speed matters.
- KNN prediction is slow (O(n*d) per prediction -- must compute distance to all training points). Consider using KD-tree or Ball-tree for faster lookup.
- Both algorithms are strong baselines and frequently asked about in ML interviews. Know how to implement KNN from scratch and explain Bayes theorem with a real example.