Chapter 11 Intermediate 55 Questions

Practice Questions — Unsupervised Learning - Clustering (K-Means, DBSCAN, Hierarchical)

← Back to Notes

10 Easy

13 Medium

11 Hard

Topic-Specific Questions

Question 1

Easy

What is the output of the following code?

from sklearn.cluster import KMeans
import numpy as np

X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
km = KMeans(n_clusters=2, random_state=42, n_init=10)
labels = km.fit_predict(X)
print(labels)

K-Means groups points that are close together. The first three points are near x=1, the last three near x=10.

[1 1 1 0 0 0] (or [0 0 0 1 1 1] -- cluster label numbers may vary but the grouping is the same)

Question 2

Easy

What is the fundamental difference between supervised and unsupervised learning?

Think about whether the training data has labels or not.

In supervised learning, the training data has labels (known outputs for each input), and the model learns to predict these labels. In unsupervised learning, the training data has no labels -- the model must discover patterns, structures, or groupings on its own.

Question 3

Easy

Name three real-world applications of clustering.

Think about grouping customers, detecting fraud, and organizing documents.

1. Customer segmentation: Grouping customers by purchasing behavior for targeted marketing. 2. Anomaly detection: Identifying fraud transactions that do not belong to any normal cluster. 3. Document grouping: Organizing news articles or emails into topic groups automatically.

Question 4

Easy

What does the K in K-Means represent?

It is a parameter you must choose before running the algorithm.

K represents the number of clusters you want the algorithm to find. You must specify K before running K-Means. The algorithm then partitions the data into exactly K groups.

Question 5

Easy

What is the output?

from sklearn.cluster import KMeans
import numpy as np

X = np.array([[0, 0], [1, 0], [0, 1], [10, 10], [11, 10], [10, 11]])
km = KMeans(n_clusters=2, random_state=0, n_init=10)
km.fit(X)
print(km.cluster_centers_.shape)
print(km.inertia_)

cluster_centers_ has shape (K, n_features). Inertia is the sum of squared distances to centroids.

(2, 2)
A small inertia value (approximately 2.0 to 4.0) because the clusters are tight.

Question 6

Easy

What is the elbow method and why is it used?

It helps choose the number of clusters K by looking at how inertia changes.

The elbow method plots the number of clusters (K) on the x-axis versus the inertia (within-cluster sum of squared distances) on the y-axis. As K increases, inertia decreases. The elbow point is where the rate of decrease sharply changes (the curve bends). This suggests the optimal K beyond which adding more clusters gives diminishing returns.

Question 7

Medium

What is the output?

from sklearn.cluster import DBSCAN
import numpy as np

X = np.array([[1, 2], [2, 2], [2, 3], [8, 7], [8, 8], [25, 80]])
db = DBSCAN(eps=1.5, min_samples=2)
labels = db.fit_predict(X)
print(labels)
print(f"Noise points: {list(labels).count(-1)}")

Points within eps=1.5 of each other form clusters. The point at [25, 80] is far from everything else.

[0 0 0 1 1 -1]
Noise points: 1

Question 8

Medium

What are the two key parameters of DBSCAN, and what does each control?

One controls the neighborhood radius and the other controls the minimum density.

eps (epsilon): The maximum distance between two points for them to be considered neighbors. min_samples: The minimum number of points within eps distance required for a point to be considered a core point (part of a dense region).

Question 9

Medium

What is the difference between a core point, a border point, and a noise point in DBSCAN?

It depends on how many neighbors a point has within eps distance.

Core point: Has at least min_samples points within eps distance (including itself). It is in the dense interior of a cluster. Border point: Has fewer than min_samples neighbors but is within eps distance of a core point. It sits on the edge of a cluster. Noise point: Has fewer than min_samples neighbors and is not within eps of any core point. Labeled -1 by DBSCAN.

Question 10

Medium

What is the output?

from sklearn.cluster import AgglomerativeClustering
import numpy as np

X = np.array([[1, 1], [1.5, 1.5], [5, 5], [5.5, 5.5], [9, 9], [9.5, 9.5]])
agg = AgglomerativeClustering(n_clusters=3, linkage='ward')
labels = agg.fit_predict(X)
print(labels)

Ward linkage minimizes within-cluster variance. Points near each other get grouped together.

[0 0 1 1 2 2] (label numbers may vary but pairs are grouped together)

Question 11

Medium

What is the silhouette score and what range of values can it take?

It measures how similar a point is to its own cluster compared to other clusters.

The silhouette score measures clustering quality. For each point, it compares the average distance to points in its own cluster (a) versus the average distance to points in the nearest other cluster (b). The formula is: (b - a) / max(a, b). The score ranges from -1 to +1. +1 means the point is far from other clusters and close to its own. 0 means it is on the boundary. -1 means it is likely in the wrong cluster.

Question 12

Medium

What is the output?

from sklearn.cluster import KMeans
import numpy as np

X = np.array([[1, 1], [1, 2], [2, 1], [10, 10], [10, 11], [11, 10]])
km = KMeans(n_clusters=2, random_state=42, n_init=10)
km.fit(X)
print(km.predict([[1.5, 1.5]]))
print(km.predict([[10.5, 10.5]]))

predict() assigns new points to the nearest existing centroid.

[1] (or [0]) for the point near (1,1)
[0] (or [1]) for the point near (10,10)
The two predictions will have different labels.

Question 13

Medium

Write code to find the optimal K for K-Means using silhouette score. Use make_blobs(n_samples=300, centers=4, random_state=42). Test K from 2 to 8 and print the best K.

Loop through K values, compute silhouette_score for each, and pick the K with the highest score.

from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from sklearn.metrics import silhouette_score

X, _ = make_blobs(n_samples=300, centers=4, random_state=42)

best_k = 2
best_score = -1
for k in range(2, 9):
    km = KMeans(n_clusters=k, random_state=42, n_init=10)
    labels = km.fit_predict(X)
    score = silhouette_score(X, labels)
    print(f"K={k}: silhouette={score:.4f}")
    if score > best_score:
        best_score = score
        best_k = k

print(f"Best K: {best_k} with silhouette score: {best_score:.4f}")

Question 14

Hard

Explain why K-Means fails on crescent-shaped (moon) data but DBSCAN succeeds.

Think about how each algorithm defines cluster membership.

K-Means assigns each point to the nearest centroid and creates Voronoi partitions (straight-line boundaries). For crescent shapes, the centroid of each crescent is in the space between them, causing K-Means to draw a straight line that splits both crescents incorrectly. DBSCAN follows the density of points along each crescent. Since points within each crescent are densely connected, DBSCAN traces the shape correctly regardless of curvature.

Question 15

Hard

What is the difference between agglomerative and divisive hierarchical clustering? Which is more commonly used and why?

One starts from individual points and merges; the other starts from one big cluster and splits.

Agglomerative (bottom-up): Starts with each data point as its own cluster. At each step, the two closest clusters are merged. Continues until all points are in one cluster. Divisive (top-down): Starts with all points in one cluster. At each step, the most heterogeneous cluster is split. Continues until each point is its own cluster. Agglomerative is more commonly used because it is computationally cheaper -- O(n^2 log n) for agglomerative vs O(2^n) for optimal divisive splits.

Question 16

Hard

Write code to apply DBSCAN on make_moons(n_samples=300, noise=0.1, random_state=42). Scale the data, find the clusters, and print the number of clusters and noise points.

Use StandardScaler before DBSCAN. Count clusters excluding the -1 label.

from sklearn.cluster import DBSCAN
from sklearn.datasets import make_moons
from sklearn.preprocessing import StandardScaler

X, _ = make_moons(n_samples=300, noise=0.1, random_state=42)
X_scaled = StandardScaler().fit_transform(X)

db = DBSCAN(eps=0.3, min_samples=5)
labels = db.fit_predict(X_scaled)

n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
n_noise = list(labels).count(-1)

print(f"Clusters: {n_clusters}")
print(f"Noise points: {n_noise}")
print(f"Points per cluster:")
for lbl in sorted(set(labels)):
    name = "Noise" if lbl == -1 else f"Cluster {lbl}"
    print(f"  {name}: {list(labels).count(lbl)}")

Question 17

Hard

Why does K-Means use the init='k-means++' initialization strategy by default in sklearn? What problem does it solve?

Random initialization can lead to poor convergence. K-Means++ provides a smarter starting point.

Random initialization can place centroids close together or in suboptimal locations, causing K-Means to converge to a local minimum (a solution that is not globally optimal). k-means++ selects initial centroids that are far apart from each other: the first centroid is chosen randomly, and each subsequent centroid is selected with probability proportional to its squared distance from the nearest existing centroid. This ensures centroids start spread across the data, leading to faster convergence and better final clusters.

Question 18

Medium

Write code to cluster the Iris dataset (without using the labels) using K-Means with K=3, and then compare the cluster assignments to the actual species labels using a crosstab.

Use KMeans(n_clusters=3), fit_predict on iris.data, then pd.crosstab to compare labels.

import pandas as pd
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris

iris = load_iris()
km = KMeans(n_clusters=3, random_state=42, n_init=10)
clusters = km.fit_predict(iris.data)

print("Cluster vs Actual Species:")
print(pd.crosstab(clusters, iris.target, rownames=['Cluster'], colnames=['Species']))

Question 19

Hard

Explain the difference between K-Means inertia and silhouette score. When would they give conflicting recommendations for K?

Inertia always improves with more K. Silhouette can peak at a lower K.

Inertia measures within-cluster sum of squared distances. It always decreases as K increases (more clusters = tighter clusters). Silhouette score measures how well each point fits its cluster versus neighboring clusters, ranging from -1 to +1, and can decrease when K is too high. They conflict when increasing K reduces inertia slightly but fragments natural clusters, causing silhouette to drop.

Question 20

Easy

What is the output?

from sklearn.cluster import KMeans
import numpy as np

X = np.array([[0, 0], [10, 10]])
km = KMeans(n_clusters=2, random_state=42, n_init=10)
km.fit(X)
print(km.labels_)
print(km.inertia_)

With 2 points and 2 clusters, each point is its own centroid.

[1 0] (or [0 1])
0.0

Question 21

Medium

What is image compression using K-Means? Explain the process.

Each pixel has an RGB color. K-Means groups similar colors together.

Image compression with K-Means reduces the number of distinct colors in an image. Each pixel is treated as a 3D data point (R, G, B values). K-Means clusters all pixels into K clusters. Each pixel is then replaced by its cluster centroid color. An image with millions of unique colors is reduced to K colors (e.g., 16 or 32), significantly reducing file size while preserving visual quality.

Question 22

Hard

Write code to apply both K-Means and Agglomerative Clustering on the Iris dataset (K=3 for both) and compare their silhouette scores.

Fit both models, compute silhouette_score for each, and print which is higher.

from sklearn.cluster import KMeans, AgglomerativeClustering
from sklearn.datasets import load_iris
from sklearn.metrics import silhouette_score
from sklearn.preprocessing import StandardScaler

iris = load_iris()
X = StandardScaler().fit_transform(iris.data)

km = KMeans(n_clusters=3, random_state=42, n_init=10)
km_labels = km.fit_predict(X)
km_sil = silhouette_score(X, km_labels)

agg = AgglomerativeClustering(n_clusters=3, linkage='ward')
agg_labels = agg.fit_predict(X)
agg_sil = silhouette_score(X, agg_labels)

print(f"K-Means silhouette: {km_sil:.4f}")
print(f"Agglomerative silhouette: {agg_sil:.4f}")
print(f"Winner: {'K-Means' if km_sil > agg_sil else 'Agglomerative'}")

Mixed & Application Questions

Question 1

Easy

What is the output?

from sklearn.cluster import KMeans
import numpy as np

X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
km = KMeans(n_clusters=1, random_state=42, n_init=10)
km.fit(X)
print(km.cluster_centers_)
print(km.labels_)

With K=1, all points belong to one cluster. The centroid is the mean of all points.

[[0.5 0.5]]
[0 0 0 0]

Question 2

Easy

What is the output?

from sklearn.cluster import DBSCAN
import numpy as np

X = np.array([[1, 1], [100, 100], [200, 200]])
db = DBSCAN(eps=5, min_samples=2)
labels = db.fit_predict(X)
print(labels)

All three points are far apart. With eps=5 and min_samples=2, no point has a neighbor within distance 5.

[-1 -1 -1]

Question 3

Easy

Why must you scale features before applying K-Means or DBSCAN?

These algorithms use distance. What happens if one feature has a much larger range than another?

K-Means and DBSCAN both use Euclidean distance. If features have different scales (e.g., income in thousands vs age in years), the large-scale feature dominates the distance calculation and the small-scale feature is effectively ignored. Scaling (using StandardScaler or MinMaxScaler) ensures all features contribute equally to the distance.

Question 4

Medium

What is the output?

from sklearn.metrics import silhouette_score
from sklearn.cluster import KMeans
import numpy as np

X = np.array([[0, 0], [0.1, 0.1], [10, 10], [10.1, 10.1]])
km = KMeans(n_clusters=2, random_state=42, n_init=10)
labels = km.fit_predict(X)
score = silhouette_score(X, labels)
print(f"Score: {score:.4f}")
print(f"Score > 0.9: {score > 0.9}")

Two very tight, well-separated clusters should have a silhouette score close to +1.

Score: 0.9965 (approximately)
Score > 0.9: True

Question 5

Medium

Write code that generates blob data with 3 centers, runs K-Means with K=3, and prints the cluster sizes (number of points per cluster).

Use make_blobs, fit KMeans, then count labels using np.bincount or a loop.

from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
import numpy as np

X, _ = make_blobs(n_samples=300, centers=3, random_state=42)
km = KMeans(n_clusters=3, random_state=42, n_init=10)
labels = km.fit_predict(X)

print("Cluster sizes:")
for i in range(3):
    size = np.sum(labels == i)
    print(f"  Cluster {i}: {size} points")
print(f"Total: {len(labels)}")

Question 6

Medium

A data scientist runs K-Means with K=5 on customer data. The silhouette score is 0.15. What does this indicate, and what should they try next?

A silhouette score near 0 means clusters are overlapping or poorly defined.

A silhouette score of 0.15 indicates poor clustering -- clusters overlap significantly and points are not well-separated. The data scientist should: 1. Try different values of K (the true number of clusters might not be 5). 2. Scale the features if not already done. 3. Try different algorithms (DBSCAN if clusters are non-spherical). 4. Examine whether the features used are actually meaningful for clustering. 5. Visualize the data using PCA or t-SNE to understand its structure.

Question 7

Medium

What is the output?

from sklearn.cluster import DBSCAN
import numpy as np

X = np.array([[1, 1], [1.1, 1.1], [1.2, 1.2], [1.3, 1.3], [10, 10]])
db = DBSCAN(eps=0.5, min_samples=3)
labels = db.fit_predict(X)
print(labels)
print(f"Clusters: {len(set(labels)) - (1 if -1 in labels else 0)}")

The first four points are very close together. The last point is far away and alone.

[0 0 0 0 -1]
Clusters: 1

Question 8

Hard

Raghav runs DBSCAN on a dataset and gets 0 clusters (all points labeled as noise). What are two possible reasons and how should he fix each?

Think about what makes points noise in DBSCAN: eps and min_samples.

1. eps is too small: If eps is smaller than the typical distance between neighboring points, no point will have enough neighbors to be a core point. Fix: increase eps. Use the k-distance plot to choose an appropriate eps. 2. min_samples is too high: If min_samples is larger than the number of points in any natural cluster, no dense regions can form. Fix: decrease min_samples. A common starting point is min_samples = 2 * n_features. Also check if the data is not scaled -- distances between points might be much larger than expected if features have vastly different scales.

Question 9

Hard

Write code to compare K-Means and DBSCAN on make_circles(n_samples=300, noise=0.05, factor=0.5, random_state=42). Print the silhouette score for each algorithm.

Scale the data. Run KMeans(n_clusters=2) and DBSCAN(eps=0.3, min_samples=5). Handle DBSCAN noise for silhouette.

from sklearn.cluster import KMeans, DBSCAN
from sklearn.datasets import make_circles
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score

X, _ = make_circles(n_samples=300, noise=0.05, factor=0.5, random_state=42)
X_scaled = StandardScaler().fit_transform(X)

# K-Means
km = KMeans(n_clusters=2, random_state=42, n_init=10)
km_labels = km.fit_predict(X_scaled)
km_sil = silhouette_score(X_scaled, km_labels)

# DBSCAN
db = DBSCAN(eps=0.3, min_samples=5)
db_labels = db.fit_predict(X_scaled)
mask = db_labels != -1
db_sil = silhouette_score(X_scaled[mask], db_labels[mask]) if len(set(db_labels[mask])) >= 2 else -1

print(f"K-Means silhouette: {km_sil:.4f}")
print(f"DBSCAN silhouette:  {db_sil:.4f}")
print(f"DBSCAN wins: {db_sil > km_sil}")

Question 10

Hard

Explain the four linkage methods in hierarchical clustering (single, complete, average, ward). Which is the default in sklearn and why?

Each measures inter-cluster distance differently. Ward minimizes variance.

Single linkage: Distance = minimum distance between any two points in the two clusters. Produces elongated chains. Complete linkage: Distance = maximum distance between any two points. Produces compact clusters. Average linkage: Distance = average of all pairwise distances between points in the two clusters. A balance between single and complete. Ward linkage: Merges the pair of clusters that causes the smallest increase in total within-cluster variance. Default in sklearn because it produces the most balanced, compact clusters similar to K-Means output.

Question 11

Hard

Write a function find_best_eps(X, min_samples) that returns a suggested eps value for DBSCAN using the k-distance method. Use NearestNeighbors to find the distance to the k-th neighbor, sort them, and return the distance at the 90th percentile as the suggested eps.

Use NearestNeighbors(n_neighbors=min_samples), fit, kneighbors, sort distances, take 90th percentile.

import numpy as np
from sklearn.neighbors import NearestNeighbors

def find_best_eps(X, min_samples):
    nn = NearestNeighbors(n_neighbors=min_samples)
    nn.fit(X)
    distances, _ = nn.kneighbors(X)
    k_distances = np.sort(distances[:, min_samples - 1])
    suggested_eps = np.percentile(k_distances, 90)
    return suggested_eps

# Test it
from sklearn.datasets import make_blobs
from sklearn.preprocessing import StandardScaler

X, _ = make_blobs(n_samples=300, centers=3, random_state=42)
X_scaled = StandardScaler().fit_transform(X)
eps = find_best_eps(X_scaled, min_samples=5)
print(f"Suggested eps: {eps:.4f}")

Question 12

Hard

Meera has a dataset with clusters of very different densities -- one tight cluster with 50 points and one sparse cluster with 200 points. Which algorithm would struggle and which would work? Explain.

DBSCAN uses a single eps. Can one eps value capture both tight and sparse clusters?

DBSCAN struggles with varying density clusters. A single eps value cannot capture both a tight cluster (needing small eps) and a sparse cluster (needing large eps). A small eps would fragment the sparse cluster; a large eps would merge the tight cluster with noise. K-Means would handle this better because it does not depend on density -- it assigns points to the nearest centroid regardless of cluster compactness. Alternatively, HDBSCAN (Hierarchical DBSCAN) handles varying densities by building a hierarchy of DBSCAN solutions at different eps values.

Multiple Choice Questions

MCQ 1

Which of the following is NOT a clustering algorithm?

A. K-Means
B. DBSCAN
C. Logistic Regression
D. Agglomerative Clustering

Answer: C
C is correct. Logistic Regression is a supervised classification algorithm, not a clustering algorithm. K-Means, DBSCAN, and Agglomerative Clustering are all unsupervised clustering algorithms.

MCQ 2

What does DBSCAN label a point that does not belong to any cluster?

A. 0
B. -1
C. NaN
D. None

Answer: B
B is correct. DBSCAN labels noise points (outliers) as -1. These are points that do not have enough neighbors within eps distance to be core points and are not reachable from any core point.

MCQ 3

What does the elbow method help determine in K-Means?

A. The best features to use
B. The optimal number of clusters (K)
C. The learning rate
D. The number of iterations

Answer: B
B is correct. The elbow method plots K versus inertia (within-cluster sum of squares). The 'elbow' point -- where the curve bends and starts to flatten -- suggests the optimal number of clusters.

MCQ 4

What is the range of the silhouette score?

A. 0 to 1
B. -1 to 1
C. 0 to 100
D. -inf to +inf

Answer: B
B is correct. The silhouette score ranges from -1 (worst, point is in the wrong cluster) to +1 (best, point is well-matched to its own cluster and separated from others). 0 indicates the point is on the boundary between clusters.

MCQ 5

Which of the following statements about K-Means is TRUE?

A. K-Means can find any shape of cluster
B. K-Means automatically determines the number of clusters
C. K-Means requires you to specify K (number of clusters) beforehand
D. K-Means is immune to outliers

Answer: C
C is correct. K-Means requires K as input. It cannot determine the number of clusters automatically (A is wrong -- it assumes spherical clusters, B is wrong -- K must be specified, D is wrong -- outliers can pull centroids).

MCQ 6

What does KMeans.inertia_ measure?

A. The number of iterations to converge
B. The sum of squared distances from points to their assigned centroids
C. The accuracy of clustering
D. The number of noise points

Answer: B
B is correct. Inertia (also called within-cluster sum of squares or WCSS) is the sum of squared distances from each point to its assigned centroid. Lower inertia means tighter clusters. It always decreases as K increases.

MCQ 7

Priya has data with crescent-shaped clusters. Which algorithm should she use?

A. K-Means
B. Linear Regression
C. DBSCAN
D. Naive Bayes

Answer: C
C is correct. DBSCAN can find arbitrarily shaped clusters because it follows density, not distance to centroids. K-Means (A) assumes spherical clusters and would fail. Linear Regression (B) and Naive Bayes (D) are supervised algorithms, not clustering algorithms.

MCQ 8

In DBSCAN, what is a 'core point'?

A. A point at the center of the dataset
B. A point with at least min_samples neighbors within eps distance
C. The first point processed by the algorithm
D. A point that belongs to exactly one cluster

Answer: B
B is correct. A core point has at least min_samples points (including itself) within a radius of eps. Core points form the dense interior of clusters. Points near core points but without enough neighbors themselves are border points.

MCQ 9

Which hierarchical clustering linkage method minimizes within-cluster variance?

A. Single linkage
B. Complete linkage
C. Average linkage
D. Ward linkage

Answer: D
D is correct. Ward linkage merges clusters that result in the minimum increase in total within-cluster variance. It produces compact, balanced clusters similar to K-Means. Single linkage (A) uses minimum distance and can create chains.

MCQ 10

What is a dendrogram?

A. A type of neural network
B. A tree diagram showing the order of cluster merges in hierarchical clustering
C. A plot of K versus inertia
D. A scatter plot of clustered data

Answer: B
B is correct. A dendrogram is a tree-shaped diagram that shows how hierarchical clustering progressively merges (or splits) clusters. The height at which branches merge indicates the distance between clusters. Cutting the dendrogram at a specific height gives a flat clustering.

MCQ 11

Arjun scales his data with StandardScaler before DBSCAN. Why is this important?

A. StandardScaler speeds up DBSCAN
B. DBSCAN only accepts values between 0 and 1
C. Unscaled features with different ranges distort distance calculations, making eps meaningless
D. StandardScaler removes outliers automatically

Answer: C
C is correct. DBSCAN uses Euclidean distance. If income is in thousands and age is in years, the income feature dominates the distance. A single eps value cannot meaningfully capture neighborhood proximity when features have vastly different scales. Scaling ensures all features contribute equally.

MCQ 12

What happens if you increase eps in DBSCAN while keeping min_samples constant?

A. More noise points, fewer clusters
B. Fewer noise points, clusters may merge into fewer larger clusters
C. No effect on results
D. More clusters with smaller sizes

Answer: B
B is correct. A larger eps means a wider neighborhood radius. More points become neighbors, more core points form, and previously separate clusters may merge. Fewer points are classified as noise because they now fall within eps of some core point.

MCQ 13

Which of the following is an advantage of DBSCAN over K-Means?

A. DBSCAN is always faster
B. DBSCAN does not require specifying the number of clusters
C. DBSCAN always produces better silhouette scores
D. DBSCAN works better in high-dimensional spaces

Answer: B
B is correct. DBSCAN automatically determines the number of clusters based on density. K-Means requires K as input. DBSCAN is not always faster (A), does not always produce better silhouette scores (C), and actually struggles in high dimensions (D) due to the curse of dimensionality.

MCQ 14

Kavitha runs K-Means 10 times with different random seeds on the same data with the same K. She gets different cluster assignments each time. What causes this?

A. The data changes between runs
B. Different random initializations lead to different local optima
C. K-Means has a random component in its distance calculation
D. The sklearn implementation has a bug

Answer: B
B is correct. K-Means is sensitive to initialization. Different random starting centroids can lead the algorithm to converge to different local optima. This is why sklearn's KMeans uses n_init=10 by default (runs 10 times, keeps the best result) and init='k-means++' for smarter initialization.

MCQ 15

A dataset has varying density clusters -- one very tight and one very spread out. Which algorithm handles this BEST?

A. K-Means
B. DBSCAN
C. HDBSCAN
D. Agglomerative Clustering with single linkage

Answer: C
C is correct. HDBSCAN (Hierarchical DBSCAN) handles varying density by building a hierarchy of DBSCAN solutions at different eps values. Standard DBSCAN (B) uses a single eps and struggles with varying densities. K-Means (A) does not consider density at all. Single linkage (D) can create chaining artifacts.

MCQ 16

What is the time complexity of K-Means clustering?

A. O(n)
B. O(n * k * i * d) where n=points, k=clusters, i=iterations, d=dimensions
C. O(n^2)
D. O(n * log n)

Answer: B
B is correct. In each iteration, K-Means computes the distance from each of n points to each of k centroids in d dimensions, then updates centroids. This gives O(n * k * d) per iteration, and with i iterations total: O(n * k * i * d). This is efficient for large datasets, which is why K-Means is so popular.

MCQ 17

Suresh uses silhouette_score() to compare K=2 (score=0.65) and K=5 (score=0.42) on the same dataset. What should he conclude?

A. K=5 is better because more clusters means better fit
B. K=2 is better because it has a higher silhouette score
C. He should also consider the elbow method and domain knowledge before deciding
D. The scores are not comparable for different K values

Answer: C
C is correct. While K=2 has a higher silhouette score, the best K depends on multiple factors. The silhouette score favors fewer, well-separated clusters. The elbow method might suggest a different K, and domain knowledge (e.g., the business needs 5 customer segments) should also be considered. No single metric should be the sole decision-maker.

MCQ 18

In agglomerative hierarchical clustering, what is the 'chaining effect'?

A. When clusters form a perfect chain-like sequence
B. When single linkage connects two clusters through a chain of intermediate points, creating elongated clusters
C. When the algorithm runs indefinitely
D. When all points end up in one cluster too early

Answer: B
B is correct. The chaining effect is a known problem with single linkage. If a chain of points connects two distinct clusters (each pair close enough), single linkage merges them because it only considers the minimum distance between any two points. Complete or Ward linkage avoids this by considering all or variance-based distances.

MCQ 19

What is the main goal of clustering?

A. Predict a target variable
B. Group similar data points together without labels
C. Reduce the number of features
D. Remove outliers from the dataset

Answer: B
B is correct. Clustering groups similar data points together based on their features, without any predefined labels. Prediction (A) is supervised learning. Dimensionality reduction (C) is a separate task. While DBSCAN can detect outliers (D), that is a side effect, not the main goal of clustering.

MCQ 20

Rohit runs K-Means with K=3 and gets inertia=500. He runs it again with K=4 and gets inertia=480. Should he use K=4?

A. Yes, lower inertia is always better
B. No, the small improvement (4%) suggests K=3 is at or past the elbow point
C. He should always use the maximum K possible
D. Inertia is not a valid metric for K-Means

Answer: B
B is correct. Inertia always decreases as K increases (trivially, K=n gives inertia=0). The elbow method looks for where the decrease becomes marginal. A drop from 500 to 480 (4% improvement) after the curve has flattened suggests K=3 is at or past the elbow. Adding more clusters gives diminishing returns.

MCQ 21

Which statement about the silhouette score is TRUE?

A. It can only be computed for K-Means
B. It requires labeled data to compute
C. A score of -0.5 indicates most points are closer to a neighboring cluster than their own
D. It is always positive for good clusterings

Answer: C
C is correct. A negative silhouette score means b(i) < a(i), meaning the point is closer to the nearest other cluster than to its own cluster -- it is likely misassigned. Silhouette works for any clustering algorithm (A is wrong), does not need labels (B is wrong), and can be negative (D is wrong).

Coding Challenges

Coding challenges coming soon.

Need to Review the Concepts?

Go back to the detailed notes for this chapter.

Read Chapter Notes

Want to learn AI and ML with a live mentor?

Explore our AI/ML Masterclass