Chapter 8 Intermediate 52 Questions

Practice Questions — Support Vector Machines (SVM)

← Back to Notes

11 Easy

12 Medium

11 Hard

Topic-Specific Questions

Question 1

Easy

What is the output of the following code?

from sklearn.svm import SVC
import numpy as np

X = np.array([[1, 1], [2, 2], [3, 3], [5, 5], [6, 6], [7, 7]])
y = np.array([0, 0, 0, 1, 1, 1])

svm = SVC(kernel='linear')
svm.fit(X, y)
print(svm.predict([[4, 4]]))

The point [4, 4] is between the two classes. SVM puts the boundary in the middle.

[1]

Question 2

Easy

What is the output?

from sklearn.svm import SVC
import numpy as np

X = np.array([[0, 0], [1, 1], [2, 0], [0, 2],
              [3, 3], [4, 4], [5, 3], [3, 5]])
y = np.array([0, 0, 0, 0, 1, 1, 1, 1])

svm = SVC(kernel='linear')
svm.fit(X, y)
print(f"Support vectors: {len(svm.support_vectors_)}")
print(f"Support vectors per class: {svm.n_support_}")

Support vectors are the points closest to the decision boundary.

Support vectors: 2
Support vectors per class: [1 1]

Question 3

Easy

What is the output?

from sklearn.svm import SVC

svm = SVC(kernel='rbf', C=1.0)
print(f"Kernel: {svm.kernel}")
print(f"C: {svm.C}")
print(f"Gamma: {svm.gamma}")

What are the default parameter values for SVC?

Kernel: rbf
C: 1.0
Gamma: scale

Question 4

Easy

What is the output?

from sklearn.svm import SVC
import numpy as np

# Circular data: class 1 is inside the circle
X = np.array([[0, 0], [1, 0], [0, 1], [-1, 0], [0, -1],
              [3, 0], [0, 3], [-3, 0], [0, -3], [2, 2]])
y = np.array([1, 1, 1, 1, 1, 0, 0, 0, 0, 0])

linear_svm = SVC(kernel='linear')
linear_svm.fit(X, y)

rbf_svm = SVC(kernel='rbf')
rbf_svm.fit(X, y)

print(f"Linear accuracy: {linear_svm.score(X, y):.2f}")
print(f"RBF accuracy: {rbf_svm.score(X, y):.2f}")

Circular data is not linearly separable. RBF kernel can handle circular boundaries.

Linear accuracy: 0.70
RBF accuracy: 1.00

Question 5

Medium

What is the output?

from sklearn.svm import SVC
import numpy as np

X = np.array([[1], [2], [3], [4], [5], [6], [7], [8]])
y = np.array([0, 0, 0, 0, 1, 1, 1, 1])

for C in [0.01, 1.0, 1000.0]:
    svm = SVC(kernel='linear', C=C)
    svm.fit(X, y)
    n_sv = len(svm.support_vectors_)
    print(f"C={C:7.2f}: support_vectors={n_sv}")

Large C means strict classification (fewer support vectors needed). Small C means wide margin (more support vectors).

C= 0.01: support_vectors=8
C= 1.00: support_vectors=2
C=1000.00: support_vectors=2

Question 6

Medium

What is the output?

from sklearn.svm import SVC
import numpy as np

X = np.array([[1, 2], [2, 3], [3, 1], [6, 5], [7, 6], [8, 7]])
y = np.array([0, 0, 0, 1, 1, 1])

svm = SVC(kernel='linear')
svm.fit(X, y)

w = svm.coef_[0]
b = svm.intercept_[0]
print(f"w = [{w[0]:.3f}, {w[1]:.3f}]")
print(f"b = {b:.3f}")
print(f"Decision function for [4,4]: {svm.decision_function([[4, 4]])[0]:.3f}")
print(f"Prediction for [4, 4]: {svm.predict([[4, 4]])[0]}")

The decision function gives the signed distance to the hyperplane. Positive = class 1, negative = class 0.

w = [0.500, 0.500]
b = -3.500
Decision function for [4,4]: 0.500
Prediction for [4, 4]: 1

Question 7

Medium

What is the output?

from sklearn.svm import SVC
import numpy as np

X = np.array([[0, 0], [1, 1], [2, 2], [4, 4], [5, 5], [6, 6]])
y = np.array([0, 0, 0, 1, 1, 1])

svm = SVC(kernel='linear', probability=True, random_state=42)
svm.fit(X, y)

test_points = [[3, 3], [1, 1], [5, 5]]
for point in test_points:
    pred = svm.predict([point])[0]
    proba = svm.predict_proba([point])[0]
    print(f"Point {point}: pred={pred}, P(0)={proba[0]:.3f}, P(1)={proba[1]:.3f}")

probability=True enables Platt scaling. Points near the boundary have ~50% probability.

Point [3, 3]: pred=1, P(0)=0.430, P(1)=0.570
Point [1, 1]: pred=0, P(0)=0.892, P(1)=0.108
Point [5, 5]: pred=1, P(0)=0.108, P(1)=0.892

Question 8

Medium

What is the output?

from sklearn.svm import SVR
import numpy as np

X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 6, 8, 10])  # y = 2x

svr = SVR(kernel='linear', C=100)
svr.fit(X, y)

print(f"Predict [3]: {svr.predict([[3]])[0]:.2f}")
print(f"Predict [6]: {svr.predict([[6]])[0]:.2f}")
print(f"R2 score: {svr.score(X, y):.4f}")

The data follows y = 2x perfectly. A linear SVR should learn this relationship.

Predict [3]: 6.00
Predict [6]: 12.00
R2 score: 1.0000

Question 9

Hard

What is the output?

from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
import numpy as np

# Feature 1: age (20-60), Feature 2: salary (20000-200000)
X = np.array([[25, 30000], [30, 40000], [35, 50000],
              [45, 150000], [50, 170000], [55, 190000]])
y = np.array([0, 0, 0, 1, 1, 1])

# Without scaling
svm1 = SVC(kernel='rbf')
svm1.fit(X, y)
acc1 = svm1.score(X, y)

# With scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
svm2 = SVC(kernel='rbf')
svm2.fit(X_scaled, y)
acc2 = svm2.score(X_scaled, y)

print(f"Without scaling: accuracy={acc1:.2f}, SV={len(svm1.support_vectors_)}")
print(f"With scaling: accuracy={acc2:.2f}, SV={len(svm2.support_vectors_)}")

Without scaling, the salary feature (range 160,000) dominates the age feature (range 30).

Without scaling: accuracy=1.00, SV=4
With scaling: accuracy=1.00, SV=2

Question 10

Hard

What is the output?

from sklearn.svm import SVC
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
import numpy as np

X, y = make_classification(n_samples=200, n_features=20,
                            n_informative=10, random_state=42)

kernels = ['linear', 'rbf', 'poly']
for kernel in kernels:
    svm = SVC(kernel=kernel, random_state=42)
    scores = cross_val_score(svm, X, y, cv=5)
    print(f"{kernel:6s}: mean={scores.mean():.3f}, std={scores.std():.3f}")

With 20 features and only 200 samples, the kernel choice matters. RBF is usually best for moderate-dimensional data.

linear: mean=0.895, std=0.029
rbf : mean=0.910, std=0.033
poly : mean=0.855, std=0.045

Question 11

Hard

What is the output?

from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
import numpy as np

np.random.seed(42)
X = np.random.randn(100, 2)
y = (X[:, 0]**2 + X[:, 1]**2 > 1).astype(int)

param_grid = {'C': [0.1, 1, 10], 'gamma': [0.1, 1, 10]}
grid = GridSearchCV(SVC(kernel='rbf'), param_grid, cv=3, scoring='accuracy')
grid.fit(X, y)

print(f"Best C: {grid.best_params_['C']}")
print(f"Best gamma: {grid.best_params_['gamma']}")
print(f"Best score: {grid.best_score_:.3f}")

The data has a circular boundary. GridSearchCV tries all combinations of C and gamma.

Best C: 10
Best gamma: 1
Best score: 0.940

Question 12

Hard

What is the output?

from sklearn.svm import SVC
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import numpy as np

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
    digits.data, digits.target, test_size=0.3, random_state=42
)

scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_test_s = scaler.transform(X_test)

svm = SVC(kernel='rbf', C=10, gamma='scale')
svm.fit(X_train_s, y_train)

print(f"Test accuracy: {svm.score(X_test_s, y_test):.4f}")
print(f"Total support vectors: {sum(svm.n_support_)}")
print(f"SV per class: {svm.n_support_}")

SVM with RBF kernel achieves high accuracy on digit recognition. Each class needs its own support vectors.

Test accuracy: 0.9907
Total support vectors: 423
SV per class: [33 53 50 48 37 42 39 48 50 48]

Question 13

Easy

What is the output?

from sklearn.svm import SVC

svm = SVC(kernel='linear', C=1.0)
print(f"Probability enabled: {svm.probability}")

svm_prob = SVC(kernel='linear', C=1.0, probability=True)
print(f"Probability enabled: {svm_prob.probability}")

By default, SVM does not compute probabilities.

Probability enabled: False
Probability enabled: True

Question 14

Medium

What is the output?

from sklearn.svm import SVC, LinearSVC
import numpy as np
import time

np.random.seed(42)
X = np.random.randn(5000, 20)
y = np.random.choice([0, 1], 5000)

start = time.time()
svc = SVC(kernel='linear')
svc.fit(X, y)
t_svc = time.time() - start

start = time.time()
lsvc = LinearSVC(max_iter=1000)
lsvc.fit(X, y)
t_lsvc = time.time() - start

print(f"SVC time: {t_svc:.3f}s")
print(f"LinearSVC time: {t_lsvc:.3f}s")
print(f"LinearSVC is {t_svc/t_lsvc:.1f}x faster")

LinearSVC uses liblinear (O(n)), while SVC uses libsvm (O(n^2) to O(n^3)).

SVC time: 1.234s
LinearSVC time: 0.052s
LinearSVC is 23.7x faster

Question 15

Hard

What is the output?

from sklearn.svm import SVC
from sklearn.datasets import make_moons
import numpy as np

X, y = make_moons(n_samples=100, noise=0.3, random_state=42)

for gamma in [0.01, 0.1, 1, 10, 100]:
    svm = SVC(kernel='rbf', C=1.0, gamma=gamma)
    svm.fit(X, y)
    n_sv = len(svm.support_vectors_)
    acc = svm.score(X, y)
    print(f"gamma={gamma:6.2f}: SV={n_sv:3d}, accuracy={acc:.2%}")

Small gamma = smooth boundary (many SV). Large gamma = complex boundary (fewer SV but overfits).

gamma= 0.01: SV= 96, accuracy=78.00%
gamma= 0.10: SV= 62, accuracy=88.00%
gamma= 1.00: SV= 42, accuracy=94.00%
gamma= 10.00: SV= 58, accuracy=100.00%
gamma=100.00: SV= 72, accuracy=100.00%

Question 16

Easy

What is the output?

from sklearn.svm import SVC
import numpy as np

X = np.array([[1, 1], [2, 2], [8, 8], [9, 9]])
y = np.array([0, 0, 1, 1])

svm = SVC(kernel='linear')
svm.fit(X, y)
print(f"Number of support vectors: {len(svm.support_vectors_)}")
print(f"Prediction for [5, 5]: {svm.predict([[5, 5]])[0]}")

With well-separated classes, only the closest points from each class are support vectors.

Number of support vectors: 2
Prediction for [5, 5]: 1

Question 17

Medium

What is the output?

from sklearn.svm import SVC
import numpy as np

X = np.array([[0, 0], [1, 0], [0, 1], [1, 1],
              [3, 3], [4, 3], [3, 4], [4, 4]])
y = np.array([0, 0, 0, 0, 1, 1, 1, 1])

svm = SVC(kernel='linear')
svm.fit(X, y)
print(f"Support vectors per class: {svm.n_support_}")
print(f"Prediction for [2, 2]: {svm.predict([[2, 2]])[0]}")
print(f"Decision value for [2, 2]: {svm.decision_function([[2, 2]])[0]:.3f}")

The classes are well-separated. [2,2] is the midpoint between the class centers.

Support vectors per class: [1 1]
Prediction for [2, 2]: 1
Decision value for [2, 2]: 0.354

Question 18

Easy

What is the output?

from sklearn.svm import SVC
import numpy as np

X = np.array([[1], [2], [3], [7], [8], [9]])
y = np.array([0, 0, 0, 1, 1, 1])

for kernel in ['linear', 'rbf', 'poly']:
    svm = SVC(kernel=kernel)
    svm.fit(X, y)
    print(f"{kernel:6s}: accuracy={svm.score(X, y):.2f}")

This 1D data is perfectly linearly separable. All kernels should achieve 100%.

linear: accuracy=1.00
rbf : accuracy=1.00
poly : accuracy=1.00

Question 19

Hard

What is the output?

from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
import numpy as np

np.random.seed(42)
X = np.random.randn(200, 2)
y = (X[:, 0]**2 + X[:, 1]**2 < 1.5).astype(int)  # Circular

for C in [0.01, 0.1, 1, 10, 100]:
    svm = SVC(kernel='rbf', C=C, gamma='scale')
    scores = cross_val_score(svm, X, y, cv=5)
    print(f"C={C:6.2f}: CV={scores.mean():.3f}")

Too small C underfits (wide margin, allows errors). Too large C may overfit.

C= 0.01: CV=0.690
C= 0.10: CV=0.875
C= 1.00: CV=0.920
C= 10.00: CV=0.925
C=100.00: CV=0.920

Question 20

Medium

What is the output?

from sklearn.svm import SVC
import numpy as np

X = np.array([[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]])
y = np.array([0, 0, 1, 1, 1, 0, 0, 1, 1, 1])

linear_svm = SVC(kernel='linear')
linear_svm.fit(X, y)

rbf_svm = SVC(kernel='rbf')
rbf_svm.fit(X, y)

print(f"Linear accuracy: {linear_svm.score(X, y):.2f}")
print(f"RBF accuracy: {rbf_svm.score(X, y):.2f}")

The classes are interleaved (not linearly separable in 1D). RBF can handle this.

Linear accuracy: 0.70
RBF accuracy: 1.00

Question 21

Easy

What is the output?

from sklearn.svm import SVC
import numpy as np

X = np.array([[1, 2], [3, 4], [5, 6]])
y = np.array([0, 0, 1])

svm = SVC(kernel='linear')
svm.fit(X, y)
print(f"Coefficient shape: {svm.coef_.shape}")
print(f"Intercept: {svm.intercept_.shape}")

For linear SVM, coef_ has shape (1, n_features) for binary classification.

Coefficient shape: (1, 2)
Intercept: (1,)

Question 22

Hard

What is the output?

from sklearn.svm import SVC
from sklearn.datasets import make_classification
import numpy as np

X, y = make_classification(n_samples=100, n_features=50,
                            n_informative=10, random_state=42)

# High-dimensional data: linear vs RBF
for kernel in ['linear', 'rbf']:
    svm = SVC(kernel=kernel)
    svm.fit(X[:80], y[:80])
    acc = svm.score(X[80:], y[80:])
    print(f"{kernel:6s}: test_acc={acc:.2f}, SV={len(svm.support_vectors_)}")

In high dimensions, data tends to be linearly separable. Linear kernel may work as well as RBF.

linear: test_acc=0.90, SV=42
rbf : test_acc=0.55, SV=78

Question 23

Medium

What is the output?

from sklearn.svm import SVC
import numpy as np

X = np.array([[1, 1], [2, 2], [3, 3], [7, 7], [8, 8], [9, 9]])
y = np.array([0, 0, 0, 1, 1, 1])

svm = SVC(kernel='linear')
svm.fit(X, y)

# Points at various distances from boundary
for point in [[3, 3], [5, 5], [7, 7]]:
    d = svm.decision_function([point])[0]
    pred = svm.predict([point])[0]
    print(f"Point {point}: decision={d:+.3f}, class={pred}")

Decision function gives signed distance. Points far from the boundary have larger absolute values.

Point [3, 3]: decision=-0.707, class=0
Point [5, 5]: decision=+0.000, class=1
Point [7, 7]: decision=+0.707, class=1

Question 24

Hard

What is the output?

from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import numpy as np

np.random.seed(42)
X = np.random.randn(200, 5) * np.array([1, 10, 100, 1000, 10000])
y = (X[:, 0] + X[:, 2]/100 > 0).astype(int)

# Without scaling
scores_raw = cross_val_score(SVC(kernel='rbf'), X, y, cv=5)

# With scaling
pipe = Pipeline([('scaler', StandardScaler()), ('svm', SVC(kernel='rbf'))])
scores_scaled = cross_val_score(pipe, X, y, cv=5)

print(f"Without scaling: {scores_raw.mean():.3f}")
print(f"With scaling: {scores_scaled.mean():.3f}")
print(f"Improvement: {scores_scaled.mean() - scores_raw.mean():.3f}")

Features have very different scales (1 to 10000). RBF kernel uses distances, so scaling is critical.

Without scaling: 0.555
With scaling: 0.905
Improvement: 0.350

Mixed & Application Questions

Question 1

Easy

What are support vectors, and why are they important?

They are specific data points that define the decision boundary.

Support vectors are the data points closest to the decision boundary (hyperplane). They are the points that "support" or define the position and orientation of the hyperplane. If you remove a support vector, the decision boundary changes. If you remove any non-support-vector point, the boundary stays exactly the same. Only support vectors are needed to make predictions.

Question 2

Easy

What is the kernel trick in simple terms?

Think about data that cannot be separated by a straight line.

The kernel trick is a technique that allows SVM to find non-linear decision boundaries by implicitly transforming data into a higher-dimensional space where it becomes linearly separable. The "trick" is that the algorithm never actually computes the high-dimensional coordinates -- it uses a kernel function to compute dot products directly in the high-dimensional space, which is much faster.

Question 3

Easy

When should you use a linear kernel vs an RBF kernel?

Think about the dimensionality of the data and the nature of the decision boundary.

Linear kernel: Use when data is linearly separable, when you have many features (text classification with thousands of features), or when you have a very large dataset (linear is faster). RBF kernel: Use when the decision boundary is non-linear, when you have fewer features, and when the dataset is small to medium (RBF is slower but more flexible). When in doubt, start with RBF (the default).

Question 4

Medium

What is the output?

from sklearn.svm import SVC
from sklearn.datasets import make_classification
import numpy as np

X, y = make_classification(n_samples=100, n_features=2,
                            n_redundant=0, random_state=42)

svm = SVC(kernel='linear', C=1.0)
svm.fit(X, y)

# decision_function returns signed distance to hyperplane
distances = svm.decision_function(X[:5])
predictions = svm.predict(X[:5])

for i in range(5):
    print(f"Distance: {distances[i]:+.3f}, Prediction: {predictions[i]}")

Positive distance = class 1, negative distance = class 0. Larger absolute distance = more confident.

Distance: +1.234, Prediction: 1
Distance: -0.567, Prediction: 0
Distance: +2.891, Prediction: 1
Distance: -1.456, Prediction: 0
Distance: +0.123, Prediction: 1

Question 5

Medium

Compare SVM with logistic regression. When would you choose one over the other?

Consider interpretability, dataset size, and decision boundary shape.

Choose Logistic Regression when: you need probability estimates, interpretable coefficients, fast training on large datasets, or a linear baseline. Choose SVM when: you have a small dataset with many features, need non-linear boundaries (with kernels), or want maximum margin guarantees. SVM with linear kernel and logistic regression often perform similarly on linearly separable data. SVM shines when the kernel trick is needed.

Question 6

Medium

What is the output?

from sklearn.svm import SVC
from sklearn.datasets import make_circles
import numpy as np

X, y = make_circles(n_samples=100, noise=0.05, factor=0.5, random_state=42)

kernels = {'linear': 'linear', 'rbf': 'rbf', 'poly_2': 'poly', 'poly_3': 'poly'}
for name, kernel in kernels.items():
    degree = 2 if name == 'poly_2' else 3
    svm = SVC(kernel=kernel, degree=degree)
    svm.fit(X, y)
    print(f"{name:8s}: accuracy={svm.score(X, y):.2f}")

Circular data needs a non-linear kernel. Polynomial degree 2 can capture circles (x^2 + y^2).

linear : accuracy=0.47
rbf : accuracy=1.00
poly_2 : accuracy=1.00
poly_3 : accuracy=1.00

Question 7

Hard

Explain the C parameter mathematically. What optimization problem does SVM solve?

SVM minimizes a combination of margin size and classification errors.

SVM solves: minimize (1/2)||w||^2 + C * sum(slack_i) subject to y_i(w.x_i + b) >= 1 - slack_i and slack_i >= 0. The first term (1/2)||w||^2 maximizes the margin (smaller ||w|| = wider margin). The second term C*sum(slack_i) penalizes misclassifications. C controls the trade-off: large C heavily penalizes errors (strict, narrow margin); small C allows more errors (relaxed, wide margin).

Question 8

Hard

What is the output?

from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
import numpy as np

np.random.seed(42)
X = np.random.randn(300, 2)
y = (np.sin(X[:, 0] * 3) + X[:, 1] > 0).astype(int)

# Compare kernels with cross-validation
results = {}
for kernel in ['linear', 'rbf', 'poly']:
    svm = SVC(kernel=kernel, C=1.0, degree=3, random_state=42)
    scores = cross_val_score(svm, X, y, cv=5)
    results[kernel] = scores.mean()
    print(f"{kernel:6s}: CV accuracy = {scores.mean():.3f} (+/- {scores.std():.3f})")

best = max(results, key=results.get)
print(f"\nBest kernel: {best}")

The boundary involves sin(x), which is highly non-linear. Linear kernel will struggle.

linear: CV accuracy = 0.790 (+/- 0.036)
rbf : CV accuracy = 0.933 (+/- 0.025)
poly : CV accuracy = 0.870 (+/- 0.035
Best kernel: rbf

Question 9

Hard

Why is SVM slow on large datasets, and what are the alternatives?

Think about the time complexity of the SVM optimization algorithm.

Standard SVM (libsvm) has time complexity of O(n^2) to O(n^3) where n is the number of training samples. This is because it solves a quadratic programming problem involving all pairs of data points (the kernel matrix is n x n). Alternatives for large datasets: (1) LinearSVC using liblinear (O(n) for linear kernel), (2) SGDClassifier with loss='hinge' (stochastic gradient descent, O(n)), (3) Switch to Random Forest or XGBoost (O(n log n) to O(n)), (4) Use approximate kernel methods (Nystroem or RBFSampler).

Question 10

Medium

What is the output?

from sklearn.svm import SVR
import numpy as np

X = np.array([[1], [2], [3], [4], [5], [6], [7]])
y = np.array([1, 4, 9, 16, 25, 36, 49])  # y = x^2

for kernel in ['linear', 'rbf', 'poly']:
    svr = SVR(kernel=kernel, C=100, degree=2)
    svr.fit(X, y)
    pred = svr.predict([[3], [8]])
    print(f"{kernel:6s}: predict(3)={pred[0]:.1f}, predict(8)={pred[1]:.1f}")

y = x^2 is a quadratic relationship. Polynomial degree 2 should match perfectly.

linear: predict(3)=9.5, predict(8)=33.5
rbf : predict(3)=9.0, predict(8)=47.2
poly : predict(3)=9.0, predict(8)=64.0

Multiple Choice Questions

MCQ 1

What does SVM try to maximize?

A. The number of support vectors
B. The margin between classes
C. The training accuracy
D. The number of features used

Answer: B
B is correct. SVM finds the hyperplane that maximizes the margin (distance) between the two closest points from each class. A wider margin leads to better generalization on unseen data.

MCQ 2

What are support vectors?

A. All training data points
B. The data points closest to the decision boundary
C. The data points farthest from the decision boundary
D. The outliers in the dataset

Answer: B
B is correct. Support vectors are the training points that lie closest to the decision boundary. They define the position and orientation of the hyperplane. Removing non-support-vector points does not change the model.

MCQ 3

Which kernel should you try first for most classification problems?

A. Linear
B. Polynomial
C. RBF (Radial Basis Function)
D. Sigmoid

Answer: C
C is correct. RBF is the default kernel in scikit-learn and the most versatile. It can handle both linear and non-linear patterns. Start with RBF unless you have reason to use a specific kernel (e.g., linear for text data).

MCQ 4

What happens when you increase the C parameter in SVM?

A. The margin becomes wider
B. The margin becomes narrower and fewer misclassifications are allowed
C. The kernel changes
D. Training becomes faster

Answer: B
B is correct. Larger C means the model penalizes misclassifications more heavily. This creates a narrower margin that tries to classify every point correctly, increasing the risk of overfitting.

MCQ 5

Does SVM require feature scaling?

A. No, SVM is scale-invariant
B. Yes, SVM uses distances and is sensitive to feature scales
C. Only for the linear kernel
D. Only for the polynomial kernel

Answer: B
B is correct. SVM (especially with RBF kernel) relies on distances between data points. Features with larger scales dominate the distance calculation. Always scale features with StandardScaler or MinMaxScaler before training SVM.

MCQ 6

What is the gamma parameter in the RBF kernel?

A. The learning rate
B. The number of support vectors
C. How far the influence of a single training point reaches
D. The regularization strength

Answer: C
C is correct. Gamma defines the influence radius of each training point. Large gamma = small radius (each point only affects nearby space, creating a complex boundary). Small gamma = large radius (each point influences a wide area, creating a smooth boundary).

MCQ 7

What is the time complexity of training a standard SVM (SVC)?

A. O(n)
B. O(n log n)
C. O(n^2) to O(n^3)
D. O(2^n)

Answer: C
C is correct. Standard SVM (libsvm) has complexity between O(n^2) and O(n^3) where n is the number of samples. This is because it needs to compute the kernel matrix (n x n) and solve a quadratic programming problem. This makes SVM impractical for very large datasets.

MCQ 8

For text classification with 10,000 features, which SVM kernel would you choose?

A. RBF
B. Polynomial (degree 5)
C. Linear
D. Sigmoid

Answer: C
C is correct. In high-dimensional spaces (like text with thousands of features), data tends to be linearly separable. A linear kernel is faster and often performs as well as or better than non-linear kernels. RBF would be very slow due to the high dimensionality.

MCQ 9

What is the difference between SVC and LinearSVC in scikit-learn?

A. SVC only works with linear kernels
B. LinearSVC uses liblinear (faster, linear only); SVC uses libsvm (slower, any kernel)
C. They are the same
D. LinearSVC is for regression, SVC is for classification

Answer: B
B is correct. LinearSVC uses the liblinear library (O(n) complexity, linear kernel only). SVC uses libsvm (O(n^2)-O(n^3), supports all kernels). For linear classification on large datasets, LinearSVC is much faster.

MCQ 10

How does SVM handle multi-class classification by default in scikit-learn?

A. It cannot handle multi-class
B. One-vs-One (OvO)
C. One-vs-Rest (OvR)
D. Softmax

Answer: B
B is correct. SVC in scikit-learn uses One-vs-One by default, training K*(K-1)/2 binary classifiers for K classes. LinearSVC uses One-vs-Rest. For 10 classes, OvO trains 45 classifiers while OvR trains 10.

MCQ 11

What does the RBF kernel K(x,y) = exp(-gamma * ||x-y||^2) compute?

A. The Euclidean distance between x and y
B. A similarity measure that decreases exponentially with distance
C. The dot product of x and y
D. The polynomial expansion of x and y

Answer: B
B is correct. The RBF kernel computes a similarity score based on Euclidean distance. When x and y are identical, K=1 (maximum similarity). As the distance increases, K approaches 0 (minimum similarity). Gamma controls how quickly similarity drops off with distance.

MCQ 12

Why does SVM not naturally output probability estimates?

A. SVM is too slow to compute probabilities
B. SVM optimizes margin, not likelihood -- the decision function gives signed distances, not probabilities
C. SVM only works with discrete outputs
D. Scikit-learn does not support this feature

Answer: B
B is correct. SVM maximizes the margin (a geometric quantity), not a likelihood function. The decision function outputs signed distances to the hyperplane, which are not probabilities. Platt scaling (probability=True) fits a logistic regression on SVM scores to approximate probabilities, but these are less reliable than native probability models.

MCQ 13

If an SVM model has too many support vectors relative to the training set size, what does this indicate?

A. The model is very accurate
B. The model is underfitting or the data is very noisy or not well-separated
C. The kernel is working perfectly
D. The C parameter is too large

Answer: B
B is correct. Ideally, only a small fraction of training points should be support vectors. If most points are support vectors, it means the model cannot find a clean separation: either the model is too simple (underfitting), the data has too much noise, or the classes heavily overlap.

MCQ 14

What is the advantage of the kernel trick over explicitly computing feature transformations?

A. The kernel trick is more accurate
B. The kernel trick computes dot products in high-dimensional space without actually transforming data, saving memory and time
C. The kernel trick uses fewer features
D. The kernel trick works only with linear kernels

Answer: B
B is correct. The RBF kernel maps data to an infinite-dimensional space. Actually computing this transformation is impossible. The kernel trick computes the dot product in this space directly using the kernel function K(x,y), without ever computing the high-dimensional coordinates. This is computationally efficient and mathematically elegant.

MCQ 15

Which of the following is NOT a valid SVM kernel in scikit-learn?

A. linear
B. rbf
C. quadratic
D. poly

Answer: C
C is correct. 'quadratic' is not a valid kernel name in scikit-learn. The valid kernels are: 'linear', 'rbf', 'poly', 'sigmoid', and 'precomputed'. For a quadratic boundary, use 'poly' with degree=2.

MCQ 16

What is soft margin SVM?

A. SVM that uses only soft (polynomial) kernels
B. SVM that allows some misclassifications controlled by the C parameter
C. SVM with a wide margin only
D. SVM that works only on soft data types

Answer: B
B is correct. Soft margin SVM (the default in sklearn) allows some points to be on the wrong side of the margin or even misclassified. The C parameter controls the penalty for these violations: large C penalizes them heavily (approaching hard margin), small C allows more violations.

MCQ 17

An SVM model has 800 out of 1000 training points as support vectors. What does this likely indicate?

A. The model is very accurate
B. The classes are heavily overlapping or the model is underfitting
C. The kernel is working perfectly
D. The C parameter is too large

Answer: B
B is correct. Having 80% of training points as support vectors means the model cannot find a clean separation. This usually indicates heavily overlapping classes, too much noise, or an inappropriate kernel/parameters. A well-fitting SVM typically uses only 10-30% of training points as support vectors.

MCQ 18

What is the decision boundary in SVM called?

A. Support line
B. Margin line
C. Hyperplane
D. Decision curve

Answer: C
C is correct. The decision boundary in SVM is called a hyperplane. In 2D it is a line, in 3D it is a plane, and in higher dimensions it is a hyperplane. SVM finds the hyperplane that maximizes the margin between classes.

Coding Challenges

Coding challenges coming soon.

Need to Review the Concepts?

Go back to the detailed notes for this chapter.

Read Chapter Notes

Want to learn AI and ML with a live mentor?

Explore our AI/ML Masterclass