Practice Questions — Support Vector Machines (SVM)
← Back to NotesTopic-Specific Questions
Question 1
Easy
What is the output of the following code?
from sklearn.svm import SVC
import numpy as np
X = np.array([[1, 1], [2, 2], [3, 3], [5, 5], [6, 6], [7, 7]])
y = np.array([0, 0, 0, 1, 1, 1])
svm = SVC(kernel='linear')
svm.fit(X, y)
print(svm.predict([[4, 4]]))The point [4, 4] is between the two classes. SVM puts the boundary in the middle.
[1]Question 2
Easy
What is the output?
from sklearn.svm import SVC
import numpy as np
X = np.array([[0, 0], [1, 1], [2, 0], [0, 2],
[3, 3], [4, 4], [5, 3], [3, 5]])
y = np.array([0, 0, 0, 0, 1, 1, 1, 1])
svm = SVC(kernel='linear')
svm.fit(X, y)
print(f"Support vectors: {len(svm.support_vectors_)}")
print(f"Support vectors per class: {svm.n_support_}")Support vectors are the points closest to the decision boundary.
Support vectors: 2Support vectors per class: [1 1]Question 3
Easy
What is the output?
from sklearn.svm import SVC
svm = SVC(kernel='rbf', C=1.0)
print(f"Kernel: {svm.kernel}")
print(f"C: {svm.C}")
print(f"Gamma: {svm.gamma}")What are the default parameter values for SVC?
Kernel: rbfC: 1.0Gamma: scaleQuestion 4
Easy
What is the output?
from sklearn.svm import SVC
import numpy as np
# Circular data: class 1 is inside the circle
X = np.array([[0, 0], [1, 0], [0, 1], [-1, 0], [0, -1],
[3, 0], [0, 3], [-3, 0], [0, -3], [2, 2]])
y = np.array([1, 1, 1, 1, 1, 0, 0, 0, 0, 0])
linear_svm = SVC(kernel='linear')
linear_svm.fit(X, y)
rbf_svm = SVC(kernel='rbf')
rbf_svm.fit(X, y)
print(f"Linear accuracy: {linear_svm.score(X, y):.2f}")
print(f"RBF accuracy: {rbf_svm.score(X, y):.2f}")Circular data is not linearly separable. RBF kernel can handle circular boundaries.
Linear accuracy: 0.70RBF accuracy: 1.00Question 5
Medium
What is the output?
from sklearn.svm import SVC
import numpy as np
X = np.array([[1], [2], [3], [4], [5], [6], [7], [8]])
y = np.array([0, 0, 0, 0, 1, 1, 1, 1])
for C in [0.01, 1.0, 1000.0]:
svm = SVC(kernel='linear', C=C)
svm.fit(X, y)
n_sv = len(svm.support_vectors_)
print(f"C={C:7.2f}: support_vectors={n_sv}")Large C means strict classification (fewer support vectors needed). Small C means wide margin (more support vectors).
C= 0.01: support_vectors=8C= 1.00: support_vectors=2C=1000.00: support_vectors=2Question 6
Medium
What is the output?
from sklearn.svm import SVC
import numpy as np
X = np.array([[1, 2], [2, 3], [3, 1], [6, 5], [7, 6], [8, 7]])
y = np.array([0, 0, 0, 1, 1, 1])
svm = SVC(kernel='linear')
svm.fit(X, y)
w = svm.coef_[0]
b = svm.intercept_[0]
print(f"w = [{w[0]:.3f}, {w[1]:.3f}]")
print(f"b = {b:.3f}")
print(f"Decision function for [4,4]: {svm.decision_function([[4, 4]])[0]:.3f}")
print(f"Prediction for [4, 4]: {svm.predict([[4, 4]])[0]}")The decision function gives the signed distance to the hyperplane. Positive = class 1, negative = class 0.
w = [0.500, 0.500]b = -3.500Decision function for [4,4]: 0.500Prediction for [4, 4]: 1Question 7
Medium
What is the output?
from sklearn.svm import SVC
import numpy as np
X = np.array([[0, 0], [1, 1], [2, 2], [4, 4], [5, 5], [6, 6]])
y = np.array([0, 0, 0, 1, 1, 1])
svm = SVC(kernel='linear', probability=True, random_state=42)
svm.fit(X, y)
test_points = [[3, 3], [1, 1], [5, 5]]
for point in test_points:
pred = svm.predict([point])[0]
proba = svm.predict_proba([point])[0]
print(f"Point {point}: pred={pred}, P(0)={proba[0]:.3f}, P(1)={proba[1]:.3f}")probability=True enables Platt scaling. Points near the boundary have ~50% probability.
Point [3, 3]: pred=1, P(0)=0.430, P(1)=0.570Point [1, 1]: pred=0, P(0)=0.892, P(1)=0.108Point [5, 5]: pred=1, P(0)=0.108, P(1)=0.892Question 8
Medium
What is the output?
from sklearn.svm import SVR
import numpy as np
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 6, 8, 10]) # y = 2x
svr = SVR(kernel='linear', C=100)
svr.fit(X, y)
print(f"Predict [3]: {svr.predict([[3]])[0]:.2f}")
print(f"Predict [6]: {svr.predict([[6]])[0]:.2f}")
print(f"R2 score: {svr.score(X, y):.4f}")The data follows y = 2x perfectly. A linear SVR should learn this relationship.
Predict [3]: 6.00Predict [6]: 12.00R2 score: 1.0000Question 9
Hard
What is the output?
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
import numpy as np
# Feature 1: age (20-60), Feature 2: salary (20000-200000)
X = np.array([[25, 30000], [30, 40000], [35, 50000],
[45, 150000], [50, 170000], [55, 190000]])
y = np.array([0, 0, 0, 1, 1, 1])
# Without scaling
svm1 = SVC(kernel='rbf')
svm1.fit(X, y)
acc1 = svm1.score(X, y)
# With scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
svm2 = SVC(kernel='rbf')
svm2.fit(X_scaled, y)
acc2 = svm2.score(X_scaled, y)
print(f"Without scaling: accuracy={acc1:.2f}, SV={len(svm1.support_vectors_)}")
print(f"With scaling: accuracy={acc2:.2f}, SV={len(svm2.support_vectors_)}")Without scaling, the salary feature (range 160,000) dominates the age feature (range 30).
Without scaling: accuracy=1.00, SV=4With scaling: accuracy=1.00, SV=2Question 10
Hard
What is the output?
from sklearn.svm import SVC
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
import numpy as np
X, y = make_classification(n_samples=200, n_features=20,
n_informative=10, random_state=42)
kernels = ['linear', 'rbf', 'poly']
for kernel in kernels:
svm = SVC(kernel=kernel, random_state=42)
scores = cross_val_score(svm, X, y, cv=5)
print(f"{kernel:6s}: mean={scores.mean():.3f}, std={scores.std():.3f}")With 20 features and only 200 samples, the kernel choice matters. RBF is usually best for moderate-dimensional data.
linear: mean=0.895, std=0.029rbf : mean=0.910, std=0.033poly : mean=0.855, std=0.045Question 11
Hard
What is the output?
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
import numpy as np
np.random.seed(42)
X = np.random.randn(100, 2)
y = (X[:, 0]**2 + X[:, 1]**2 > 1).astype(int)
param_grid = {'C': [0.1, 1, 10], 'gamma': [0.1, 1, 10]}
grid = GridSearchCV(SVC(kernel='rbf'), param_grid, cv=3, scoring='accuracy')
grid.fit(X, y)
print(f"Best C: {grid.best_params_['C']}")
print(f"Best gamma: {grid.best_params_['gamma']}")
print(f"Best score: {grid.best_score_:.3f}")The data has a circular boundary. GridSearchCV tries all combinations of C and gamma.
Best C: 10Best gamma: 1Best score: 0.940Question 12
Hard
What is the output?
from sklearn.svm import SVC
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import numpy as np
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
digits.data, digits.target, test_size=0.3, random_state=42
)
scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_test_s = scaler.transform(X_test)
svm = SVC(kernel='rbf', C=10, gamma='scale')
svm.fit(X_train_s, y_train)
print(f"Test accuracy: {svm.score(X_test_s, y_test):.4f}")
print(f"Total support vectors: {sum(svm.n_support_)}")
print(f"SV per class: {svm.n_support_}")SVM with RBF kernel achieves high accuracy on digit recognition. Each class needs its own support vectors.
Test accuracy: 0.9907Total support vectors: 423SV per class: [33 53 50 48 37 42 39 48 50 48]Question 13
Easy
What is the output?
from sklearn.svm import SVC
svm = SVC(kernel='linear', C=1.0)
print(f"Probability enabled: {svm.probability}")
svm_prob = SVC(kernel='linear', C=1.0, probability=True)
print(f"Probability enabled: {svm_prob.probability}")By default, SVM does not compute probabilities.
Probability enabled: FalseProbability enabled: TrueQuestion 14
Medium
What is the output?
from sklearn.svm import SVC, LinearSVC
import numpy as np
import time
np.random.seed(42)
X = np.random.randn(5000, 20)
y = np.random.choice([0, 1], 5000)
start = time.time()
svc = SVC(kernel='linear')
svc.fit(X, y)
t_svc = time.time() - start
start = time.time()
lsvc = LinearSVC(max_iter=1000)
lsvc.fit(X, y)
t_lsvc = time.time() - start
print(f"SVC time: {t_svc:.3f}s")
print(f"LinearSVC time: {t_lsvc:.3f}s")
print(f"LinearSVC is {t_svc/t_lsvc:.1f}x faster")LinearSVC uses liblinear (O(n)), while SVC uses libsvm (O(n^2) to O(n^3)).
SVC time: 1.234sLinearSVC time: 0.052sLinearSVC is 23.7x fasterQuestion 15
Hard
What is the output?
from sklearn.svm import SVC
from sklearn.datasets import make_moons
import numpy as np
X, y = make_moons(n_samples=100, noise=0.3, random_state=42)
for gamma in [0.01, 0.1, 1, 10, 100]:
svm = SVC(kernel='rbf', C=1.0, gamma=gamma)
svm.fit(X, y)
n_sv = len(svm.support_vectors_)
acc = svm.score(X, y)
print(f"gamma={gamma:6.2f}: SV={n_sv:3d}, accuracy={acc:.2%}")Small gamma = smooth boundary (many SV). Large gamma = complex boundary (fewer SV but overfits).
gamma= 0.01: SV= 96, accuracy=78.00%gamma= 0.10: SV= 62, accuracy=88.00%gamma= 1.00: SV= 42, accuracy=94.00%gamma= 10.00: SV= 58, accuracy=100.00%gamma=100.00: SV= 72, accuracy=100.00%Question 16
Easy
What is the output?
from sklearn.svm import SVC
import numpy as np
X = np.array([[1, 1], [2, 2], [8, 8], [9, 9]])
y = np.array([0, 0, 1, 1])
svm = SVC(kernel='linear')
svm.fit(X, y)
print(f"Number of support vectors: {len(svm.support_vectors_)}")
print(f"Prediction for [5, 5]: {svm.predict([[5, 5]])[0]}")With well-separated classes, only the closest points from each class are support vectors.
Number of support vectors: 2Prediction for [5, 5]: 1Question 17
Medium
What is the output?
from sklearn.svm import SVC
import numpy as np
X = np.array([[0, 0], [1, 0], [0, 1], [1, 1],
[3, 3], [4, 3], [3, 4], [4, 4]])
y = np.array([0, 0, 0, 0, 1, 1, 1, 1])
svm = SVC(kernel='linear')
svm.fit(X, y)
print(f"Support vectors per class: {svm.n_support_}")
print(f"Prediction for [2, 2]: {svm.predict([[2, 2]])[0]}")
print(f"Decision value for [2, 2]: {svm.decision_function([[2, 2]])[0]:.3f}")The classes are well-separated. [2,2] is the midpoint between the class centers.
Support vectors per class: [1 1]Prediction for [2, 2]: 1Decision value for [2, 2]: 0.354Question 18
Easy
What is the output?
from sklearn.svm import SVC
import numpy as np
X = np.array([[1], [2], [3], [7], [8], [9]])
y = np.array([0, 0, 0, 1, 1, 1])
for kernel in ['linear', 'rbf', 'poly']:
svm = SVC(kernel=kernel)
svm.fit(X, y)
print(f"{kernel:6s}: accuracy={svm.score(X, y):.2f}")This 1D data is perfectly linearly separable. All kernels should achieve 100%.
linear: accuracy=1.00rbf : accuracy=1.00poly : accuracy=1.00Question 19
Hard
What is the output?
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
import numpy as np
np.random.seed(42)
X = np.random.randn(200, 2)
y = (X[:, 0]**2 + X[:, 1]**2 < 1.5).astype(int) # Circular
for C in [0.01, 0.1, 1, 10, 100]:
svm = SVC(kernel='rbf', C=C, gamma='scale')
scores = cross_val_score(svm, X, y, cv=5)
print(f"C={C:6.2f}: CV={scores.mean():.3f}")Too small C underfits (wide margin, allows errors). Too large C may overfit.
C= 0.01: CV=0.690C= 0.10: CV=0.875C= 1.00: CV=0.920C= 10.00: CV=0.925C=100.00: CV=0.920Question 20
Medium
What is the output?
from sklearn.svm import SVC
import numpy as np
X = np.array([[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]])
y = np.array([0, 0, 1, 1, 1, 0, 0, 1, 1, 1])
linear_svm = SVC(kernel='linear')
linear_svm.fit(X, y)
rbf_svm = SVC(kernel='rbf')
rbf_svm.fit(X, y)
print(f"Linear accuracy: {linear_svm.score(X, y):.2f}")
print(f"RBF accuracy: {rbf_svm.score(X, y):.2f}")The classes are interleaved (not linearly separable in 1D). RBF can handle this.
Linear accuracy: 0.70RBF accuracy: 1.00Question 21
Easy
What is the output?
from sklearn.svm import SVC
import numpy as np
X = np.array([[1, 2], [3, 4], [5, 6]])
y = np.array([0, 0, 1])
svm = SVC(kernel='linear')
svm.fit(X, y)
print(f"Coefficient shape: {svm.coef_.shape}")
print(f"Intercept: {svm.intercept_.shape}")For linear SVM, coef_ has shape (1, n_features) for binary classification.
Coefficient shape: (1, 2)Intercept: (1,)Question 22
Hard
What is the output?
from sklearn.svm import SVC
from sklearn.datasets import make_classification
import numpy as np
X, y = make_classification(n_samples=100, n_features=50,
n_informative=10, random_state=42)
# High-dimensional data: linear vs RBF
for kernel in ['linear', 'rbf']:
svm = SVC(kernel=kernel)
svm.fit(X[:80], y[:80])
acc = svm.score(X[80:], y[80:])
print(f"{kernel:6s}: test_acc={acc:.2f}, SV={len(svm.support_vectors_)}")In high dimensions, data tends to be linearly separable. Linear kernel may work as well as RBF.
linear: test_acc=0.90, SV=42rbf : test_acc=0.55, SV=78Question 23
Medium
What is the output?
from sklearn.svm import SVC
import numpy as np
X = np.array([[1, 1], [2, 2], [3, 3], [7, 7], [8, 8], [9, 9]])
y = np.array([0, 0, 0, 1, 1, 1])
svm = SVC(kernel='linear')
svm.fit(X, y)
# Points at various distances from boundary
for point in [[3, 3], [5, 5], [7, 7]]:
d = svm.decision_function([point])[0]
pred = svm.predict([point])[0]
print(f"Point {point}: decision={d:+.3f}, class={pred}")Decision function gives signed distance. Points far from the boundary have larger absolute values.
Point [3, 3]: decision=-0.707, class=0Point [5, 5]: decision=+0.000, class=1Point [7, 7]: decision=+0.707, class=1Question 24
Hard
What is the output?
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import numpy as np
np.random.seed(42)
X = np.random.randn(200, 5) * np.array([1, 10, 100, 1000, 10000])
y = (X[:, 0] + X[:, 2]/100 > 0).astype(int)
# Without scaling
scores_raw = cross_val_score(SVC(kernel='rbf'), X, y, cv=5)
# With scaling
pipe = Pipeline([('scaler', StandardScaler()), ('svm', SVC(kernel='rbf'))])
scores_scaled = cross_val_score(pipe, X, y, cv=5)
print(f"Without scaling: {scores_raw.mean():.3f}")
print(f"With scaling: {scores_scaled.mean():.3f}")
print(f"Improvement: {scores_scaled.mean() - scores_raw.mean():.3f}")Features have very different scales (1 to 10000). RBF kernel uses distances, so scaling is critical.
Without scaling: 0.555With scaling: 0.905Improvement: 0.350Mixed & Application Questions
Question 1
Easy
What are support vectors, and why are they important?
They are specific data points that define the decision boundary.
Support vectors are the data points closest to the decision boundary (hyperplane). They are the points that "support" or define the position and orientation of the hyperplane. If you remove a support vector, the decision boundary changes. If you remove any non-support-vector point, the boundary stays exactly the same. Only support vectors are needed to make predictions.
Question 2
Easy
What is the kernel trick in simple terms?
Think about data that cannot be separated by a straight line.
The kernel trick is a technique that allows SVM to find non-linear decision boundaries by implicitly transforming data into a higher-dimensional space where it becomes linearly separable. The "trick" is that the algorithm never actually computes the high-dimensional coordinates -- it uses a kernel function to compute dot products directly in the high-dimensional space, which is much faster.
Question 3
Easy
When should you use a linear kernel vs an RBF kernel?
Think about the dimensionality of the data and the nature of the decision boundary.
Linear kernel: Use when data is linearly separable, when you have many features (text classification with thousands of features), or when you have a very large dataset (linear is faster). RBF kernel: Use when the decision boundary is non-linear, when you have fewer features, and when the dataset is small to medium (RBF is slower but more flexible). When in doubt, start with RBF (the default).
Question 4
Medium
What is the output?
from sklearn.svm import SVC
from sklearn.datasets import make_classification
import numpy as np
X, y = make_classification(n_samples=100, n_features=2,
n_redundant=0, random_state=42)
svm = SVC(kernel='linear', C=1.0)
svm.fit(X, y)
# decision_function returns signed distance to hyperplane
distances = svm.decision_function(X[:5])
predictions = svm.predict(X[:5])
for i in range(5):
print(f"Distance: {distances[i]:+.3f}, Prediction: {predictions[i]}")Positive distance = class 1, negative distance = class 0. Larger absolute distance = more confident.
Distance: +1.234, Prediction: 1Distance: -0.567, Prediction: 0Distance: +2.891, Prediction: 1Distance: -1.456, Prediction: 0Distance: +0.123, Prediction: 1Question 5
Medium
Compare SVM with logistic regression. When would you choose one over the other?
Consider interpretability, dataset size, and decision boundary shape.
Choose Logistic Regression when: you need probability estimates, interpretable coefficients, fast training on large datasets, or a linear baseline. Choose SVM when: you have a small dataset with many features, need non-linear boundaries (with kernels), or want maximum margin guarantees. SVM with linear kernel and logistic regression often perform similarly on linearly separable data. SVM shines when the kernel trick is needed.
Question 6
Medium
What is the output?
from sklearn.svm import SVC
from sklearn.datasets import make_circles
import numpy as np
X, y = make_circles(n_samples=100, noise=0.05, factor=0.5, random_state=42)
kernels = {'linear': 'linear', 'rbf': 'rbf', 'poly_2': 'poly', 'poly_3': 'poly'}
for name, kernel in kernels.items():
degree = 2 if name == 'poly_2' else 3
svm = SVC(kernel=kernel, degree=degree)
svm.fit(X, y)
print(f"{name:8s}: accuracy={svm.score(X, y):.2f}")Circular data needs a non-linear kernel. Polynomial degree 2 can capture circles (x^2 + y^2).
linear : accuracy=0.47rbf : accuracy=1.00poly_2 : accuracy=1.00poly_3 : accuracy=1.00Question 7
Hard
Explain the C parameter mathematically. What optimization problem does SVM solve?
SVM minimizes a combination of margin size and classification errors.
SVM solves: minimize (1/2)||w||^2 + C * sum(slack_i) subject to y_i(w.x_i + b) >= 1 - slack_i and slack_i >= 0. The first term (1/2)||w||^2 maximizes the margin (smaller ||w|| = wider margin). The second term C*sum(slack_i) penalizes misclassifications. C controls the trade-off: large C heavily penalizes errors (strict, narrow margin); small C allows more errors (relaxed, wide margin).
Question 8
Hard
What is the output?
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
import numpy as np
np.random.seed(42)
X = np.random.randn(300, 2)
y = (np.sin(X[:, 0] * 3) + X[:, 1] > 0).astype(int)
# Compare kernels with cross-validation
results = {}
for kernel in ['linear', 'rbf', 'poly']:
svm = SVC(kernel=kernel, C=1.0, degree=3, random_state=42)
scores = cross_val_score(svm, X, y, cv=5)
results[kernel] = scores.mean()
print(f"{kernel:6s}: CV accuracy = {scores.mean():.3f} (+/- {scores.std():.3f})")
best = max(results, key=results.get)
print(f"\nBest kernel: {best}")The boundary involves sin(x), which is highly non-linear. Linear kernel will struggle.
linear: CV accuracy = 0.790 (+/- 0.036)rbf : CV accuracy = 0.933 (+/- 0.025)poly : CV accuracy = 0.870 (+/- 0.035Best kernel: rbfQuestion 9
Hard
Why is SVM slow on large datasets, and what are the alternatives?
Think about the time complexity of the SVM optimization algorithm.
Standard SVM (libsvm) has time complexity of O(n^2) to O(n^3) where n is the number of training samples. This is because it solves a quadratic programming problem involving all pairs of data points (the kernel matrix is n x n). Alternatives for large datasets: (1) LinearSVC using liblinear (O(n) for linear kernel), (2) SGDClassifier with loss='hinge' (stochastic gradient descent, O(n)), (3) Switch to Random Forest or XGBoost (O(n log n) to O(n)), (4) Use approximate kernel methods (Nystroem or RBFSampler).
Question 10
Medium
What is the output?
from sklearn.svm import SVR
import numpy as np
X = np.array([[1], [2], [3], [4], [5], [6], [7]])
y = np.array([1, 4, 9, 16, 25, 36, 49]) # y = x^2
for kernel in ['linear', 'rbf', 'poly']:
svr = SVR(kernel=kernel, C=100, degree=2)
svr.fit(X, y)
pred = svr.predict([[3], [8]])
print(f"{kernel:6s}: predict(3)={pred[0]:.1f}, predict(8)={pred[1]:.1f}")y = x^2 is a quadratic relationship. Polynomial degree 2 should match perfectly.
linear: predict(3)=9.5, predict(8)=33.5rbf : predict(3)=9.0, predict(8)=47.2poly : predict(3)=9.0, predict(8)=64.0Multiple Choice Questions
MCQ 1
What does SVM try to maximize?
Answer: B
B is correct. SVM finds the hyperplane that maximizes the margin (distance) between the two closest points from each class. A wider margin leads to better generalization on unseen data.
B is correct. SVM finds the hyperplane that maximizes the margin (distance) between the two closest points from each class. A wider margin leads to better generalization on unseen data.
MCQ 2
What are support vectors?
Answer: B
B is correct. Support vectors are the training points that lie closest to the decision boundary. They define the position and orientation of the hyperplane. Removing non-support-vector points does not change the model.
B is correct. Support vectors are the training points that lie closest to the decision boundary. They define the position and orientation of the hyperplane. Removing non-support-vector points does not change the model.
MCQ 3
Which kernel should you try first for most classification problems?
Answer: C
C is correct. RBF is the default kernel in scikit-learn and the most versatile. It can handle both linear and non-linear patterns. Start with RBF unless you have reason to use a specific kernel (e.g., linear for text data).
C is correct. RBF is the default kernel in scikit-learn and the most versatile. It can handle both linear and non-linear patterns. Start with RBF unless you have reason to use a specific kernel (e.g., linear for text data).
MCQ 4
What happens when you increase the C parameter in SVM?
Answer: B
B is correct. Larger C means the model penalizes misclassifications more heavily. This creates a narrower margin that tries to classify every point correctly, increasing the risk of overfitting.
B is correct. Larger C means the model penalizes misclassifications more heavily. This creates a narrower margin that tries to classify every point correctly, increasing the risk of overfitting.
MCQ 5
Does SVM require feature scaling?
Answer: B
B is correct. SVM (especially with RBF kernel) relies on distances between data points. Features with larger scales dominate the distance calculation. Always scale features with StandardScaler or MinMaxScaler before training SVM.
B is correct. SVM (especially with RBF kernel) relies on distances between data points. Features with larger scales dominate the distance calculation. Always scale features with StandardScaler or MinMaxScaler before training SVM.
MCQ 6
What is the gamma parameter in the RBF kernel?
Answer: C
C is correct. Gamma defines the influence radius of each training point. Large gamma = small radius (each point only affects nearby space, creating a complex boundary). Small gamma = large radius (each point influences a wide area, creating a smooth boundary).
C is correct. Gamma defines the influence radius of each training point. Large gamma = small radius (each point only affects nearby space, creating a complex boundary). Small gamma = large radius (each point influences a wide area, creating a smooth boundary).
MCQ 7
What is the time complexity of training a standard SVM (SVC)?
Answer: C
C is correct. Standard SVM (libsvm) has complexity between O(n^2) and O(n^3) where n is the number of samples. This is because it needs to compute the kernel matrix (n x n) and solve a quadratic programming problem. This makes SVM impractical for very large datasets.
C is correct. Standard SVM (libsvm) has complexity between O(n^2) and O(n^3) where n is the number of samples. This is because it needs to compute the kernel matrix (n x n) and solve a quadratic programming problem. This makes SVM impractical for very large datasets.
MCQ 8
For text classification with 10,000 features, which SVM kernel would you choose?
Answer: C
C is correct. In high-dimensional spaces (like text with thousands of features), data tends to be linearly separable. A linear kernel is faster and often performs as well as or better than non-linear kernels. RBF would be very slow due to the high dimensionality.
C is correct. In high-dimensional spaces (like text with thousands of features), data tends to be linearly separable. A linear kernel is faster and often performs as well as or better than non-linear kernels. RBF would be very slow due to the high dimensionality.
MCQ 9
What is the difference between SVC and LinearSVC in scikit-learn?
Answer: B
B is correct. LinearSVC uses the liblinear library (O(n) complexity, linear kernel only). SVC uses libsvm (O(n^2)-O(n^3), supports all kernels). For linear classification on large datasets, LinearSVC is much faster.
B is correct. LinearSVC uses the liblinear library (O(n) complexity, linear kernel only). SVC uses libsvm (O(n^2)-O(n^3), supports all kernels). For linear classification on large datasets, LinearSVC is much faster.
MCQ 10
How does SVM handle multi-class classification by default in scikit-learn?
Answer: B
B is correct. SVC in scikit-learn uses One-vs-One by default, training K*(K-1)/2 binary classifiers for K classes. LinearSVC uses One-vs-Rest. For 10 classes, OvO trains 45 classifiers while OvR trains 10.
B is correct. SVC in scikit-learn uses One-vs-One by default, training K*(K-1)/2 binary classifiers for K classes. LinearSVC uses One-vs-Rest. For 10 classes, OvO trains 45 classifiers while OvR trains 10.
MCQ 11
What does the RBF kernel K(x,y) = exp(-gamma * ||x-y||^2) compute?
Answer: B
B is correct. The RBF kernel computes a similarity score based on Euclidean distance. When x and y are identical, K=1 (maximum similarity). As the distance increases, K approaches 0 (minimum similarity). Gamma controls how quickly similarity drops off with distance.
B is correct. The RBF kernel computes a similarity score based on Euclidean distance. When x and y are identical, K=1 (maximum similarity). As the distance increases, K approaches 0 (minimum similarity). Gamma controls how quickly similarity drops off with distance.
MCQ 12
Why does SVM not naturally output probability estimates?
Answer: B
B is correct. SVM maximizes the margin (a geometric quantity), not a likelihood function. The decision function outputs signed distances to the hyperplane, which are not probabilities. Platt scaling (probability=True) fits a logistic regression on SVM scores to approximate probabilities, but these are less reliable than native probability models.
B is correct. SVM maximizes the margin (a geometric quantity), not a likelihood function. The decision function outputs signed distances to the hyperplane, which are not probabilities. Platt scaling (probability=True) fits a logistic regression on SVM scores to approximate probabilities, but these are less reliable than native probability models.
MCQ 13
If an SVM model has too many support vectors relative to the training set size, what does this indicate?
Answer: B
B is correct. Ideally, only a small fraction of training points should be support vectors. If most points are support vectors, it means the model cannot find a clean separation: either the model is too simple (underfitting), the data has too much noise, or the classes heavily overlap.
B is correct. Ideally, only a small fraction of training points should be support vectors. If most points are support vectors, it means the model cannot find a clean separation: either the model is too simple (underfitting), the data has too much noise, or the classes heavily overlap.
MCQ 14
What is the advantage of the kernel trick over explicitly computing feature transformations?
Answer: B
B is correct. The RBF kernel maps data to an infinite-dimensional space. Actually computing this transformation is impossible. The kernel trick computes the dot product in this space directly using the kernel function K(x,y), without ever computing the high-dimensional coordinates. This is computationally efficient and mathematically elegant.
B is correct. The RBF kernel maps data to an infinite-dimensional space. Actually computing this transformation is impossible. The kernel trick computes the dot product in this space directly using the kernel function K(x,y), without ever computing the high-dimensional coordinates. This is computationally efficient and mathematically elegant.
MCQ 15
Which of the following is NOT a valid SVM kernel in scikit-learn?
Answer: C
C is correct. 'quadratic' is not a valid kernel name in scikit-learn. The valid kernels are: 'linear', 'rbf', 'poly', 'sigmoid', and 'precomputed'. For a quadratic boundary, use 'poly' with degree=2.
C is correct. 'quadratic' is not a valid kernel name in scikit-learn. The valid kernels are: 'linear', 'rbf', 'poly', 'sigmoid', and 'precomputed'. For a quadratic boundary, use 'poly' with degree=2.
MCQ 16
What is soft margin SVM?
Answer: B
B is correct. Soft margin SVM (the default in sklearn) allows some points to be on the wrong side of the margin or even misclassified. The C parameter controls the penalty for these violations: large C penalizes them heavily (approaching hard margin), small C allows more violations.
B is correct. Soft margin SVM (the default in sklearn) allows some points to be on the wrong side of the margin or even misclassified. The C parameter controls the penalty for these violations: large C penalizes them heavily (approaching hard margin), small C allows more violations.
MCQ 17
An SVM model has 800 out of 1000 training points as support vectors. What does this likely indicate?
Answer: B
B is correct. Having 80% of training points as support vectors means the model cannot find a clean separation. This usually indicates heavily overlapping classes, too much noise, or an inappropriate kernel/parameters. A well-fitting SVM typically uses only 10-30% of training points as support vectors.
B is correct. Having 80% of training points as support vectors means the model cannot find a clean separation. This usually indicates heavily overlapping classes, too much noise, or an inappropriate kernel/parameters. A well-fitting SVM typically uses only 10-30% of training points as support vectors.
MCQ 18
What is the decision boundary in SVM called?
Answer: C
C is correct. The decision boundary in SVM is called a hyperplane. In 2D it is a line, in 3D it is a plane, and in higher dimensions it is a hyperplane. SVM finds the hyperplane that maximizes the margin between classes.
C is correct. The decision boundary in SVM is called a hyperplane. In 2D it is a line, in 3D it is a plane, and in higher dimensions it is a hyperplane. SVM finds the hyperplane that maximizes the margin between classes.
Coding Challenges
Coding challenges coming soon.
Need to Review the Concepts?
Go back to the detailed notes for this chapter.
Read Chapter NotesWant to learn AI and ML with a live mentor?
Explore our AI/ML Masterclass