Chapter 6 Intermediate 58 Questions

Practice Questions — Logistic Regression and Classification

← Back to Notes

12 Easy

14 Medium

12 Hard

Topic-Specific Questions

Question 1

Easy

What is the output of the following code?

import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

print(sigmoid(0))

When z=0, e^0 = 1, so sigmoid(0) = 1/(1+1).

0.5

Question 2

Easy

What is the output?

import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

print(sigmoid(100) > 0.99)
print(sigmoid(-100) < 0.01)

For very large positive z, sigmoid approaches 1. For very large negative z, it approaches 0.

True
True

Question 3

Easy

What is the output?

y_true = 1
y_pred_prob = 0.9

if y_pred_prob >= 0.5:
    y_pred = 1
else:
    y_pred = 0

print(f"Predicted: {y_pred}")
print(f"Correct: {y_pred == y_true}")

0.9 is above the 0.5 threshold, so the prediction is class 1.

Predicted: 1
Correct: True

Question 4

Easy

What is the output?

import numpy as np

# Softmax function
def softmax(z):
    exp_z = np.exp(z)
    return exp_z / np.sum(exp_z)

scores = [2.0, 1.0, 0.5]
probs = softmax(scores)
print(np.round(probs, 3))
print(f"Sum: {np.sum(probs):.1f}")

Softmax converts scores to probabilities that sum to 1.

[0.628 0.232 0.14 ]
Sum: 1.0

Question 5

Easy

What is the output?

# Confusion matrix values
TP = 40
TN = 50
FP = 10
FN = 5

accuracy = (TP + TN) / (TP + TN + FP + FN)
print(f"Accuracy: {accuracy:.2%}")

Accuracy is the ratio of all correct predictions (TP + TN) to total predictions.

Accuracy: 85.71%

Question 6

Easy

What is the output?

TP = 40
FP = 10
FN = 5

precision = TP / (TP + FP)
recall = TP / (TP + FN)
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")

Precision = TP/(TP+FP). Recall = TP/(TP+FN).

Precision: 0.80
Recall: 0.89

Question 7

Medium

What is the output?

import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# Simple model: w=2, b=-3
w = 2.0
b = -3.0

for x in [0, 1, 1.5, 2, 3]:
    z = w * x + b
    p = sigmoid(z)
    pred = 1 if p >= 0.5 else 0
    print(f"x={x}, z={z:.1f}, p={p:.3f}, pred={pred}")

The decision boundary is where z=0, i.e., 2*x - 3 = 0, i.e., x = 1.5.

x=0, z=-3.0, p=0.047, pred=0
x=1, z=-1.0, p=0.269, pred=0
x=1.5, z=0.0, p=0.500, pred=1
x=2, z=1.0, p=0.731, pred=1
x=3, z=3.0, p=0.953, pred=1

Question 8

Medium

What is the output?

import numpy as np

def log_loss(y_true, y_pred):
    epsilon = 1e-15
    y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
    return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))

# Perfect predictions
y = np.array([1, 0, 1, 0])
p_good = np.array([0.95, 0.05, 0.90, 0.10])
p_bad = np.array([0.10, 0.90, 0.20, 0.80])

print(f"Good predictions loss: {log_loss(y, p_good):.4f}")
print(f"Bad predictions loss:  {log_loss(y, p_bad):.4f}")

Good predictions (close to true labels) have low loss. Bad predictions have high loss.

Good predictions loss: 0.0743
Bad predictions loss: 1.8184

Question 9

Medium

What is the output?

from sklearn.metrics import confusion_matrix

y_true = [1, 1, 0, 0, 1, 0, 1, 0, 1, 0]
y_pred = [1, 0, 0, 0, 1, 1, 1, 0, 0, 0]

cm = confusion_matrix(y_true, y_pred)
print(cm)

Row 0 = actual negatives, Row 1 = actual positives. Column 0 = predicted negative, Column 1 = predicted positive.

[[4 1]
 [2 3]]

Question 10

Medium

What is the output?

precision = 0.8
recall = 0.6

f1 = 2 * precision * recall / (precision + recall)
print(f"F1 Score: {f1:.4f}")

F1 is the harmonic mean of precision and recall.

F1 Score: 0.6857

Question 11

Medium

What is the output?

from sklearn.linear_model import LogisticRegression
import numpy as np

X = np.array([[1], [2], [3], [4], [5], [6], [7], [8]])
y = np.array([0, 0, 0, 0, 1, 1, 1, 1])

model = LogisticRegression(random_state=42)
model.fit(X, y)

print(model.predict([[4.5]]))
print(model.predict_proba([[4.5]]).round(3))

4.5 is right at the boundary between the two classes (0s are 1-4, 1s are 5-8).

[1]
[[0.472 0.528]]

Question 12

Medium

What is the output?

import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# Verify the derivative property
z = 2.0
s = sigmoid(z)
derivative = s * (1 - s)

# Numerical derivative
h = 1e-7
num_derivative = (sigmoid(z + h) - sigmoid(z)) / h

print(f"Analytical derivative: {derivative:.6f}")
print(f"Numerical derivative:  {num_derivative:.6f}")
print(f"Match: {abs(derivative - num_derivative) < 1e-5}")

The derivative of sigmoid is sigmoid(z) * (1 - sigmoid(z)). The numerical derivative should match.

Analytical derivative: 0.104994
Numerical derivative: 0.104994
Match: True

Question 13

Hard

What is the output?

import numpy as np

def softmax(z):
    exp_z = np.exp(z - np.max(z))  # subtract max for numerical stability
    return exp_z / np.sum(exp_z)

# Equal scores
probs1 = softmax([1, 1, 1])
print("Equal scores:", np.round(probs1, 3))

# One dominant score
probs2 = softmax([10, 1, 1])
print("Dominant score:", np.round(probs2, 3))

# Very large differences
probs3 = softmax([100, 0, 0])
print("Extreme:", np.round(probs3, 3))

Softmax with equal inputs gives uniform probabilities. Large differences push probabilities toward 0 and 1.

Equal scores: [0.333 0.333 0.333]
Dominant score: [1. 0. 0. ]
Extreme: [1. 0. 0.]

Question 14

Hard

What is the output?

from sklearn.metrics import precision_score, recall_score, f1_score

# Scenario: disease detection
# 100 patients: 10 have disease (positive), 90 do not (negative)
y_true = [0]*90 + [1]*10

# Model A: predicts all negative
y_pred_a = [0]*100
print(f"Model A recall: {recall_score(y_true, y_pred_a):.2f}")

# Model B: predicts all positive
y_pred_b = [1]*100
print(f"Model B precision: {precision_score(y_true, y_pred_b):.2f}")
print(f"Model B recall: {recall_score(y_true, y_pred_b):.2f}")

Model A catches no positives. Model B catches all positives but also flags all negatives.

Model A recall: 0.00
Model B precision: 0.10
Model B recall: 1.00

Question 15

Hard

What is the output?

from sklearn.linear_model import LogisticRegression
import numpy as np

X = np.array([[1, 2], [2, 3], [3, 4], [5, 6], [6, 7], [7, 8]])
y = np.array([0, 0, 0, 1, 1, 1])

model = LogisticRegression(random_state=42)
model.fit(X, y)

# Get coefficients
w1, w2 = model.coef_[0]
b = model.intercept_[0]
print(f"w1={w1:.3f}, w2={w2:.3f}, b={b:.3f}")

# Decision boundary: w1*x1 + w2*x2 + b = 0
# At x1=4, solve for x2
x1 = 4
x2_boundary = -(w1 * x1 + b) / w2
print(f"At x1=4, boundary x2={x2_boundary:.2f}")

The decision boundary is the line w1*x1 + w2*x2 + b = 0. Solve for x2 given x1.

w1=0.596, w2=0.596, b=-4.769
At x1=4, boundary x2=4.00

Question 16

Hard

What is the output?

import numpy as np

# Gradient descent step for logistic regression
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

X = np.array([[1, 2], [3, 4]])
y = np.array([0, 1])
w = np.array([0.0, 0.0])
b = 0.0
lr = 0.1

# One gradient descent step
z = X.dot(w) + b
preds = sigmoid(z)
error = preds - y

dw = (1/len(y)) * X.T.dot(error)
db = (1/len(y)) * np.sum(error)

w_new = w - lr * dw
b_new = b - lr * db

print(f"Initial predictions: {np.round(preds, 3)}")
print(f"Errors: {np.round(error, 3)}")
print(f"dw: {np.round(dw, 4)}")
print(f"New weights: {np.round(w_new, 4)}")

Initial weights are zero, so z=0 for all samples, sigmoid(0)=0.5 for all samples.

Initial predictions: [0.5 0.5]
Errors: [ 0.5 -0.5]
dw: [-0.5 -0.5]
New weights: [0.05 0.05]

Question 17

Hard

What is the output?

from sklearn.linear_model import LogisticRegression
import numpy as np

X = np.array([[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]])
y = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1])

model = LogisticRegression(random_state=42)
model.fit(X, y)

# Predict probabilities for threshold analysis
for threshold in [0.3, 0.5, 0.7]:
    y_pred = (model.predict_proba(X)[:, 1] >= threshold).astype(int)
    tp = sum((y == 1) & (y_pred == 1))
    fp = sum((y == 0) & (y_pred == 1))
    fn = sum((y == 1) & (y_pred == 0))
    prec = tp / (tp + fp) if (tp + fp) > 0 else 0
    rec = tp / (tp + fn) if (tp + fn) > 0 else 0
    print(f"Threshold={threshold}: Precision={prec:.2f}, Recall={rec:.2f}")

Lower threshold predicts more positives (higher recall, lower precision). Higher threshold is more selective.

Threshold=0.3: Precision=0.83, Recall=1.00
Threshold=0.5: Precision=1.00, Recall=1.00
Threshold=0.7: Precision=1.00, Recall=0.80

Question 18

Easy

What is the output?

TP = 30
TN = 60
FP = 10
FN = 0

recall = TP / (TP + FN)
print(f"Recall: {recall:.2f}")

FN = 0 means no actual positives were missed.

Recall: 1.00

Question 19

Medium

What is the output?

import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# Probability that sigmoid(z) + sigmoid(-z) = 1?
for z in [0, 2, -3, 10]:
    s_pos = sigmoid(z)
    s_neg = sigmoid(-z)
    print(f"sigmoid({z:3d}) + sigmoid({-z:3d}) = {s_pos + s_neg:.4f}")

This is a mathematical property of the sigmoid function: sigma(z) + sigma(-z) = 1.

sigmoid( 0) + sigmoid( 0) = 1.0000
sigmoid( 2) + sigmoid( -2) = 1.0000
sigmoid( -3) + sigmoid( 3) = 1.0000
sigmoid( 10) + sigmoid(-10) = 1.0000

Question 20

Easy

What is the output?

from sklearn.linear_model import LogisticRegression
import numpy as np

X = np.array([[1], [2], [3], [4], [5]])
y = np.array([0, 0, 1, 1, 1])

model = LogisticRegression(random_state=42)
model.fit(X, y)

print(f"Classes: {model.classes_}")
print(f"Number of features: {model.n_features_in_}")

model.classes_ shows the unique classes, and n_features_in_ shows input feature count.

Classes: [0 1]
Number of features: 1

Question 21

Hard

What is the output?

import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# Binary cross-entropy for perfect vs terrible predictions
y_true = np.array([1, 0, 1, 0])

# Case 1: Perfect predictions
p_perfect = np.array([0.999, 0.001, 0.999, 0.001])
loss_perfect = -np.mean(y_true * np.log(p_perfect) + (1 - y_true) * np.log(1 - p_perfect))

# Case 2: Terrible predictions (confident and wrong)
p_terrible = np.array([0.001, 0.999, 0.001, 0.999])
loss_terrible = -np.mean(y_true * np.log(p_terrible) + (1 - y_true) * np.log(1 - p_terrible))

print(f"Perfect predictions loss: {loss_perfect:.4f}")
print(f"Terrible predictions loss: {loss_terrible:.4f}")
print(f"Ratio: {loss_terrible/loss_perfect:.1f}x")

Log loss heavily penalizes confident wrong predictions. Being confidently wrong is much worse than being unsure.

Perfect predictions loss: 0.0010
Terrible predictions loss: 6.9078
Ratio: 6907.8x

Question 22

Medium

What is the output?

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import numpy as np

np.random.seed(42)
X = np.random.randn(100, 2)
y = (X[:, 0] + X[:, 1] > 0).astype(int)

# C controls regularization strength
for C in [0.001, 0.1, 1, 100]:
    model = LogisticRegression(C=C, random_state=42, max_iter=1000)
    model.fit(X, y)
    acc = model.score(X, y)
    coef_magnitude = np.sum(np.abs(model.coef_))
    print(f"C={C:6.3f}: accuracy={acc:.3f}, |coef|={coef_magnitude:.3f}")

Small C means strong regularization (small coefficients). Large C means weak regularization (larger coefficients).

Question 23

Easy

What is the output?

from sklearn.metrics import classification_report

y_true = [0, 0, 0, 1, 1, 1, 1, 1]
y_pred = [0, 0, 1, 1, 1, 1, 0, 0]

report = classification_report(y_true, y_pred, output_dict=True)
print(f"Class 0 precision: {report['0']['precision']:.2f}")
print(f"Class 1 recall: {report['1']['recall']:.2f}")
print(f"Overall accuracy: {report['accuracy']:.2f}")

For class 0: TP_0=2 (predicted 0, actual 0). For class 1: TP_1=3 (predicted 1, actual 1), FN_1=2 (predicted 0, actual 1).

Class 0 precision: 0.50
Class 1 recall: 0.60
Overall accuracy: 0.62

Question 24

Hard

What is the output?

from sklearn.linear_model import LogisticRegression
import numpy as np

np.random.seed(42)
X = np.random.randn(200, 2)
y = (X[:, 0]**2 + X[:, 1]**2 < 1).astype(int)  # Circular boundary

model = LogisticRegression(random_state=42, max_iter=1000)
model.fit(X, y)
print(f"Accuracy: {model.score(X, y):.3f}")
print(f"Can logistic regression capture circular boundaries? {'Yes' if model.score(X, y) > 0.85 else 'No'}")

Logistic regression creates a linear (straight line) decision boundary. A circular boundary is non-linear.

Accuracy: 0.665
Can logistic regression capture circular boundaries? No

Question 25

Medium

What is the output?

from sklearn.linear_model import LogisticRegression
import numpy as np

X = np.array([[1, 0], [0, 1], [1, 1],
              [4, 3], [3, 4], [4, 4]])
y = np.array([0, 0, 0, 1, 1, 1])

model = LogisticRegression(random_state=42)
model.fit(X, y)

# predict_proba returns [P(class 0), P(class 1)] for each sample
proba = model.predict_proba([[2, 2]])[0]
print(f"P(class 0) + P(class 1) = {proba[0] + proba[1]:.1f}")
print(f"Prediction: {model.predict([[2, 2]])[0]}")

Probabilities from predict_proba always sum to 1.

P(class 0) + P(class 1) = 1.0
Prediction: 0

Question 26

Hard

What is the output?

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
import numpy as np

iris = load_iris()
X, y = iris.data, iris.target

model = LogisticRegression(multi_class='multinomial', max_iter=1000, random_state=42)
model.fit(X, y)

print(f"Coefficient shape: {model.coef_.shape}")
print(f"Intercept shape: {model.intercept_.shape}")
print(f"Number of classes: {len(model.classes_)}")
print(f"Accuracy: {model.score(X, y):.4f}")

For K classes with D features, the coefficient matrix is K x D.

Coefficient shape: (3, 4)
Intercept shape: (3,)
Number of classes: 3
Accuracy: 0.9733

Mixed & Application Questions

Question 1

Easy

What is the key difference between linear regression and logistic regression?

Think about what each one predicts.

Linear regression predicts a continuous value (e.g., price, temperature). Logistic regression predicts a probability of belonging to a class (e.g., spam or not spam). Linear regression outputs any real number; logistic regression outputs a value between 0 and 1 using the sigmoid function.

Question 2

Easy

What does the sigmoid function do, and what is its formula?

It maps any real number to a specific range.

The sigmoid function maps any real number to a value between 0 and 1. Formula: sigma(z) = 1 / (1 + e^(-z)). When z=0, output is 0.5. When z is very positive, output approaches 1. When z is very negative, output approaches 0.

Question 3

Easy

In a confusion matrix, what is the difference between a False Positive (FP) and a False Negative (FN)?

The second word tells you the prediction, the first word tells you if it was correct.

False Positive (FP): The model predicted positive, but the actual label is negative (false alarm). False Negative (FN): The model predicted negative, but the actual label is positive (missed detection).

Question 4

Medium

What is the output?

from sklearn.linear_model import LogisticRegression
import numpy as np

X = np.array([[2, 3], [4, 5], [6, 7], [8, 9]])
y = np.array([0, 0, 1, 1])

model = LogisticRegression(random_state=42)
model.fit(X, y)

proba = model.predict_proba([[5, 6]])[0]
print(f"P(class 0): {proba[0]:.3f}")
print(f"P(class 1): {proba[1]:.3f}")
print(f"Sum: {sum(proba):.1f}")

predict_proba returns probabilities for each class. They always sum to 1.
P(class 0): 0.500
P(class 1): 0.500
Sum: 1.0



                Question 5
                Medium
                What is the output?
from sklearn.metrics import f1_score
import numpy as np

y_true = [1, 1, 1, 0, 0, 0, 0, 0, 0, 0]

# Model A: predicts all negative
y_pred_a = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

# Model B: correctly identifies 2 positives, 1 false positive
y_pred_b = [1, 1, 0, 1, 0, 0, 0, 0, 0, 0]

print(f"Model A F1: {f1_score(y_true, y_pred_a):.2f}")
print(f"Model B F1: {f1_score(y_true, y_pred_b):.2f}")
Model A has 0 true positives. Model B has TP=2, FP=1, FN=1.
Model A F1: 0.00
Model B F1: 0.67

                Question 6
                Medium
                Why can we not use Mean Squared Error (MSE) as the loss function for logistic regression?
Think about the shape of the loss surface when sigmoid is combined with MSE.
When MSE is used with the sigmoid function, the loss surface becomes non-convex with many local minima. Gradient descent can get stuck in a local minimum and fail to find the optimal solution. Log loss (binary cross-entropy) creates a convex loss surface for logistic regression, guaranteeing that gradient descent finds the global minimum.

                Question 7
                Medium
                What is the output?
import numpy as np

# Sigmoid derivative property
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

z_values = [0, 1, -1, 5, -5]
for z in z_values:
    s = sigmoid(z)
    deriv = s * (1 - s)
    print(f"z={z:2d}: sigmoid={s:.4f}, derivative={deriv:.4f}")
The derivative is maximal at z=0 (where sigmoid=0.5) and approaches 0 for extreme z values.
z= 0: sigmoid=0.5000, derivative=0.2500
z= 1: sigmoid=0.7311, derivative=0.1966
z=-1: sigmoid=0.2689, derivative=0.1966
z= 5: sigmoid=0.9933, derivative=0.0066
z=-5: sigmoid=0.0067, derivative=0.0066

                Question 8
                Medium
                Explain the difference between One-vs-Rest and Softmax for multi-class classification.
One trains multiple binary classifiers; the other trains a single unified model.
One-vs-Rest (OvR): Trains K separate binary classifiers, one per class. Each classifier learns "is this class X or not?" At prediction time, the class with the highest probability wins. Softmax: A single model with K output units. The softmax function converts raw scores to probabilities that sum to 1. All classes are considered simultaneously in a unified model.

                Question 9
                Hard
                What is the output?
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
import numpy as np

# Feature 1: study hours (1-10)
# Feature 2: height in cm (150-190) -- irrelevant!
X = np.array([[2, 160], [3, 175], [4, 155], [7, 180], [8, 165], [9, 170]])
y = np.array([0, 0, 0, 1, 1, 1])

# Without scaling
model1 = LogisticRegression(random_state=42, max_iter=1000)
model1.fit(X, y)
print("Without scaling, coefs:", np.round(model1.coef_[0], 4))

# With scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
model2 = LogisticRegression(random_state=42, max_iter=1000)
model2.fit(X_scaled, y)
print("With scaling, coefs:", np.round(model2.coef_[0], 4))
Without scaling, the coefficient for height (large values) appears small but is misleading. With scaling, coefficients reflect true importance.
Without scaling, coefs: [0.3986 0.0013]
With scaling, coefs: [1.4753 0.0398]

                Question 10
                Hard
                What is the output?
from sklearn.metrics import precision_score, recall_score

y_true = [1, 1, 1, 1, 1, 0, 0, 0, 0, 0]

# Conservative model (high threshold)
y_pred_conservative = [0, 0, 0, 1, 1, 0, 0, 0, 0, 0]

# Aggressive model (low threshold)
y_pred_aggressive = [1, 1, 1, 1, 1, 1, 1, 0, 0, 0]

print("Conservative:")
print(f"  Precision: {precision_score(y_true, y_pred_conservative):.2f}")
print(f"  Recall: {recall_score(y_true, y_pred_conservative):.2f}")
print("Aggressive:")
print(f"  Precision: {precision_score(y_true, y_pred_aggressive):.2f}")
print(f"  Recall: {recall_score(y_true, y_pred_aggressive):.2f}")
Conservative predicts fewer positives (fewer FP but more FN). Aggressive predicts more positives (fewer FN but more FP).
Conservative:
  Precision: 1.00
  Recall: 0.40
Aggressive:
  Precision: 0.71
  Recall: 1.00

                Question 11
                Hard
                A hospital uses a model to predict if a patient has cancer. Which metric should they prioritize: precision or recall? Why?
Consider the cost of missing a cancer patient vs the cost of a false alarm.
The hospital should prioritize recall. Missing a cancer patient (False Negative) is far more dangerous than a false alarm (False Positive). A missed cancer case means the patient does not receive treatment, which can be fatal. A false alarm means additional tests, which is inconvenient but not life-threatening. High recall ensures the model catches as many actual cancer cases as possible.

                Question 12
                Hard
                What is the output?
import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# Logistic regression prediction
weights = np.array([0.5, -0.3, 0.8])
bias = -0.2

# Two students
student_a = np.array([8, 3, 7])  # [study_hours, absences, assignment_score]
student_b = np.array([2, 8, 3])

for name, student in [('Aarav', student_a), ('Priya', student_b)]:
    z = np.dot(weights, student) + bias
    p = sigmoid(z)
    print(f"{name}: z={z:.2f}, P(pass)={p:.4f}, Prediction={'Pass' if p>=0.5 else 'Fail'}")
Compute z = w1*x1 + w2*x2 + w3*x3 + b for each student, then apply sigmoid.
Aarav: z=8.90, P(pass)=0.9999, Prediction=Pass
Priya: z=-1.80, P(pass)=0.1419, Prediction=Fail



        
        
            Multiple Choice Questions
            
                MCQ 1
                What does the sigmoid function output?
                A. Any real number
B. Only 0 or 1
C. A value between 0 and 1
D. A value between -1 and 1
Answer: C
C is correct. The sigmoid function maps any real number to a value strictly between 0 and 1. It never actually reaches 0 or 1 (only approaches them asymptotically), making it perfect for representing probabilities.

                MCQ 2
                What type of problem does logistic regression solve?
                A. Regression (predicting continuous values)
B. Classification (predicting categories)
C. Clustering (grouping data)
D. Dimensionality reduction
Answer: B
B is correct. Despite its name containing "regression", logistic regression is a classification algorithm. It predicts the probability of belonging to a class, not a continuous value.

                MCQ 3
                What is the default decision threshold in logistic regression?
                A. 0.25
B. 0.5
C. 0.75
D. 1.0
Answer: B
B is correct. By default, if the predicted probability is >= 0.5, the model predicts class 1; otherwise class 0. This threshold can be adjusted based on the application (e.g., lower for disease detection to increase recall).

                MCQ 4
                What is a True Positive (TP)?
                A. Predicted positive, actually negative
B. Predicted negative, actually positive
C. Predicted positive, actually positive
D. Predicted negative, actually negative
Answer: C
C is correct. A True Positive means the model predicted positive and the actual label is also positive. The prediction was correct. TP is in the top-left cell of the confusion matrix.

                MCQ 5
                Which loss function is used for logistic regression?
                A. Mean Squared Error (MSE)
B. Mean Absolute Error (MAE)
C. Log Loss (Binary Cross-Entropy)
D. Hinge Loss
Answer: C
C is correct. Log loss (binary cross-entropy) is the standard loss function for logistic regression. MSE would create a non-convex optimization problem with the sigmoid function. Hinge loss is used for SVMs.

                MCQ 6
                What is sigmoid(0)?
                A. 0
B. 0.5
C. 1
D. Undefined
Answer: B
B is correct. sigmoid(0) = 1/(1+e^0) = 1/(1+1) = 0.5. This is the decision boundary: when the linear combination equals 0, the model is equally uncertain about both classes.

                MCQ 7
                In sklearn, which method returns class probabilities instead of class labels?
                A. predict()
B. predict_proba()
C. score()
D. fit_predict()
Answer: B
B is correct. predict_proba() returns probability estimates for each class. predict() returns hard class labels (0 or 1). score() returns accuracy. fit_predict() is used for clustering, not classification.

                MCQ 8
                What does Precision measure?
                A. Of all actual positives, how many did we predict correctly
B. Of all positive predictions, how many are actually positive
C. The overall percentage of correct predictions
D. The ratio of true negatives to total negatives
Answer: B
B is correct. Precision = TP / (TP + FP). It answers: "Of all samples I predicted as positive, what fraction are truly positive?" High precision means few false positives (few false alarms).

                MCQ 9
                What does Recall measure?
                A. Of all positive predictions, how many are correct
B. Of all actual positives, how many did the model detect
C. The overall accuracy of the model
D. The specificity of the model
Answer: B
B is correct. Recall = TP / (TP + FN). It answers: "Of all actual positive cases, what fraction did the model catch?" High recall means few false negatives (few missed positives). Also called sensitivity or true positive rate.

                MCQ 10
                What is the F1 Score?
                A. The arithmetic mean of precision and recall
B. The geometric mean of precision and recall
C. The harmonic mean of precision and recall
D. The weighted sum of precision and recall
Answer: C
C is correct. F1 = 2 * (Precision * Recall) / (Precision + Recall). The harmonic mean penalizes extreme differences: if precision is 1.0 but recall is 0.0, F1 is 0, not 0.5 (as the arithmetic mean would give).

                MCQ 11
                In One-vs-Rest multi-class classification with 5 classes, how many binary classifiers are trained?
                A. 2
B. 5
C. 10
D. 25
Answer: B
B is correct. One-vs-Rest trains one binary classifier per class. With 5 classes, you get 5 classifiers: class 1 vs rest, class 2 vs rest, ..., class 5 vs rest. Each classifier decides if a sample belongs to its class or not.

                MCQ 12
                What happens when you lower the classification threshold from 0.5 to 0.3?
                A. Precision increases, recall decreases
B. Precision decreases, recall increases
C. Both precision and recall increase
D. Both precision and recall decrease
Answer: B
B is correct. Lowering the threshold means more samples are classified as positive. This catches more true positives (higher recall) but also introduces more false positives (lower precision). This is the precision-recall trade-off.

                MCQ 13
                Why is feature scaling important for logistic regression?
                A. It makes the model more accurate
B. It is required by Python syntax
C. It helps gradient descent converge faster and gives meaningful coefficients
D. It reduces the number of features
Answer: C
C is correct. Logistic regression uses gradient descent, which is sensitive to feature scales. Unscaled features cause the gradient to oscillate, slowing convergence. Scaled features also make coefficients comparable, enabling feature importance analysis.

                MCQ 14
                What does the softmax function guarantee about its outputs?
                A. All outputs are between -1 and 1
B. All outputs are positive and sum to 1
C. The largest output is always 1
D. All outputs are integers
Answer: B
B is correct. Softmax converts raw scores into probabilities. Each output is positive (because of the exponential), and they sum to exactly 1. This makes them interpretable as class probabilities for multi-class classification.

                MCQ 15
                If a model has precision=0.90 and recall=0.60, what is the F1 score?
                A. 0.75
B. 0.72
C. 0.80
D. 0.65
Answer: B
B is correct. F1 = 2 * (0.90 * 0.60) / (0.90 + 0.60) = 2 * 0.54 / 1.50 = 1.08 / 1.50 = 0.72. The F1 score is closer to the lower value (recall=0.60) than to the higher value (precision=0.90), because the harmonic mean penalizes imbalance.

                MCQ 16
                What is the gradient of log loss with respect to weights in logistic regression?
                A. (1/N) * X^T * (y - predictions)
B. (1/N) * X^T * (predictions - y)
C. X^T * (predictions - y)^2
D. (1/N) * (predictions - y)
Answer: B
B is correct. The gradient is dJ/dw = (1/N) * X^T * (predictions - y), where predictions are sigmoid outputs. This has the same form as linear regression's gradient, but predictions are now probabilities from the sigmoid function, not raw linear outputs.

                MCQ 17
                A logistic regression model with 2 features produces the decision boundary equation: 3*x1 + 2*x2 - 6 = 0. At the point (1, 1), what class does the model predict?
                A. Class 1 (positive)
B. Class 0 (negative)
C. Cannot determine without the sigmoid output
D. The point is exactly on the boundary
Answer: B
B is correct. At (1,1): z = 3*1 + 2*1 - 6 = 3 + 2 - 6 = -1. Since z < 0, sigmoid(z) < 0.5, and the model predicts class 0 (negative). Points with negative z values are on the negative side of the decision boundary.

                MCQ 18
                Given a confusion matrix: TN=80, FP=5, FN=10, TP=30. What is the accuracy?
                A. 0.88
B. 0.75
C. 0.86
D. 0.92
Answer: A
A is correct. Accuracy = (TP + TN) / (TP + TN + FP + FN) = (30 + 80) / (30 + 80 + 5 + 10) = 110 / 125 = 0.88. 88% of all predictions are correct.

                MCQ 19
                Why is log loss preferred over accuracy as a loss function during training?
                A. Log loss is faster to compute
B. Log loss is differentiable and provides gradients for optimization, while accuracy is not differentiable
C. Log loss always gives higher values
D. Accuracy cannot handle multi-class problems
Answer: B
B is correct. Accuracy is a step function (either correct or incorrect) and is not differentiable. You cannot compute gradients from accuracy, so gradient descent cannot use it. Log loss is a smooth, differentiable function that provides meaningful gradients for parameter optimization.

                MCQ 20
                In logistic regression, what does a weight coefficient of -2.5 for a feature mean?
                A. The feature always reduces the output by 2.5
B. As the feature increases by 1 unit, the log-odds of the positive class decrease by 2.5
C. The feature is 2.5 times less important than other features
D. The probability decreases by 2.5 for each unit increase
Answer: B
B is correct. In logistic regression, coefficients represent changes in log-odds. A coefficient of -2.5 means that for every 1-unit increase in that feature, the log-odds of the positive class decrease by 2.5. In practical terms, the feature has a strong negative association with the positive class.
        

        
        
            Coding Challenges
            Coding challenges coming soon.
        

        
        
            
            Previous Chapter
            Linear Regression - Your First ML Algorithm
        
            
            Next Chapter
            Decision Trees and Random Forests
        
        

        
        
            Need to Review the Concepts?
            Go back to the detailed notes for this chapter.
            Read Chapter Notes
        

        
        
            Want to learn AI and ML with a live mentor?
            Explore our AI/ML Masterclass