Practice Questions — Mathematics for Machine Learning
← Back to NotesTopic-Specific Questions
Question 1
Easy
Compute the dot product of [1, 2, 3] and [4, 5, 6] using NumPy. Show the manual calculation in a comment.
np.dot(a, b) or a @ b. Manual: 1*4 + 2*5 + 3*6.
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# Manual: 1*4 + 2*5 + 3*6 = 4 + 10 + 18 = 32
print(f"Dot product: {np.dot(a, b)}")
Output: Dot product: 32Question 2
Easy
Create a 3x3 identity matrix using NumPy and print it.
Use np.eye(3).
import numpy as np
I = np.eye(3)
print(I)
Output: [[1. 0. 0.] [0. 1. 0.] [0. 0. 1.]]Question 3
Easy
What is the output?
import numpy as np
A = np.array([[1, 2], [3, 4]])
print(A.T).T transposes the matrix: rows become columns.
[[1 3]
[2 4]]Question 4
Easy
What does the gradient tell us in Machine Learning?
Think about direction and steepness.
The gradient is a vector of partial derivatives that points in the direction of steepest increase of a function. In ML, we want to minimize the loss function, so we move in the opposite direction of the gradient. The magnitude of the gradient tells us how steep the function is at that point.
Question 5
Medium
Write NumPy code to multiply two matrices A = [[1, 2], [3, 4]] and B = [[5, 6], [7, 8]]. What is element [0][1] of the result?
Use A @ B. Element [0][1] = dot product of row 0 of A and column 1 of B.
import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
C = A @ B
print(f"A @ B:\n{C}")
print(f"\nElement [0][1] = 1*6 + 2*8 = {C[0, 1]}")
Output: [[19 22] [43 50]], Element [0][1] = 22Question 6
Medium
Write code to compute the mean, median, and standard deviation of the array [10, 20, 30, 40, 50, 1000] and explain why mean and median differ significantly.
Use np.mean(), np.median(), np.std(). The value 1000 is an outlier.
import numpy as np
arr = np.array([10, 20, 30, 40, 50, 1000])
print(f"Mean: {np.mean(arr):.1f}")
print(f"Median: {np.median(arr):.1f}")
print(f"Std: {np.std(arr):.1f}")
print("Mean is 191.7, median is 35.0")
print("The outlier 1000 drags the mean up but does not affect the median")Question 7
Medium
Explain Bayes Theorem with a real-world example. What are prior, likelihood, and posterior?
P(A|B) = P(B|A) * P(A) / P(B). Think of disease testing.
Bayes Theorem: P(A|B) = P(B|A) * P(A) / P(B). Prior P(A) = initial belief before seeing evidence. Likelihood P(B|A) = probability of evidence given the hypothesis. Posterior P(A|B) = updated belief after seeing evidence. Example: P(disease) = 0.01 (prior). P(positive test | disease) = 0.95 (likelihood). P(positive test) = 0.06. P(disease | positive test) = 0.95 * 0.01 / 0.06 = 0.158 (posterior). Even with a positive test, there is only 15.8% chance of disease because the disease is rare.
Question 8
Medium
Implement one step of gradient descent for the function f(x) = x^2. Starting at x = 5 with learning rate 0.1, what is the new x?
Derivative of x^2 is 2x. New x = old_x - lr * gradient.
x = 5.0
learning_rate = 0.1
gradient = 2 * x # derivative of x^2
x_new = x - learning_rate * gradient
print(f"Old x: {x}")
print(f"Gradient at x=5: {gradient}")
print(f"New x: {x_new}")
print(f"f(old x) = {x**2}, f(new x) = {x_new**2}")
Output: New x: 4.0, f(old x) = 25, f(new x) = 16Question 9
Medium
What is the output?
import numpy as np
arr = np.array([2, 4, 4, 4, 5, 5, 7, 9])
print(np.mean(arr))
print(np.var(arr))Mean = sum/count. Variance = average of squared deviations from mean.
5.04.0Question 10
Hard
Write code to compute the correlation between two arrays using the formula: corr = cov(X,Y) / (std(X) * std(Y)). Verify with np.corrcoef().
Compute covariance manually, then divide by product of standard deviations.
import numpy as np
X = np.array([1, 2, 3, 4, 5])
Y = np.array([2, 4, 5, 4, 5])
# Manual calculation
mean_x, mean_y = np.mean(X), np.mean(Y)
cov_xy = np.mean((X - mean_x) * (Y - mean_y))
std_x, std_y = np.std(X), np.std(Y)
corr_manual = cov_xy / (std_x * std_y)
# NumPy verification
corr_numpy = np.corrcoef(X, Y)[0, 1]
print(f"Manual correlation: {corr_manual:.4f}")
print(f"NumPy correlation: {corr_numpy:.4f}")Question 11
Hard
What are eigenvalues and eigenvectors? Why are they important in ML (specifically PCA)?
Eigenvectors are directions that do not change when a transformation is applied.
For a matrix A, an eigenvector v satisfies A @ v = lambda * v, where lambda is the eigenvalue. The eigenvector's direction is unchanged by the transformation -- only its magnitude changes by factor lambda. In PCA, the eigenvectors of the covariance matrix represent the principal directions of maximum variance in the data. The eigenvalues indicate how much variance is in each direction. PCA keeps the top-k eigenvectors (those with largest eigenvalues) to reduce dimensionality while preserving the most information.
Question 12
Hard
Implement gradient descent to find the minimum of f(x) = (x - 5)^2 + 3. Start at x = 0, learning rate = 0.2, run for 20 steps.
Derivative: f'(x) = 2(x - 5). Minimum is at x = 5, f(5) = 3.
x = 0.0
lr = 0.2
for i in range(20):
grad = 2 * (x - 5)
x = x - lr * grad
print(f"Final x: {x:.6f} (expected: 5.0)")
print(f"Final f(x): {(x-5)**2 + 3:.6f} (expected: 3.0)")Question 13
Hard
What is the output?
import numpy as np
A = np.array([[1, 0], [0, 1]])
v = np.array([3, 7])
print(A @ v)A is the identity matrix.
[3 7]Question 14
Easy
What is the difference between variance and standard deviation? Which one is in the same units as the original data?
One is the square of the other.
Variance is the average of squared deviations from the mean. Standard deviation is the square root of variance. Standard deviation is in the same units as the original data (e.g., if data is in cm, std is in cm). Variance is in squared units (cm^2). This is why standard deviation is more commonly used and reported.
Question 15
Medium
Compute the inverse of matrix A = [[2, 1], [5, 3]] using NumPy and verify that A @ A_inv gives the identity matrix.
Use np.linalg.inv(A) and np.allclose() to verify.
import numpy as np
A = np.array([[2, 1], [5, 3]])
A_inv = np.linalg.inv(A)
print(f"A:\n{A}")
print(f"\nA inverse:\n{A_inv}")
print(f"\nA @ A_inv:\n{np.round(A @ A_inv)}")
print(f"Is identity? {np.allclose(A @ A_inv, np.eye(2))}")Question 16
Easy
Write Python code to calculate the probability of rolling a 6 on a fair die, and the probability of NOT rolling a 6.
P(6) = 1/6. P(not 6) = 1 - P(6).
p_six = 1 / 6
p_not_six = 1 - p_six
print(f"P(rolling 6): {p_six:.4f} ({p_six*100:.2f}%)")
print(f"P(not rolling 6): {p_not_six:.4f} ({p_not_six*100:.2f}%)")Question 17
Hard
Use Bayes theorem to calculate: If 2% of people have a disease, a test is 95% accurate for sick people and 90% accurate for healthy people, what is P(disease | positive test)?
P(disease)=0.02, P(positive|disease)=0.95, P(positive|healthy)=0.10.
p_disease = 0.02
p_healthy = 0.98
p_pos_disease = 0.95
p_pos_healthy = 0.10
p_pos = p_pos_disease * p_disease + p_pos_healthy * p_healthy
p_disease_pos = (p_pos_disease * p_disease) / p_pos
print(f"P(disease | positive test): {p_disease_pos:.4f}")
print(f"That's only {p_disease_pos*100:.1f}%!")Question 18
Medium
What is the output?
import numpy as np
a = np.array([1, 0, 0])
b = np.array([0, 1, 0])
print(np.dot(a, b))These vectors are perpendicular (orthogonal).
0Question 19
Hard
Explain the chain rule in calculus and why it is essential for training neural networks (backpropagation).
If y = f(g(x)), then dy/dx = f'(g(x)) * g'(x). Think about layers in a neural network.
The chain rule states that if y = f(g(x)), then dy/dx = f'(g(x)) * g'(x). In neural networks, the output is a composition of many functions (layers): output = f3(f2(f1(x))). To compute how a weight in layer 1 affects the final loss, we need the chain rule: dL/dw1 = dL/df3 * df3/df2 * df2/df1 * df1/dw1. This chaining of derivatives through layers is called backpropagation. Without the chain rule, we could not train deep networks.
Question 20
Medium
Write code to compute and display the correlation matrix for three variables: hours_studied, attendance, and marks.
Use np.corrcoef() with the three arrays stacked together.
import numpy as np
hours = np.array([2, 4, 6, 8, 3, 7, 5])
attendance = np.array([60, 70, 85, 95, 65, 90, 80])
marks = np.array([50, 65, 80, 92, 55, 85, 72])
corr = np.corrcoef([hours, attendance, marks])
labels = ['Hours', 'Attend', 'Marks']
print('Correlation Matrix:')
print(f"{'':>10}", end='')
for l in labels: print(f"{l:>10}", end='')
print()
for i, l in enumerate(labels):
print(f"{l:>10}", end='')
for j in range(3):
print(f"{corr[i][j]:>10.4f}", end='')
print()Mixed & Application Questions
Question 1
Easy
Create two NumPy vectors a = [3, 4] and b = [1, 2]. Compute and print: a + b, a - b, a * 2, and the dot product.
Use +, -, *, np.dot() for the operations.
import numpy as np
a = np.array([3, 4])
b = np.array([1, 2])
print(f"a + b = {a + b}")
print(f"a - b = {a - b}")
print(f"a * 2 = {a * 2}")
print(f"dot product = {np.dot(a, b)}")Question 2
Easy
What is the output?
import numpy as np
print(np.mean([10, 20, 30, 40, 50]))Mean = sum / count.
30.0Question 3
Medium
Deepak's dataset has two features: temperature (0-50 Celsius) and income (10000-500000 rupees). Why might this cause problems for KNN, and what should he do?
KNN uses distance. Which feature will dominate the distance calculation?
Income (range: 490000) will completely dominate the distance calculation over temperature (range: 50) in KNN, because KNN uses Euclidean distance. A 1-degree temperature difference would be negligible compared to a 1000-rupee income difference. Deepak should apply feature scaling: either StandardScaler (z-score normalization) or MinMaxScaler (scale to 0-1) to bring both features to the same scale.
Question 4
Medium
Write NumPy code to compute the magnitude (length) of the vector [3, 4]. Verify that it equals 5 (Pythagorean theorem).
Magnitude = sqrt(x^2 + y^2). Use np.linalg.norm().
import numpy as np
v = np.array([3, 4])
mag = np.linalg.norm(v)
print(f"Vector: {v}")
print(f"Magnitude: {mag}")
print(f"Manual: sqrt(3^2 + 4^2) = sqrt(9+16) = sqrt(25) = {np.sqrt(9+16)}")
Output: Magnitude: 5.0Question 5
Hard
Implement gradient descent to minimize f(w1, w2) = w1^2 + w2^2. Start at (5, 5), learning rate 0.1, 30 steps. Print every 5th step.
Gradients: df/dw1 = 2*w1, df/dw2 = 2*w2. Update both simultaneously.
import numpy as np
w = np.array([5.0, 5.0])
lr = 0.1
for i in range(30):
grad = 2 * w
w = w - lr * grad
if i % 5 == 0:
print(f"Step {i:2d}: w = [{w[0]:.4f}, {w[1]:.4f}], f(w) = {np.sum(w**2):.6f}")
print(f"\nFinal: w = [{w[0]:.6f}, {w[1]:.6f}]")
print(f"Expected: [0, 0]")Question 6
Hard
What is the output?
import numpy as np
A = np.array([[2, 0], [0, 3]])
v = np.array([1, 1])
print(A @ v)A is a diagonal matrix. It scales each component independently.
[2 3]Question 7
Easy
What is the difference between correlation and covariance?
One is bounded [-1, 1], the other is unbounded.
Covariance measures how two variables change together but is unbounded (its value depends on the scale of the data). Correlation is normalized covariance, bounded between -1 and +1, making it easier to interpret. Correlation = Covariance / (std_X * std_Y). A correlation of 0.9 always means a strong positive relationship, regardless of the data's scale.
Question 8
Hard
Write a Python function that implements the normal equation for linear regression: w = (X^T @ X)^(-1) @ X^T @ y. Test it with simple data.
Use np.linalg.inv() for the inverse, @ for matrix multiplication.
import numpy as np
def normal_equation(X, y):
return np.linalg.inv(X.T @ X) @ X.T @ y
# Simple data: y = 2*x + 1
X = np.array([[1, 1], [1, 2], [1, 3], [1, 4], [1, 5]]) # Bias column + feature
y = np.array([3, 5, 7, 9, 11]) # y = 2x + 1
w = normal_equation(X, y)
print(f"Weights: bias={w[0]:.2f}, slope={w[1]:.2f}")
print(f"Equation: y = {w[1]:.2f}x + {w[0]:.2f}")
Output: Weights: bias=1.00, slope=2.00Question 9
Medium
What is the 68-95-99.7 rule for the normal distribution?
It describes what percentage of data falls within 1, 2, and 3 standard deviations of the mean.
For a normal distribution: approximately 68% of data falls within 1 standard deviation of the mean, 95% within 2 standard deviations, and 99.7% within 3 standard deviations. If marks have mean=70 and std=10, then 68% of students score 60-80, 95% score 50-90, and 99.7% score 40-100.
Question 10
Medium
Write code to simulate flipping a coin 10000 times and verify that the probability of heads approaches 0.5.
Use np.random.choice(['H', 'T'], size=10000).
import numpy as np
np.random.seed(42)
flips = np.random.choice(['H', 'T'], size=10000)
heads = np.sum(flips == 'H')
p_heads = heads / len(flips)
print(f"Total flips: {len(flips)}")
print(f"Heads: {heads}")
print(f"P(Heads): {p_heads:.4f} (expected: 0.5000)")Multiple Choice Questions
MCQ 1
What is the dot product of [1, 2, 3] and [4, 5, 6]?
Answer: B
B is correct. Dot product = (1*4) + (2*5) + (3*6) = 4 + 10 + 18 = 32. Option A is element-wise multiplication (not summed). The dot product always returns a single scalar.
B is correct. Dot product = (1*4) + (2*5) + (3*6) = 4 + 10 + 18 = 32. Option A is element-wise multiplication (not summed). The dot product always returns a single scalar.
MCQ 2
What is the transpose of a 3x2 matrix?
Answer: B
B is correct. Transpose swaps rows and columns. A (3 rows x 2 columns) matrix becomes (2 rows x 3 columns). In general, the transpose of an (m x n) matrix is (n x m).
B is correct. Transpose swaps rows and columns. A (3 rows x 2 columns) matrix becomes (2 rows x 3 columns). In general, the transpose of an (m x n) matrix is (n x m).
MCQ 3
What is the derivative of f(x) = x^2?
Answer: B
B is correct. Using the power rule: the derivative of x^n is n * x^(n-1). So the derivative of x^2 is 2 * x^1 = 2x. At x=3, the derivative is 6, meaning the function is increasing at a rate of 6.
B is correct. Using the power rule: the derivative of x^n is n * x^(n-1). So the derivative of x^2 is 2 * x^1 = 2x. At x=3, the derivative is 6, meaning the function is increasing at a rate of 6.
MCQ 4
In gradient descent, the update rule is w = w - lr * gradient. What happens if the learning rate is too large?
Answer: C
C is correct. A learning rate that is too large causes gradient descent to take steps that are too big, overshooting the minimum and bouncing back and forth (or diverging to infinity). The loss increases instead of decreasing. Typical good learning rates are 0.01 or 0.001.
C is correct. A learning rate that is too large causes gradient descent to take steps that are too big, overshooting the minimum and bouncing back and forth (or diverging to infinity). The loss increases instead of decreasing. Typical good learning rates are 0.01 or 0.001.
MCQ 5
What is the shape of the result when multiplying a (3, 4) matrix with a (4, 2) matrix?
Answer: A
A is correct. For matrix multiplication, (m x n) @ (n x p) = (m x p). The inner dimensions (n=4) must match. The result takes the outer dimensions: (3 x 4) @ (4 x 2) = (3 x 2).
A is correct. For matrix multiplication, (m x n) @ (n x p) = (m x p). The inner dimensions (n=4) must match. The result takes the outer dimensions: (3 x 4) @ (4 x 2) = (3 x 2).
MCQ 6
Bayes theorem states P(A|B) = P(B|A) * P(A) / P(B). What is P(A) called?
Answer: C
C is correct. P(A) is the prior -- our initial belief before seeing evidence. P(B|A) is the likelihood -- probability of evidence given our hypothesis. P(A|B) is the posterior -- updated belief after seeing evidence. P(B) is the evidence.
C is correct. P(A) is the prior -- our initial belief before seeing evidence. P(B|A) is the likelihood -- probability of evidence given our hypothesis. P(A|B) is the posterior -- updated belief after seeing evidence. P(B) is the evidence.
MCQ 7
If a dataset has mean = 100 and standard deviation = 15, what Z-score does a value of 130 have?
Answer: B
B is correct. Z-score = (value - mean) / std = (130 - 100) / 15 = 30 / 15 = 2.0. This means 130 is exactly 2 standard deviations above the mean. According to the 68-95-99.7 rule, about 97.5% of values are below this point.
B is correct. Z-score = (value - mean) / std = (130 - 100) / 15 = 30 / 15 = 2.0. This means 130 is exactly 2 standard deviations above the mean. According to the 68-95-99.7 rule, about 97.5% of values are below this point.
MCQ 8
What does a correlation of -0.95 between two variables indicate?
Answer: C
C is correct. A correlation of -0.95 indicates a very strong negative linear relationship: as one variable increases, the other strongly decreases. The magnitude (0.95) shows the relationship is very strong. Only a correlation near 0 indicates no linear relationship.
C is correct. A correlation of -0.95 indicates a very strong negative linear relationship: as one variable increases, the other strongly decreases. The magnitude (0.95) shows the relationship is very strong. Only a correlation near 0 indicates no linear relationship.
MCQ 9
Which measure of central tendency is most robust to outliers?
Answer: B
B is correct. The median (middle value when sorted) is robust to outliers because it only depends on the position, not the magnitude of extreme values. The mean is heavily influenced by outliers. Variance is a measure of spread, not central tendency.
B is correct. The median (middle value when sorted) is robust to outliers because it only depends on the position, not the magnitude of extreme values. The mean is heavily influenced by outliers. Variance is a measure of spread, not central tendency.
MCQ 10
In PCA, eigenvalues of the covariance matrix represent:
Answer: B
B is correct. Eigenvalues represent the amount of variance in the direction of their corresponding eigenvectors. Larger eigenvalue = more variance explained. Eigenvectors (not eigenvalues) give the direction. PCA keeps components with the largest eigenvalues.
B is correct. Eigenvalues represent the amount of variance in the direction of their corresponding eigenvectors. Larger eigenvalue = more variance explained. Eigenvectors (not eigenvalues) give the direction. PCA keeps components with the largest eigenvalues.
MCQ 11
What is the partial derivative of f(x, y) = x^2 + 3xy with respect to x?
Answer: A
A is correct. To find the partial derivative with respect to x, treat y as a constant. d/dx(x^2) = 2x. d/dx(3xy) = 3y (y is constant). Total: 2x + 3y.
A is correct. To find the partial derivative with respect to x, treat y as a constant. d/dx(x^2) = 2x. d/dx(3xy) = 3y (y is constant). Total: 2x + 3y.
MCQ 12
For matrix multiplication A @ B to be valid, which dimensions must match?
Answer: B
B is correct. For (m x n) @ (n x p), the inner dimensions (number of columns of A = number of rows of B) must match. The result has shape (m x p). The matrices do NOT need to be square or have the same shape.
B is correct. For (m x n) @ (n x p), the inner dimensions (number of columns of A = number of rows of B) must match. The result has shape (m x p). The matrices do NOT need to be square or have the same shape.
MCQ 13
The gradient of f(w1, w2) = w1^2 + w2^2 at point (3, 4) is:
Answer: B
B is correct. The gradient is [df/dw1, df/dw2] = [2*w1, 2*w2]. At (3, 4): gradient = [2*3, 2*4] = [6, 8]. This gradient points in the direction of steepest increase. Gradient descent would move in the opposite direction: [-6, -8] (scaled by learning rate).
B is correct. The gradient is [df/dw1, df/dw2] = [2*w1, 2*w2]. At (3, 4): gradient = [2*3, 2*4] = [6, 8]. This gradient points in the direction of steepest increase. Gradient descent would move in the opposite direction: [-6, -8] (scaled by learning rate).
MCQ 14
P(A) + P(not A) always equals:
Answer: C
C is correct. An event either happens or it does not. P(A) + P(not A) = 1 is one of the fundamental axioms of probability. If P(rain) = 0.3, then P(no rain) = 0.7, and 0.3 + 0.7 = 1.
C is correct. An event either happens or it does not. P(A) + P(not A) = 1 is one of the fundamental axioms of probability. If P(rain) = 0.3, then P(no rain) = 0.7, and 0.3 + 0.7 = 1.
MCQ 15
Which NumPy function computes the inverse of a matrix?
Answer: B
B is correct. Matrix operations in NumPy are in the
B is correct. Matrix operations in NumPy are in the
np.linalg module. np.linalg.inv(A) computes the inverse. Other useful functions: np.linalg.det(A) for determinant, np.linalg.eig(A) for eigenvalues/eigenvectors, np.linalg.norm(v) for vector magnitude.MCQ 16
Why is the chain rule important in deep learning?
Answer: B
B is correct. Neural networks are compositions of functions (layers). To train them, we need to know how each weight affects the final loss. The chain rule allows us to compute these gradients by multiplying derivatives through the chain of layers -- this process is called backpropagation.
B is correct. Neural networks are compositions of functions (layers). To train them, we need to know how each weight affects the final loss. The chain rule allows us to compute these gradients by multiplying derivatives through the chain of layers -- this process is called backpropagation.
Coding Challenges
Challenge 1: Linear Algebra Operations Suite
EasyCreate two 3D vectors a = [2, 3, 5] and b = [1, 4, 6]. Compute: (1) their dot product, (2) element-wise product, (3) magnitude of each, (4) cosine similarity. Print each result with a label.
Sample Input
a = [2, 3, 5], b = [1, 4, 6]
Sample Output
Dot product: 44
Element-wise: [2 12 30]
Mag a: 6.16, Mag b: 7.28
Cosine similarity: 0.9806
Use NumPy for all calculations.
import numpy as np
a = np.array([2, 3, 5])
b = np.array([1, 4, 6])
print(f"Dot product: {np.dot(a, b)}")
print(f"Element-wise: {a * b}")
print(f"Mag a: {np.linalg.norm(a):.2f}, Mag b: {np.linalg.norm(b):.2f}")
cosine = np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
print(f"Cosine similarity: {cosine:.4f}")Challenge 2: Gradient Descent Visualizer
MediumImplement gradient descent to minimize f(x) = (x - 7)^2 + 2. Start at x = 0, use learning rate 0.15, run 25 steps. Print the step number, current x, f(x), and gradient every 5 steps. Verify convergence to x = 7.
Sample Input
x_init = 0, lr = 0.15, steps = 25
Sample Output
Step 0: x=0.0000, f(x)=51.0000, grad=-14.0000
Step 5: x=5.4143, f(x)=4.5126, grad=-3.1715
...
Final: x=6.9757, f(x)=2.0006
Print every 5th step. Round to 4 decimal places.
x = 0.0
lr = 0.15
for i in range(25):
fx = (x - 7)**2 + 2
grad = 2 * (x - 7)
if i % 5 == 0:
print(f"Step {i:2d}: x={x:.4f}, f(x)={fx:.4f}, grad={grad:.4f}")
x = x - lr * grad
print(f"\nFinal: x={x:.4f}, f(x)={(x-7)**2 + 2:.4f}")
print(f"Expected: x=7.0000, f(x)=2.0000")Challenge 3: Bayes Theorem Calculator
MediumWrite a function bayes(p_a, p_b_given_a, p_b_given_not_a) that computes P(A|B) using Bayes theorem. Test with: (1) Disease testing (P(disease)=0.01, P(pos|disease)=0.99, P(pos|healthy)=0.05), (2) Spam detection (P(spam)=0.4, P(word|spam)=0.7, P(word|not_spam)=0.1).
Sample Input
bayes(0.01, 0.99, 0.05)
Sample Output
P(disease | positive test) = 0.1667
P(spam | contains word) = 0.8235
Function must handle any valid probabilities.
def bayes(p_a, p_b_given_a, p_b_given_not_a):
p_not_a = 1 - p_a
p_b = p_b_given_a * p_a + p_b_given_not_a * p_not_a
return (p_b_given_a * p_a) / p_b
# Test 1: Disease testing
result1 = bayes(0.01, 0.99, 0.05)
print(f"P(disease | positive test) = {result1:.4f}")
# Test 2: Spam detection
result2 = bayes(0.4, 0.7, 0.1)
print(f"P(spam | contains word) = {result2:.4f}")Challenge 4: Statistics Dashboard
MediumGiven marks of 15 students: [45, 67, 89, 92, 34, 78, 56, 91, 73, 82, 65, 88, 54, 71, 96], compute and display: mean, median, mode (use scipy), variance, std deviation, range, Q1, Q3, IQR. Also identify outliers using the IQR method (below Q1-1.5*IQR or above Q3+1.5*IQR).
Sample Input
marks = [45, 67, 89, 92, 34, 78, 56, 91, 73, 82, 65, 88, 54, 71, 96]
Sample Output
Complete statistics dashboard with outlier detection
Use NumPy. Show clear formatting.
import numpy as np
marks = np.array([45, 67, 89, 92, 34, 78, 56, 91, 73, 82, 65, 88, 54, 71, 96])
print('=== Statistics Dashboard ===')
print(f'Mean: {np.mean(marks):.2f}')
print(f'Median: {np.median(marks):.2f}')
print(f'Variance: {np.var(marks):.2f}')
print(f'Std Dev: {np.std(marks):.2f}')
print(f'Range: {np.ptp(marks)}')
Q1 = np.percentile(marks, 25)
Q3 = np.percentile(marks, 75)
IQR = Q3 - Q1
print(f'Q1: {Q1}, Q3: {Q3}, IQR: {IQR}')
lower = Q1 - 1.5 * IQR
upper = Q3 + 1.5 * IQR
outliers = marks[(marks < lower) | (marks > upper)]
print(f'Outlier bounds: [{lower:.1f}, {upper:.1f}]')
print(f'Outliers: {outliers if len(outliers) > 0 else "None"}')Challenge 5: Normal Equation for Linear Regression
HardImplement the normal equation w = (X^T X)^(-1) X^T y to solve linear regression. Create data for y = 3x + 7 + noise with 50 points. Add a bias column to X. Compute weights and print the learned equation. Compare with sklearn's LinearRegression.
Sample Input
50 data points from y = 3x + 7 + noise
Sample Output
Normal equation: y = 2.98x + 7.12
sklearn: y = 2.98x + 7.12
Use np.linalg.inv() for the normal equation. Use random_state=42.
import numpy as np
from sklearn.linear_model import LinearRegression
np.random.seed(42)
x = np.random.uniform(0, 10, 50)
y = 3 * x + 7 + np.random.normal(0, 2, 50)
# Normal equation
X = np.column_stack([np.ones(50), x]) # Add bias column
w = np.linalg.inv(X.T @ X) @ X.T @ y
print(f'Normal equation: y = {w[1]:.2f}x + {w[0]:.2f}')
# sklearn comparison
model = LinearRegression()
model.fit(x.reshape(-1, 1), y)
print(f'sklearn: y = {model.coef_[0]:.2f}x + {model.intercept_:.2f}')
print(f'Match: {np.allclose(w[1], model.coef_[0]) and np.allclose(w[0], model.intercept_)}')Challenge 6: Multivariate Gradient Descent
HardImplement gradient descent for linear regression with 2 features. Use synthetic data: y = 2*x1 + 3*x2 + 5 + noise. Start weights at [0, 0, 0] (bias, w1, w2). Use MSE as loss function. Run 1000 iterations with lr=0.01. Print the learned weights and compare with the true values.
Sample Input
100 data points, y = 2*x1 + 3*x2 + 5 + noise
Sample Output
Learned: y = 2.01*x1 + 3.02*x2 + 4.98
True: y = 2*x1 + 3*x2 + 5
Normalize features before training. Print loss every 200 steps.
import numpy as np
np.random.seed(42)
n = 100
x1 = np.random.uniform(0, 10, n)
x2 = np.random.uniform(0, 10, n)
y = 2 * x1 + 3 * x2 + 5 + np.random.normal(0, 1, n)
# Normalize features
x1_norm = (x1 - x1.mean()) / x1.std()
x2_norm = (x2 - x2.mean()) / x2.std()
X = np.column_stack([np.ones(n), x1_norm, x2_norm])
w = np.zeros(3)
lr = 0.01
for i in range(1000):
y_pred = X @ w
error = y_pred - y
loss = np.mean(error ** 2)
gradient = (2 / n) * X.T @ error
w = w - lr * gradient
if i % 200 == 0:
print(f'Step {i}: Loss = {loss:.4f}')
print(f'\nLearned weights (normalized): bias={w[0]:.2f}, w1={w[1]:.2f}, w2={w[2]:.2f}')
print(f'True equation: y = 2*x1 + 3*x2 + 5')Need to Review the Concepts?
Go back to the detailed notes for this chapter.
Read Chapter NotesWant to learn AI and ML with a live mentor?
Explore our AI/ML Masterclass