Chapter 5 Beginner 58 Questions

Practice Questions — Linear Regression - Your First ML Algorithm

← Back to Notes
10 Easy
11 Medium
7 Hard

Topic-Specific Questions

Question 1
Easy
Write Python code to train a simple linear regression model using scikit-learn on the data: X = [1, 2, 3, 4, 5], y = [3, 5, 7, 9, 11]. Print the slope and intercept.
Use LinearRegression().fit(). Remember to reshape X.
from sklearn.linear_model import LinearRegression
import numpy as np
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([3, 5, 7, 9, 11])
model = LinearRegression()
model.fit(X, y)
print(f'Slope: {model.coef_[0]:.2f}')
print(f'Intercept: {model.intercept_:.2f}')
print(f'Equation: y = {model.coef_[0]:.2f}x + {model.intercept_:.2f}')
Output: Slope: 2.00, Intercept: 1.00
Question 2
Easy
What does R-squared = 0.85 mean in simple terms?
Think about how much of the variation in y is explained by the model.
R-squared = 0.85 means the model explains 85% of the variance (variation) in the target variable. The remaining 15% is unexplained variance (due to noise, missing features, or non-linear relationships). Higher R-squared is better, with 1.0 being a perfect fit.
Question 3
Easy
A linear regression model has equation y = 3x + 5. What is the predicted value for x = 10?
Substitute x = 10 into y = 3x + 5.
y = 3(10) + 5 = 35
Question 4
Easy
Write code to predict the value for x=6 using a trained linear regression model.
After model.fit(), use model.predict([[6]]).
from sklearn.linear_model import LinearRegression
import numpy as np
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([10, 20, 30, 40, 50])
model = LinearRegression()
model.fit(X, y)
prediction = model.predict([[6]])
print(f'Prediction for x=6: {prediction[0]:.2f}')
Output: Prediction for x=6: 60.00
Question 5
Medium
What is the cost function used in linear regression and why do we square the errors?
MSE = mean of (actual - predicted)^2. Think about positive vs negative errors.
The cost function is Mean Squared Error (MSE): MSE = (1/n) * sum((y_actual - y_predicted)^2). We square errors for two reasons: (1) It makes all errors positive (overpredicting by 5 and underpredicting by 5 are equally bad). (2) It penalizes large errors more than small ones (an error of 10 contributes 100, while an error of 2 contributes only 4). This encourages the model to avoid large mistakes.
Question 6
Medium
Write code to calculate MAE, MSE, RMSE, and R-squared for actual values [3, 5, 7, 9] and predicted values [2.8, 5.2, 6.5, 9.1].
Use sklearn.metrics functions or compute manually with numpy.
import numpy as np
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
y_actual = np.array([3, 5, 7, 9])
y_pred = np.array([2.8, 5.2, 6.5, 9.1])
print(f'MAE: {mean_absolute_error(y_actual, y_pred):.4f}')
print(f'MSE: {mean_squared_error(y_actual, y_pred):.4f}')
print(f'RMSE: {np.sqrt(mean_squared_error(y_actual, y_pred)):.4f}')
print(f'R^2: {r2_score(y_actual, y_pred):.4f}')
Question 7
Medium
Implement one step of gradient descent for the model y = wx + b. Given w=0, b=0, X=[1, 2, 3], y=[2, 4, 6], learning_rate=0.1, compute the new w and b after one update.
Compute predictions, errors, gradients, then update.
import numpy as np
X = np.array([1, 2, 3])
y = np.array([2, 4, 6])
w, b, lr = 0.0, 0.0, 0.1

y_pred = w * X + b
error = y_pred - y
dw = (2/3) * np.sum(X * error)
db = (2/3) * np.sum(error)
w_new = w - lr * dw
b_new = b - lr * db
print(f'Old: w={w}, b={b}')
print(f'Gradients: dw={dw:.4f}, db={db:.4f}')
print(f'New: w={w_new:.4f}, b={b_new:.4f}')
Question 8
Medium
What is the difference between simple and multiple linear regression?
Think about the number of input features.
Simple linear regression has one input feature: y = mx + b (one slope). Multiple linear regression has two or more input features: y = w1*x1 + w2*x2 + ... + b (one weight per feature). Multiple regression captures the combined effect of several features on the target. For example, house price depends on area AND bedrooms AND location, not just one feature.
Question 9
Hard
Implement gradient descent for linear regression from scratch. Train on data y = 2x + 3 + noise for 100 iterations and print the learned equation.
Initialize w=0, b=0. Compute gradients of MSE. Update w and b each iteration.
import numpy as np
np.random.seed(42)
X = np.random.uniform(0, 10, 50)
y = 2 * X + 3 + np.random.normal(0, 1, 50)
X_norm = (X - X.mean()) / X.std()
w, b, lr = 0.0, 0.0, 0.1
for i in range(100):
    y_pred = w * X_norm + b
    dw = (2/50) * np.sum(X_norm * (y_pred - y))
    db = (2/50) * np.sum(y_pred - y)
    w -= lr * dw
    b -= lr * db
mse = np.mean((y - (w * X_norm + b))**2)
print(f'Learned w={w:.4f}, b={b:.4f}, MSE={mse:.4f}')
Question 10
Hard
What is the normal equation and when would you use gradient descent instead?
Normal equation: w = (X^T X)^(-1) X^T y. Think about computational cost.
The normal equation w = (X^T X)^(-1) X^T y computes the optimal weights directly in one step. Use it when: (1) the number of features is small to moderate (< 10,000), (2) the data fits in memory. Use gradient descent when: (1) the dataset is very large, (2) there are many features (matrix inversion is O(n^3) in features), (3) the problem is not purely linear (gradient descent is more general).
Question 11
Medium
Train a multiple linear regression model with 3 features and print the coefficient for each feature to understand which one matters most.
Use model.coef_ to get weights for each feature.
import numpy as np
from sklearn.linear_model import LinearRegression
np.random.seed(42)
X = np.random.randn(100, 3)
y = 5*X[:, 0] + 2*X[:, 1] + 0.1*X[:, 2] + np.random.normal(0, 0.5, 100)
model = LinearRegression()
model.fit(X, y)
features = ['Feature 1', 'Feature 2', 'Feature 3']
for feat, coef in zip(features, model.coef_):
    print(f'{feat}: coefficient = {coef:.4f}')
print(f'Intercept: {model.intercept_:.4f}')
Question 12
Easy
What is MSE if actual values are [10, 20, 30] and predictions are [12, 18, 33]?
MSE = mean of squared differences: ((10-12)^2 + (20-18)^2 + (30-33)^2) / 3.
MSE = ((-2)^2 + (2)^2 + (-3)^2) / 3 = (4 + 4 + 9) / 3 = 17/3 = 5.667
Question 13
Hard
What are the assumptions of linear regression? What happens when they are violated?
Think about linearity, independence, constant variance, normal errors, no multicollinearity.
(1) Linearity: Relationship must be linear. Violation: curved patterns in residuals, poor predictions. Fix: polynomial features or non-linear model. (2) Independence: Observations should be independent. Violation: autocorrelation in time series. (3) Homoscedasticity: Error variance should be constant. Violation: fan-shaped residual plot. Fix: log-transform target. (4) Normal residuals: Errors should be approximately normal. Mild violation is OK for predictions, affects confidence intervals. (5) No multicollinearity: Features should not be highly correlated. Violation: unstable coefficients. Fix: drop one of correlated features or use Ridge regression.
Question 14
Easy
Write code to make predictions for x = [1, 5, 10, 15, 20] using a trained model and display them in a nice format.
Train a model first, then predict an array of values.
from sklearn.linear_model import LinearRegression
import numpy as np
X_train = np.array([2, 4, 6, 8]).reshape(-1, 1)
y_train = np.array([15, 25, 35, 45])
model = LinearRegression()
model.fit(X_train, y_train)
test_values = [1, 5, 10, 15, 20]
for x in test_values:
    pred = model.predict([[x]])[0]
    print(f'x = {x:2d} -> predicted y = {pred:.2f}')
Question 15
Hard
Compare linear regression and polynomial regression (degree 2 and 3) on non-linear data. Print R-squared for each.
Use PolynomialFeatures(degree=n) with LinearRegression.
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import r2_score
np.random.seed(42)
X = np.linspace(0, 5, 50).reshape(-1, 1)
y = X.squeeze()**2 + np.random.normal(0, 2, 50)
for deg in [1, 2, 3]:
    poly = PolynomialFeatures(degree=deg)
    X_poly = poly.fit_transform(X)
    model = LinearRegression()
    model.fit(X_poly, y)
    r2 = r2_score(y, model.predict(X_poly))
    print(f'Degree {deg}: R^2 = {r2:.4f}')
Question 16
Medium
Write code to train a linear regression model to predict salary from years of experience. Use the data: experience=[1,3,5,7,10,12,15], salary=[30,45,60,75,95,110,130] (in thousands). Print the model equation and predict salary for 8 years experience.
Train with LinearRegression, access coef_ and intercept_.
from sklearn.linear_model import LinearRegression
import numpy as np
exp = np.array([1, 3, 5, 7, 10, 12, 15]).reshape(-1, 1)
sal = np.array([30, 45, 60, 75, 95, 110, 130])
model = LinearRegression()
model.fit(exp, sal)
print(f'Equation: salary = {model.coef_[0]:.2f} * experience + {model.intercept_:.2f}')
pred = model.predict([[8]])[0]
print(f'Predicted salary for 8 years: {pred:.2f}K')
Question 17
Medium
What is the difference between MAE and RMSE? When would you prefer one over the other?
Think about sensitivity to large errors.
MAE (Mean Absolute Error) treats all errors equally: an error of 10 counts as 10. RMSE (Root Mean Squared Error) penalizes large errors more: an error of 10 counts as 100 (squared) before averaging and square-rooting. Use MAE when all errors are equally important and you want a straightforward interpretation. Use RMSE when large errors are particularly undesirable (e.g., predicting medical dosages where a large error is dangerous).
Question 18
Hard
Use the normal equation w = (X^T X)^(-1) X^T y to solve linear regression without sklearn. Compare with sklearn's solution.
Add a column of 1s to X for the bias term. Use np.linalg.inv().
import numpy as np
from sklearn.linear_model import LinearRegression
np.random.seed(42)
x = np.random.uniform(0, 10, 50)
y = 3 * x + 7 + np.random.normal(0, 2, 50)
X = np.column_stack([np.ones(50), x])
w_normal = np.linalg.inv(X.T @ X) @ X.T @ y
model = LinearRegression()
model.fit(x.reshape(-1, 1), y)
print(f'Normal equation: y = {w_normal[1]:.4f}x + {w_normal[0]:.4f}')
print(f'sklearn: y = {model.coef_[0]:.4f}x + {model.intercept_:.4f}')
Question 19
Medium
A linear regression model on house prices has coefficients: area=50, bedrooms=100000, intercept=500000. What is the predicted price for a house with area=1200 sqft and 3 bedrooms?
price = 50 * area + 100000 * bedrooms + 500000.
price = 50 * 1200 + 100000 * 3 + 500000 = 60000 + 300000 + 500000 = 860000
Question 20
Easy
What is the gradient in gradient descent? What direction does it point?
Think about the direction of steepest change.
The gradient is a vector of partial derivatives of the loss function with respect to each model parameter. It points in the direction of steepest increase of the loss function. In gradient descent, we move in the opposite direction (steepest decrease) to minimize the loss: w_new = w_old - learning_rate * gradient.

Mixed & Application Questions

Question 1
Easy
Write code to compute MSE manually for actual=[10, 20, 30] and predicted=[12, 18, 28] using NumPy.
MSE = np.mean((y_actual - y_predicted) ** 2).
import numpy as np
actual = np.array([10, 20, 30])
predicted = np.array([12, 18, 28])
mse = np.mean((actual - predicted) ** 2)
print(f'MSE: {mse:.4f}')
print(f'RMSE: {np.sqrt(mse):.4f}')
Output: MSE: 5.3333, RMSE: 2.3094
Question 2
Medium
Kavitha's model has R^2 = 0.95 on training data but R^2 = 0.60 on test data. What is happening and how should she fix it?
Big gap between train and test performance indicates a specific problem.
The model is overfitting: it memorized the training data (including noise) instead of learning the general pattern. Fixes: (1) Use more training data. (2) Reduce model complexity (fewer features, lower polynomial degree). (3) Use regularization (Ridge or Lasso regression). (4) Use cross-validation to tune hyperparameters.
Question 3
Medium
Generate synthetic data y = 5x1 + 2x2 + noise, train a model, and print which feature is more important based on coefficients.
The feature with the larger absolute coefficient is more important (if features are on the same scale).
import numpy as np
from sklearn.linear_model import LinearRegression
np.random.seed(42)
X = np.random.randn(100, 2)
y = 5*X[:, 0] + 2*X[:, 1] + np.random.normal(0, 0.5, 100)
model = LinearRegression()
model.fit(X, y)
for i, coef in enumerate(model.coef_):
    print(f'Feature {i+1}: coefficient = {coef:.4f}')
more_important = 'Feature 1' if abs(model.coef_[0]) > abs(model.coef_[1]) else 'Feature 2'
print(f'More important: {more_important}')
Question 4
Hard
Write code to plot the learning curve: MSE vs iteration number for gradient descent training. Show how MSE decreases over 200 iterations.
Store MSE at each iteration in a list, then describe the trend.
import numpy as np
np.random.seed(42)
X = np.random.uniform(0, 10, 50)
y = 2*X + 5 + np.random.normal(0, 1, 50)
X_norm = (X - X.mean()) / X.std()
w, b, lr = 0.0, 0.0, 0.1
losses = []
for i in range(200):
    y_pred = w*X_norm + b
    mse = np.mean((y - y_pred)**2)
    losses.append(mse)
    dw = (2/50)*np.sum(X_norm*(y_pred - y))
    db = (2/50)*np.sum(y_pred - y)
    w -= lr * dw
    b -= lr * db
print(f'MSE at step 0: {losses[0]:.2f}')
print(f'MSE at step 10: {losses[10]:.2f}')
print(f'MSE at step 50: {losses[50]:.2f}')
print(f'MSE at step 200: {losses[-1]:.2f}')
print(f'MSE converged: loss barely changes after ~50 iterations')
Question 5
Hard
Explain why polynomial regression with degree 20 on 30 data points is a bad idea, even if it gives R^2 = 0.99 on training data.
Think about the number of parameters vs the number of data points.
With degree 20, the model has 21 parameters (coefficients for x^0 through x^20). With only 30 data points, the model has almost as many parameters as data points. It will perfectly fit the training data (memorizing each point) but generalize terribly to new data. This is severe overfitting. The R^2 of 0.99 on training data is misleading -- the test R^2 would likely be much lower or even negative. Use cross-validation to choose the right polynomial degree.
Question 6
Easy
Write code to access and print the coefficient and intercept of a trained sklearn LinearRegression model.
model.coef_ for weights, model.intercept_ for bias.
from sklearn.linear_model import LinearRegression
import numpy as np
X = np.array([[1], [2], [3], [4]])
y = np.array([10, 20, 30, 40])
model = LinearRegression()
model.fit(X, y)
print(f'Coefficient (slope): {model.coef_[0]}')
print(f'Intercept: {model.intercept_}')
print(f'Equation: y = {model.coef_[0]}x + {model.intercept_}')
Question 7
Medium
Write code to evaluate a linear regression model on test data and print all four metrics: MAE, MSE, RMSE, R-squared.
Use sklearn.metrics functions on y_test and y_pred.
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
np.random.seed(42)
X = np.random.uniform(0, 10, (100, 1))
y = 3*X.squeeze() + 5 + np.random.normal(0, 2, 100)
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2, random_state=42)
model = LinearRegression().fit(X_tr, y_tr)
y_pred = model.predict(X_te)
print(f'MAE:  {mean_absolute_error(y_te, y_pred):.4f}')
print(f'MSE:  {mean_squared_error(y_te, y_pred):.4f}')
print(f'RMSE: {np.sqrt(mean_squared_error(y_te, y_pred)):.4f}')
print(f'R^2:  {r2_score(y_te, y_pred):.4f}')
Question 8
Easy
What does the intercept (b) represent in the equation y = mx + b?
What is y when x = 0?
The intercept b is the predicted value of y when all features are 0. In the equation y = 3*experience + 25000, the intercept 25000 represents the predicted salary for someone with 0 years of experience (the starting salary). In some contexts, the intercept may not have a meaningful interpretation (e.g., if x=0 is not realistic).

Multiple Choice Questions

MCQ 1
Linear regression is used to predict:
  • A. Categories (like cat or dog)
  • B. Continuous numerical values (like price or temperature)
  • C. True or False values
  • D. Text data
Answer: B
B is correct. Linear regression is a regression algorithm that predicts continuous numerical values. For categorical predictions, use classification algorithms like logistic regression or decision trees.
MCQ 2
What is the equation for simple linear regression?
  • A. y = ax^2 + bx + c
  • B. y = mx + b
  • C. y = e^x
  • D. y = log(x)
Answer: B
B is correct. Simple linear regression fits a straight line y = mx + b, where m is the slope and b is the y-intercept. Option A is quadratic (polynomial), C is exponential, and D is logarithmic.
MCQ 3
What does the slope (m) in y = mx + b represent?
  • A. The value of y when x is 0
  • B. How much y changes when x increases by 1
  • C. The average value of y
  • D. The total error
Answer: B
B is correct. The slope m represents the rate of change: for every 1 unit increase in x, y changes by m units. If m = 3, each unit of x adds 3 to y. Option A describes the intercept b.
MCQ 4
Which cost function does linear regression minimize?
  • A. Cross-Entropy
  • B. Mean Absolute Error
  • C. Mean Squared Error
  • D. Accuracy
Answer: C
C is correct. Linear regression minimizes MSE (Mean Squared Error). Cross-entropy is for classification. MAE could be used but MSE is standard because it is differentiable everywhere and has a unique minimum. Accuracy is a classification metric.
MCQ 5
What does R-squared = 0 mean?
  • A. The model is perfect
  • B. The model is no better than predicting the mean of y for every sample
  • C. All predictions are exactly wrong
  • D. There is no data to evaluate
Answer: B
B is correct. R^2 = 0 means the model explains none of the variance in y -- it performs equally to simply predicting the mean for every sample. R^2 = 1 means perfect. R^2 can be negative for very bad models (worse than predicting the mean).
MCQ 6
In gradient descent, what happens if the learning rate is too small?
  • A. The model diverges
  • B. The model converges very slowly
  • C. The model converges to a wrong answer
  • D. Nothing - small learning rate is always better
Answer: B
B is correct. A very small learning rate means tiny steps toward the minimum. The model will eventually converge to the correct answer but may take thousands or millions of iterations. Too large a learning rate causes divergence (option A).
MCQ 7
What is the purpose of the .reshape(-1, 1) operation before fitting sklearn models?
  • A. To normalize the data
  • B. To convert a 1D array to a 2D array (samples x features)
  • C. To remove missing values
  • D. To shuffle the data
Answer: B
B is correct. sklearn requires feature data (X) to be 2D: (n_samples, n_features). A 1D array has shape (n,) which sklearn cannot interpret. reshape(-1, 1) converts it to (n, 1), meaning n samples with 1 feature.
MCQ 8
What is multicollinearity and why is it a problem for linear regression?
  • A. When features are independent of each other
  • B. When features are highly correlated with each other, making coefficients unstable
  • C. When the target variable is non-numeric
  • D. When there are too many data points
Answer: B
B is correct. Multicollinearity occurs when features are highly correlated (e.g., area_sqft and area_sqm). This makes individual coefficients unreliable (they can flip signs or have huge values) because the model cannot determine each feature's individual contribution. Fix: remove one of the correlated features or use Ridge regression.
MCQ 9
A model has MAE = 5 and RMSE = 10. What does the large gap between them suggest?
  • A. The model is well-calibrated
  • B. There are some large outlier errors
  • C. The model is underfitting
  • D. The data has no outliers
Answer: B
B is correct. RMSE penalizes large errors more (squaring). If RMSE >> MAE, some predictions have very large errors that inflate RMSE but barely affect MAE. For example, most errors might be 3-5, but a few are 20-30. Those outlier errors contribute 400-900 to MSE (before averaging) but only 20-30 to MAE.
MCQ 10
What is the normal equation in linear regression?
  • A. y = mx + b
  • B. w = (X^T X)^(-1) X^T y
  • C. MSE = mean((y - y_pred)^2)
  • D. w = w - lr * gradient
Answer: B
B is correct. The normal equation w = (X^T X)^(-1) X^T y directly computes the optimal weights without iteration. It gives the exact solution to the least squares problem. Option D is the gradient descent update rule (iterative). Option C is the MSE formula.
MCQ 11
What does model.predict(X) return in sklearn?
  • A. The model's accuracy
  • B. The predicted values for input X
  • C. The model's coefficients
  • D. The training loss
Answer: B
B is correct. predict(X) takes feature data and returns the model's predictions. For linear regression, it computes X @ coef_ + intercept_ for each sample.
MCQ 12
PolynomialFeatures(degree=3) transforms the feature x into:
  • A. [x, x^2, x^3]
  • B. [1, x, x^2, x^3]
  • C. [x^3]
  • D. [x, 2x, 3x]
Answer: B
B is correct. PolynomialFeatures(degree=3) creates [1, x, x^2, x^3] (including the bias term 1 and all powers up to 3). This allows linear regression to fit cubic curves while remaining linear in its parameters.
MCQ 13
Which of the following is NOT an assumption of linear regression?
  • A. The relationship between features and target is linear
  • B. Residuals are normally distributed
  • C. All features must be on the same scale
  • D. Features should not be highly correlated with each other
Answer: C
C is correct (it is NOT an assumption). Linear regression does not require features to be on the same scale -- the normal equation and sklearn handle different scales fine. However, scaling IS needed if you use gradient descent. The actual assumptions are: linearity (A), normal residuals (B), and no multicollinearity (D).
MCQ 14
If the coefficient for 'bedrooms' in a house price model is 150000, what does it mean?
  • A. Each bedroom costs 150000 to build
  • B. Each additional bedroom is associated with a 150000 increase in predicted price, holding other features constant
  • C. Houses with bedrooms cost 150000
  • D. 150000 is the R-squared for bedrooms
Answer: B
B is correct. In multiple regression, each coefficient represents the change in the predicted target for a one-unit increase in that feature, while holding all other features constant. So each additional bedroom adds 150000 to the predicted price, assuming area, location, etc. remain the same.
MCQ 15
What is the gradient of MSE with respect to weight w for simple linear regression y = wx + b?
  • A. (2/n) * sum(x * (y_pred - y))
  • B. (2/n) * sum(y_pred - y)
  • C. (1/n) * sum(y - y_pred)
  • D. sum(x^2)
Answer: A
A is correct. The partial derivative of MSE with respect to w is (2/n) * sum(x_i * (y_pred_i - y_i)). This tells us how the loss changes when we adjust w. Option B is the gradient with respect to b (the bias). These gradients are used in gradient descent to update w and b.

Coding Challenges

Challenge 1: Linear Regression from Scratch

Hard
Implement a LinearRegressionGD class with fit(X, y) and predict(X) methods using gradient descent. It should store the loss history. Train it on y = 4x + 10 + noise data (100 points). Print the learned equation and final MSE. Compare with sklearn.
Sample Input
100 data points from y = 4x + 10 + noise
Sample Output
Scratch: y = 3.98x + 10.12, MSE = 4.23 sklearn: y = 3.98x + 10.12
Normalize X before training. Use learning_rate=0.1, iterations=500.
import numpy as np
from sklearn.linear_model import LinearRegression

class LinearRegressionGD:
    def __init__(self, lr=0.1, n_iter=500):
        self.lr = lr
        self.n_iter = n_iter
        self.losses = []
    def fit(self, X, y):
        n = len(X)
        self.w = np.zeros(X.shape[1])
        self.b = 0.0
        for _ in range(self.n_iter):
            y_pred = X @ self.w + self.b
            self.losses.append(np.mean((y - y_pred)**2))
            self.w -= self.lr * (1/n) * X.T @ (y_pred - y)
            self.b -= self.lr * (1/n) * np.sum(y_pred - y)
    def predict(self, X): return X @ self.w + self.b

np.random.seed(42)
x = np.random.uniform(0, 10, 100)
y = 4*x + 10 + np.random.normal(0, 2, 100)
X = ((x - x.mean()) / x.std()).reshape(-1, 1)

my_model = LinearRegressionGD()
my_model.fit(X, y)
print(f'Scratch: w={my_model.w[0]:.4f}, b={my_model.b:.4f}, MSE={my_model.losses[-1]:.4f}')

sk_model = LinearRegression().fit(X, y)
print(f'sklearn: w={sk_model.coef_[0]:.4f}, b={sk_model.intercept_:.4f}')

Challenge 2: House Price Predictor

Medium
Create a synthetic house price dataset with features: area (500-3000), bedrooms (1-5), age (0-30), distance_to_city (1-25). Price = 20 + 0.03*area + 8*bedrooms - 0.5*age + noise. Train a model, evaluate it, and predict price for a specific house.
Sample Input
200 synthetic houses
Sample Output
Model equation, R-squared, RMSE, and prediction for [1500, 3, 5, 10]
Use StandardScaler. Print feature coefficients.
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score, mean_squared_error

np.random.seed(42)
n = 200
df = pd.DataFrame({'Area': np.random.uniform(500, 3000, n).round(0), 'Beds': np.random.randint(1, 6, n), 'Age': np.random.randint(0, 30, n), 'Distance': np.random.uniform(1, 25, n).round(1)})
df['Price'] = (20 + 0.03*df['Area'] + 8*df['Beds'] - 0.5*df['Age'] + np.random.normal(0, 5, n)).round(2)

X = df.drop('Price', axis=1)
y = df['Price']
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_tr_s = scaler.fit_transform(X_tr)
X_te_s = scaler.transform(X_te)

model = LinearRegression().fit(X_tr_s, y_tr)
y_pred = model.predict(X_te_s)
print(f'R^2: {r2_score(y_te, y_pred):.4f}')
print(f'RMSE: {np.sqrt(mean_squared_error(y_te, y_pred)):.2f}')
for feat, coef in zip(X.columns, model.coef_):
    print(f'{feat}: {coef:.4f}')
new = scaler.transform([[1500, 3, 5, 10]])
print(f'Prediction for [1500, 3, 5, 10]: {model.predict(new)[0]:.2f}')

Challenge 3: Model Comparison: Linear vs Polynomial

Medium
Generate non-linear data: y = 0.5*x^2 - 2*x + 10 + noise. Train linear (degree 1), quadratic (degree 2), and cubic (degree 3) models. Print R-squared for each on test data. Determine the best degree.
Sample Input
80 data points from y = 0.5x^2 - 2x + 10 + noise
Sample Output
Degree 1 R^2: 0.35, Degree 2 R^2: 0.95, Degree 3 R^2: 0.95
Split 80/20. Use random_state=42.
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

np.random.seed(42)
X = np.random.uniform(-5, 5, 80).reshape(-1, 1)
y = 0.5*X.squeeze()**2 - 2*X.squeeze() + 10 + np.random.normal(0, 2, 80)
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2, random_state=42)

best_r2, best_deg = -1, 0
for deg in [1, 2, 3]:
    poly = PolynomialFeatures(degree=deg)
    X_tr_p = poly.fit_transform(X_tr)
    X_te_p = poly.transform(X_te)
    model = LinearRegression().fit(X_tr_p, y_tr)
    r2 = r2_score(y_te, model.predict(X_te_p))
    print(f'Degree {deg}: R^2 = {r2:.4f}')
    if r2 > best_r2: best_r2, best_deg = r2, deg
print(f'Best: degree {best_deg} (R^2 = {best_r2:.4f}')

Challenge 4: Salary Prediction with Feature Analysis

Easy
Create a dataset mapping years of experience to salary. Train a linear regression model. Print the equation, evaluate with all 4 metrics, and predict salaries for 0, 5, 10, 15, and 20 years experience.
Sample Input
experience = [1,2,3,4,5,6,8,10,12,15], salary in thousands
Sample Output
Equation, metrics, prediction table
Use meaningful salary values (e.g., 25-130K range).
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

np.random.seed(42)
exp = np.array([1, 2, 3, 4, 5, 6, 8, 10, 12, 15, 1.5, 3.5, 7, 9, 11, 14, 4.5, 6.5, 8.5, 13]).reshape(-1, 1)
sal = 25 + 7*exp.squeeze() + np.random.normal(0, 3, 20)

X_tr, X_te, y_tr, y_te = train_test_split(exp, sal, test_size=0.2, random_state=42)
model = LinearRegression().fit(X_tr, y_tr)
y_pred = model.predict(X_te)

print(f'Equation: salary = {model.coef_[0]:.2f} * experience + {model.intercept_:.2f}')
print(f'MAE: {mean_absolute_error(y_te, y_pred):.2f}K')
print(f'RMSE: {np.sqrt(mean_squared_error(y_te, y_pred)):.2f}K')
print(f'R^2: {r2_score(y_te, y_pred):.4f}')
print('\nPredictions:')
for yr in [0, 5, 10, 15, 20]:
    print(f'  {yr} years -> {model.predict([[yr]])[0]:.1f}K')

Challenge 5: Learning Rate Experiment

Hard
Implement gradient descent for linear regression and test 5 different learning rates: 0.001, 0.01, 0.05, 0.1, 0.5. For each, run 200 iterations and report: final MSE, whether it converged, and the number of iterations to reach MSE < 5.
Sample Input
50 data points from y = 3x + 10 + noise
Sample Output
Table comparing all 5 learning rates
Normalize features. Use random_state=42.
import numpy as np

np.random.seed(42)
X = np.random.uniform(0, 10, 50)
y = 3*X + 10 + np.random.normal(0, 2, 50)
X_norm = (X - X.mean()) / X.std()

print(f'{"LR":<8}{"Final MSE":<12}{"Converged":<12}{"Steps to MSE<5"}')
print('-' * 44)
for lr in [0.001, 0.01, 0.05, 0.1, 0.5]:
    w, b = 0.0, 0.0
    steps_to_5 = 'N/A'
    for i in range(200):
        y_pred = w*X_norm + b
        mse = np.mean((y - y_pred)**2)
        if mse < 5 and steps_to_5 == 'N/A':
            steps_to_5 = str(i)
        dw = (2/50)*np.sum(X_norm*(y_pred-y))
        db = (2/50)*np.sum(y_pred-y)
        w -= lr*dw
        b -= lr*db
    final_mse = np.mean((y-(w*X_norm+b))**2)
    converged = 'Yes' if final_mse < 10 else 'No'
    print(f'{lr:<8}{final_mse:<12.4f}{converged:<12}{steps_to_5}')

Challenge 6: End-to-End ML Pipeline: Student Score Predictor

Hard
Build a complete ML pipeline: (1) Generate data with 3 features (hours_studied, attendance, previous_score) and target (final_score). (2) Perform EDA (describe, correlations). (3) Preprocess (handle missing values, scale). (4) Train-test split. (5) Train LinearRegression. (6) Evaluate with all metrics. (7) Predict for a new student.
Sample Input
150 synthetic student records with some missing values
Sample Output
Complete pipeline output: EDA, training, evaluation, prediction
Include missing value handling and feature scaling.
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

np.random.seed(42)
n = 150
df = pd.DataFrame({
    'Hours': np.random.uniform(1, 10, n).round(1),
    'Attendance': np.random.uniform(40, 100, n).round(1),
    'Prev_Score': np.random.uniform(30, 90, n).round(1)
})
df['Final'] = (10 + 5*df['Hours'] + 0.3*df['Attendance'] + 0.4*df['Prev_Score'] + np.random.normal(0, 5, n)).round(1).clip(0, 100)
df.loc[np.random.choice(n, 10), 'Hours'] = np.nan
df.loc[np.random.choice(n, 5), 'Attendance'] = np.nan

print('=== EDA ===')
print(f'Shape: {df.shape}')
print(f'Missing: {df.isnull().sum().to_dict()}')
print(f'Correlations with Final:')
print(df.corr()['Final'].drop('Final').round(3))

df['Hours'] = df['Hours'].fillna(df['Hours'].median())
df['Attendance'] = df['Attendance'].fillna(df['Attendance'].median())

X = df.drop('Final', axis=1)
y = df['Final']
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_tr_s = scaler.fit_transform(X_tr)
X_te_s = scaler.transform(X_te)

model = LinearRegression().fit(X_tr_s, y_tr)
y_pred = model.predict(X_te_s)

print(f'\n=== Evaluation ===')
print(f'MAE: {mean_absolute_error(y_te, y_pred):.2f}')
print(f'RMSE: {np.sqrt(mean_squared_error(y_te, y_pred)):.2f}')
print(f'R^2: {r2_score(y_te, y_pred):.4f}')

for feat, coef in zip(X.columns, model.coef_):
    print(f'{feat}: {coef:.4f}')

new_student = scaler.transform([[6, 85, 70]])
print(f'\nNew student (6hrs, 85% att, 70 prev): {model.predict(new_student)[0]:.1f}')

Need to Review the Concepts?

Go back to the detailed notes for this chapter.

Read Chapter Notes

Want to learn AI and ML with a live mentor?

Explore our AI/ML Masterclass