Chapter 2 Beginner 62 Questions

Practice Questions — Python for AI - NumPy, Pandas, and Matplotlib

← Back to Notes

10 Easy

13 Medium

9 Hard

Topic-Specific Questions

Question 1

Easy

What is the output?

import numpy as np
arr = np.array([1, 2, 3]) * 2
print(arr)

NumPy performs element-wise multiplication.

[2 4 6]

Question 2

Easy

What is the output?

import numpy as np
print(np.zeros(3))
print(np.ones((2, 2)))

zeros creates an array of zeros, ones creates an array of ones.

[0. 0. 0.]

[[1. 1.]
 [1. 1.]]

Question 3

Easy

What is the output?

import numpy as np
arr = np.array([10, 20, 30, 40, 50])
print(arr[1:4])

NumPy slicing works like Python list slicing: start (inclusive) to end (exclusive).

[20 30 40]

Question 4

Easy

Write code to create a NumPy array of numbers from 0 to 9 and compute the sum, mean, and standard deviation.

Use np.arange(10) to create the array, then np.sum(), np.mean(), np.std().

import numpy as np

arr = np.arange(10)
print(f"Array: {arr}")
print(f"Sum: {np.sum(arr)}")
print(f"Mean: {np.mean(arr)}")
print(f"Std: {np.std(arr):.2f}")

Output: Sum: 45, Mean: 4.5, Std: 2.87

Question 5

Medium

What is the output?

import numpy as np
arr = np.array([10, 25, 5, 40, 15])
print(arr[arr > 12])

This is boolean indexing -- it returns elements where the condition is True.

[25 40 15]

Question 6

Medium

What is the output?

import numpy as np
arr = np.arange(6).reshape(2, 3)
print(arr)
print(arr.shape)

reshape(2, 3) turns a 1D array of 6 elements into a 2x3 matrix.

[[0 1 2]
 [3 4 5]]

(2, 3)

Question 7

Medium

Write code to compute the dot product of vectors [1, 2, 3] and [4, 5, 6] using NumPy. What is the mathematical calculation?

Dot product = (1*4) + (2*5) + (3*6). Use np.dot().

import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = np.dot(a, b)
print(f"Dot product: {result}")
print(f"Calculation: (1*4) + (2*5) + (3*6) = {1*4 + 2*5 + 3*6}")

Output: Dot product: 32

Question 8

Easy

Write Pandas code to create a DataFrame with 3 students (Name, Marks) and print only the student with the highest marks.

Use df[df['Marks'] == df['Marks'].max()] or df.loc[df['Marks'].idxmax()].

import pandas as pd

df = pd.DataFrame({
    'Name': ['Aarav', 'Priya', 'Rohan'],
    'Marks': [85, 92, 78]
})

top_student = df.loc[df['Marks'].idxmax()]
print(f"Top student: {top_student['Name']} with {top_student['Marks']} marks")

Output: Top student: Priya with 92 marks

Question 9

Medium

Write Pandas code to filter students with marks greater than 80 from this DataFrame: Name=['Aarav','Priya','Rohan','Ananya'], Marks=[85, 42, 91, 38].

Use df[df['Marks'] > 80].

import pandas as pd

df = pd.DataFrame({
    'Name': ['Aarav', 'Priya', 'Rohan', 'Ananya'],
    'Marks': [85, 42, 91, 38]
})

high_scorers = df[df['Marks'] > 80]
print(high_scorers)

Output: Name Marks 0 Aarav 85 2 Rohan 91

Question 10

Medium

Write code to find and fill missing values in a Pandas DataFrame. Create a DataFrame where the 'Salary' column has two NaN values.

Use np.nan to create missing values, isnull().sum() to detect, and fillna() to fill.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Employee': ['Aarav', 'Priya', 'Rohan', 'Ananya'],
    'Salary': [50000, np.nan, 60000, np.nan]
})

print("Before:")
print(df)
print(f"Missing values: {df['Salary'].isnull().sum()}")

df['Salary'] = df['Salary'].fillna(df['Salary'].median())
print("\nAfter filling with median:")
print(df)

Question 11

Medium

What is the output?

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
print(df.iloc[0:2])

iloc uses integer position indexing. 0:2 means positions 0 and 1.

   A  B
0  1  4
1  2  5

Question 12

Hard

Write code to group students by their department and calculate the average marks for each department.

Use df.groupby('Department')['Marks'].mean().

import pandas as pd

df = pd.DataFrame({
    'Name': ['Aarav', 'Priya', 'Rohan', 'Ananya', 'Vikram', 'Meera'],
    'Department': ['CSE', 'ECE', 'CSE', 'IT', 'ECE', 'CSE'],
    'Marks': [85, 92, 78, 88, 65, 95]
})

avg_by_dept = df.groupby('Department')['Marks'].mean().round(1)
print("Average marks by department:")
print(avg_by_dept)

Output: CSE 86.0, ECE 78.5, IT 88.0

Question 13

Hard

What is the output?

import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
print(A @ B)

Matrix multiplication: row of A dot column of B.

[[19 22]
 [43 50]]

Question 14

Hard

Write NumPy code to generate 1000 random numbers from a normal distribution with mean=50 and std=10, then find what percentage of values fall between 40 and 60.

Use np.random.normal(). Count values between 40 and 60 using boolean indexing.

import numpy as np
np.random.seed(42)

data = np.random.normal(loc=50, scale=10, size=1000)
in_range = np.sum((data >= 40) & (data <= 60))
pct = in_range / len(data) * 100
print(f"Values between 40-60: {in_range} out of 1000")
print(f"Percentage: {pct:.1f}%")

Question 15

Easy

What is the output?

import numpy as np
arr = np.array([5, 10, 15, 20])
print(np.sum(arr))
print(np.mean(arr))

Sum adds all elements. Mean is sum divided by count.

50
12.5

Question 16

Medium

Write Pandas code to add a new column 'Grade' to a DataFrame based on marks: A if marks >= 90, B if >= 75, C if >= 60, else F.

Use df['Marks'].apply(lambda x: ...) with conditional logic.

import pandas as pd

df = pd.DataFrame({
    'Name': ['Aarav', 'Priya', 'Rohan', 'Ananya'],
    'Marks': [95, 78, 62, 45]
})

def assign_grade(marks):
    if marks >= 90: return 'A'
    elif marks >= 75: return 'B'
    elif marks >= 60: return 'C'
    else: return 'F'

df['Grade'] = df['Marks'].apply(assign_grade)
print(df)

Question 17

Hard

What is the output?

import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(np.sum(arr, axis=0))
print(np.sum(arr, axis=1))

axis=0 sums along columns (down). axis=1 sums along rows (across).

[5 7 9]
[ 6 15]

Question 18

Hard

Write code to merge two DataFrames: one with student names and IDs, another with IDs and marks. Join them on the 'ID' column.

Use pd.merge(df1, df2, on='ID').

import pandas as pd

students = pd.DataFrame({
    'ID': [101, 102, 103],
    'Name': ['Aarav', 'Priya', 'Rohan']
})

marks = pd.DataFrame({
    'ID': [101, 102, 103],
    'Marks': [85, 92, 78]
})

result = pd.merge(students, marks, on='ID')
print(result)

Output: ID Name Marks 0 101 Aarav 85 1 102 Priya 92 2 103 Rohan 78

Question 19

Medium

Write Matplotlib code to create a bar chart showing marks of 4 students with student names on x-axis, marks on y-axis, a title, and labels.

Use plt.bar(names, marks), plt.title(), plt.xlabel(), plt.ylabel().

import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

names = ['Aarav', 'Priya', 'Rohan', 'Ananya']
marks = [85, 92, 78, 88]

plt.figure(figsize=(8, 5))
plt.bar(names, marks, color=['#a855f7', '#06b6d4', '#f59e0b', '#22c55e'])
plt.title('Student Marks Comparison')
plt.xlabel('Students')
plt.ylabel('Marks')
plt.savefig('bar_chart.png')
print("Bar chart saved")

Question 20

Easy

What is the output?

import pandas as pd
df = pd.DataFrame({'Name': ['Aarav', 'Priya', 'Rohan'], 'Age': [20, 22, 21]})
print(df.shape)

shape returns (rows, columns).

(3, 2)

Mixed & Application Questions

Question 1

Easy

What is the output?

import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr + arr)

Adding two arrays performs element-wise addition.

[2 4 6 8]

Question 2

Easy

Write code to create a Pandas Series from a list [10, 20, 30, 40] with custom index ['a', 'b', 'c', 'd'] and print the element at index 'c'.

Use pd.Series(data, index=custom_index).

import pandas as pd

s = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
print(s)
print(f"\nElement at 'c': {s['c']}")

Output: Element at 'c': 30

Question 3

Medium

What is the output?

import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9])
print(np.max(arr))
print(np.argmax(arr))

max returns the maximum value, argmax returns the INDEX of the maximum.

9
5

Question 4

Medium

What is the output?

import pandas as pd
df = pd.DataFrame({'X': [1, 2, 3], 'Y': [4, 5, 6]})
print(df['X'].sum())
print(df.sum())

Sum on a column gives a scalar. Sum on a DataFrame gives column-wise sums.

6

X     6
Y    15
dtype: int64

Question 5

Medium

Write code to create a 3x3 identity matrix using NumPy and verify that multiplying it with any vector gives the same vector.

Use np.eye(3) for the identity matrix. Multiply with np.dot().

import numpy as np

I = np.eye(3)
print("Identity matrix:\n", I)

v = np.array([5, 10, 15])
result = I @ v
print(f"\nI @ {v} = {result}")
print(f"Same as original? {np.array_equal(v, result)}")

Question 6

Medium

Write Pandas code to read a DataFrame with some missing values and create a summary showing: total rows, missing values per column, and percentage missing per column.

Use isnull().sum() and divide by len(df) * 100.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Age': [25, np.nan, 30, np.nan, 28],
    'Salary': [50000, 60000, np.nan, 70000, 55000],
    'City': ['Delhi', None, 'Mumbai', 'Pune', None]
})

print(f"Total rows: {len(df)}")
print(f"\nMissing values:")
for col in df.columns:
    missing = df[col].isnull().sum()
    pct = missing / len(df) * 100
    print(f"  {col}: {missing} ({pct:.0f}%)")

Question 7

Hard

What is the output?

import numpy as np
arr = np.array([[10, 20, 30], [40, 50, 60]])
print(arr.T)
print(arr.T.shape)

T transposes the matrix (swaps rows and columns).

[[10 40]
 [20 50]
 [30 60]]

(3, 2)

Question 8

Hard

Write a complete mini analysis: create a DataFrame with 5 students and their marks in 3 subjects, compute total and average marks, sort by average descending, and display the result.

Use df['Total'] = df[['Math', 'Science', 'English']].sum(axis=1) for row-wise sum.

import pandas as pd

df = pd.DataFrame({
    'Name': ['Aarav', 'Priya', 'Rohan', 'Ananya', 'Vikram'],
    'Math': [85, 92, 78, 88, 65],
    'Science': [90, 88, 72, 95, 70],
    'English': [78, 95, 80, 82, 75]
})

df['Total'] = df[['Math', 'Science', 'English']].sum(axis=1)
df['Average'] = df['Total'] / 3
df = df.sort_values('Average', ascending=False)
print(df.round(1).to_string(index=False))

Question 9

Easy

Write NumPy code to create a random array of 5 integers between 1 and 100 and print the minimum and maximum values.

Use np.random.randint(1, 101, size=5).

import numpy as np
np.random.seed(42)

arr = np.random.randint(1, 101, size=5)
print(f"Array: {arr}")
print(f"Min: {np.min(arr)}")
print(f"Max: {np.max(arr)}")

Question 10

Hard

What is the output?

import pandas as pd
df = pd.DataFrame({'A': [1, 1, 2, 2], 'B': [10, 20, 30, 40]})
result = df.groupby('A')['B'].agg(['sum', 'mean', 'count'])
print(result)

GroupBy groups rows by column A, then applies sum, mean, and count to column B.

   sum  mean  count
A                   
1   30  15.0      2
2   70  35.0      2

Question 11

Hard

Write code to normalize an array of marks to the range [0, 1] using Min-Max scaling: (x - min) / (max - min).

Find min and max of the array, then apply the formula element-wise.

import numpy as np

marks = np.array([30, 45, 78, 92, 55, 67, 88, 41])
min_val = np.min(marks)
max_val = np.max(marks)
normalized = (marks - min_val) / (max_val - min_val)

print(f"Original marks: {marks}")
print(f"Min: {min_val}, Max: {max_val}")
print(f"Normalized (0-1): {np.round(normalized, 3)}")
print(f"Min normalized: {normalized.min():.1f}, Max normalized: {normalized.max():.1f}")

Question 12

Medium

What is the output?

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': ['x', 'y', 'z']})
print(df.dtypes)

Pandas infers data types automatically. Numbers become int64, strings become object.

A     int64
B    object
dtype: object

Multiple Choice Questions

MCQ 1

What does np.array([1, 2, 3]) * 3 produce?

A. [1, 2, 3, 1, 2, 3, 1, 2, 3]
B. [3, 6, 9]
C. Error
D. [1, 2, 3, 3]

Answer: B
B is correct. NumPy performs element-wise multiplication: each element is multiplied by 3. This is vectorization -- no loops needed. Option A would be the result if this were a Python list ([1,2,3] * 3 repeats the list).

MCQ 2

What does df.head() do in Pandas?

A. Displays the column names
B. Displays the first 5 rows of the DataFrame
C. Displays the data types
D. Displays summary statistics

Answer: B
B is correct. df.head() returns the first 5 rows by default. You can pass a number like df.head(10) to see more. df.columns shows column names. df.dtypes shows data types. df.describe() shows statistics.

MCQ 3

Which NumPy function creates an array of evenly spaced values between two endpoints?

A. np.arange()
B. np.linspace()
C. np.zeros()
D. np.random.rand()

Answer: B
B is correct. np.linspace(0, 1, 5) creates 5 evenly spaced values between 0 and 1: [0, 0.25, 0.5, 0.75, 1]. np.arange() uses a step size (not a count). np.zeros() creates an array of zeros. np.random.rand() creates random values.

MCQ 4

What is the output of np.arange(0, 10, 2)?

A. [0, 2, 4, 6, 8, 10]
B. [0, 2, 4, 6, 8]
C. [2, 4, 6, 8, 10]
D. [0, 1, 2, 3, 4]

Answer: B
B is correct. np.arange(start, stop, step) creates values from 0 to 10 (exclusive) with step 2. The values are [0, 2, 4, 6, 8]. The stop value (10) is excluded, just like Python's range().

MCQ 5

In Pandas, what is the difference between loc and iloc?

A. loc is for rows, iloc is for columns
B. loc uses label-based indexing, iloc uses integer position-based indexing
C. They are the same thing
D. loc is faster than iloc

Answer: B
B is correct. loc selects data by label (e.g., df.loc['row_name', 'col_name']). iloc selects by integer position (e.g., df.iloc[0, 1]). Important: loc slicing is inclusive on both ends, while iloc slicing excludes the end (like Python).

MCQ 6

What does arr.reshape(-1, 1) do to a 1D NumPy array with 5 elements?

A. Flattens it to shape (5,)
B. Converts it to shape (5, 1) -- a column vector
C. Converts it to shape (1, 5) -- a row vector
D. Raises an error

Answer: B
B is correct. reshape(-1, 1) creates a 2D array with 1 column. The -1 means 'calculate the number of rows automatically' (5 elements / 1 column = 5 rows). This gives shape (5, 1), which is what scikit-learn expects for single-feature input.

MCQ 7

What does df.isnull().sum() return for a Pandas DataFrame?

A. Total number of missing values in the entire DataFrame
B. Number of missing values in each column
C. True or False for each cell
D. Number of non-null values in each column

Answer: B
B is correct. df.isnull() creates a boolean DataFrame (True where values are missing). .sum() sums each column (True counts as 1, False as 0), giving the count of missing values per column. To get the total, use df.isnull().sum().sum().

MCQ 8

What is the correct way to filter a Pandas DataFrame for rows where age > 20 AND marks > 80?

A. df[df['age'] > 20 and df['marks'] > 80]
B. df[(df['age'] > 20) & (df['marks'] > 80)]
C. df[df['age'] > 20 & df['marks'] > 80]
D. df.filter(age > 20, marks > 80)

Answer: B
B is correct. In Pandas, you must use & (not and) for element-wise AND, and each condition must be in parentheses. Option A uses Python's and which does not work with Series. Option C has incorrect precedence without parentheses.

MCQ 9

What type of plot is best for showing the distribution of a single numerical variable?

A. Scatter plot
B. Bar chart
C. Histogram
D. Pie chart

Answer: C
C is correct. Histograms show the frequency distribution of a single variable by dividing values into bins and counting occurrences. Scatter plots show relationships between two variables. Bar charts compare categories. Pie charts show proportions of a whole.

MCQ 10

What is broadcasting in NumPy?

A. Sending arrays to multiple processors
B. Automatically expanding smaller arrays to match larger arrays for element-wise operations
C. Converting arrays to different data types
D. Distributing data across multiple machines

Answer: B
B is correct. Broadcasting is NumPy's mechanism for performing operations between arrays of different shapes. When you write np.array([1, 2, 3]) + 5, NumPy broadcasts the scalar 5 to match the array shape, effectively adding [5, 5, 5]. This also works between differently shaped arrays following specific rules.

MCQ 11

What does df.groupby('Department')['Salary'].mean() return?

A. The overall mean salary
B. The mean salary for each department
C. A list of all salaries
D. The Department column

Answer: B
B is correct. groupby('Department') groups the rows by unique values in the Department column. ['Salary'].mean() then computes the average salary within each group. The result is a Series indexed by department names with mean salaries as values.

MCQ 12

What is the shape of np.array([[1, 2, 3], [4, 5, 6]]).T?

A. (2, 3)
B. (3, 2)
C. (6,)
D. (1, 6)

Answer: B
B is correct. The original array has shape (2, 3) -- 2 rows and 3 columns. Transposing (.T) swaps rows and columns, giving shape (3, 2) -- 3 rows and 2 columns. Transpose is essential in ML for operations like computing the normal equation in linear regression: (X.T @ X)^(-1) @ X.T @ y.

MCQ 13

What does pd.read_csv('data.csv') do?

A. Creates a new CSV file
B. Reads a CSV file into a Pandas DataFrame
C. Reads a CSV file into a NumPy array
D. Opens a CSV file in a text editor

Answer: B
B is correct. pd.read_csv() reads a CSV (Comma Separated Values) file and returns a Pandas DataFrame. This is the most common way to load data in ML projects. Pandas also supports read_excel(), read_json(), and read_sql() for other formats.

MCQ 14

What is the output of np.dot(np.array([1, 2, 3]), np.array([4, 5, 6]))?

A. [4, 10, 18]
B. 32
C. [[4, 5, 6], [8, 10, 12], [12, 15, 18]]
D. Error

Answer: B
B is correct. The dot product of two 1D arrays is a scalar: (1*4) + (2*5) + (3*6) = 4 + 10 + 18 = 32. Option A would be element-wise multiplication (arr1 * arr2). The dot product is a single number that measures the similarity between two vectors.

MCQ 15

Which Matplotlib function creates a scatter plot?

A. plt.plot()
B. plt.bar()
C. plt.scatter()
D. plt.hist()

Answer: C
C is correct. plt.scatter(x, y) creates a scatter plot showing individual data points. plt.plot() creates line charts. plt.bar() creates bar charts. plt.hist() creates histograms. Scatter plots are essential in ML for visualizing relationships between features.

Coding Challenges

Challenge 1: NumPy Statistics Calculator

Easy

Create a NumPy array of exam marks: [72, 85, 90, 65, 78, 92, 55, 88, 76, 81]. Calculate and print: count, sum, mean, median, standard deviation, minimum, maximum, and range (max - min).

Sample Input

marks = [72, 85, 90, 65, 78, 92, 55, 88, 76, 81]

Sample Output

Count: 10 Sum: 782 Mean: 78.2 Median: 79.5 Std: 10.79 Min: 55 Max: 92 Range: 37

Use NumPy functions only. Do not use Python built-in functions.

import numpy as np

marks = np.array([72, 85, 90, 65, 78, 92, 55, 88, 76, 81])

print(f"Count: {len(marks)}")
print(f"Sum: {np.sum(marks)}")
print(f"Mean: {np.mean(marks)}")
print(f"Median: {np.median(marks)}")
print(f"Std: {np.std(marks):.2f}")
print(f"Min: {np.min(marks)}")
print(f"Max: {np.max(marks)}")
print(f"Range: {np.max(marks) - np.min(marks)}")

Challenge 2: Pandas Student Report Card Generator

Medium

Create a DataFrame with 6 students, their marks in Math, Science, and English. Add columns for Total, Average, and Grade (A: avg >= 85, B: >= 70, C: >= 55, F: below 55). Sort by Average descending and print the report.

Sample Input

Students: Aarav, Priya, Rohan, Ananya, Vikram, Meera

Sample Output

Complete report card sorted by average marks

Use Pandas operations. No manual calculations.

import pandas as pd

df = pd.DataFrame({
    'Name': ['Aarav', 'Priya', 'Rohan', 'Ananya', 'Vikram', 'Meera'],
    'Math': [85, 92, 55, 88, 45, 95],
    'Science': [90, 88, 60, 95, 50, 91],
    'English': [78, 95, 52, 82, 48, 88]
})

subjects = ['Math', 'Science', 'English']
df['Total'] = df[subjects].sum(axis=1)
df['Average'] = (df['Total'] / 3).round(1)

def grade(avg):
    if avg >= 85: return 'A'
    if avg >= 70: return 'B'
    if avg >= 55: return 'C'
    return 'F'

df['Grade'] = df['Average'].apply(grade)
df = df.sort_values('Average', ascending=False)
print(df.to_string(index=False))

Challenge 3: Missing Data Handler

Medium

Create a DataFrame with intentional missing values (use np.nan). Write a function that: (1) reports missing values per column, (2) fills numerical columns with median, (3) fills categorical columns with mode, (4) verifies no missing values remain.

Sample Input

DataFrame with Name, Age (2 NaN), Salary (1 NaN), City (1 None)

Sample Output

Missing values report, filled DataFrame, verification

Handle numerical and categorical columns differently.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Name': ['Aarav', 'Priya', 'Rohan', 'Ananya', 'Vikram'],
    'Age': [25, np.nan, 30, np.nan, 28],
    'Salary': [50000, 60000, np.nan, 70000, 55000],
    'City': ['Delhi', None, 'Mumbai', 'Pune', None]
})

print('Before cleaning:')
print(df)
print(f'\nMissing values:\n{df.isnull().sum()}')

for col in df.select_dtypes(include='number').columns:
    df[col] = df[col].fillna(df[col].median())

for col in df.select_dtypes(include='object').columns:
    df[col] = df[col].fillna(df[col].mode()[0])

print('\nAfter cleaning:')
print(df)
print(f'\nMissing values remaining: {df.isnull().sum().sum()}')

Challenge 4: Data Visualization Dashboard

Hard

Generate synthetic data for 200 students with: hours_studied (1-10), attendance_pct (40-100), and marks (correlated with hours). Create a 2x2 dashboard with: (1) histogram of marks, (2) scatter plot of hours vs marks, (3) bar chart of average marks by attendance category (Low/Medium/High), (4) box plot of marks. Save as 'dashboard.png'.

Sample Input

np.random.seed(42), 200 synthetic student records

Sample Output

dashboard.png saved with 4 subplots

Use NumPy for data generation, Pandas for manipulation, Matplotlib for plotting.

import numpy as np
import pandas as pd
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

np.random.seed(42)
n = 200
hours = np.random.uniform(1, 10, n)
attendance = np.random.uniform(40, 100, n)
marks = 30 + 5 * hours + np.random.normal(0, 8, n)
marks = np.clip(marks, 0, 100)

df = pd.DataFrame({'Hours': hours.round(1), 'Attendance': attendance.round(1), 'Marks': marks.round(1)})
df['Att_Category'] = pd.cut(df['Attendance'], bins=[0, 60, 80, 100], labels=['Low', 'Medium', 'High'])

fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('Student Performance Dashboard', fontsize=16)

axes[0][0].hist(df['Marks'], bins=20, color='#a855f7', edgecolor='black')
axes[0][0].set_title('Marks Distribution')
axes[0][0].set_xlabel('Marks')

axes[0][1].scatter(df['Hours'], df['Marks'], alpha=0.5, c='#06b6d4', s=30)
axes[0][1].set_title('Study Hours vs Marks')
axes[0][1].set_xlabel('Hours Studied')
axes[0][1].set_ylabel('Marks')

avg_by_att = df.groupby('Att_Category')['Marks'].mean()
axes[1][0].bar(avg_by_att.index.astype(str), avg_by_att.values, color=['#ef4444', '#f59e0b', '#22c55e'])
axes[1][0].set_title('Avg Marks by Attendance')

axes[1][1].boxplot(df['Marks'])
axes[1][1].set_title('Marks Box Plot')

plt.tight_layout()
plt.savefig('dashboard.png', dpi=100)
print('Dashboard saved as dashboard.png')

Challenge 5: Matrix Operations for ML

Hard

Implement the following using NumPy: (1) Create a 3x3 matrix A and a 3x1 vector b. (2) Compute A transposed. (3) Compute A @ b (matrix-vector multiplication). (4) Compute the inverse of A using np.linalg.inv(). (5) Verify that A @ A_inv equals the identity matrix (use np.allclose).

Sample Input

A = [[2, 1, 0], [1, 3, 1], [0, 1, 2]], b = [[1], [2], [3]]

Sample Output

Transpose, product, inverse, and identity verification

Use NumPy's linalg module. Round results to 2 decimal places.

import numpy as np

A = np.array([[2, 1, 0], [1, 3, 1], [0, 1, 2]])
b = np.array([[1], [2], [3]])

print('Matrix A:\n', A)
print('\nA transposed:\n', A.T)
print('\nA @ b:\n', A @ b)

A_inv = np.linalg.inv(A)
print('\nA inverse:\n', np.round(A_inv, 2))

identity = A @ A_inv
print('\nA @ A_inv:\n', np.round(identity, 2))
print('\nIs identity matrix?', np.allclose(identity, np.eye(3)))

Challenge 6: Complete Data Analysis: CSV-like Data Pipeline

Hard

Simulate reading a dataset: create a DataFrame with 50 employee records (Name, Department, Salary with some NaN, Experience). Perform a complete analysis: (1) show shape and info, (2) handle missing salaries with department-wise median, (3) add 'Salary_Level' column (Low/Medium/High), (4) groupby department and show average salary and count, (5) find the top 3 highest-paid employees.

Sample Input

50 synthetic employee records with some missing salaries

Sample Output

Complete analysis report with cleaned data and insights

Use Pandas for all operations. Make the output readable.

import pandas as pd
import numpy as np

np.random.seed(42)
n = 50
depts = np.random.choice(['Engineering', 'Marketing', 'Sales', 'HR'], n)
salaries = np.where(depts == 'Engineering', np.random.normal(90000, 15000, n),
           np.where(depts == 'Marketing', np.random.normal(70000, 10000, n),
           np.where(depts == 'Sales', np.random.normal(60000, 12000, n),
           np.random.normal(55000, 8000, n))))

df = pd.DataFrame({
    'Name': [f'Employee_{i}' for i in range(1, n+1)],
    'Department': depts,
    'Salary': salaries.round(0),
    'Experience': np.random.randint(1, 20, n)
})
df.loc[np.random.choice(n, 8, replace=False), 'Salary'] = np.nan

print(f'Shape: {df.shape}')
print(f'Missing salaries: {df["Salary"].isnull().sum()}')

df['Salary'] = df.groupby('Department')['Salary'].transform(lambda x: x.fillna(x.median()))
print(f'Missing after fill: {df["Salary"].isnull().sum()}')

df['Salary_Level'] = pd.cut(df['Salary'], bins=[0, 55000, 80000, float('inf')], labels=['Low', 'Medium', 'High'])

print('\nDepartment Summary:')
print(df.groupby('Department').agg(
    Avg_Salary=('Salary', 'mean'),
    Count=('Salary', 'count')
).round(0))

print('\nTop 3 Highest Paid:')
print(df.nlargest(3, 'Salary')[['Name', 'Department', 'Salary']].to_string(index=False))

Need to Review the Concepts?

Go back to the detailed notes for this chapter.

Read Chapter Notes

Want to learn AI and ML with a live mentor?

Explore our AI/ML Masterclass