Chapter 2 Beginner 62 Questions

Practice Questions — Python for AI - NumPy, Pandas, and Matplotlib

← Back to Notes
10 Easy
13 Medium
9 Hard

Topic-Specific Questions

Question 1
Easy
What is the output?
import numpy as np
arr = np.array([1, 2, 3]) * 2
print(arr)
NumPy performs element-wise multiplication.
[2 4 6]
Question 2
Easy
What is the output?
import numpy as np
print(np.zeros(3))
print(np.ones((2, 2)))
zeros creates an array of zeros, ones creates an array of ones.
[0. 0. 0.]
[[1. 1.] [1. 1.]]
Question 3
Easy
What is the output?
import numpy as np
arr = np.array([10, 20, 30, 40, 50])
print(arr[1:4])
NumPy slicing works like Python list slicing: start (inclusive) to end (exclusive).
[20 30 40]
Question 4
Easy
Write code to create a NumPy array of numbers from 0 to 9 and compute the sum, mean, and standard deviation.
Use np.arange(10) to create the array, then np.sum(), np.mean(), np.std().
import numpy as np

arr = np.arange(10)
print(f"Array: {arr}")
print(f"Sum: {np.sum(arr)}")
print(f"Mean: {np.mean(arr)}")
print(f"Std: {np.std(arr):.2f}")
Output: Sum: 45, Mean: 4.5, Std: 2.87
Question 5
Medium
What is the output?
import numpy as np
arr = np.array([10, 25, 5, 40, 15])
print(arr[arr > 12])
This is boolean indexing -- it returns elements where the condition is True.
[25 40 15]
Question 6
Medium
What is the output?
import numpy as np
arr = np.arange(6).reshape(2, 3)
print(arr)
print(arr.shape)
reshape(2, 3) turns a 1D array of 6 elements into a 2x3 matrix.
[[0 1 2] [3 4 5]]
(2, 3)
Question 7
Medium
Write code to compute the dot product of vectors [1, 2, 3] and [4, 5, 6] using NumPy. What is the mathematical calculation?
Dot product = (1*4) + (2*5) + (3*6). Use np.dot().
import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = np.dot(a, b)
print(f"Dot product: {result}")
print(f"Calculation: (1*4) + (2*5) + (3*6) = {1*4 + 2*5 + 3*6}")
Output: Dot product: 32
Question 8
Easy
Write Pandas code to create a DataFrame with 3 students (Name, Marks) and print only the student with the highest marks.
Use df[df['Marks'] == df['Marks'].max()] or df.loc[df['Marks'].idxmax()].
import pandas as pd

df = pd.DataFrame({
    'Name': ['Aarav', 'Priya', 'Rohan'],
    'Marks': [85, 92, 78]
})

top_student = df.loc[df['Marks'].idxmax()]
print(f"Top student: {top_student['Name']} with {top_student['Marks']} marks")
Output: Top student: Priya with 92 marks
Question 9
Medium
Write Pandas code to filter students with marks greater than 80 from this DataFrame: Name=['Aarav','Priya','Rohan','Ananya'], Marks=[85, 42, 91, 38].
Use df[df['Marks'] > 80].
import pandas as pd

df = pd.DataFrame({
    'Name': ['Aarav', 'Priya', 'Rohan', 'Ananya'],
    'Marks': [85, 42, 91, 38]
})

high_scorers = df[df['Marks'] > 80]
print(high_scorers)
Output: Name Marks 0 Aarav 85 2 Rohan 91
Question 10
Medium
Write code to find and fill missing values in a Pandas DataFrame. Create a DataFrame where the 'Salary' column has two NaN values.
Use np.nan to create missing values, isnull().sum() to detect, and fillna() to fill.
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Employee': ['Aarav', 'Priya', 'Rohan', 'Ananya'],
    'Salary': [50000, np.nan, 60000, np.nan]
})

print("Before:")
print(df)
print(f"Missing values: {df['Salary'].isnull().sum()}")

df['Salary'] = df['Salary'].fillna(df['Salary'].median())
print("\nAfter filling with median:")
print(df)
Question 11
Medium
What is the output?
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
print(df.iloc[0:2])
iloc uses integer position indexing. 0:2 means positions 0 and 1.
A B 0 1 4 1 2 5
Question 12
Hard
Write code to group students by their department and calculate the average marks for each department.
Use df.groupby('Department')['Marks'].mean().
import pandas as pd

df = pd.DataFrame({
    'Name': ['Aarav', 'Priya', 'Rohan', 'Ananya', 'Vikram', 'Meera'],
    'Department': ['CSE', 'ECE', 'CSE', 'IT', 'ECE', 'CSE'],
    'Marks': [85, 92, 78, 88, 65, 95]
})

avg_by_dept = df.groupby('Department')['Marks'].mean().round(1)
print("Average marks by department:")
print(avg_by_dept)
Output: CSE 86.0, ECE 78.5, IT 88.0
Question 13
Hard
What is the output?
import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
print(A @ B)
Matrix multiplication: row of A dot column of B.
[[19 22] [43 50]]
Question 14
Hard
Write NumPy code to generate 1000 random numbers from a normal distribution with mean=50 and std=10, then find what percentage of values fall between 40 and 60.
Use np.random.normal(). Count values between 40 and 60 using boolean indexing.
import numpy as np
np.random.seed(42)

data = np.random.normal(loc=50, scale=10, size=1000)
in_range = np.sum((data >= 40) & (data <= 60))
pct = in_range / len(data) * 100
print(f"Values between 40-60: {in_range} out of 1000")
print(f"Percentage: {pct:.1f}%")
Question 15
Easy
What is the output?
import numpy as np
arr = np.array([5, 10, 15, 20])
print(np.sum(arr))
print(np.mean(arr))
Sum adds all elements. Mean is sum divided by count.
50
12.5
Question 16
Medium
Write Pandas code to add a new column 'Grade' to a DataFrame based on marks: A if marks >= 90, B if >= 75, C if >= 60, else F.
Use df['Marks'].apply(lambda x: ...) with conditional logic.
import pandas as pd

df = pd.DataFrame({
    'Name': ['Aarav', 'Priya', 'Rohan', 'Ananya'],
    'Marks': [95, 78, 62, 45]
})

def assign_grade(marks):
    if marks >= 90: return 'A'
    elif marks >= 75: return 'B'
    elif marks >= 60: return 'C'
    else: return 'F'

df['Grade'] = df['Marks'].apply(assign_grade)
print(df)
Question 17
Hard
What is the output?
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(np.sum(arr, axis=0))
print(np.sum(arr, axis=1))
axis=0 sums along columns (down). axis=1 sums along rows (across).
[5 7 9]
[ 6 15]
Question 18
Hard
Write code to merge two DataFrames: one with student names and IDs, another with IDs and marks. Join them on the 'ID' column.
Use pd.merge(df1, df2, on='ID').
import pandas as pd

students = pd.DataFrame({
    'ID': [101, 102, 103],
    'Name': ['Aarav', 'Priya', 'Rohan']
})

marks = pd.DataFrame({
    'ID': [101, 102, 103],
    'Marks': [85, 92, 78]
})

result = pd.merge(students, marks, on='ID')
print(result)
Output: ID Name Marks 0 101 Aarav 85 1 102 Priya 92 2 103 Rohan 78
Question 19
Medium
Write Matplotlib code to create a bar chart showing marks of 4 students with student names on x-axis, marks on y-axis, a title, and labels.
Use plt.bar(names, marks), plt.title(), plt.xlabel(), plt.ylabel().
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

names = ['Aarav', 'Priya', 'Rohan', 'Ananya']
marks = [85, 92, 78, 88]

plt.figure(figsize=(8, 5))
plt.bar(names, marks, color=['#a855f7', '#06b6d4', '#f59e0b', '#22c55e'])
plt.title('Student Marks Comparison')
plt.xlabel('Students')
plt.ylabel('Marks')
plt.savefig('bar_chart.png')
print("Bar chart saved")
Question 20
Easy
What is the output?
import pandas as pd
df = pd.DataFrame({'Name': ['Aarav', 'Priya', 'Rohan'], 'Age': [20, 22, 21]})
print(df.shape)
shape returns (rows, columns).
(3, 2)

Mixed & Application Questions

Question 1
Easy
What is the output?
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr + arr)
Adding two arrays performs element-wise addition.
[2 4 6 8]
Question 2
Easy
Write code to create a Pandas Series from a list [10, 20, 30, 40] with custom index ['a', 'b', 'c', 'd'] and print the element at index 'c'.
Use pd.Series(data, index=custom_index).
import pandas as pd

s = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
print(s)
print(f"\nElement at 'c': {s['c']}")
Output: Element at 'c': 30
Question 3
Medium
What is the output?
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9])
print(np.max(arr))
print(np.argmax(arr))
max returns the maximum value, argmax returns the INDEX of the maximum.
9
5
Question 4
Medium
What is the output?
import pandas as pd
df = pd.DataFrame({'X': [1, 2, 3], 'Y': [4, 5, 6]})
print(df['X'].sum())
print(df.sum())
Sum on a column gives a scalar. Sum on a DataFrame gives column-wise sums.
6
X 6 Y 15 dtype: int64
Question 5
Medium
Write code to create a 3x3 identity matrix using NumPy and verify that multiplying it with any vector gives the same vector.
Use np.eye(3) for the identity matrix. Multiply with np.dot().
import numpy as np

I = np.eye(3)
print("Identity matrix:\n", I)

v = np.array([5, 10, 15])
result = I @ v
print(f"\nI @ {v} = {result}")
print(f"Same as original? {np.array_equal(v, result)}")
Question 6
Medium
Write Pandas code to read a DataFrame with some missing values and create a summary showing: total rows, missing values per column, and percentage missing per column.
Use isnull().sum() and divide by len(df) * 100.
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Age': [25, np.nan, 30, np.nan, 28],
    'Salary': [50000, 60000, np.nan, 70000, 55000],
    'City': ['Delhi', None, 'Mumbai', 'Pune', None]
})

print(f"Total rows: {len(df)}")
print(f"\nMissing values:")
for col in df.columns:
    missing = df[col].isnull().sum()
    pct = missing / len(df) * 100
    print(f"  {col}: {missing} ({pct:.0f}%)")
Question 7
Hard
What is the output?
import numpy as np
arr = np.array([[10, 20, 30], [40, 50, 60]])
print(arr.T)
print(arr.T.shape)
T transposes the matrix (swaps rows and columns).
[[10 40] [20 50] [30 60]]
(3, 2)
Question 8
Hard
Write a complete mini analysis: create a DataFrame with 5 students and their marks in 3 subjects, compute total and average marks, sort by average descending, and display the result.
Use df['Total'] = df[['Math', 'Science', 'English']].sum(axis=1) for row-wise sum.
import pandas as pd

df = pd.DataFrame({
    'Name': ['Aarav', 'Priya', 'Rohan', 'Ananya', 'Vikram'],
    'Math': [85, 92, 78, 88, 65],
    'Science': [90, 88, 72, 95, 70],
    'English': [78, 95, 80, 82, 75]
})

df['Total'] = df[['Math', 'Science', 'English']].sum(axis=1)
df['Average'] = df['Total'] / 3
df = df.sort_values('Average', ascending=False)
print(df.round(1).to_string(index=False))
Question 9
Easy
Write NumPy code to create a random array of 5 integers between 1 and 100 and print the minimum and maximum values.
Use np.random.randint(1, 101, size=5).
import numpy as np
np.random.seed(42)

arr = np.random.randint(1, 101, size=5)
print(f"Array: {arr}")
print(f"Min: {np.min(arr)}")
print(f"Max: {np.max(arr)}")
Question 10
Hard
What is the output?
import pandas as pd
df = pd.DataFrame({'A': [1, 1, 2, 2], 'B': [10, 20, 30, 40]})
result = df.groupby('A')['B'].agg(['sum', 'mean', 'count'])
print(result)
GroupBy groups rows by column A, then applies sum, mean, and count to column B.
sum mean count A 1 30 15.0 2 2 70 35.0 2
Question 11
Hard
Write code to normalize an array of marks to the range [0, 1] using Min-Max scaling: (x - min) / (max - min).
Find min and max of the array, then apply the formula element-wise.
import numpy as np

marks = np.array([30, 45, 78, 92, 55, 67, 88, 41])
min_val = np.min(marks)
max_val = np.max(marks)
normalized = (marks - min_val) / (max_val - min_val)

print(f"Original marks: {marks}")
print(f"Min: {min_val}, Max: {max_val}")
print(f"Normalized (0-1): {np.round(normalized, 3)}")
print(f"Min normalized: {normalized.min():.1f}, Max normalized: {normalized.max():.1f}")
Question 12
Medium
What is the output?
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': ['x', 'y', 'z']})
print(df.dtypes)
Pandas infers data types automatically. Numbers become int64, strings become object.
A int64 B object dtype: object

Multiple Choice Questions

MCQ 1
What does np.array([1, 2, 3]) * 3 produce?
  • A. [1, 2, 3, 1, 2, 3, 1, 2, 3]
  • B. [3, 6, 9]
  • C. Error
  • D. [1, 2, 3, 3]
Answer: B
B is correct. NumPy performs element-wise multiplication: each element is multiplied by 3. This is vectorization -- no loops needed. Option A would be the result if this were a Python list ([1,2,3] * 3 repeats the list).
MCQ 2
What does df.head() do in Pandas?
  • A. Displays the column names
  • B. Displays the first 5 rows of the DataFrame
  • C. Displays the data types
  • D. Displays summary statistics
Answer: B
B is correct. df.head() returns the first 5 rows by default. You can pass a number like df.head(10) to see more. df.columns shows column names. df.dtypes shows data types. df.describe() shows statistics.
MCQ 3
Which NumPy function creates an array of evenly spaced values between two endpoints?
  • A. np.arange()
  • B. np.linspace()
  • C. np.zeros()
  • D. np.random.rand()
Answer: B
B is correct. np.linspace(0, 1, 5) creates 5 evenly spaced values between 0 and 1: [0, 0.25, 0.5, 0.75, 1]. np.arange() uses a step size (not a count). np.zeros() creates an array of zeros. np.random.rand() creates random values.
MCQ 4
What is the output of np.arange(0, 10, 2)?
  • A. [0, 2, 4, 6, 8, 10]
  • B. [0, 2, 4, 6, 8]
  • C. [2, 4, 6, 8, 10]
  • D. [0, 1, 2, 3, 4]
Answer: B
B is correct. np.arange(start, stop, step) creates values from 0 to 10 (exclusive) with step 2. The values are [0, 2, 4, 6, 8]. The stop value (10) is excluded, just like Python's range().
MCQ 5
In Pandas, what is the difference between loc and iloc?
  • A. loc is for rows, iloc is for columns
  • B. loc uses label-based indexing, iloc uses integer position-based indexing
  • C. They are the same thing
  • D. loc is faster than iloc
Answer: B
B is correct. loc selects data by label (e.g., df.loc['row_name', 'col_name']). iloc selects by integer position (e.g., df.iloc[0, 1]). Important: loc slicing is inclusive on both ends, while iloc slicing excludes the end (like Python).
MCQ 6
What does arr.reshape(-1, 1) do to a 1D NumPy array with 5 elements?
  • A. Flattens it to shape (5,)
  • B. Converts it to shape (5, 1) -- a column vector
  • C. Converts it to shape (1, 5) -- a row vector
  • D. Raises an error
Answer: B
B is correct. reshape(-1, 1) creates a 2D array with 1 column. The -1 means 'calculate the number of rows automatically' (5 elements / 1 column = 5 rows). This gives shape (5, 1), which is what scikit-learn expects for single-feature input.
MCQ 7
What does df.isnull().sum() return for a Pandas DataFrame?
  • A. Total number of missing values in the entire DataFrame
  • B. Number of missing values in each column
  • C. True or False for each cell
  • D. Number of non-null values in each column
Answer: B
B is correct. df.isnull() creates a boolean DataFrame (True where values are missing). .sum() sums each column (True counts as 1, False as 0), giving the count of missing values per column. To get the total, use df.isnull().sum().sum().
MCQ 8
What is the correct way to filter a Pandas DataFrame for rows where age > 20 AND marks > 80?
  • A. df[df['age'] > 20 and df['marks'] > 80]
  • B. df[(df['age'] > 20) & (df['marks'] > 80)]
  • C. df[df['age'] > 20 & df['marks'] > 80]
  • D. df.filter(age > 20, marks > 80)
Answer: B
B is correct. In Pandas, you must use & (not and) for element-wise AND, and each condition must be in parentheses. Option A uses Python's and which does not work with Series. Option C has incorrect precedence without parentheses.
MCQ 9
What type of plot is best for showing the distribution of a single numerical variable?
  • A. Scatter plot
  • B. Bar chart
  • C. Histogram
  • D. Pie chart
Answer: C
C is correct. Histograms show the frequency distribution of a single variable by dividing values into bins and counting occurrences. Scatter plots show relationships between two variables. Bar charts compare categories. Pie charts show proportions of a whole.
MCQ 10
What is broadcasting in NumPy?
  • A. Sending arrays to multiple processors
  • B. Automatically expanding smaller arrays to match larger arrays for element-wise operations
  • C. Converting arrays to different data types
  • D. Distributing data across multiple machines
Answer: B
B is correct. Broadcasting is NumPy's mechanism for performing operations between arrays of different shapes. When you write np.array([1, 2, 3]) + 5, NumPy broadcasts the scalar 5 to match the array shape, effectively adding [5, 5, 5]. This also works between differently shaped arrays following specific rules.
MCQ 11
What does df.groupby('Department')['Salary'].mean() return?
  • A. The overall mean salary
  • B. The mean salary for each department
  • C. A list of all salaries
  • D. The Department column
Answer: B
B is correct. groupby('Department') groups the rows by unique values in the Department column. ['Salary'].mean() then computes the average salary within each group. The result is a Series indexed by department names with mean salaries as values.
MCQ 12
What is the shape of np.array([[1, 2, 3], [4, 5, 6]]).T?
  • A. (2, 3)
  • B. (3, 2)
  • C. (6,)
  • D. (1, 6)
Answer: B
B is correct. The original array has shape (2, 3) -- 2 rows and 3 columns. Transposing (.T) swaps rows and columns, giving shape (3, 2) -- 3 rows and 2 columns. Transpose is essential in ML for operations like computing the normal equation in linear regression: (X.T @ X)^(-1) @ X.T @ y.
MCQ 13
What does pd.read_csv('data.csv') do?
  • A. Creates a new CSV file
  • B. Reads a CSV file into a Pandas DataFrame
  • C. Reads a CSV file into a NumPy array
  • D. Opens a CSV file in a text editor
Answer: B
B is correct. pd.read_csv() reads a CSV (Comma Separated Values) file and returns a Pandas DataFrame. This is the most common way to load data in ML projects. Pandas also supports read_excel(), read_json(), and read_sql() for other formats.
MCQ 14
What is the output of np.dot(np.array([1, 2, 3]), np.array([4, 5, 6]))?
  • A. [4, 10, 18]
  • B. 32
  • C. [[4, 5, 6], [8, 10, 12], [12, 15, 18]]
  • D. Error
Answer: B
B is correct. The dot product of two 1D arrays is a scalar: (1*4) + (2*5) + (3*6) = 4 + 10 + 18 = 32. Option A would be element-wise multiplication (arr1 * arr2). The dot product is a single number that measures the similarity between two vectors.
MCQ 15
Which Matplotlib function creates a scatter plot?
  • A. plt.plot()
  • B. plt.bar()
  • C. plt.scatter()
  • D. plt.hist()
Answer: C
C is correct. plt.scatter(x, y) creates a scatter plot showing individual data points. plt.plot() creates line charts. plt.bar() creates bar charts. plt.hist() creates histograms. Scatter plots are essential in ML for visualizing relationships between features.

Coding Challenges

Challenge 1: NumPy Statistics Calculator

Easy
Create a NumPy array of exam marks: [72, 85, 90, 65, 78, 92, 55, 88, 76, 81]. Calculate and print: count, sum, mean, median, standard deviation, minimum, maximum, and range (max - min).
Sample Input
marks = [72, 85, 90, 65, 78, 92, 55, 88, 76, 81]
Sample Output
Count: 10 Sum: 782 Mean: 78.2 Median: 79.5 Std: 10.79 Min: 55 Max: 92 Range: 37
Use NumPy functions only. Do not use Python built-in functions.
import numpy as np

marks = np.array([72, 85, 90, 65, 78, 92, 55, 88, 76, 81])

print(f"Count: {len(marks)}")
print(f"Sum: {np.sum(marks)}")
print(f"Mean: {np.mean(marks)}")
print(f"Median: {np.median(marks)}")
print(f"Std: {np.std(marks):.2f}")
print(f"Min: {np.min(marks)}")
print(f"Max: {np.max(marks)}")
print(f"Range: {np.max(marks) - np.min(marks)}")

Challenge 2: Pandas Student Report Card Generator

Medium
Create a DataFrame with 6 students, their marks in Math, Science, and English. Add columns for Total, Average, and Grade (A: avg >= 85, B: >= 70, C: >= 55, F: below 55). Sort by Average descending and print the report.
Sample Input
Students: Aarav, Priya, Rohan, Ananya, Vikram, Meera
Sample Output
Complete report card sorted by average marks
Use Pandas operations. No manual calculations.
import pandas as pd

df = pd.DataFrame({
    'Name': ['Aarav', 'Priya', 'Rohan', 'Ananya', 'Vikram', 'Meera'],
    'Math': [85, 92, 55, 88, 45, 95],
    'Science': [90, 88, 60, 95, 50, 91],
    'English': [78, 95, 52, 82, 48, 88]
})

subjects = ['Math', 'Science', 'English']
df['Total'] = df[subjects].sum(axis=1)
df['Average'] = (df['Total'] / 3).round(1)

def grade(avg):
    if avg >= 85: return 'A'
    if avg >= 70: return 'B'
    if avg >= 55: return 'C'
    return 'F'

df['Grade'] = df['Average'].apply(grade)
df = df.sort_values('Average', ascending=False)
print(df.to_string(index=False))

Challenge 3: Missing Data Handler

Medium
Create a DataFrame with intentional missing values (use np.nan). Write a function that: (1) reports missing values per column, (2) fills numerical columns with median, (3) fills categorical columns with mode, (4) verifies no missing values remain.
Sample Input
DataFrame with Name, Age (2 NaN), Salary (1 NaN), City (1 None)
Sample Output
Missing values report, filled DataFrame, verification
Handle numerical and categorical columns differently.
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Name': ['Aarav', 'Priya', 'Rohan', 'Ananya', 'Vikram'],
    'Age': [25, np.nan, 30, np.nan, 28],
    'Salary': [50000, 60000, np.nan, 70000, 55000],
    'City': ['Delhi', None, 'Mumbai', 'Pune', None]
})

print('Before cleaning:')
print(df)
print(f'\nMissing values:\n{df.isnull().sum()}')

for col in df.select_dtypes(include='number').columns:
    df[col] = df[col].fillna(df[col].median())

for col in df.select_dtypes(include='object').columns:
    df[col] = df[col].fillna(df[col].mode()[0])

print('\nAfter cleaning:')
print(df)
print(f'\nMissing values remaining: {df.isnull().sum().sum()}')

Challenge 4: Data Visualization Dashboard

Hard
Generate synthetic data for 200 students with: hours_studied (1-10), attendance_pct (40-100), and marks (correlated with hours). Create a 2x2 dashboard with: (1) histogram of marks, (2) scatter plot of hours vs marks, (3) bar chart of average marks by attendance category (Low/Medium/High), (4) box plot of marks. Save as 'dashboard.png'.
Sample Input
np.random.seed(42), 200 synthetic student records
Sample Output
dashboard.png saved with 4 subplots
Use NumPy for data generation, Pandas for manipulation, Matplotlib for plotting.
import numpy as np
import pandas as pd
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

np.random.seed(42)
n = 200
hours = np.random.uniform(1, 10, n)
attendance = np.random.uniform(40, 100, n)
marks = 30 + 5 * hours + np.random.normal(0, 8, n)
marks = np.clip(marks, 0, 100)

df = pd.DataFrame({'Hours': hours.round(1), 'Attendance': attendance.round(1), 'Marks': marks.round(1)})
df['Att_Category'] = pd.cut(df['Attendance'], bins=[0, 60, 80, 100], labels=['Low', 'Medium', 'High'])

fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('Student Performance Dashboard', fontsize=16)

axes[0][0].hist(df['Marks'], bins=20, color='#a855f7', edgecolor='black')
axes[0][0].set_title('Marks Distribution')
axes[0][0].set_xlabel('Marks')

axes[0][1].scatter(df['Hours'], df['Marks'], alpha=0.5, c='#06b6d4', s=30)
axes[0][1].set_title('Study Hours vs Marks')
axes[0][1].set_xlabel('Hours Studied')
axes[0][1].set_ylabel('Marks')

avg_by_att = df.groupby('Att_Category')['Marks'].mean()
axes[1][0].bar(avg_by_att.index.astype(str), avg_by_att.values, color=['#ef4444', '#f59e0b', '#22c55e'])
axes[1][0].set_title('Avg Marks by Attendance')

axes[1][1].boxplot(df['Marks'])
axes[1][1].set_title('Marks Box Plot')

plt.tight_layout()
plt.savefig('dashboard.png', dpi=100)
print('Dashboard saved as dashboard.png')

Challenge 5: Matrix Operations for ML

Hard
Implement the following using NumPy: (1) Create a 3x3 matrix A and a 3x1 vector b. (2) Compute A transposed. (3) Compute A @ b (matrix-vector multiplication). (4) Compute the inverse of A using np.linalg.inv(). (5) Verify that A @ A_inv equals the identity matrix (use np.allclose).
Sample Input
A = [[2, 1, 0], [1, 3, 1], [0, 1, 2]], b = [[1], [2], [3]]
Sample Output
Transpose, product, inverse, and identity verification
Use NumPy's linalg module. Round results to 2 decimal places.
import numpy as np

A = np.array([[2, 1, 0], [1, 3, 1], [0, 1, 2]])
b = np.array([[1], [2], [3]])

print('Matrix A:\n', A)
print('\nA transposed:\n', A.T)
print('\nA @ b:\n', A @ b)

A_inv = np.linalg.inv(A)
print('\nA inverse:\n', np.round(A_inv, 2))

identity = A @ A_inv
print('\nA @ A_inv:\n', np.round(identity, 2))
print('\nIs identity matrix?', np.allclose(identity, np.eye(3)))

Challenge 6: Complete Data Analysis: CSV-like Data Pipeline

Hard
Simulate reading a dataset: create a DataFrame with 50 employee records (Name, Department, Salary with some NaN, Experience). Perform a complete analysis: (1) show shape and info, (2) handle missing salaries with department-wise median, (3) add 'Salary_Level' column (Low/Medium/High), (4) groupby department and show average salary and count, (5) find the top 3 highest-paid employees.
Sample Input
50 synthetic employee records with some missing salaries
Sample Output
Complete analysis report with cleaned data and insights
Use Pandas for all operations. Make the output readable.
import pandas as pd
import numpy as np

np.random.seed(42)
n = 50
depts = np.random.choice(['Engineering', 'Marketing', 'Sales', 'HR'], n)
salaries = np.where(depts == 'Engineering', np.random.normal(90000, 15000, n),
           np.where(depts == 'Marketing', np.random.normal(70000, 10000, n),
           np.where(depts == 'Sales', np.random.normal(60000, 12000, n),
           np.random.normal(55000, 8000, n))))

df = pd.DataFrame({
    'Name': [f'Employee_{i}' for i in range(1, n+1)],
    'Department': depts,
    'Salary': salaries.round(0),
    'Experience': np.random.randint(1, 20, n)
})
df.loc[np.random.choice(n, 8, replace=False), 'Salary'] = np.nan

print(f'Shape: {df.shape}')
print(f'Missing salaries: {df["Salary"].isnull().sum()}')

df['Salary'] = df.groupby('Department')['Salary'].transform(lambda x: x.fillna(x.median()))
print(f'Missing after fill: {df["Salary"].isnull().sum()}')

df['Salary_Level'] = pd.cut(df['Salary'], bins=[0, 55000, 80000, float('inf')], labels=['Low', 'Medium', 'High'])

print('\nDepartment Summary:')
print(df.groupby('Department').agg(
    Avg_Salary=('Salary', 'mean'),
    Count=('Salary', 'count')
).round(0))

print('\nTop 3 Highest Paid:')
print(df.nlargest(3, 'Salary')[['Name', 'Department', 'Salary']].to_string(index=False))

Need to Review the Concepts?

Go back to the detailed notes for this chapter.

Read Chapter Notes

Want to learn AI and ML with a live mentor?

Explore our AI/ML Masterclass