Practice Questions — Python for AI - NumPy, Pandas, and Matplotlib
← Back to NotesTopic-Specific Questions
Question 1
Easy
What is the output?
import numpy as np
arr = np.array([1, 2, 3]) * 2
print(arr)NumPy performs element-wise multiplication.
[2 4 6]Question 2
Easy
What is the output?
import numpy as np
print(np.zeros(3))
print(np.ones((2, 2)))zeros creates an array of zeros, ones creates an array of ones.
[0. 0. 0.][[1. 1.]
[1. 1.]]Question 3
Easy
What is the output?
import numpy as np
arr = np.array([10, 20, 30, 40, 50])
print(arr[1:4])NumPy slicing works like Python list slicing: start (inclusive) to end (exclusive).
[20 30 40]Question 4
Easy
Write code to create a NumPy array of numbers from 0 to 9 and compute the sum, mean, and standard deviation.
Use np.arange(10) to create the array, then np.sum(), np.mean(), np.std().
import numpy as np
arr = np.arange(10)
print(f"Array: {arr}")
print(f"Sum: {np.sum(arr)}")
print(f"Mean: {np.mean(arr)}")
print(f"Std: {np.std(arr):.2f}")
Output: Sum: 45, Mean: 4.5, Std: 2.87Question 5
Medium
What is the output?
import numpy as np
arr = np.array([10, 25, 5, 40, 15])
print(arr[arr > 12])This is boolean indexing -- it returns elements where the condition is True.
[25 40 15]Question 6
Medium
What is the output?
import numpy as np
arr = np.arange(6).reshape(2, 3)
print(arr)
print(arr.shape)reshape(2, 3) turns a 1D array of 6 elements into a 2x3 matrix.
[[0 1 2]
[3 4 5]](2, 3)Question 7
Medium
Write code to compute the dot product of vectors [1, 2, 3] and [4, 5, 6] using NumPy. What is the mathematical calculation?
Dot product = (1*4) + (2*5) + (3*6). Use np.dot().
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = np.dot(a, b)
print(f"Dot product: {result}")
print(f"Calculation: (1*4) + (2*5) + (3*6) = {1*4 + 2*5 + 3*6}")
Output: Dot product: 32Question 8
Easy
Write Pandas code to create a DataFrame with 3 students (Name, Marks) and print only the student with the highest marks.
Use df[df['Marks'] == df['Marks'].max()] or df.loc[df['Marks'].idxmax()].
import pandas as pd
df = pd.DataFrame({
'Name': ['Aarav', 'Priya', 'Rohan'],
'Marks': [85, 92, 78]
})
top_student = df.loc[df['Marks'].idxmax()]
print(f"Top student: {top_student['Name']} with {top_student['Marks']} marks")
Output: Top student: Priya with 92 marksQuestion 9
Medium
Write Pandas code to filter students with marks greater than 80 from this DataFrame: Name=['Aarav','Priya','Rohan','Ananya'], Marks=[85, 42, 91, 38].
Use df[df['Marks'] > 80].
import pandas as pd
df = pd.DataFrame({
'Name': ['Aarav', 'Priya', 'Rohan', 'Ananya'],
'Marks': [85, 42, 91, 38]
})
high_scorers = df[df['Marks'] > 80]
print(high_scorers)
Output:
Name Marks
0 Aarav 85
2 Rohan 91Question 10
Medium
Write code to find and fill missing values in a Pandas DataFrame. Create a DataFrame where the 'Salary' column has two NaN values.
Use np.nan to create missing values, isnull().sum() to detect, and fillna() to fill.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Employee': ['Aarav', 'Priya', 'Rohan', 'Ananya'],
'Salary': [50000, np.nan, 60000, np.nan]
})
print("Before:")
print(df)
print(f"Missing values: {df['Salary'].isnull().sum()}")
df['Salary'] = df['Salary'].fillna(df['Salary'].median())
print("\nAfter filling with median:")
print(df)Question 11
Medium
What is the output?
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
print(df.iloc[0:2])iloc uses integer position indexing. 0:2 means positions 0 and 1.
A B
0 1 4
1 2 5Question 12
Hard
Write code to group students by their department and calculate the average marks for each department.
Use df.groupby('Department')['Marks'].mean().
import pandas as pd
df = pd.DataFrame({
'Name': ['Aarav', 'Priya', 'Rohan', 'Ananya', 'Vikram', 'Meera'],
'Department': ['CSE', 'ECE', 'CSE', 'IT', 'ECE', 'CSE'],
'Marks': [85, 92, 78, 88, 65, 95]
})
avg_by_dept = df.groupby('Department')['Marks'].mean().round(1)
print("Average marks by department:")
print(avg_by_dept)
Output: CSE 86.0, ECE 78.5, IT 88.0Question 13
Hard
What is the output?
import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
print(A @ B)Matrix multiplication: row of A dot column of B.
[[19 22]
[43 50]]Question 14
Hard
Write NumPy code to generate 1000 random numbers from a normal distribution with mean=50 and std=10, then find what percentage of values fall between 40 and 60.
Use np.random.normal(). Count values between 40 and 60 using boolean indexing.
import numpy as np
np.random.seed(42)
data = np.random.normal(loc=50, scale=10, size=1000)
in_range = np.sum((data >= 40) & (data <= 60))
pct = in_range / len(data) * 100
print(f"Values between 40-60: {in_range} out of 1000")
print(f"Percentage: {pct:.1f}%")Question 15
Easy
What is the output?
import numpy as np
arr = np.array([5, 10, 15, 20])
print(np.sum(arr))
print(np.mean(arr))Sum adds all elements. Mean is sum divided by count.
5012.5Question 16
Medium
Write Pandas code to add a new column 'Grade' to a DataFrame based on marks: A if marks >= 90, B if >= 75, C if >= 60, else F.
Use df['Marks'].apply(lambda x: ...) with conditional logic.
import pandas as pd
df = pd.DataFrame({
'Name': ['Aarav', 'Priya', 'Rohan', 'Ananya'],
'Marks': [95, 78, 62, 45]
})
def assign_grade(marks):
if marks >= 90: return 'A'
elif marks >= 75: return 'B'
elif marks >= 60: return 'C'
else: return 'F'
df['Grade'] = df['Marks'].apply(assign_grade)
print(df)Question 17
Hard
What is the output?
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(np.sum(arr, axis=0))
print(np.sum(arr, axis=1))axis=0 sums along columns (down). axis=1 sums along rows (across).
[5 7 9][ 6 15]Question 18
Hard
Write code to merge two DataFrames: one with student names and IDs, another with IDs and marks. Join them on the 'ID' column.
Use pd.merge(df1, df2, on='ID').
import pandas as pd
students = pd.DataFrame({
'ID': [101, 102, 103],
'Name': ['Aarav', 'Priya', 'Rohan']
})
marks = pd.DataFrame({
'ID': [101, 102, 103],
'Marks': [85, 92, 78]
})
result = pd.merge(students, marks, on='ID')
print(result)
Output:
ID Name Marks
0 101 Aarav 85
1 102 Priya 92
2 103 Rohan 78Question 19
Medium
Write Matplotlib code to create a bar chart showing marks of 4 students with student names on x-axis, marks on y-axis, a title, and labels.
Use plt.bar(names, marks), plt.title(), plt.xlabel(), plt.ylabel().
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
names = ['Aarav', 'Priya', 'Rohan', 'Ananya']
marks = [85, 92, 78, 88]
plt.figure(figsize=(8, 5))
plt.bar(names, marks, color=['#a855f7', '#06b6d4', '#f59e0b', '#22c55e'])
plt.title('Student Marks Comparison')
plt.xlabel('Students')
plt.ylabel('Marks')
plt.savefig('bar_chart.png')
print("Bar chart saved")Question 20
Easy
What is the output?
import pandas as pd
df = pd.DataFrame({'Name': ['Aarav', 'Priya', 'Rohan'], 'Age': [20, 22, 21]})
print(df.shape)shape returns (rows, columns).
(3, 2)Mixed & Application Questions
Question 1
Easy
What is the output?
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr + arr)Adding two arrays performs element-wise addition.
[2 4 6 8]Question 2
Easy
Write code to create a Pandas Series from a list [10, 20, 30, 40] with custom index ['a', 'b', 'c', 'd'] and print the element at index 'c'.
Use pd.Series(data, index=custom_index).
import pandas as pd
s = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
print(s)
print(f"\nElement at 'c': {s['c']}")
Output: Element at 'c': 30Question 3
Medium
What is the output?
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9])
print(np.max(arr))
print(np.argmax(arr))max returns the maximum value, argmax returns the INDEX of the maximum.
95Question 4
Medium
What is the output?
import pandas as pd
df = pd.DataFrame({'X': [1, 2, 3], 'Y': [4, 5, 6]})
print(df['X'].sum())
print(df.sum())Sum on a column gives a scalar. Sum on a DataFrame gives column-wise sums.
6X 6
Y 15
dtype: int64Question 5
Medium
Write code to create a 3x3 identity matrix using NumPy and verify that multiplying it with any vector gives the same vector.
Use np.eye(3) for the identity matrix. Multiply with np.dot().
import numpy as np
I = np.eye(3)
print("Identity matrix:\n", I)
v = np.array([5, 10, 15])
result = I @ v
print(f"\nI @ {v} = {result}")
print(f"Same as original? {np.array_equal(v, result)}")Question 6
Medium
Write Pandas code to read a DataFrame with some missing values and create a summary showing: total rows, missing values per column, and percentage missing per column.
Use isnull().sum() and divide by len(df) * 100.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Age': [25, np.nan, 30, np.nan, 28],
'Salary': [50000, 60000, np.nan, 70000, 55000],
'City': ['Delhi', None, 'Mumbai', 'Pune', None]
})
print(f"Total rows: {len(df)}")
print(f"\nMissing values:")
for col in df.columns:
missing = df[col].isnull().sum()
pct = missing / len(df) * 100
print(f" {col}: {missing} ({pct:.0f}%)")Question 7
Hard
What is the output?
import numpy as np
arr = np.array([[10, 20, 30], [40, 50, 60]])
print(arr.T)
print(arr.T.shape)T transposes the matrix (swaps rows and columns).
[[10 40]
[20 50]
[30 60]](3, 2)Question 8
Hard
Write a complete mini analysis: create a DataFrame with 5 students and their marks in 3 subjects, compute total and average marks, sort by average descending, and display the result.
Use df['Total'] = df[['Math', 'Science', 'English']].sum(axis=1) for row-wise sum.
import pandas as pd
df = pd.DataFrame({
'Name': ['Aarav', 'Priya', 'Rohan', 'Ananya', 'Vikram'],
'Math': [85, 92, 78, 88, 65],
'Science': [90, 88, 72, 95, 70],
'English': [78, 95, 80, 82, 75]
})
df['Total'] = df[['Math', 'Science', 'English']].sum(axis=1)
df['Average'] = df['Total'] / 3
df = df.sort_values('Average', ascending=False)
print(df.round(1).to_string(index=False))Question 9
Easy
Write NumPy code to create a random array of 5 integers between 1 and 100 and print the minimum and maximum values.
Use np.random.randint(1, 101, size=5).
import numpy as np
np.random.seed(42)
arr = np.random.randint(1, 101, size=5)
print(f"Array: {arr}")
print(f"Min: {np.min(arr)}")
print(f"Max: {np.max(arr)}")Question 10
Hard
What is the output?
import pandas as pd
df = pd.DataFrame({'A': [1, 1, 2, 2], 'B': [10, 20, 30, 40]})
result = df.groupby('A')['B'].agg(['sum', 'mean', 'count'])
print(result)GroupBy groups rows by column A, then applies sum, mean, and count to column B.
sum mean count
A
1 30 15.0 2
2 70 35.0 2Question 11
Hard
Write code to normalize an array of marks to the range [0, 1] using Min-Max scaling: (x - min) / (max - min).
Find min and max of the array, then apply the formula element-wise.
import numpy as np
marks = np.array([30, 45, 78, 92, 55, 67, 88, 41])
min_val = np.min(marks)
max_val = np.max(marks)
normalized = (marks - min_val) / (max_val - min_val)
print(f"Original marks: {marks}")
print(f"Min: {min_val}, Max: {max_val}")
print(f"Normalized (0-1): {np.round(normalized, 3)}")
print(f"Min normalized: {normalized.min():.1f}, Max normalized: {normalized.max():.1f}")Question 12
Medium
What is the output?
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': ['x', 'y', 'z']})
print(df.dtypes)Pandas infers data types automatically. Numbers become int64, strings become object.
A int64
B object
dtype: objectMultiple Choice Questions
MCQ 1
What does np.array([1, 2, 3]) * 3 produce?
Answer: B
B is correct. NumPy performs element-wise multiplication: each element is multiplied by 3. This is vectorization -- no loops needed. Option A would be the result if this were a Python list (
B is correct. NumPy performs element-wise multiplication: each element is multiplied by 3. This is vectorization -- no loops needed. Option A would be the result if this were a Python list (
[1,2,3] * 3 repeats the list).MCQ 2
What does df.head() do in Pandas?
Answer: B
B is correct.
B is correct.
df.head() returns the first 5 rows by default. You can pass a number like df.head(10) to see more. df.columns shows column names. df.dtypes shows data types. df.describe() shows statistics.MCQ 3
Which NumPy function creates an array of evenly spaced values between two endpoints?
Answer: B
B is correct.
B is correct.
np.linspace(0, 1, 5) creates 5 evenly spaced values between 0 and 1: [0, 0.25, 0.5, 0.75, 1]. np.arange() uses a step size (not a count). np.zeros() creates an array of zeros. np.random.rand() creates random values.MCQ 4
What is the output of np.arange(0, 10, 2)?
Answer: B
B is correct.
B is correct.
np.arange(start, stop, step) creates values from 0 to 10 (exclusive) with step 2. The values are [0, 2, 4, 6, 8]. The stop value (10) is excluded, just like Python's range().MCQ 5
In Pandas, what is the difference between loc and iloc?
Answer: B
B is correct.
B is correct.
loc selects data by label (e.g., df.loc['row_name', 'col_name']). iloc selects by integer position (e.g., df.iloc[0, 1]). Important: loc slicing is inclusive on both ends, while iloc slicing excludes the end (like Python).MCQ 6
What does arr.reshape(-1, 1) do to a 1D NumPy array with 5 elements?
Answer: B
B is correct.
B is correct.
reshape(-1, 1) creates a 2D array with 1 column. The -1 means 'calculate the number of rows automatically' (5 elements / 1 column = 5 rows). This gives shape (5, 1), which is what scikit-learn expects for single-feature input.MCQ 7
What does df.isnull().sum() return for a Pandas DataFrame?
Answer: B
B is correct.
B is correct.
df.isnull() creates a boolean DataFrame (True where values are missing). .sum() sums each column (True counts as 1, False as 0), giving the count of missing values per column. To get the total, use df.isnull().sum().sum().MCQ 8
What is the correct way to filter a Pandas DataFrame for rows where age > 20 AND marks > 80?
Answer: B
B is correct. In Pandas, you must use
B is correct. In Pandas, you must use
& (not and) for element-wise AND, and each condition must be in parentheses. Option A uses Python's and which does not work with Series. Option C has incorrect precedence without parentheses.MCQ 9
What type of plot is best for showing the distribution of a single numerical variable?
Answer: C
C is correct. Histograms show the frequency distribution of a single variable by dividing values into bins and counting occurrences. Scatter plots show relationships between two variables. Bar charts compare categories. Pie charts show proportions of a whole.
C is correct. Histograms show the frequency distribution of a single variable by dividing values into bins and counting occurrences. Scatter plots show relationships between two variables. Bar charts compare categories. Pie charts show proportions of a whole.
MCQ 10
What is broadcasting in NumPy?
Answer: B
B is correct. Broadcasting is NumPy's mechanism for performing operations between arrays of different shapes. When you write
B is correct. Broadcasting is NumPy's mechanism for performing operations between arrays of different shapes. When you write
np.array([1, 2, 3]) + 5, NumPy broadcasts the scalar 5 to match the array shape, effectively adding [5, 5, 5]. This also works between differently shaped arrays following specific rules.MCQ 11
What does df.groupby('Department')['Salary'].mean() return?
Answer: B
B is correct.
B is correct.
groupby('Department') groups the rows by unique values in the Department column. ['Salary'].mean() then computes the average salary within each group. The result is a Series indexed by department names with mean salaries as values.MCQ 12
What is the shape of np.array([[1, 2, 3], [4, 5, 6]]).T?
Answer: B
B is correct. The original array has shape (2, 3) -- 2 rows and 3 columns. Transposing (
B is correct. The original array has shape (2, 3) -- 2 rows and 3 columns. Transposing (
.T) swaps rows and columns, giving shape (3, 2) -- 3 rows and 2 columns. Transpose is essential in ML for operations like computing the normal equation in linear regression: (X.T @ X)^(-1) @ X.T @ y.MCQ 13
What does pd.read_csv('data.csv') do?
Answer: B
B is correct.
B is correct.
pd.read_csv() reads a CSV (Comma Separated Values) file and returns a Pandas DataFrame. This is the most common way to load data in ML projects. Pandas also supports read_excel(), read_json(), and read_sql() for other formats.MCQ 14
What is the output of np.dot(np.array([1, 2, 3]), np.array([4, 5, 6]))?
Answer: B
B is correct. The dot product of two 1D arrays is a scalar: (1*4) + (2*5) + (3*6) = 4 + 10 + 18 = 32. Option A would be element-wise multiplication (
B is correct. The dot product of two 1D arrays is a scalar: (1*4) + (2*5) + (3*6) = 4 + 10 + 18 = 32. Option A would be element-wise multiplication (
arr1 * arr2). The dot product is a single number that measures the similarity between two vectors.MCQ 15
Which Matplotlib function creates a scatter plot?
Answer: C
C is correct.
C is correct.
plt.scatter(x, y) creates a scatter plot showing individual data points. plt.plot() creates line charts. plt.bar() creates bar charts. plt.hist() creates histograms. Scatter plots are essential in ML for visualizing relationships between features.Coding Challenges
Challenge 1: NumPy Statistics Calculator
EasyCreate a NumPy array of exam marks: [72, 85, 90, 65, 78, 92, 55, 88, 76, 81]. Calculate and print: count, sum, mean, median, standard deviation, minimum, maximum, and range (max - min).
Sample Input
marks = [72, 85, 90, 65, 78, 92, 55, 88, 76, 81]
Sample Output
Count: 10
Sum: 782
Mean: 78.2
Median: 79.5
Std: 10.79
Min: 55
Max: 92
Range: 37
Use NumPy functions only. Do not use Python built-in functions.
import numpy as np
marks = np.array([72, 85, 90, 65, 78, 92, 55, 88, 76, 81])
print(f"Count: {len(marks)}")
print(f"Sum: {np.sum(marks)}")
print(f"Mean: {np.mean(marks)}")
print(f"Median: {np.median(marks)}")
print(f"Std: {np.std(marks):.2f}")
print(f"Min: {np.min(marks)}")
print(f"Max: {np.max(marks)}")
print(f"Range: {np.max(marks) - np.min(marks)}")Challenge 2: Pandas Student Report Card Generator
MediumCreate a DataFrame with 6 students, their marks in Math, Science, and English. Add columns for Total, Average, and Grade (A: avg >= 85, B: >= 70, C: >= 55, F: below 55). Sort by Average descending and print the report.
Sample Input
Students: Aarav, Priya, Rohan, Ananya, Vikram, Meera
Sample Output
Complete report card sorted by average marks
Use Pandas operations. No manual calculations.
import pandas as pd
df = pd.DataFrame({
'Name': ['Aarav', 'Priya', 'Rohan', 'Ananya', 'Vikram', 'Meera'],
'Math': [85, 92, 55, 88, 45, 95],
'Science': [90, 88, 60, 95, 50, 91],
'English': [78, 95, 52, 82, 48, 88]
})
subjects = ['Math', 'Science', 'English']
df['Total'] = df[subjects].sum(axis=1)
df['Average'] = (df['Total'] / 3).round(1)
def grade(avg):
if avg >= 85: return 'A'
if avg >= 70: return 'B'
if avg >= 55: return 'C'
return 'F'
df['Grade'] = df['Average'].apply(grade)
df = df.sort_values('Average', ascending=False)
print(df.to_string(index=False))Challenge 3: Missing Data Handler
MediumCreate a DataFrame with intentional missing values (use np.nan). Write a function that: (1) reports missing values per column, (2) fills numerical columns with median, (3) fills categorical columns with mode, (4) verifies no missing values remain.
Sample Input
DataFrame with Name, Age (2 NaN), Salary (1 NaN), City (1 None)
Sample Output
Missing values report, filled DataFrame, verification
Handle numerical and categorical columns differently.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Name': ['Aarav', 'Priya', 'Rohan', 'Ananya', 'Vikram'],
'Age': [25, np.nan, 30, np.nan, 28],
'Salary': [50000, 60000, np.nan, 70000, 55000],
'City': ['Delhi', None, 'Mumbai', 'Pune', None]
})
print('Before cleaning:')
print(df)
print(f'\nMissing values:\n{df.isnull().sum()}')
for col in df.select_dtypes(include='number').columns:
df[col] = df[col].fillna(df[col].median())
for col in df.select_dtypes(include='object').columns:
df[col] = df[col].fillna(df[col].mode()[0])
print('\nAfter cleaning:')
print(df)
print(f'\nMissing values remaining: {df.isnull().sum().sum()}')Challenge 4: Data Visualization Dashboard
HardGenerate synthetic data for 200 students with: hours_studied (1-10), attendance_pct (40-100), and marks (correlated with hours). Create a 2x2 dashboard with: (1) histogram of marks, (2) scatter plot of hours vs marks, (3) bar chart of average marks by attendance category (Low/Medium/High), (4) box plot of marks. Save as 'dashboard.png'.
Sample Input
np.random.seed(42), 200 synthetic student records
Sample Output
dashboard.png saved with 4 subplots
Use NumPy for data generation, Pandas for manipulation, Matplotlib for plotting.
import numpy as np
import pandas as pd
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
np.random.seed(42)
n = 200
hours = np.random.uniform(1, 10, n)
attendance = np.random.uniform(40, 100, n)
marks = 30 + 5 * hours + np.random.normal(0, 8, n)
marks = np.clip(marks, 0, 100)
df = pd.DataFrame({'Hours': hours.round(1), 'Attendance': attendance.round(1), 'Marks': marks.round(1)})
df['Att_Category'] = pd.cut(df['Attendance'], bins=[0, 60, 80, 100], labels=['Low', 'Medium', 'High'])
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('Student Performance Dashboard', fontsize=16)
axes[0][0].hist(df['Marks'], bins=20, color='#a855f7', edgecolor='black')
axes[0][0].set_title('Marks Distribution')
axes[0][0].set_xlabel('Marks')
axes[0][1].scatter(df['Hours'], df['Marks'], alpha=0.5, c='#06b6d4', s=30)
axes[0][1].set_title('Study Hours vs Marks')
axes[0][1].set_xlabel('Hours Studied')
axes[0][1].set_ylabel('Marks')
avg_by_att = df.groupby('Att_Category')['Marks'].mean()
axes[1][0].bar(avg_by_att.index.astype(str), avg_by_att.values, color=['#ef4444', '#f59e0b', '#22c55e'])
axes[1][0].set_title('Avg Marks by Attendance')
axes[1][1].boxplot(df['Marks'])
axes[1][1].set_title('Marks Box Plot')
plt.tight_layout()
plt.savefig('dashboard.png', dpi=100)
print('Dashboard saved as dashboard.png')Challenge 5: Matrix Operations for ML
HardImplement the following using NumPy: (1) Create a 3x3 matrix A and a 3x1 vector b. (2) Compute A transposed. (3) Compute A @ b (matrix-vector multiplication). (4) Compute the inverse of A using np.linalg.inv(). (5) Verify that A @ A_inv equals the identity matrix (use np.allclose).
Sample Input
A = [[2, 1, 0], [1, 3, 1], [0, 1, 2]], b = [[1], [2], [3]]
Sample Output
Transpose, product, inverse, and identity verification
Use NumPy's linalg module. Round results to 2 decimal places.
import numpy as np
A = np.array([[2, 1, 0], [1, 3, 1], [0, 1, 2]])
b = np.array([[1], [2], [3]])
print('Matrix A:\n', A)
print('\nA transposed:\n', A.T)
print('\nA @ b:\n', A @ b)
A_inv = np.linalg.inv(A)
print('\nA inverse:\n', np.round(A_inv, 2))
identity = A @ A_inv
print('\nA @ A_inv:\n', np.round(identity, 2))
print('\nIs identity matrix?', np.allclose(identity, np.eye(3)))Challenge 6: Complete Data Analysis: CSV-like Data Pipeline
HardSimulate reading a dataset: create a DataFrame with 50 employee records (Name, Department, Salary with some NaN, Experience). Perform a complete analysis: (1) show shape and info, (2) handle missing salaries with department-wise median, (3) add 'Salary_Level' column (Low/Medium/High), (4) groupby department and show average salary and count, (5) find the top 3 highest-paid employees.
Sample Input
50 synthetic employee records with some missing salaries
Sample Output
Complete analysis report with cleaned data and insights
Use Pandas for all operations. Make the output readable.
import pandas as pd
import numpy as np
np.random.seed(42)
n = 50
depts = np.random.choice(['Engineering', 'Marketing', 'Sales', 'HR'], n)
salaries = np.where(depts == 'Engineering', np.random.normal(90000, 15000, n),
np.where(depts == 'Marketing', np.random.normal(70000, 10000, n),
np.where(depts == 'Sales', np.random.normal(60000, 12000, n),
np.random.normal(55000, 8000, n))))
df = pd.DataFrame({
'Name': [f'Employee_{i}' for i in range(1, n+1)],
'Department': depts,
'Salary': salaries.round(0),
'Experience': np.random.randint(1, 20, n)
})
df.loc[np.random.choice(n, 8, replace=False), 'Salary'] = np.nan
print(f'Shape: {df.shape}')
print(f'Missing salaries: {df["Salary"].isnull().sum()}')
df['Salary'] = df.groupby('Department')['Salary'].transform(lambda x: x.fillna(x.median()))
print(f'Missing after fill: {df["Salary"].isnull().sum()}')
df['Salary_Level'] = pd.cut(df['Salary'], bins=[0, 55000, 80000, float('inf')], labels=['Low', 'Medium', 'High'])
print('\nDepartment Summary:')
print(df.groupby('Department').agg(
Avg_Salary=('Salary', 'mean'),
Count=('Salary', 'count')
).round(0))
print('\nTop 3 Highest Paid:')
print(df.nlargest(3, 'Salary')[['Name', 'Department', 'Salary']].to_string(index=False))Need to Review the Concepts?
Go back to the detailed notes for this chapter.
Read Chapter NotesWant to learn AI and ML with a live mentor?
Explore our AI/ML Masterclass