Chapter 25 Advanced 50 Questions

Practice Questions — AI Ethics, Responsible AI, and Career Roadmap

← Back to Notes
9 Easy
12 Medium
9 Hard

Topic-Specific Questions

Question 1
Easy
What is the output of the following code?
bias_types = ["Historical", "Representation", "Measurement", "Labeling"]
for i, bias in enumerate(bias_types, 1):
    print(f"{i}. {bias} bias")
enumerate with start=1 gives 1-based indexing.
1. Historical bias
2. Representation bias
3. Measurement bias
4. Labeling bias
Question 2
Easy
What is the output?
roles = {
    "Data Analyst": "4-8 LPA",
    "Data Scientist": "8-20 LPA",
    "ML Engineer": "12-30 LPA",
    "AI Researcher": "20-50+ LPA"
}
for role, salary in roles.items():
    print(f"{role:20s}: {salary}")
Dictionary iteration produces key-value pairs.
Data Analyst : 4-8 LPA
Data Scientist : 8-20 LPA
ML Engineer : 12-30 LPA
AI Researcher : 20-50+ LPA
Question 3
Easy
What is the output?
principles = ["Transparency", "Accountability", "Privacy", "Safety", "Fairness"]
print(f"Responsible AI has {len(principles)} key principles:")
for p in principles:
    print(f"  - {p}")
len() counts 5 principles. Each is printed with a dash prefix.
Responsible AI has 5 key principles:
- Transparency
- Accountability
- Privacy
- Safety
- Fairness
Question 4
Medium
What is the output?
def disparate_impact_ratio(group_a_rate, group_b_rate):
    ratio = min(group_a_rate, group_b_rate) / max(group_a_rate, group_b_rate)
    return round(ratio, 3)

male_rate = 0.75
female_rate = 0.55

ratio = disparate_impact_ratio(male_rate, female_rate)
print(f"Male approval rate: {male_rate}")
print(f"Female approval rate: {female_rate}")
print(f"Disparate impact ratio: {ratio}")
print(f"Passes 80% rule: {ratio >= 0.8}")
Disparate impact = min_rate / max_rate. The 80% rule says this should be >= 0.8.
Male approval rate: 0.75
Female approval rate: 0.55
Disparate impact ratio: 0.733
Passes 80% rule: False
Question 5
Medium
What is the output?
eu_ai_risk_levels = [
    ("Unacceptable", "Banned", "Social scoring"),
    ("High", "Regulated", "Credit scoring, hiring"),
    ("Limited", "Disclosure", "Chatbots, deepfakes"),
    ("Minimal", "No regulation", "Games, spam filters")
]

print("EU AI Act Risk Classification:")
for level, requirement, examples in eu_ai_risk_levels:
    print(f"  {level:15s} | {requirement:15s} | {examples}")
Four risk levels in the EU AI Act, from banned to unregulated.
EU AI Act Risk Classification:
Unacceptable | Banned | Social scoring
High | Regulated | Credit scoring, hiring
Limited | Disclosure | Chatbots, deepfakes
Minimal | No regulation | Games, spam filters
Question 6
Medium
What is the output?
interview_rounds = {
    "ML Theory": ["Bias-variance", "Regularization", "Gradient descent"],
    "Coding": ["Python", "Data structures", "ML from scratch"],
    "System Design": ["Recommendation", "Fraud detection", "Search ranking"],
    "Behavioral": ["Past projects", "Teamwork", "Ethics"]
}

for round_name, topics in interview_rounds.items():
    print(f"{round_name}: {', '.join(topics)}")
print(f"\nTotal rounds: {len(interview_rounds)}")
Dictionary with 4 keys, each having a list of 3 topics.
ML Theory: Bias-variance, Regularization, Gradient descent
Coding: Python, Data structures, ML from scratch
System Design: Recommendation, Fraud detection, Search ranking
Behavioral: Past projects, Teamwork, Ethics

Total rounds: 4
Question 7
Easy
What is proxy discrimination in AI, and why is removing protected attributes from the model not sufficient to prevent bias?
Think about features that correlate with protected attributes.
Proxy discrimination occurs when a model uses features that correlate with protected attributes (race, gender, religion) to make biased predictions, even though the protected attributes are not direct inputs. For example, zip code can proxy for race (due to residential segregation), income can proxy for gender (due to the pay gap), and name patterns can proxy for ethnicity. Simply removing protected attributes from the feature set does not eliminate these proxy relationships. The model can still learn biased patterns through the remaining correlated features. The solution requires actively testing outcomes across protected groups and using fairness-aware training methods.
Question 8
Medium
What is the difference between LIME and SHAP for model explainability? When would Priya use one over the other?
Think about speed, theoretical foundation, and type of explanation.
LIME creates a local linear approximation around a specific prediction by perturbing the input and observing how predictions change. It is fast, simple to understand, and works with any model. However, it can be unstable (different runs may give different explanations) and the local approximation may not capture complex non-linear boundaries. SHAP uses Shapley values from game theory to assign each feature a fair contribution to the prediction. It is theoretically grounded, consistent, and provides both local and global explanations. However, it is slower, especially for non-tree models. Priya should use SHAP when she needs rigorous, consistent explanations (regulatory compliance, audits). She should use LIME for quick, approximate explanations during development or when SHAP is too slow.
Question 9
Medium
Describe the AI alignment problem. Why is it considered one of the most important challenges in AI safety?
Think about the gap between what we tell AI to do and what we actually want.
The alignment problem is the challenge of ensuring AI systems pursue goals that are aligned with human intentions and values. As AI becomes more capable, misalignment becomes more dangerous. Key issues: (1) Reward hacking: AI finds unintended ways to maximize its reward (e.g., a cleaning robot puts trash in a closet instead of removing it -- technically "clean" by the metric). (2) Specification gaming: The objective function does not capture the full intent (e.g., optimizing for clicks produces clickbait). (3) Value encoding: Human values are complex, contextual, and often contradictory -- encoding them mathematically is extremely difficult. RLHF is one approach: train a reward model from human preferences, then optimize the AI to maximize this learned reward. But RLHF itself has limitations (reward model may not generalize to new situations).
Question 10
Hard
The impossibility theorem states that it is impossible to satisfy all fairness criteria simultaneously. Explain why, and how Arjun should choose which fairness metric to optimize for a loan approval system.
Think about the mathematical constraints between demographic parity, equal opportunity, and equalized odds.
The impossibility theorem (Chouldechova, 2017; Kleinberg et al., 2016) proves that Demographic Parity, Equal Opportunity, and Predictive Parity cannot all be achieved simultaneously except when either the base rates are equal across groups or the classifier is perfect. In real-world lending, different demographic groups have different historical default rates (due to systemic inequality), so the base rates differ. For Arjun's loan system, the choice depends on the goal: Equal Opportunity (equal TPR) is most appropriate because it ensures that qualified applicants from all groups have the same chance of approval -- it equalizes opportunity for those who would actually repay the loan, without requiring equal approval rates for groups with different qualifications. Demographic Parity would require approving the same percentage from each group, potentially approving unqualified applicants or rejecting qualified ones just to match rates.
Question 11
Hard
What is the output?
import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def logistic_loss(y_true, y_pred):
    epsilon = 1e-8
    return -np.mean(
        y_true * np.log(y_pred + epsilon) +
        (1 - y_true) * np.log(1 - y_pred + epsilon)
    )

# Interview question: compute loss for perfect and imperfect predictions
y_true = np.array([1, 0, 1, 0])

# Perfect predictions
y_perfect = np.array([0.99, 0.01, 0.99, 0.01])
loss_perfect = logistic_loss(y_true, y_perfect)

# Imperfect predictions
y_imperfect = np.array([0.6, 0.4, 0.7, 0.3])
loss_imperfect = logistic_loss(y_true, y_imperfect)

# Worst predictions (opposite)
y_worst = np.array([0.01, 0.99, 0.01, 0.99])
loss_worst = logistic_loss(y_true, y_worst)

print(f"Perfect:   loss = {loss_perfect:.4f}")
print(f"Imperfect: loss = {loss_imperfect:.4f}")
print(f"Worst:     loss = {loss_worst:.4f}")
print(f"\nPerfect < Imperfect < Worst: {loss_perfect < loss_imperfect < loss_worst}")
Binary cross-entropy: lower loss for predictions closer to true labels.
Perfect: loss = 0.0101
Imperfect: loss = 0.5108 (approximately)
Worst: loss = 4.6052

Perfect < Imperfect < Worst: True
Question 12
Hard
How should Kavitha build a Kaggle portfolio that stands out to potential employers? What differentiates a good Kaggle profile from a great one?
Think beyond just competition rankings.
A good Kaggle profile has a few competition entries. A great Kaggle profile has: (1) Top rankings in 3-5 competitions (top 10-20% shows real skill). (2) Published notebooks with detailed EDA, feature engineering, and model explanations (Kaggle Notebooks Expert/Master status). (3) Discussion contributions showing domain knowledge and helpfulness. (4) Diverse competition types (tabular, NLP, vision, time series) showing breadth. (5) Blog posts explaining competition approaches (on Medium or personal blog). What employers look for: not just rankings, but the process -- how did Kavitha approach the problem, what features did she engineer, how did she iterate, what did she learn from failures? A well-documented silver-medal solution is more impressive than an undocumented gold medal.
Question 13
Easy
What is the output?
fairness_metrics = ["Demographic Parity", "Equal Opportunity", "Equalized Odds", "Individual Fairness"]
for metric in fairness_metrics:
    print(f"  - {metric}")
print(f"Total: {len(fairness_metrics)} (cannot all be satisfied simultaneously)")
4 fairness metrics, with a note about the impossibility theorem.
- Demographic Parity
- Equal Opportunity
- Equalized Odds
- Individual Fairness
Total: 4 (cannot all be satisfied simultaneously)
Question 14
Medium
What is the output?
def salary_range(role, experience_years):
    base = {"Data Analyst": 6, "Data Scientist": 12, "ML Engineer": 18, "AI Researcher": 30}
    b = base[role]
    low = b + experience_years * 1.5
    high = b + experience_years * 3
    return f"{low:.0f}-{high:.0f} LPA"

for role in ["Data Analyst", "Data Scientist", "ML Engineer"]:
    print(f"{role:20s}: {salary_range(role, 3)}")
Calculate salary range based on base + experience multiplier.
Data Analyst : 10-15 LPA
Data Scientist : 17-21 LPA
ML Engineer : 23-27 LPA
Question 15
Hard
What is the difference between individual fairness and group fairness? When might they conflict?
One focuses on treating similar people similarly, the other focuses on equal outcomes for groups.
Group fairness requires statistical equality across demographic groups (e.g., equal approval rates for men and women). Individual fairness requires that similar individuals receive similar predictions regardless of group membership. They can conflict: consider two applicants with identical qualifications except one is from an overrepresented group. Group fairness might require rejecting the overrepresented applicant to balance rates, violating individual fairness (similar people get different outcomes). Conversely, strict individual fairness might perpetuate historical patterns that lead to group-level inequality. The choice depends on context: group fairness is often preferred for systemic change, individual fairness for case-by-case justice.
Question 16
Medium
What is the output?
ml_interview = {
    "ML Theory": 30,
    "Coding": 25,
    "System Design": 25,
    "Behavioral": 20
}
print("Interview Weight Distribution:")
for round_name, weight in ml_interview.items():
    bar = "#" * (weight // 5)
    print(f"  {round_name:15s}: {weight}% {bar}")
print(f"Total: {sum(ml_interview.values())}%")
4 interview rounds with percentage weights totaling 100%.
Interview Weight Distribution:
ML Theory : 30% ######
Coding : 25% #####
System Design : 25% #####
Behavioral : 20% ####
Total: 100%
Question 17
Easy
What is the output?
top_companies = ["Google", "Microsoft", "Amazon", "Flipkart", "PhonePe"]
print(f"Top AI hiring companies in India ({len(top_companies)}):")
for c in top_companies:
    print(f"  - {c}")
5 top companies hiring AI talent in India.
Top AI hiring companies in India (5):
- Google
- Microsoft
- Amazon
- Flipkart
- PhonePe
Question 18
Hard
What is the EU AI Act's approach to regulating generative AI, and how does it affect developers building AI products?
Think about transparency requirements and risk classification.
The EU AI Act classifies generative AI (foundation models) under specific obligations: (1) Transparency: AI-generated content must be labeled (users must know they are interacting with AI). Deepfakes must be disclosed. (2) Copyright compliance: Providers must publish summaries of copyrighted training data used. (3) Safety testing: High-capability models must undergo red-teaming and adversarial testing before release. (4) Technical documentation: Model cards describing capabilities, limitations, and intended use. For developers, this means: adding AI disclosure labels, documenting training data sources, implementing safety evaluations, and maintaining technical documentation. Non-compliance can result in fines up to 35 million euros or 7% of global revenue.

Mixed & Application Questions

Question 1
Easy
What is the output?
portfolio_items = [
    "Kaggle competitions",
    "GitHub projects",
    "Blog posts",
    "Open source contributions"
]
for item in portfolio_items:
    print(f"  Portfolio: {item}")
print(f"Total: {len(portfolio_items)} types")
Four items in the portfolio list.
Portfolio: Kaggle competitions
Portfolio: GitHub projects
Portfolio: Blog posts
Portfolio: Open source contributions
Total: 4 types
Question 2
Easy
What is the output?
skills_by_role = {
    "Data Analyst": ["SQL", "Excel", "Tableau"],
    "ML Engineer": ["Python", "Docker", "MLOps"]
}
for role, skills in skills_by_role.items():
    print(f"{role}: {' + '.join(skills)}")
join connects list items with ' + ' separator.
Data Analyst: SQL + Excel + Tableau
ML Engineer: Python + Docker + MLOps
Question 3
Medium
What is the output?
def check_fairness(group_a_tpr, group_b_tpr, threshold=0.1):
    diff = abs(group_a_tpr - group_b_tpr)
    return {
        "difference": round(diff, 3),
        "fair": diff < threshold,
        "metric": "Equal Opportunity"
    }

result = check_fairness(0.85, 0.72)
print(f"Metric: {result['metric']}")
print(f"TPR difference: {result['difference']}")
print(f"Fair: {result['fair']}")
abs(0.85 - 0.72) = 0.13. Is 0.13 < 0.1?
Metric: Equal Opportunity
TPR difference: 0.13
Fair: False
Question 4
Medium
What is the output?
def salary_comparison(india_lpa, us_usd_k):
    # Approximate conversion: 1 LPA = ~$1,200 USD
    india_usd = india_lpa * 1200
    us_usd = us_usd_k * 1000
    ppp_factor = 4.5  # PPP adjustment for India
    india_adjusted = india_usd * ppp_factor
    return {
        "india_nominal": india_usd,
        "us_nominal": us_usd,
        "india_ppp": india_adjusted,
        "ppp_ratio": round(india_adjusted / us_usd, 2)
    }

# ML Engineer salary comparison
result = salary_comparison(20, 150)  # 20 LPA India, $150K US
print(f"India (nominal): ${result['india_nominal']:,}")
print(f"US (nominal): ${result['us_nominal']:,}")
print(f"India (PPP-adjusted): ${result['india_ppp']:,}")
print(f"PPP ratio: {result['ppp_ratio']}")
20 LPA * 1200 = $24,000 nominal. PPP-adjusted = $24,000 * 4.5.
India (nominal): $24,000
US (nominal): $150,000
India (PPP-adjusted): $108,000
PPP ratio: 0.72
Question 5
Medium
Suresh is asked in an ML interview: 'Design a fraud detection system for a payment company.' How should he structure his answer?
Cover data, features, model, serving, and monitoring.
Suresh should follow the standard ML system design framework: (1) Clarify requirements: Scale (transactions per second), latency (real-time or batch), precision vs recall priority (false positives cost user friction, false negatives cost money). (2) Data: Transaction logs, user history, device info, geolocation. (3) Features: Transaction amount, frequency, time since last transaction, distance from usual location, device fingerprint, velocity features (spending pattern changes). (4) Model: Two-stage: rule-based filters (hard limits) + ML model (gradient boosting or neural network). Handle class imbalance with SMOTE/undersampling. (5) Serving: Real-time inference (< 50ms). Feature store for precomputed features. (6) Monitoring: Precision/recall on flagged transactions, false positive rate, latency, feature drift. Human review loop for edge cases.
Question 6
Hard
A hospital wants to deploy an AI system for diagnosing skin conditions from photos. What ethical considerations must Meera address before deployment?
Think about bias, explainability, safety, privacy, and human oversight.
Meera must address: (1) Representation bias: Dermatology AI trained mostly on lighter skin tones performs significantly worse on darker skin (proven by multiple studies). She must ensure diverse training data and test accuracy across all skin tones. (2) Explainability: Doctors need to understand why the AI flagged a condition. Use GradCAM or attention maps to highlight which image regions influenced the diagnosis. (3) Human oversight: The AI should assist doctors, not replace them. All AI diagnoses must be reviewed by a dermatologist. (4) Privacy: Medical images are sensitive. Ensure HIPAA/equivalent compliance, data encryption, and no unauthorized storage. (5) Safety: False negatives (missing a melanoma) are life-threatening. Optimize for high recall (sensitivity) even at the cost of more false positives. (6) Informed consent: Patients should know AI is being used and have the option to opt out.
Question 7
Hard
What is the difference between privacy and fairness in AI? Can a system be private but unfair, or fair but not private?
Think about what each concept protects and whether they can conflict.
Privacy and fairness are independent ethical dimensions that can conflict: A system can be private but unfair: A loan approval model uses differential privacy to protect individual data, but its training data has historical racial bias, so it unfairly denies loans to minorities. Data is protected but outcomes are biased. A system can be fair but not private: A model achieves equal approval rates across racial groups, but to do so it requires access to racial information -- collecting and storing this sensitive data creates privacy risks. A system can be both: Using federated learning (privacy) with fairness constraints (fairness). Or neither: An unencrypted database of biased predictions. The tension: testing for fairness often requires collecting protected attributes (race, gender), which raises privacy concerns. Solutions include secure computation, privacy-preserving audits, and differential privacy with fairness constraints.
Question 8
Hard
What is the output?
def bias_variance_tradeoff(model_complexity):
    """ML interview classic: explain bias-variance."""
    if model_complexity < 3:
        return {"bias": "High", "variance": "Low", "error": "Underfitting"}
    elif model_complexity > 7:
        return {"bias": "Low", "variance": "High", "error": "Overfitting"}
    else:
        return {"bias": "Medium", "variance": "Medium", "error": "Good fit"}

for complexity in [1, 5, 9]:
    result = bias_variance_tradeoff(complexity)
    print(f"Complexity={complexity}: {result['error']:12s} | "
          f"Bias={result['bias']:6s} | Variance={result['variance']}")
Low complexity = high bias (underfitting). High complexity = high variance (overfitting).
Complexity=1: Underfitting | Bias=High | Variance=Low
Complexity=5: Good fit | Bias=Medium | Variance=Medium
Complexity=9: Overfitting | Bias=Low | Variance=High
Question 9
Easy
What is the output?
explain_tools = {"LIME": "Local linear approximation", "SHAP": "Shapley value attribution"}
for tool, method in explain_tools.items():
    print(f"{tool}: {method}")
Two major explainability tools and their underlying methods.
LIME: Local linear approximation
SHAP: Shapley value attribution
Question 10
Medium
What is the output?
def career_skills_overlap(role1_skills, role2_skills):
    common = set(role1_skills) & set(role2_skills)
    return sorted(common)

ds_skills = ["Python", "Statistics", "ML", "SQL", "Communication"]
mle_skills = ["Python", "Docker", "MLOps", "ML", "Cloud", "APIs"]

overlap = career_skills_overlap(ds_skills, mle_skills)
print(f"Data Scientist skills: {len(ds_skills)}")
print(f"ML Engineer skills: {len(mle_skills)}")
print(f"Overlap: {overlap}")
print(f"Overlap count: {len(overlap)}")
Set intersection finds common skills between two roles.
Data Scientist skills: 5
ML Engineer skills: 6
Overlap: ['ML', 'Python']
Overlap count: 2
Question 11
Medium
What is differential privacy and how does it protect individual data in ML?
Think about adding noise to make individual data points unidentifiable.
Differential privacy is a mathematical framework that provides formal privacy guarantees. It works by adding calibrated noise to computations (queries, model training) so that the presence or absence of any single individual's data does not significantly change the output. In ML, differential privacy can be applied during training (DP-SGD: differentially private stochastic gradient descent) by clipping gradients and adding Gaussian noise. The privacy budget (epsilon) controls the noise-utility tradeoff: smaller epsilon means more privacy but lower model accuracy. Apple uses differential privacy for iPhone analytics, and Google uses it for Chrome browser data collection.
Question 12
Hard
Rohit is transitioning from a Data Analyst role to an ML Engineer role. What specific skills should he develop, and what projects should he build to demonstrate readiness?
Think about the gap between analysis and production ML systems.
Rohit needs to bridge the gap from analysis to engineering: Technical skills: (1) Advanced Python (classes, decorators, type hints, not just scripts). (2) MLOps tools (Docker, MLflow, CI/CD). (3) API development (FastAPI/Flask). (4) Cloud platforms (AWS SageMaker or GCP Vertex AI -- get a certification). (5) Software engineering practices (Git workflows, testing, code review). Portfolio projects: (1) End-to-end ML project: data pipeline -> model -> FastAPI deployment -> Docker -> cloud hosting. (2) A/B testing framework for model comparison. (3) Automated retraining pipeline triggered by data drift. (4) Contribute to an open-source MLOps tool. Key differentiator: Rohit should emphasize that his analyst background gives him strong data intuition that many ML Engineers lack. Position it as a strength, not a gap.

Multiple Choice Questions

MCQ 1
What is bias in AI training data?
  • A. A type of neural network layer
  • B. Systematic errors in data that lead to unfair model predictions
  • C. A method to speed up training
  • D. The bias term in linear regression
Answer: B
B is correct. AI bias refers to systematic errors or prejudices in training data that cause the model to make unfair predictions. This can arise from historical discrimination, underrepresentation of groups, or biased labeling.
MCQ 2
What tool uses Shapley values from game theory to explain model predictions?
  • A. LIME
  • B. SHAP
  • C. TensorBoard
  • D. MLflow
Answer: B
B is correct. SHAP (SHapley Additive exPlanations) uses Shapley values from cooperative game theory to fairly attribute each feature's contribution to a prediction. It provides both local (per-prediction) and global (overall) explanations.
MCQ 3
What is the first comprehensive AI law in the world?
  • A. US AI Freedom Act
  • B. EU AI Act
  • C. China AI Regulation
  • D. India Digital AI Law
Answer: B
B is correct. The EU AI Act (2024) is the world's first comprehensive AI regulation. It classifies AI systems by risk level and imposes requirements ranging from banning unacceptable-risk AI to requiring transparency for chatbots.
MCQ 4
Which ML career role focuses primarily on deploying and scaling ML systems in production?
  • A. Data Analyst
  • B. Data Scientist
  • C. ML Engineer
  • D. Business Analyst
Answer: C
C is correct. ML Engineers focus on taking ML models from research/development to production. Their skills include Python, Docker, cloud platforms, API development, MLOps, and system design.
MCQ 5
What is demographic parity as a fairness metric?
  • A. All groups have the same number of data points
  • B. The positive prediction rate is equal across all demographic groups
  • C. All features are equally important
  • D. The model has the same accuracy for all groups
Answer: B
B is correct. Demographic parity requires that the rate of positive predictions (e.g., loan approvals) be equal across all demographic groups, regardless of qualifications. P(Y_hat=1|Group A) = P(Y_hat=1|Group B).
MCQ 6
What is the 80% rule (four-fifths rule) for disparate impact?
  • A. Models must be at least 80% accurate
  • B. The selection rate for any group should be at least 80% of the rate for the most selected group
  • C. 80% of features must be significant
  • D. Training data must be 80% accurate
Answer: B
B is correct. The four-fifths (80%) rule states that the selection rate for any protected group should be at least 80% of the rate of the most-selected group. A ratio below 0.8 indicates potential illegal discrimination and warrants investigation.
MCQ 7
How does LIME explain a single prediction?
  • A. By showing the model's weights
  • B. By fitting a simple local model around the prediction using perturbed samples
  • C. By computing gradients
  • D. By retraining the model
Answer: B
B is correct. LIME generates perturbed samples around the instance to explain, gets model predictions for all of them, then fits a simple interpretable model (e.g., linear regression) on these samples. The simple model's coefficients explain which features most influenced the prediction.
MCQ 8
What is the AI alignment problem?
  • A. Making AI models run faster
  • B. Ensuring AI systems pursue goals that match human intentions and values
  • C. Aligning training data with test data
  • D. Making all AI models have the same architecture
Answer: B
B is correct. The alignment problem is ensuring AI systems do what we actually want, not just what we technically specified. Misalignment can lead to reward hacking, specification gaming, and unintended harmful behaviors as AI becomes more capable.
MCQ 9
Which platform is best for showcasing ML competition skills?
  • A. LinkedIn
  • B. Kaggle
  • C. Twitter
  • D. Stack Overflow
Answer: B
B is correct. Kaggle is the premier platform for ML competitions, notebooks, and datasets. Competition rankings, published notebooks, and discussion contributions demonstrate practical ML skills. A strong Kaggle profile (Expert/Master) is highly valued by employers.
MCQ 10
What is the bias-variance tradeoff?
  • A. Choosing between accuracy and speed
  • B. Simple models have high bias (underfit) while complex models have high variance (overfit)
  • C. Training time vs inference time
  • D. The tradeoff between training and test data size
Answer: B
B is correct. Bias is error from oversimplified models (underfitting). Variance is error from overly complex models that are sensitive to training data (overfitting). The goal is to find the complexity level that minimizes total error (bias^2 + variance).
MCQ 11
Why does the impossibility theorem make AI fairness particularly challenging?
  • A. It means AI can never be fair
  • B. It proves that multiple common fairness criteria cannot all be satisfied simultaneously when base rates differ across groups
  • C. It means we need more training data
  • D. It only applies to deep learning models
Answer: B
B is correct. The impossibility theorem (Chouldechova, 2017) proves that Demographic Parity, Equal Opportunity, and Predictive Parity cannot all hold simultaneously when different groups have different base rates. This means choosing which fairness criterion to optimize is an ethical decision, not just a technical one.
MCQ 12
In an ML system design interview for a recommendation system, what is the 'cold start' problem?
  • A. The system runs slowly when first started
  • B. New users or items have no interaction history, making personalized recommendations difficult
  • C. The model takes too long to train
  • D. The database connection times out
Answer: B
B is correct. Cold start occurs when a new user (no history) or new item (no interactions) enters the system. Solutions: content-based recommendations (use item attributes for new items), demographic-based (use user profile for new users), popularity-based fallback, and onboarding surveys.
MCQ 13
What is federated learning, and how does it address privacy concerns?
  • A. Training one large model on a supercomputer
  • B. Training models on decentralized data sources without sharing raw data between them
  • C. Using data federation tools to merge databases
  • D. Training separate models for each user
Answer: B
B is correct. Federated learning trains ML models across multiple devices or institutions without centralizing the data. Each participant trains a local model and only shares model updates (gradients), not raw data. This protects privacy while enabling collaborative learning -- used in healthcare (hospitals collaborate without sharing patient data) and mobile keyboards (Google Gboard).
MCQ 14
Under the EU AI Act, which of these AI applications is classified as 'high risk'?
  • A. AI in video games
  • B. AI for credit scoring and hiring decisions
  • C. AI-powered spam filters
  • D. AI chatbots for customer service
Answer: B
B is correct. The EU AI Act classifies AI used in credit scoring and hiring as high-risk because these decisions significantly impact people's lives. High-risk AI must undergo conformity assessments, provide transparency, enable human oversight, and maintain detailed documentation.
MCQ 15
What is RLHF (Reinforcement Learning from Human Feedback) used for in modern AI?
  • A. Training image classification models
  • B. Aligning LLMs with human preferences by training a reward model from human rankings
  • C. Generating training data automatically
  • D. Compressing models for deployment
Answer: B
B is correct. RLHF aligns LLMs (like ChatGPT and Claude) with human values by: training a reward model from human preference rankings, then optimizing the LLM using PPO to maximize this learned reward. This makes LLMs more helpful, harmless, and honest.
MCQ 16
What makes a strong AI/ML portfolio project stand out from the crowd?
  • A. Using the Titanic or MNIST dataset
  • B. Solving a unique problem with end-to-end implementation including deployment, testing, and documentation
  • C. Having many Jupyter notebooks
  • D. Using the most complex model available
Answer: B
B is correct. Stand-out projects solve real, unique problems (not standard datasets), include end-to-end implementation (data to deployment), have proper software engineering (modules, tests, Docker), and are well-documented (README, blog post). A deployed sentiment analyzer for a niche domain shows more skill than 10 notebook-only Titanic analyses.
MCQ 17
What is the purpose of LIME in AI explainability?
  • A. To train models faster
  • B. To explain individual predictions by creating a local interpretable model
  • C. To reduce model size
  • D. To generate synthetic data
Answer: B
B is correct. LIME (Local Interpretable Model-agnostic Explanations) explains a single prediction by perturbing the input, observing prediction changes, and fitting a simple local model. The local model's coefficients show which features most influenced the prediction.
MCQ 18
What is proxy discrimination in AI?
  • A. Using a proxy server for model deployment
  • B. When features that correlate with protected attributes cause biased predictions, even without using protected features directly
  • C. A type of model compression
  • D. Using one model as a proxy for another
Answer: B
B is correct. Proxy discrimination occurs when non-protected features (income, zip code) correlate with protected attributes (race, gender) and produce biased outcomes. Simply removing protected features does not eliminate bias if proxy features remain.
MCQ 19
What skills differentiate an ML Engineer from a Data Scientist?
  • A. ML Engineers know more statistics
  • B. ML Engineers focus on deployment, scaling, and MLOps; Data Scientists focus on model building and analysis
  • C. Data Scientists cannot program
  • D. ML Engineers do not use machine learning
Answer: B
B is correct. Data Scientists build and evaluate models (statistics, feature engineering, experimentation). ML Engineers deploy, scale, and maintain models in production (Docker, APIs, cloud, CI/CD, monitoring). Both use Python and ML, but ML Engineers add software engineering and operations skills.
MCQ 20
What does SHAP stand for?
  • A. Simple Heuristic Analysis Protocol
  • B. SHapley Additive exPlanations
  • C. Statistical Hypothesis And Prediction
  • D. Supervised Hierarchical Attention Processing
Answer: B
B is correct. SHAP (SHapley Additive exPlanations) uses Shapley values from cooperative game theory to fairly attribute each feature's contribution to a model prediction. It provides both local and global explanations.

Coding Challenges

Coding challenges coming soon.

Need to Review the Concepts?

Go back to the detailed notes for this chapter.

Read Chapter Notes

Want to learn AI and ML with a live mentor?

Explore our AI/ML Masterclass