Practice Questions — MLOps and Model Deployment
← Back to NotesTopic-Specific Questions
Question 1
Easy
What is the output of the following code?
pipeline_stages = ["Data", "Preprocess", "Train", "Evaluate", "Deploy", "Monitor"]
for i, stage in enumerate(pipeline_stages, 1):
print(f"{i}. {stage}")enumerate with start=1 provides 1-based indexing.
1. Data2. Preprocess3. Train4. Evaluate5. Deploy6. MonitorQuestion 2
Easy
What is the output?
save_methods = {
"pickle": ".pkl",
"joblib": ".joblib",
"keras": ".keras",
"torch": ".pt"
}
for method, ext in save_methods.items():
print(f"{method}: saves as *{ext}")Dictionary iteration produces key-value pairs.
pickle: saves as *.pkljoblib: saves as *.joblibkeras: saves as *.kerastorch: saves as *.ptQuestion 3
Easy
What is the output?
serving_patterns = [
("Real-time", "< 100ms", "Chatbots, search"),
("Batch", "Minutes-hours", "Reports, emails"),
("Streaming", "Seconds", "Fraud detection")
]
for pattern, latency, use_case in serving_patterns:
print(f"{pattern:12s} | {latency:15s} | {use_case}")Three model serving patterns with different latency requirements.
Real-time | < 100ms | Chatbots, searchBatch | Minutes-hours | Reports, emailsStreaming | Seconds | Fraud detectionQuestion 4
Medium
What is the output?
import json
# Simulated API request and response
request_data = {"features": [0.5, -1.2, 3.4, 0.8]}
response_data = {"prediction": 1, "confidence": 0.92}
print(f"Request: {json.dumps(request_data)}")
print(f"Response: {json.dumps(response_data)}")
print(f"Feature count: {len(request_data['features'])}")
print(f"Prediction: {'positive' if response_data['prediction'] == 1 else 'negative'}")json.dumps converts dict to JSON string. prediction=1 means positive.
Request: {"features": [0.5, -1.2, 3.4, 0.8]}Response: {"prediction": 1, "confidence": 0.92}Feature count: 4Prediction: positiveQuestion 5
Medium
What is the output?
docker_layers = [
("FROM python:3.11-slim", "Base image"),
("COPY requirements.txt .", "Dependencies spec"),
("RUN pip install -r requirements.txt", "Install deps (cached)"),
("COPY app.py .", "Application code"),
("COPY model.joblib .", "ML model"),
("CMD uvicorn app:app", "Start command")
]
print("Docker Layer Order (optimized for caching):")
for i, (cmd, desc) in enumerate(docker_layers, 1):
print(f" {i}. {desc:25s} -> {cmd}")
print(f"\nTotal layers: {len(docker_layers)}")
print("Note: Change to app.py rebuilds only layers 4-6 (fast rebuild)")Docker rebuilds from the first changed layer. Putting code after deps means dep install is cached.
Docker Layer Order (optimized for caching): 1. Base image -> FROM python:3.11-slim 2. Dependencies spec -> COPY requirements.txt . 3. Install deps (cached) -> RUN pip install -r requirements.txt 4. Application code -> COPY app.py . 5. ML model -> COPY model.joblib . 6. Start command -> CMD uvicorn app:appTotal layers: 6Note: Change to app.py rebuilds only layers 4-6 (fast rebuild)Question 6
Medium
What is the output?
import numpy as np
from scipy import stats
np.random.seed(42)
# Reference distribution (training)
ref = np.random.normal(0, 1, 1000)
# Test 1: Same distribution (no drift)
test_same = np.random.normal(0, 1, 1000)
stat1, p1 = stats.ks_2samp(ref, test_same)
# Test 2: Different distribution (drift!)
test_diff = np.random.normal(0.5, 1.5, 1000)
stat2, p2 = stats.ks_2samp(ref, test_diff)
print(f"Same distribution: KS={stat1:.4f}, p={p1:.4f}, drift={'YES' if p1 < 0.05 else 'No'}")
print(f"Drifted distribution: KS={stat2:.4f}, p={p2:.4f}, drift={'YES' if p2 < 0.05 else 'No'}")KS test detects if two samples come from different distributions. Low p-value = likely different.
Same distribution: KS=[small], p=[large, > 0.05], drift=NoDrifted distribution: KS=[large], p=[small, < 0.05], drift=YESQuestion 7
Hard
What is the output?
class ModelVersionManager:
def __init__(self):
self.versions = []
self.active = None
def register(self, name, accuracy):
version = {"name": name, "accuracy": accuracy, "version": len(self.versions) + 1}
self.versions.append(version)
return version["version"]
def promote(self, version_num):
for v in self.versions:
if v["version"] == version_num:
self.active = v
return True
return False
def get_active(self):
return self.active
mgr = ModelVersionManager()
mgr.register("rf_v1", 0.89)
mgr.register("rf_v2", 0.92)
v3 = mgr.register("gb_v1", 0.95)
mgr.promote(v3)
active = mgr.get_active()
print(f"Total versions: {len(mgr.versions)}")
print(f"Active: {active['name']} (v{active['version']}, acc={active['accuracy']})")Three models registered. v3 promoted to active. get_active returns the promoted model.
Total versions: 3Active: gb_v1 (v3, acc=0.95)Question 8
Easy
Why should you load the ML model once at startup rather than loading it on every API request?
Think about disk I/O time vs memory access time.
Loading a model from disk involves file I/O, deserialization, and memory allocation -- typically 100-500ms per load. In contrast, a model already in memory can be used for prediction in microseconds. Loading on every request would add this 100-500ms overhead to every API call, making the service unusable for real-time applications. Loading once at startup keeps the model in memory for the entire lifecycle of the application, and all requests share the same fast in-memory model.
Question 9
Medium
What is the difference between data drift and concept drift? Give an example of each.
One is about input distribution changes, the other is about the input-output relationship changing.
Data drift occurs when the distribution of input features changes. The model's decision boundary is still correct, but it receives inputs it has not seen before. Example: a loan model trained on ages 25-55 starts receiving applications from 18-year-olds. Concept drift occurs when the relationship between features and the target variable changes, even if the input distribution stays the same. Example: a spam classifier where the same words that used to be benign ("cryptocurrency", "NFT") are now associated with spam due to evolving spam tactics. Data drift affects inputs; concept drift affects the mapping from inputs to outputs.
Question 10
Medium
What advantages does FastAPI have over Flask for ML model serving?
Think about validation, documentation, performance, and developer experience.
FastAPI advantages: (1) Automatic request validation via Pydantic models -- incorrect requests return clear error messages without manual validation code. (2) Auto-generated API docs at /docs (Swagger UI) and /redoc. (3) Async support for handling concurrent requests efficiently. (4) Type hints throughout for better IDE support and self-documentation. (5) Higher performance than Flask due to Starlette and async I/O. Flask advantages: simpler for small projects, larger ecosystem, more tutorials. For ML APIs, FastAPI is generally preferred because input validation (checking feature counts, types, ranges) is crucial and automatic with Pydantic.
Question 11
Hard
Rajesh has deployed a model that works perfectly in staging but fails in production with 'module not found' errors. What is likely the problem and how does Docker solve it?
Think about environment differences between machines.
The likely problem is environment mismatch: production has different Python/library versions, missing dependencies, or different OS libraries than Rajesh's staging machine. Docker solves this by packaging the application, model, and ALL dependencies into a container that includes its own OS libraries, Python installation, and pip packages. This container runs identically on any machine that has Docker installed, eliminating 'works on my machine' problems. The Dockerfile specifies the exact base image (e.g., python:3.11-slim), exact pip packages (from requirements.txt with pinned versions), and the exact startup command. What runs in Docker on Rajesh's laptop will run identically on a production server.
Question 12
Hard
Design a monitoring strategy for a production ML model. What metrics would you track and how would you set up alerts?
Think about input data, model performance, and operational health.
Track three categories: (1) Data quality metrics: Feature distributions (KS test against training data), null rates, type mismatches, input volume. Alert when drift p-value < 0.05 or null rate exceeds 5%. (2) Model performance metrics: Accuracy, F1, precision, recall on labeled samples. Average prediction confidence. Prediction distribution. Alert when accuracy drops below threshold or confidence distribution shifts. (3) Operational metrics: API latency (P50, P95, P99), throughput (requests/sec), error rate, memory/CPU usage. Alert when P99 latency exceeds SLA or error rate exceeds 1%. Implementation: log every prediction (input, output, timestamp, latency), run drift checks hourly, collect ground truth labels when available, and send alerts via Slack/email/PagerDuty.
Question 13
Easy
What is the output?
checks = ["Data drift", "Concept drift", "Accuracy decay", "Latency spike", "Error rate"]
for c in checks:
print(f" Monitor: {c}")
print(f"Total: {len(checks)}")5 things to monitor for deployed ML models.
Monitor: Data drift Monitor: Concept drift Monitor: Accuracy decay Monitor: Latency spike Monitor: Error rateTotal: 5Question 14
Medium
What is the output?
import json
def validate_request(data, n_features=4):
if data is None: return "Invalid JSON"
if "features" not in data: return "Missing features"
if len(data["features"]) != n_features: return f"Expected {n_features}, got {len(data['features'])}"
return "OK"
requests = [{"features":[1,2,3,4]}, {"data":[1]}, {"features":[1,2]}, None]
for req in requests:
print(f"{str(req)[:35]:35s} -> {validate_request(req)}")Validate JSON, key existence, and feature count.
{"features": [1, 2, 3, 4]} -> OK{"data": [1]} -> Missing features{"features": [1, 2]} -> Expected 4, got 2None -> Invalid JSONQuestion 15
Hard
What is the output?
def api_capacity(latency_ms, workers):
return workers * (1000 / latency_ms)
for latency, workers in [(10, 4), (50, 4), (200, 4), (50, 16)]:
rps = api_capacity(latency, workers)
print(f"{latency:3d}ms, {workers:2d} workers -> {rps:.0f} RPS")Each worker handles 1000/latency requests per second.
10ms, 4 workers -> 400 RPS 50ms, 4 workers -> 80 RPS200ms, 4 workers -> 20 RPS 50ms, 16 workers -> 320 RPSQuestion 16
Medium
What is the difference between a Docker image and a Docker container?
Think about the difference between a class and an instance.
A Docker image is a read-only template containing the application, dependencies, and configuration -- like a class definition. A Docker container is a running instance of an image -- like an object instantiated from a class. You can run multiple containers from one image. Images are built from Dockerfiles and stored in registries. Containers are started, stopped, and deleted independently. The image is the blueprint; the container is the running house.
Question 17
Easy
What is the output?
mlops_tools = {"Versioning": "MLflow", "Container": "Docker", "API": "FastAPI", "CI/CD": "GitHub Actions"}
for purpose, tool in mlops_tools.items():
print(f"{purpose:12s} -> {tool}")4 MLOps categories and their popular tools.
Versioning -> MLflowContainer -> DockerAPI -> FastAPICI/CD -> GitHub ActionsQuestion 18
Hard
What is model quantization and when would Deepak use it for deployment?
Think about reducing model precision to make inference faster and cheaper.
Model quantization reduces the numerical precision of model weights and activations from 32-bit floats (FP32) to lower precision formats like 16-bit (FP16), 8-bit integers (INT8), or 4-bit (INT4). Benefits: smaller model files (4x smaller at INT8), faster inference (integer operations are cheaper), and lower memory usage (fit larger models on smaller GPUs). Trade-offs: slight accuracy loss (usually < 1% for INT8, 2-5% for INT4). Deepak should use quantization when: deploying on edge devices with limited memory, reducing cloud inference costs, meeting strict latency SLAs, or deploying large language models that do not fit in GPU memory at full precision. Tools: PyTorch quantization, ONNX Runtime, TensorRT, bitsandbytes.
Mixed & Application Questions
Question 1
Easy
What is the output?
cloud_options = [
("AWS SageMaker", "Amazon"),
("Vertex AI", "Google"),
("Azure ML", "Microsoft"),
("HF Spaces", "Hugging Face")
]
for service, provider in cloud_options:
print(f"{provider:15s} -> {service}")List of tuples with cloud provider and their ML platform.
Amazon -> AWS SageMakerGoogle -> Vertex AIMicrosoft -> Azure MLHugging Face -> HF SpacesQuestion 2
Easy
What is the output?
http_codes = {
200: "OK -- prediction returned",
400: "Bad Request -- invalid input",
404: "Not Found -- wrong endpoint",
500: "Server Error -- model failed"
}
for code, meaning in http_codes.items():
print(f"HTTP {code}: {meaning}")Common HTTP status codes used in ML APIs.
HTTP 200: OK -- prediction returnedHTTP 400: Bad Request -- invalid inputHTTP 404: Not Found -- wrong endpointHTTP 500: Server Error -- model failedQuestion 3
Medium
What is the output?
import os
# Simulated file sizes for model artifacts
artifacts = {
"model.pkl": 5242880, # 5 MB
"vectorizer.pkl": 1048576, # 1 MB
"config.json": 1024, # 1 KB
"requirements.txt": 256 # 256 bytes
}
total = 0
for name, size in artifacts.items():
mb = size / (1024 * 1024)
print(f"{name:20s}: {mb:8.2f} MB")
total += size
print(f"{'Total':20s}: {total/(1024*1024):8.2f} MB")Convert bytes to MB by dividing by 1024*1024.
model.pkl : 5.00 MBvectorizer.pkl : 1.00 MBconfig.json : 0.00 MBrequirements.txt : 0.00 MBTotal : 6.00 MBQuestion 4
Medium
What is the output?
def api_latency_report(latencies_ms):
import numpy as np
p50 = np.percentile(latencies_ms, 50)
p95 = np.percentile(latencies_ms, 95)
p99 = np.percentile(latencies_ms, 99)
avg = np.mean(latencies_ms)
return {"avg": round(avg, 1), "p50": round(p50, 1), "p95": round(p95, 1), "p99": round(p99, 1)}
import numpy as np
np.random.seed(42)
latencies = np.random.exponential(scale=15, size=1000) # Exponential distribution
report = api_latency_report(latencies)
for metric, value in report.items():
sla_ok = value < 100
print(f"{metric:4s}: {value:6.1f}ms {'OK' if sla_ok else 'BREACH'}")Exponential distribution with scale=15 has most values near 15ms but a long tail.
All metrics show latency values, with p99 being the highest due to the long tail of the exponential distribution. All should be under 100ms for this distribution.
Question 5
Hard
What is the output?
import mlflow
class MockMLflowRun:
def __init__(self, params, metrics, model_name):
self.params = params
self.metrics = metrics
self.model_name = model_name
runs = [
MockMLflowRun({"n_estimators": "100", "max_depth": "5"}, {"accuracy": 0.89, "f1": 0.87}, "rf_v1"),
MockMLflowRun({"n_estimators": "200", "max_depth": "10"}, {"accuracy": 0.93, "f1": 0.91}, "rf_v2"),
MockMLflowRun({"n_estimators": "50", "learning_rate": "0.1"}, {"accuracy": 0.95, "f1": 0.94}, "gb_v1")
]
print("MLflow Experiment Comparison:")
print(f"{'Model':8s} | {'Accuracy':10s} | {'F1':10s} | {'Params':30s}")
print("-" * 65)
for run in runs:
params_str = ", ".join(f"{k}={v}" for k, v in run.params.items())
print(f"{run.model_name:8s} | {run.metrics['accuracy']:10.4f} | {run.metrics['f1']:10.4f} | {params_str}")
best = max(runs, key=lambda r: r.metrics["accuracy"])
print(f"\nBest model: {best.model_name} (accuracy={best.metrics['accuracy']})")Compare three models logged in MLflow. Find the one with highest accuracy.
MLflow Experiment Comparison:Model | Accuracy | F1 | Params-----------------------------------------------------------------rf_v1 | 0.8900 | 0.8700 | n_estimators=100, max_depth=5rf_v2 | 0.9300 | 0.9100 | n_estimators=200, max_depth=10gb_v1 | 0.9500 | 0.9400 | n_estimators=50, learning_rate=0.1Best model: gb_v1 (accuracy=0.95)Question 6
Medium
Anita trained a model in a Jupyter notebook on her laptop. What steps does she need to take to deploy it as a production API service?
Think about the full path from notebook to production.
Steps: (1) Save the model -- serialize with joblib/pickle. (2) Write API code -- create a FastAPI or Flask app with /predict and /health endpoints. (3) Add input validation -- validate feature count, types, ranges. (4) Pin dependencies -- create requirements.txt with exact library versions. (5) Containerize -- write a Dockerfile, build and test the Docker image. (6) Add monitoring -- log predictions, latency, and feature distributions. (7) Deploy -- push Docker image to a registry, deploy to cloud (SageMaker, Vertex AI) or self-hosted server. (8) Set up CI/CD -- automate testing and deployment on code changes.
Question 7
Hard
What is the difference between A/B testing and canary deployment for ML models? When would you use each?
Think about how traffic is split between old and new models.
A/B testing: Split traffic evenly (e.g., 50/50) between the old and new model. Measure statistical differences in key metrics (accuracy, revenue, user engagement). Purpose: determine which model is better based on real-world performance. Use when you want to rigorously compare two models. Canary deployment: Route a small percentage (e.g., 5%) of traffic to the new model while 95% goes to the old one. Gradually increase the new model's traffic if no issues arise. Purpose: catch problems early with minimal user impact. Use when deploying any model update to production. Canary focuses on safety (catching errors), A/B testing focuses on comparison (which is better).
Question 8
Hard
Why is CI/CD important for ML systems, and how does it differ from standard software CI/CD?
Think about what can change in ML beyond just code.
CI/CD for ML is important because ML systems have more failure modes than regular software: code changes, data changes, model performance degradation, and dependency updates can all break the system. ML CI/CD differs from standard CI/CD in several ways: (1) Data validation tests -- check that training data meets schema, distribution, and quality expectations. (2) Model performance tests -- verify accuracy/F1/latency meet minimum thresholds before deployment. (3) Model artifact management -- version and store trained models alongside code. (4) Training pipeline triggers -- CI/CD may trigger retraining when data or code changes. (5) Longer pipeline times -- model training can take hours, unlike typical code builds. Standard software CI/CD focuses on code correctness. ML CI/CD must also validate data quality, model performance, and reproducibility.
Question 9
Easy
What is the output?
deployment_checklist = [
"Model serialized (joblib/pickle)",
"API endpoint created (/predict)",
"Input validation added",
"Health check endpoint (/health)",
"Dependencies pinned (requirements.txt)",
"Dockerized",
"Monitoring configured"
]
print(f"Deployment Checklist ({len(deployment_checklist)} items):")
for i, item in enumerate(deployment_checklist, 1):
print(f" [{i}] {item}")7 items in the deployment checklist.
Deployment Checklist (7 items):Each item listed with its number from 1 to 7.
Question 10
Medium
What is the output?
def compare_frameworks(name, features):
return f"{name}: {', '.join(features)}"
flask = compare_frameworks("Flask", ["Simple", "Synchronous", "Manual validation"])
fastapi = compare_frameworks("FastAPI", ["Modern", "Async", "Auto validation", "Auto docs"])
print(flask)
print(fastapi)
print(f"\nFlask features: 3")
print(f"FastAPI features: 4")
print(f"Winner for ML APIs: FastAPI")FastAPI has more built-in features useful for ML APIs.
Flask: Simple, Synchronous, Manual validationFastAPI: Modern, Async, Auto validation, Auto docsFlask features: 3FastAPI features: 4Winner for ML APIs: FastAPIQuestion 11
Easy
What is Docker and why is it important for ML deployment?
Think about the 'works on my machine' problem.
Docker is a containerization platform that packages an application along with its dependencies, libraries, and runtime into a portable container. For ML deployment, Docker solves the 'works on my machine' problem: a model that runs on a data scientist's laptop (with specific Python/library versions) may fail on a production server with different versions. Docker ensures the exact same environment runs everywhere -- development, staging, and production. The Dockerfile specifies the base image, dependencies, application code, and startup command, making deployments reproducible and portable across any cloud provider.
Question 12
Hard
What is model versioning with MLflow and why is it important for production ML systems?
Think about what happens when a new model performs worse than the old one.
Model versioning tracks every model artifact (weights, hyperparameters, metrics, training data version) with a unique identifier. MLflow's Model Registry enables: (1) Experiment tracking: compare multiple model versions side by side on metrics. (2) Reproducibility: recreate any past model from logged parameters and data. (3) Rollback: quickly revert to a previous model version if the new one degrades in production. (4) Stage management: move models through stages (Staging -> Production -> Archived) with approval workflows. (5) Audit trail: know who trained which model, when, and with what data. Without versioning, teams cannot compare experiments, reproduce results, or recover from bad deployments.
Multiple Choice Questions
MCQ 1
What does MLOps stand for?
Answer: B
B is correct. MLOps (Machine Learning Operations) is the set of practices that combine ML development with operations to deploy, monitor, and maintain ML models in production reliably.
B is correct. MLOps (Machine Learning Operations) is the set of practices that combine ML development with operations to deploy, monitor, and maintain ML models in production reliably.
MCQ 2
Which Python library is best for serializing scikit-learn models with large numpy arrays?
Answer: C
C is correct. Joblib is optimized for efficiently serializing objects with large numpy arrays, which is common in scikit-learn models. It is faster than pickle for these use cases and supports compression.
C is correct. Joblib is optimized for efficiently serializing objects with large numpy arrays, which is common in scikit-learn models. It is faster than pickle for these use cases and supports compression.
MCQ 3
What is Docker?
Answer: B
B is correct. Docker packages applications, their dependencies, and runtime into portable containers that run identically on any machine. This eliminates 'works on my machine' problems in ML deployment.
B is correct. Docker packages applications, their dependencies, and runtime into portable containers that run identically on any machine. This eliminates 'works on my machine' problems in ML deployment.
MCQ 4
What HTTP method is typically used for the /predict endpoint in an ML API?
Answer: B
B is correct. POST is used for /predict because prediction requests send data (features) in the request body. GET is used for /health endpoints that do not require request data.
B is correct. POST is used for /predict because prediction requests send data (features) in the request body. GET is used for /health endpoints that do not require request data.
MCQ 5
What is data drift?
Answer: B
B is correct. Data drift occurs when production data has different statistical properties than training data. For example, a model trained on ages 25-55 receiving inputs from ages 18-80. This can degrade model performance.
B is correct. Data drift occurs when production data has different statistical properties than training data. For example, a model trained on ages 25-55 receiving inputs from ages 18-80. This can degrade model performance.
MCQ 6
What advantage does FastAPI have over Flask for ML APIs?
Answer: B
B is correct. FastAPI provides automatic request validation via Pydantic, native async support for concurrency, and auto-generated Swagger documentation at /docs. These features are particularly valuable for ML APIs that need input validation.
B is correct. FastAPI provides automatic request validation via Pydantic, native async support for concurrency, and auto-generated Swagger documentation at /docs. These features are particularly valuable for ML APIs that need input validation.
MCQ 7
Why should you copy requirements.txt and install dependencies BEFORE copying application code in a Dockerfile?
Answer: B
B is correct. Docker rebuilds all layers from the first changed layer onward. By installing dependencies first, changing application code only triggers a rebuild from the COPY app.py step, not the slow pip install step. This dramatically speeds up development iterations.
B is correct. Docker rebuilds all layers from the first changed layer onward. By installing dependencies first, changing application code only triggers a rebuild from the COPY app.py step, not the slow pip install step. This dramatically speeds up development iterations.
MCQ 8
What does MLflow track in an ML experiment?
Answer: B
B is correct. MLflow tracks hyperparameters (log_param), performance metrics (log_metric), model artifacts (log_model), and metadata for each experiment run. This enables experiment comparison, reproducibility, and model versioning.
B is correct. MLflow tracks hyperparameters (log_param), performance metrics (log_metric), model artifacts (log_model), and metadata for each experiment run. This enables experiment comparison, reproducibility, and model versioning.
MCQ 9
What is concept drift?
Answer: B
B is correct. Concept drift occurs when the underlying patterns the model learned change. For example, spam patterns evolve, customer preferences shift, or economic conditions change. The same inputs now map to different correct outputs than during training.
B is correct. Concept drift occurs when the underlying patterns the model learned change. For example, spam patterns evolve, customer preferences shift, or economic conditions change. The same inputs now map to different correct outputs than during training.
MCQ 10
When should batch serving be used instead of real-time serving?
Answer: B
B is correct. Batch serving processes large volumes of data at scheduled intervals (hourly, daily). It is ideal for non-time-critical tasks like generating daily recommendation emails, weekly reports, or overnight data processing. Real-time serving is needed when users expect immediate predictions.
B is correct. Batch serving processes large volumes of data at scheduled intervals (hourly, daily). It is ideal for non-time-critical tasks like generating daily recommendation emails, weekly reports, or overnight data processing. Real-time serving is needed when users expect immediate predictions.
MCQ 11
What statistical test is commonly used to detect data drift between two distributions?
Answer: B
B is correct. The Kolmogorov-Smirnov (KS) test measures the maximum distance between two cumulative distribution functions. A low p-value indicates the two samples likely come from different distributions. It is non-parametric and works for any continuous distribution.
B is correct. The Kolmogorov-Smirnov (KS) test measures the maximum distance between two cumulative distribution functions. A low p-value indicates the two samples likely come from different distributions. It is non-parametric and works for any continuous distribution.
MCQ 12
What is the purpose of a health check endpoint (/health) in a deployed ML API?
Answer: B
B is correct. The /health endpoint is a lightweight check that returns 200 OK if the service is running and the model is loaded. Load balancers use it to route traffic only to healthy instances. Monitoring systems use it to trigger alerts when services go down.
B is correct. The /health endpoint is a lightweight check that returns 200 OK if the service is running and the model is loaded. Load balancers use it to route traffic only to healthy instances. Monitoring systems use it to trigger alerts when services go down.
MCQ 13
What is canary deployment for ML models?
Answer: B
B is correct. Canary deployment routes a small fraction (e.g., 5%) of production traffic to the new model. If the new model performs well (no errors, good predictions, acceptable latency), its traffic share is gradually increased. If problems are detected, traffic is immediately reverted to the old model with minimal user impact.
B is correct. Canary deployment routes a small fraction (e.g., 5%) of production traffic to the new model. If the new model performs well (no errors, good predictions, acceptable latency), its traffic share is gradually increased. If problems are detected, traffic is immediately reverted to the old model with minimal user impact.
MCQ 14
Why is pinning exact dependency versions critical for ML deployments?
Answer: B
B is correct. Models serialized with pickle/joblib encode the exact class structures of the library version used during training. Loading a model saved with scikit-learn 1.3 using scikit-learn 1.5 may fail or produce incorrect predictions due to internal API changes. Pinning versions ensures the deployment environment exactly matches the training environment.
B is correct. Models serialized with pickle/joblib encode the exact class structures of the library version used during training. Loading a model saved with scikit-learn 1.3 using scikit-learn 1.5 may fail or produce incorrect predictions due to internal API changes. Pinning versions ensures the deployment environment exactly matches the training environment.
MCQ 15
In an ML CI/CD pipeline, what additional tests are needed compared to standard software CI/CD?
Answer: B
B is correct. ML CI/CD adds: data validation (schema, distribution, quality checks), model performance gates (accuracy/F1 must exceed thresholds), training reproducibility (same data + code = same model), and model artifact versioning. Standard CI/CD focuses on code correctness; ML CI/CD must also validate data and model quality.
B is correct. ML CI/CD adds: data validation (schema, distribution, quality checks), model performance gates (accuracy/F1 must exceed thresholds), training reproducibility (same data + code = same model), and model artifact versioning. Standard CI/CD focuses on code correctness; ML CI/CD must also validate data and model quality.
MCQ 16
Where should you load the ML model in a Flask/FastAPI application?
Answer: B
B is correct. Load the model once at startup and keep it in memory. Loading on every request adds 100-500ms of disk I/O overhead per request, making the API unusable for real-time predictions.
B is correct. Load the model once at startup and keep it in memory. Loading on every request adds 100-500ms of disk I/O overhead per request, making the API unusable for real-time predictions.
MCQ 17
What file specifies the Docker image build instructions?
Answer: B
B is correct. A Dockerfile contains instructions for building a Docker image: the base image (FROM), dependency installation (RUN), file copying (COPY), and startup command (CMD). Docker reads this file to create the image.
B is correct. A Dockerfile contains instructions for building a Docker image: the base image (FROM), dependency installation (RUN), file copying (COPY), and startup command (CMD). Docker reads this file to create the image.
MCQ 18
What is the purpose of the /health endpoint in a deployed ML API?
Answer: B
B is correct. The /health endpoint returns a simple 200 OK if the service is operational. Load balancers use it to route traffic to healthy instances. Monitoring systems use it to detect and alert on service outages.
B is correct. The /health endpoint returns a simple 200 OK if the service is operational. Load balancers use it to route traffic to healthy instances. Monitoring systems use it to detect and alert on service outages.
MCQ 19
What is A/B testing in the context of ML model deployment?
Answer: B
B is correct. A/B testing splits production traffic (e.g., 50/50) between the current model and a new version. By measuring real-world metrics (accuracy, CTR, revenue) on both, you can statistically determine which model is better before fully deploying the new one.
B is correct. A/B testing splits production traffic (e.g., 50/50) between the current model and a new version. By measuring real-world metrics (accuracy, CTR, revenue) on both, you can statistically determine which model is better before fully deploying the new one.
MCQ 20
Why is model reproducibility important in MLOps?
Answer: B
B is correct. Reproducibility means given the same data, code, hyperparameters, and random seeds, you get the same trained model. This is essential for debugging (reproduce a bug), auditing (prove the model was trained correctly), and regulatory compliance (demonstrate model lineage). MLflow and similar tools enable this.
B is correct. Reproducibility means given the same data, code, hyperparameters, and random seeds, you get the same trained model. This is essential for debugging (reproduce a bug), auditing (prove the model was trained correctly), and regulatory compliance (demonstrate model lineage). MLflow and similar tools enable this.
Coding Challenges
Coding challenges coming soon.
Need to Review the Concepts?
Go back to the detailed notes for this chapter.
Read Chapter NotesWant to learn AI and ML with a live mentor?
Explore our AI/ML Masterclass