Practice Questions — Deep Learning with TensorFlow and Keras
← Back to NotesTopic-Specific Questions
Question 1
Easy
What does the following code output for the shape?
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
model = Sequential([
Dense(64, activation='relu', input_shape=(100,)),
Dense(10, activation='softmax')
])
print(model.output_shape)The last layer has 10 units, so the output has 10 values per sample.
(None, 10)Question 2
Easy
What is the total number of trainable parameters in this model?
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
model = Sequential([
Dense(4, input_shape=(3,)),
Dense(2)
])
model.summary()Parameters = (inputs x units + biases) for each layer.
Layer 1: (3 x 4) + 4 = 16 parameters. Layer 2: (4 x 2) + 2 = 10 parameters. Total:
26 parameters.Question 3
Easy
What activation function should be used in the output layer for a binary classification problem?
model = Sequential([
Dense(64, activation='relu', input_shape=(20,)),
Dense(1, activation='???')
])Binary classification outputs a probability between 0 and 1.
sigmoidQuestion 4
Easy
What happens when this code runs?
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
print("Compiled successfully")compile() configures the model but does not train it.
Compiled successfullyQuestion 5
Easy
What will
model.predict(X) return for a model with Dense(10, activation='softmax') as the output layer, given one input sample?import numpy as np
X = np.random.rand(1, 784)
predictions = model.predict(X)
print(predictions.shape)
print(f"Sum of probabilities: {predictions.sum():.2f}")Softmax outputs probabilities that sum to 1.
(1, 10)Sum of probabilities: 1.00Question 6
Easy
What loss function should be used here?
# Labels are: [0, 3, 7, 2, 9, 1, 4, ...] (integers, not one-hot)
model.compile(optimizer='adam', loss='???', metrics=['accuracy'])Integer labels need a specific variant of categorical crossentropy.
sparse_categorical_crossentropyQuestion 7
Easy
What is the difference between
model.evaluate() and model.predict()?One needs true labels, the other does not.
model.evaluate(X, y) takes both inputs and true labels, computes the loss and metrics, and returns them. model.predict(X) takes only inputs and returns the model's output predictions without computing any loss. Use evaluate to measure performance; use predict to get predictions for new data.Question 8
Medium
What is the output shape after each layer?
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Dropout
model = Sequential([
Dense(256, activation='relu', input_shape=(784,)),
Dropout(0.3),
Dense(128, activation='relu'),
Dense(10, activation='softmax')
])
for layer in model.layers:
print(f"{layer.name}: {layer.output_shape}")Dropout does not change the shape of the data.
dense: (None, 256)dropout: (None, 256)dense_1: (None, 128)dense_2: (None, 10)Question 9
Medium
What does
history.history.keys() return after this training?model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=5, validation_split=0.2)
print(list(history.history.keys()))Training and validation versions of both loss and each metric.
['loss', 'accuracy', 'val_loss', 'val_accuracy']Question 10
Medium
What happens when EarlyStopping triggers?
from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(
monitor='val_loss',
patience=3,
restore_best_weights=True
)
# Suppose val_loss over epochs is:
# Epoch 1: 0.50, Epoch 2: 0.45, Epoch 3: 0.40,
# Epoch 4: 0.42, Epoch 5: 0.43, Epoch 6: 0.44
# At which epoch does training stop?Patience counts epochs without improvement after the best one.
Training stops after
Epoch 6. The best val_loss was 0.40 at Epoch 3. Patience is 3, meaning it waits 3 more epochs (4, 5, 6) without improvement, then stops. Weights are restored to Epoch 3.Question 11
Medium
How many parameters does this layer have?
from tensorflow.keras.layers import Dense
layer = Dense(64, input_shape=(128,))
layer.build(input_shape=(None, 128))
print(f"Weights shape: {layer.kernel.shape}")
print(f"Bias shape: {layer.bias.shape}")
print(f"Total params: {layer.count_params()}")Weights = input_dim x units, Biases = units.
Weights shape: (128, 64)Bias shape: (64,)Total params: 8256Question 12
Medium
What is the purpose of
validation_split in model.fit()? How is it different from using a separate test set?Validation data is used during training to monitor overfitting but not for weight updates.
validation_split reserves a fraction of the training data (e.g., 20%) as a validation set. The model trains on the remaining 80% and evaluates on the validation set at the end of each epoch. This helps detect overfitting in real-time. It is different from a test set because: (1) validation data is split from training data automatically, (2) it is used to monitor training and trigger callbacks like EarlyStopping, (3) the test set should be completely held out and used only for final evaluation.
Question 13
Medium
What does this code print?
from tensorflow.keras.layers import Dense, Dropout
import numpy as np
# Dropout behavior check
layer = Dropout(0.5)
# Create input tensor
x = tf.constant([[1.0, 2.0, 3.0, 4.0]])
# During training
out_train = layer(x, training=True)
print(f"Training output has zeros: {(out_train.numpy() == 0).any()}")
# During inference
out_infer = layer(x, training=False)
print(f"Inference output: {out_infer.numpy()}")Dropout zeroes out neurons during training but passes everything through during inference.
Training output has zeros: TrueInference output: [[1. 2. 3. 4.]]Question 14
Medium
Why should you place BatchNormalization between the linear transformation and the activation function (Dense without activation, then BatchNormalization, then Activation)?
Normalization works better on the raw linear output, before the non-linearity is applied.
BatchNormalization normalizes values to zero mean and unit variance. If placed after ReLU, negative values have already been zeroed out, so the normalization operates on a skewed distribution. Placing it before the activation function normalizes the full range of linear outputs, which helps the activation function operate in its most useful range. For ReLU, this means about half the values will be positive (active) and half negative (zeroed), giving better gradient flow.
Question 15
Hard
What is the output?
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, BatchNormalization
model = Sequential([
Dense(64, input_shape=(100,)),
BatchNormalization(),
Dense(10, activation='softmax')
])
total_params = model.count_params()
trainable = sum([layer.count_params() for layer in model.layers
if layer.trainable])
non_trainable = 0
for layer in model.layers:
if isinstance(layer, BatchNormalization):
non_trainable = sum([
tf.size(w).numpy() for w in layer.non_trainable_weights
])
print(f"BatchNorm has {non_trainable} non-trainable params")BatchNormalization has 4 sets of parameters per feature: gamma, beta (trainable) and moving_mean, moving_variance (non-trainable).
BatchNorm has 128 non-trainable paramsQuestion 16
Hard
Explain the difference between the Sequential API and the Functional API in Keras. When would you be forced to use the Functional API?
Think about model architectures that are not a simple linear stack of layers.
The Sequential API stacks layers in a single linear chain: each layer has exactly one input and one output. The Functional API allows any directed acyclic graph (DAG) of layers. You must use the Functional API when: (1) your model has multiple inputs (e.g., combining numeric data and text), (2) your model has multiple outputs (e.g., predicting both class and bounding box), (3) you need skip/residual connections (like in ResNet), or (4) you need to share layers between different paths. The Functional API creates layers by calling them on tensor objects:
x = Dense(64)(input_tensor).Question 17
Hard
What is the effect of reducing the learning rate here?
from tensorflow.keras.callbacks import ReduceLROnPlateau
reduce_lr = ReduceLROnPlateau(
monitor='val_loss',
factor=0.5,
patience=2,
min_lr=1e-6
)
# Suppose initial learning rate = 0.001
# val_loss: E1=0.5, E2=0.4, E3=0.41, E4=0.42, E5=0.38, E6=0.39, E7=0.40
# What is the learning rate at each epoch?After patience epochs without improvement, LR is multiplied by factor.
E1-E2: lr=0.001 (improving). E3-E4: no improvement for 2 epochs, lr reduced to 0.0005 after E4. E5: lr=0.0005 (improves to 0.38). E6-E7: no improvement for 2 epochs, lr reduced to 0.00025 after E7.
Question 18
Hard
What is internal covariate shift, and how does Batch Normalization address it?
The distribution of inputs to each layer changes as the preceding layers' weights change during training.
Internal covariate shift is the phenomenon where the distribution of inputs to a hidden layer changes during training because the preceding layer's weights are being updated. Layer 3 expects inputs with a certain distribution, but after layer 2's weights change, the distribution shifts. Batch Normalization addresses this by normalizing the inputs to each layer to have zero mean and unit variance (computed per mini-batch). This stabilizes the input distribution, allowing each layer to learn more effectively without constantly adapting to shifting inputs.
Mixed & Application Questions
Question 1
Easy
What does this code do?
model.save('my_model.keras')
loaded = keras.models.load_model('my_model.keras')
print(type(loaded))save() stores the entire model; load_model() restores it.
<class 'keras.src.models.sequential.Sequential'> (or similar Keras model class)Question 2
Easy
What is the predicted class?
import numpy as np
predictions = np.array([[0.05, 0.02, 0.85, 0.03, 0.05]])
predicted_class = predictions.argmax(axis=1)
print(predicted_class)argmax returns the index of the maximum value.
[2]Question 3
Easy
What normalization does this perform?
import numpy as np
X = np.array([0, 51, 102, 153, 204, 255])
X_normalized = X.astype('float32') / 255.0
print(X_normalized)Dividing by 255 maps [0, 255] to [0, 1].
[0. 0.2 0.4 0.6 0.8 1. ]Question 4
Easy
How many epochs will this training run if there is no early stopping?
model.fit(X_train, y_train, epochs=50, batch_size=32)epochs controls the number of complete passes through the data.
50 epochsQuestion 5
Easy
What is the role of the optimizer in model.compile()?
It determines how weights are updated during training.
The optimizer controls how the model's weights are updated based on the computed gradients during training. It determines the update rule (e.g., SGD uses basic gradient descent, Adam uses adaptive learning rates with momentum). The optimizer adjusts weights to minimize the loss function. Different optimizers converge at different speeds and may reach different solutions.
Question 6
Medium
What will the model output shape be?
from tensorflow.keras import Input, Model
from tensorflow.keras.layers import Dense, Concatenate
input_a = Input(shape=(10,))
input_b = Input(shape=(20,))
x = Concatenate()([input_a, input_b])
x = Dense(32, activation='relu')(x)
output = Dense(5, activation='softmax')(x)
model = Model(inputs=[input_a, input_b], outputs=output)
print(model.output_shape)Concatenating (10,) and (20,) gives (30,), then Dense layers transform it.
(None, 5)Question 7
Medium
What happens here?
callbacks = [
EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True),
ModelCheckpoint('best.keras', monitor='val_accuracy', save_best_only=True, mode='max'),
ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3)
]
model.fit(X_train, y_train, epochs=100, validation_split=0.2, callbacks=callbacks)All three callbacks can work together during training.
Three callbacks work simultaneously: (1) EarlyStopping monitors val_loss and stops training if it does not improve for 5 epochs, restoring the best weights. (2) ModelCheckpoint saves the model to 'best.keras' whenever val_accuracy improves (saves only the best). (3) ReduceLROnPlateau reduces the learning rate by half if val_loss does not improve for 3 epochs. Training runs up to 100 epochs but will likely stop early.
Question 8
Medium
What is the total number of parameters?
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Dropout
model = Sequential([
Dense(128, activation='relu', input_shape=(50,)),
Dropout(0.5),
Dense(64, activation='relu'),
Dense(1, activation='sigmoid')
])
print(model.count_params())Dropout has zero parameters.
14785Question 9
Medium
Why is Adam the most commonly used optimizer? What advantages does it have over plain SGD?
Adam combines momentum and adaptive learning rates.
Adam (Adaptive Moment Estimation) combines two ideas: (1) momentum -- it maintains a running average of past gradients to smooth updates and overcome local minima, and (2) adaptive learning rates -- it maintains per-parameter learning rates that adapt based on the history of gradients. This means parameters that receive large gradients get smaller learning rates, and vice versa. Compared to plain SGD, Adam converges faster, requires less learning rate tuning, and handles sparse gradients well. SGD can outperform Adam with careful tuning but Adam works well out of the box.
Question 10
Medium
What happens if you call model.fit() twice?
model.fit(X_train, y_train, epochs=5)
model.fit(X_train, y_train, epochs=5)
print("Done")fit() continues from the current state of the model.
The model trains for 5 epochs, then trains for 5 more epochs starting from where the first fit() ended. The total effect is similar to training for 10 epochs. The optimizer state (momentum, etc.) carries over between calls. Output:
DoneQuestion 11
Medium
What is wrong with this code?
model = Sequential([
Dense(128, activation='relu'),
Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')
model.fit(X_train, y_train, epochs=5)The first layer is missing something.
The first Dense layer is missing
input_shape. Without it, Keras does not know the input dimensions. This will raise an error when fit() is called if the model has not been built yet. Fix: Dense(128, activation='relu', input_shape=(784,)) (replace 784 with your actual input dimension).Question 12
Hard
Explain L1 vs L2 regularization. When would you prefer L1 over L2?
L1 drives weights to exactly zero; L2 drives weights toward zero but not exactly.
L1 regularization adds the sum of absolute weight values to the loss. It drives many weights to exactly zero, producing sparse models (feature selection). L2 regularization adds the sum of squared weight values to the loss. It drives weights toward zero but rarely exactly zero, producing small but non-zero weights. Prefer L1 when you suspect many features are irrelevant and want automatic feature selection. Prefer L2 when all features are potentially useful but you want to prevent any single weight from becoming too large. L1_L2 combines both.
Question 13
Hard
How many parameters does this Functional API model have?
from tensorflow.keras import Input, Model
from tensorflow.keras.layers import Dense, Concatenate
input_a = Input(shape=(5,))
input_b = Input(shape=(3,))
x1 = Dense(4, activation='relu')(input_a)
x2 = Dense(4, activation='relu')(input_b)
merged = Concatenate()([x1, x2])
output = Dense(1, activation='sigmoid')(merged)
model = Model(inputs=[input_a, input_b], outputs=output)
print(model.count_params())Compute parameters for each Dense layer separately. Concatenate has no parameters.
45Question 14
Hard
Arjun trained a model with Dropout(0.5) and noticed that training accuracy was 78% but test accuracy was 85%. How is this possible? Is this a problem?
Dropout is active during training but disabled during inference.
This is expected behavior with Dropout, not a bug. During training, 50% of neurons are randomly deactivated, making the model work with only half its capacity. This naturally lowers training accuracy. During testing/prediction, all neurons are active (dropout is disabled), so the model uses its full capacity and achieves higher accuracy. This gap is normal when using high dropout rates. It is not a problem -- it actually shows that dropout is effectively preventing overfitting.
Question 15
Hard
What will happen when this code runs?
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
# Model expects 784 inputs
model = Sequential([
Dense(128, activation='relu', input_shape=(784,)),
Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')
# But data has shape (1000, 28, 28) -- not flattened
import numpy as np
X = np.random.rand(1000, 28, 28)
y = np.random.randint(0, 10, 1000)
model.fit(X, y, epochs=1)The model expects flat (784,) input but receives (28, 28) shaped data.
This raises a
ValueError: Input 0 of layer 'dense' is incompatible with the layer: expected axis -1 of input shape to have value 784 but received input with shape (None, 28, 28).Question 16
Hard
What is the difference between batch_size=1, batch_size=32, and batch_size=len(X_train) during training? How does batch size affect training?
Think about the frequency of weight updates and the quality of gradient estimates.
batch_size=1 (Stochastic Gradient Descent): updates weights after every single sample. Very noisy gradients, can escape local minima, very slow per epoch (many updates). batch_size=32 (Mini-batch): updates after every 32 samples. Good balance between noise and efficiency. The standard choice. batch_size=len(X_train) (Full Batch): updates once per epoch using the entire dataset. Very accurate gradients but computationally expensive, may get stuck in sharp minima, and requires the entire dataset to fit in memory. Smaller batches generalize better but train slower; larger batches converge faster but may generalize worse.
Multiple Choice Questions
MCQ 1
Which library is the high-level API built into TensorFlow for building neural networks?
Answer: B
B is correct. Keras is the high-level API integrated into TensorFlow as
B is correct. Keras is the high-level API integrated into TensorFlow as
tf.keras. It provides a simple interface for defining, training, and evaluating neural networks.MCQ 2
What does model.fit() do in Keras?
Answer: C
C is correct.
C is correct.
model.fit() trains the model by performing forward passes, computing loss, calculating gradients, and updating weights for the specified number of epochs.MCQ 3
Which activation function is most commonly used for the hidden layers of a deep neural network?
Answer: C
C is correct. ReLU (Rectified Linear Unit) is the standard activation for hidden layers. It is computationally efficient and helps with the vanishing gradient problem. Sigmoid is used for binary output, softmax for multi-class output.
C is correct. ReLU (Rectified Linear Unit) is the standard activation for hidden layers. It is computationally efficient and helps with the vanishing gradient problem. Sigmoid is used for binary output, softmax for multi-class output.
MCQ 4
What does the 'epochs' parameter in model.fit() control?
Answer: C
C is correct. One epoch means the model has seen every training sample once. Training for 10 epochs means 10 complete passes through the entire training dataset.
C is correct. One epoch means the model has seen every training sample once. Training for 10 epochs means 10 complete passes through the entire training dataset.
MCQ 5
Which loss function is correct for multi-class classification with integer labels?
Answer: C
C is correct.
C is correct.
sparse_categorical_crossentropy is used when labels are integers (0, 1, 2, ...). categorical_crossentropy requires one-hot encoded labels. binary_crossentropy is for two-class problems.MCQ 6
What does Dropout(0.3) do during training?
Answer: B
B is correct. Dropout randomly deactivates 30% of the neurons in that layer during each training step. This prevents co-adaptation and reduces overfitting. During prediction, all neurons are active.
B is correct. Dropout randomly deactivates 30% of the neurons in that layer during each training step. This prevents co-adaptation and reduces overfitting. During prediction, all neurons are active.
MCQ 7
What does EarlyStopping with patience=5 do?
Answer: B
B is correct. EarlyStopping monitors a metric (e.g., val_loss). If the metric does not improve for 5 consecutive epochs (patience=5), training is stopped. This prevents overfitting by stopping at the right time.
B is correct. EarlyStopping monitors a metric (e.g., val_loss). If the metric does not improve for 5 consecutive epochs (patience=5), training is stopped. This prevents overfitting by stopping at the right time.
MCQ 8
In model.compile(), what role does the loss function play?
Answer: B
B is correct. The loss function quantifies the error between predictions and true labels. The optimizer uses the gradient of this loss to update the model's weights. Lower loss means better predictions.
B is correct. The loss function quantifies the error between predictions and true labels. The optimizer uses the gradient of this loss to update the model's weights. Lower loss means better predictions.
MCQ 9
What is the purpose of validation_split=0.2 in model.fit()?
Answer: B
B is correct. validation_split=0.2 holds out 20% of the training data as a validation set. The model trains on the remaining 80% and evaluates on the 20% at the end of each epoch. This helps detect overfitting.
B is correct. validation_split=0.2 holds out 20% of the training data as a validation set. The model trains on the remaining 80% and evaluates on the 20% at the end of each epoch. This helps detect overfitting.
MCQ 10
Which of the following is NOT a valid Keras callback?
Answer: D
D is correct. WeightDecay is not a Keras callback. It is a concept related to regularization (and can be set as an optimizer parameter). EarlyStopping, ModelCheckpoint, and ReduceLROnPlateau are all built-in Keras callbacks.
D is correct. WeightDecay is not a Keras callback. It is a concept related to regularization (and can be set as an optimizer parameter). EarlyStopping, ModelCheckpoint, and ReduceLROnPlateau are all built-in Keras callbacks.
MCQ 11
What does the Functional API allow that the Sequential API does not?
Answer: C
C is correct. The Functional API supports models with multiple inputs, multiple outputs, shared layers, and skip connections. The Sequential API only supports a linear stack of layers. Both support Dense layers, backpropagation, and all optimizers.
C is correct. The Functional API supports models with multiple inputs, multiple outputs, shared layers, and skip connections. The Sequential API only supports a linear stack of layers. Both support Dense layers, backpropagation, and all optimizers.
MCQ 12
What does Batch Normalization do?
Answer: B
B is correct. Batch Normalization normalizes the output of each layer to have approximately zero mean and unit variance, computed per mini-batch. This stabilizes training, allows higher learning rates, and has a mild regularization effect.
B is correct. Batch Normalization normalizes the output of each layer to have approximately zero mean and unit variance, computed per mini-batch. This stabilizes training, allows higher learning rates, and has a mild regularization effect.
MCQ 13
If a Dense layer has 256 inputs and 128 outputs, how many total parameters does it have (including biases)?
Answer: B
B is correct. Weights: 256 x 128 = 32768. Biases: 128. Total: 32768 + 128 = 32896. Each input connects to each output (weights), and each output neuron has one bias.
B is correct. Weights: 256 x 128 = 32768. Biases: 128. Total: 32768 + 128 = 32896. Each input connects to each output (weights), and each output neuron has one bias.
MCQ 14
Which model saving format is recommended in TensorFlow 2.16+?
Answer: C
C is correct. The
C is correct. The
.keras format is the recommended format in TensorFlow 2.16+. It saves the complete model (architecture, weights, optimizer state) in a single file. The older .h5 format is still supported but not recommended for new code.MCQ 15
What happens to Dropout during model.predict()?
Answer: B
B is correct. Dropout is only active during training (model.fit()). During prediction (model.predict()) and evaluation (model.evaluate()), dropout is automatically disabled. All neurons contribute to the output, and their outputs are scaled appropriately.
B is correct. Dropout is only active during training (model.fit()). During prediction (model.predict()) and evaluation (model.evaluate()), dropout is automatically disabled. All neurons contribute to the output, and their outputs are scaled appropriately.
MCQ 16
ReduceLROnPlateau with factor=0.5 and patience=3 starts with lr=0.01. After 3 epochs of no improvement, what is the new learning rate?
Answer: B
B is correct. The learning rate is multiplied by the factor: 0.01 x 0.5 = 0.005. After another 3 epochs without improvement, it would become 0.005 x 0.5 = 0.0025, and so on until it reaches min_lr.
B is correct. The learning rate is multiplied by the factor: 0.01 x 0.5 = 0.005. After another 3 epochs without improvement, it would become 0.005 x 0.5 = 0.0025, and so on until it reaches min_lr.
MCQ 17
In the Functional API, what does Concatenate()([x1, x2]) do?
Answer: C
C is correct.
C is correct.
Concatenate() joins tensors along the last axis (by default). If x1 has shape (None, 32) and x2 has shape (None, 64), the result has shape (None, 96). Use Add() for element-wise addition (requires same shapes).MCQ 18
What does model.summary() display in Keras?
Answer: B
B is correct.
B is correct.
model.summary() prints a table showing each layer's name, output shape, and number of parameters. It also shows the total trainable and non-trainable parameter counts.MCQ 19
What is the purpose of the learning rate in an optimizer?
Answer: B
B is correct. The learning rate controls how much the weights are adjusted in response to the computed gradient. Too high = unstable training (divergence). Too low = very slow convergence. Typical default for Adam: 0.001.
B is correct. The learning rate controls how much the weights are adjusted in response to the computed gradient. Too high = unstable training (divergence). Too low = very slow convergence. Typical default for Adam: 0.001.
MCQ 20
What does verbose=0 mean in model.fit()?
Answer: C
C is correct. verbose=0 suppresses all output during training. verbose=1 shows a progress bar per epoch. verbose=2 shows one line per epoch. Use verbose=0 when running experiments programmatically or in loops.
C is correct. verbose=0 suppresses all output during training. verbose=1 shows a progress bar per epoch. verbose=2 shows one line per epoch. Use verbose=0 when running experiments programmatically or in loops.
MCQ 21
What happens if you compile a model with 'mse' loss but the task is multi-class classification?
Answer: C
C is correct. MSE (mean squared error) is designed for regression, not classification. Using it for multi-class classification will not crash, but the model will converge slowly and achieve worse accuracy than with the appropriate loss (categorical or sparse categorical crossentropy).
C is correct. MSE (mean squared error) is designed for regression, not classification. Using it for multi-class classification will not crash, but the model will converge slowly and achieve worse accuracy than with the appropriate loss (categorical or sparse categorical crossentropy).
MCQ 22
What activation should the output layer use for a regression task?
Answer: D
D is correct. Regression outputs can be any real number, so no activation (linear) is used. Softmax constrains output to probabilities (classification). Sigmoid constrains to [0,1] (binary classification). ReLU constrains to non-negative values.
D is correct. Regression outputs can be any real number, so no activation (linear) is used. Softmax constrains output to probabilities (classification). Sigmoid constrains to [0,1] (binary classification). ReLU constrains to non-negative values.
MCQ 23
How does ExponentialDecay learning rate schedule work?
Answer: B
B is correct. ExponentialDecay reduces the learning rate by multiplying it by
B is correct. ExponentialDecay reduces the learning rate by multiplying it by
decay_rate every decay_steps training steps. This gradually decreases the learning rate, allowing the model to make large updates initially and fine-grained adjustments later.Coding Challenges
Coding challenges coming soon.
Need to Review the Concepts?
Go back to the detailed notes for this chapter.
Read Chapter NotesWant to learn AI and ML with a live mentor?
Explore our AI/ML Masterclass