What Is It?
What Is TensorFlow and Keras?
TensorFlow is an open-source deep learning framework developed by Google Brain. It provides the entire ecosystem for building, training, and deploying machine learning models -- from research prototypes to production systems. TensorFlow handles the heavy mathematical computations (tensor operations, automatic differentiation, GPU acceleration) so you can focus on designing your model architecture.
import tensorflow as tf
print(tf.__version__) # 2.16.x or later
print("GPUs available:", len(tf.config.list_physical_devices('GPU')))Keras is the high-level API of TensorFlow. Since TensorFlow 2.x, Keras is fully integrated as tf.keras. It provides a clean, intuitive interface for defining layers, compiling models, training, and evaluation. You do not need to manage low-level tensor operations or gradient computation -- Keras handles all of that behind the scenes.
from tensorflow import keras
from tensorflow.keras import layers
# This is all you need to build a neural network
model = keras.Sequential([
layers.Dense(128, activation='relu', input_shape=(784,)),
layers.Dense(10, activation='softmax')
])Think of TensorFlow as the engine and Keras as the steering wheel. TensorFlow powers the computation; Keras gives you a simple way to drive.
The TensorFlow Ecosystem
TensorFlow is not just a training library. The ecosystem includes:
- TensorFlow Core -- the main library for building and training models
- TensorBoard -- visualization tool for monitoring training (loss curves, model graphs, histograms)
- TensorFlow Lite -- deploy models on mobile and edge devices
- TensorFlow Serving -- serve models in production via REST/gRPC APIs
- TensorFlow Hub -- repository of pre-trained models you can reuse
- TensorFlow Datasets -- ready-to-use datasets with consistent API
Why Does It Matter?
Why Learn TensorFlow and Keras?
1. Industry Standard
TensorFlow is one of the two dominant deep learning frameworks (alongside PyTorch). Google, Uber, Airbnb, Intel, Twitter, and thousands of companies use TensorFlow in production. If Priya wants to work as an ML engineer, TensorFlow proficiency is practically required.
2. Keras Makes Deep Learning Accessible
Before Keras, building neural networks required writing hundreds of lines of boilerplate code for gradient computation, weight updates, and training loops. Keras reduces a complete model to a few lines. Rohan can build, train, and evaluate a deep learning model in under 20 lines of code. This does not mean Keras is a toy -- it powers production systems at Google scale.
3. Seamless CPU-to-GPU Transition
TensorFlow automatically detects and uses available GPUs. The same code that runs on a laptop CPU will run on an NVIDIA GPU or a Google TPU without any changes. Training that takes hours on a CPU can finish in minutes on a GPU. This hardware abstraction is critical for real-world deep learning.
4. Production Deployment
Unlike some frameworks that are research-only, TensorFlow was built for production from day one. You can train a model in Python and deploy it as a REST API (TensorFlow Serving), on a mobile phone (TensorFlow Lite), or in a browser (TensorFlow.js). This end-to-end pipeline is why industry prefers TensorFlow.
5. Complete Tooling
TensorBoard provides real-time visualization of training metrics, model architecture, and weight distributions. Callbacks give you fine-grained control over the training process. Model saving and loading is straightforward. These tools make debugging and iterating on models practical rather than painful.
Detailed Explanation
Detailed Explanation
1. Building Models with the Sequential API
The Sequential API is the simplest way to build a neural network in Keras. You stack layers one after another in a linear sequence. Each layer has exactly one input tensor and one output tensor.
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
model = Sequential([
Dense(256, activation='relu', input_shape=(784,)),
Dense(128, activation='relu'),
Dense(64, activation='relu'),
Dense(10, activation='softmax')
])
model.summary()Key parameters of Dense layers:
- units -- number of neurons in the layer (first argument)
- activation -- activation function applied after the linear transformation. Common: 'relu' (hidden layers), 'softmax' (multi-class output), 'sigmoid' (binary output)
- input_shape -- required only for the first layer, specifies the shape of one input sample (not including the batch dimension)
In the model above, a 784-dimensional input passes through three hidden layers (256, 128, 64 neurons with ReLU) and outputs a 10-class probability distribution via softmax.
2. Compiling the Model
Before training, you must compile the model by specifying the optimizer, loss function, and metrics:
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)Optimizer controls how the model updates its weights during training:
'sgd'-- Stochastic Gradient Descent (basic, often slow)'adam'-- Adaptive Moment Estimation (most popular default, works well in most cases)'rmsprop'-- good for RNNs and noisy gradients- You can also set learning rate:
keras.optimizers.Adam(learning_rate=0.001)
Loss function measures how wrong the model's predictions are:
'sparse_categorical_crossentropy'-- multi-class classification with integer labels (0, 1, 2, ...)'categorical_crossentropy'-- multi-class with one-hot encoded labels'binary_crossentropy'-- binary classification (two classes)'mse'-- mean squared error for regression
Metrics are values you want to monitor during training but they do not affect weight updates:
'accuracy'-- classification accuracy'mae'-- mean absolute error (for regression)
3. Training with model.fit()
The fit() method trains the model on your data:
history = model.fit(
X_train, y_train,
epochs=20,
batch_size=32,
validation_split=0.2,
verbose=1
)Key parameters:
- epochs -- number of complete passes through the training data. One epoch means every training sample has been seen once.
- batch_size -- number of samples processed before one weight update. Smaller batches (32) give noisier but more frequent updates. Larger batches (256) give smoother but fewer updates.
- validation_split -- fraction of training data reserved for validation. The model trains on 80% and validates on 20%. This helps detect overfitting.
- verbose -- 0 (silent), 1 (progress bar), 2 (one line per epoch)
The fit() method returns a History object containing the training and validation loss/metrics for each epoch. This is invaluable for plotting learning curves.
4. Evaluation and Prediction
# Evaluate on test data
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f"Test accuracy: {test_accuracy:.4f}")
# Predict on new data
predictions = model.predict(X_new)
predicted_class = predictions.argmax(axis=1) # Get class with highest probabilitymodel.evaluate() returns the loss value and metric values for the model on a dataset. model.predict() generates output predictions for the input samples. For classification with softmax output, each prediction is a probability array -- use argmax to get the predicted class.
5. Callbacks -- Controlling the Training Process
Callbacks are functions that execute at specific points during training (after each epoch, after each batch, etc.). They give you powerful control over the training loop without writing custom training code.
EarlyStopping
from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(
monitor='val_loss', # Watch validation loss
patience=5, # Wait 5 epochs for improvement
restore_best_weights=True # Go back to the best epoch
)
model.fit(X_train, y_train, epochs=100, callbacks=[early_stop])EarlyStopping monitors a metric and stops training when it stops improving. Without it, you might train for 100 epochs when the model peaked at epoch 23 and started overfitting. patience=5 means the training continues for 5 more epochs after the last improvement, giving the model a chance to recover from temporary plateaus.
ModelCheckpoint
from tensorflow.keras.callbacks import ModelCheckpoint
checkpoint = ModelCheckpoint(
'best_model.keras', # File path
monitor='val_accuracy', # Metric to watch
save_best_only=True, # Only save when metric improves
mode='max' # 'max' for accuracy, 'min' for loss
)
model.fit(X_train, y_train, epochs=50, callbacks=[checkpoint])ModelCheckpoint saves the model at the best epoch. If the model achieves 95% accuracy at epoch 12 and never beats it again, epoch 12's weights are saved. You can load this later with keras.models.load_model('best_model.keras').
ReduceLROnPlateau
from tensorflow.keras.callbacks import ReduceLROnPlateau
reduce_lr = ReduceLROnPlateau(
monitor='val_loss',
factor=0.5, # Multiply learning rate by 0.5
patience=3, # Wait 3 epochs with no improvement
min_lr=1e-6 # Do not reduce below this
)
model.fit(X_train, y_train, epochs=100, callbacks=[reduce_lr])When the model is stuck on a plateau (loss not decreasing), reducing the learning rate often helps it converge to a better minimum. This callback automates what practitioners used to do manually.
You can combine multiple callbacks:
model.fit(
X_train, y_train,
epochs=100,
callbacks=[early_stop, checkpoint, reduce_lr]
)6. The Functional API
The Sequential API only supports linear stacks of layers. For more complex architectures (multiple inputs, multiple outputs, shared layers, skip connections), use the Functional API:
from tensorflow.keras import Input, Model
from tensorflow.keras.layers import Dense, Concatenate
# Two separate inputs
numeric_input = Input(shape=(10,), name='numeric')
text_input = Input(shape=(100,), name='text')
# Process each input with its own layers
x1 = Dense(64, activation='relu')(numeric_input)
x2 = Dense(128, activation='relu')(text_input)
x2 = Dense(64, activation='relu')(x2)
# Merge the two paths
combined = Concatenate()([x1, x2])
output = Dense(32, activation='relu')(combined)
output = Dense(1, activation='sigmoid')(output)
model = Model(inputs=[numeric_input, text_input], outputs=output)
model.summary()The Functional API treats each layer as a function that takes a tensor and returns a tensor. This lets you build any architecture -- ResNet-style skip connections, multi-input models, models with branching paths.
7. Regularization -- Preventing Overfitting
Dropout
from tensorflow.keras.layers import Dropout
model = Sequential([
Dense(256, activation='relu', input_shape=(784,)),
Dropout(0.3), # Randomly turn off 30% of neurons during training
Dense(128, activation='relu'),
Dropout(0.3),
Dense(10, activation='softmax')
])Dropout randomly deactivates a fraction of neurons during each training step. This prevents neurons from co-adapting (relying too heavily on specific other neurons). During prediction (inference), all neurons are active but their outputs are scaled down to compensate. Dropout is one of the most effective regularization techniques in deep learning.
L1 and L2 Regularization
from tensorflow.keras.regularizers import l1, l2, l1_l2
model = Sequential([
Dense(256, activation='relu', input_shape=(784,),
kernel_regularizer=l2(0.01)), # L2 penalty on weights
Dense(128, activation='relu',
kernel_regularizer=l1_l2(l1=0.001, l2=0.01)), # Both L1 and L2
Dense(10, activation='softmax')
])L2 regularization adds a penalty proportional to the square of the weight values, pushing weights toward zero but not exactly zero. L1 regularization adds a penalty proportional to the absolute weight values, which can push weights to exactly zero (sparse weights). Smaller weights mean the model is less likely to overfit to noise in the training data.
8. Batch Normalization
from tensorflow.keras.layers import BatchNormalization
model = Sequential([
Dense(256, input_shape=(784,)),
BatchNormalization(),
keras.layers.Activation('relu'),
Dense(128),
BatchNormalization(),
keras.layers.Activation('relu'),
Dense(10, activation='softmax')
])Batch Normalization normalizes the outputs of a layer to have zero mean and unit variance for each mini-batch. Benefits:
- Stabilizes training by reducing internal covariate shift
- Allows higher learning rates (faster training)
- Has a slight regularization effect (reduces overfitting)
- Helps with vanishing/exploding gradient problems
Note the pattern: Dense (without activation) then BatchNormalization then Activation. This is a common best practice because normalization should happen before the activation function.
9. Saving and Loading Models
# Save the entire model (architecture + weights + optimizer state)
model.save('my_model.keras')
# Load it back
loaded_model = keras.models.load_model('my_model.keras')
# Save only the weights
model.save_weights('model_weights.weights.h5')
# Load weights into a model with the same architecture
new_model = build_model() # Must have same architecture
new_model.load_weights('model_weights.weights.h5')The .keras format (TensorFlow 2.16+) is the recommended format. It saves everything needed to restore the model exactly. Saving weights only is useful when you want to share weights between models or when the architecture is defined in code.
10. TensorBoard for Visualization
from tensorflow.keras.callbacks import TensorBoard
import datetime
log_dir = "logs/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = TensorBoard(
log_dir=log_dir,
histogram_freq=1, # Log weight histograms every epoch
write_graph=True # Visualize the model graph
)
model.fit(
X_train, y_train,
epochs=20,
validation_split=0.2,
callbacks=[tensorboard_callback]
)
# In terminal: tensorboard --logdir logs/
# Open http://localhost:6006 in browserTensorBoard provides interactive dashboards showing:
- Scalars -- loss and metric curves for training and validation
- Graphs -- visual representation of the model architecture
- Histograms -- weight and bias distributions over time
- Images -- sample inputs/outputs for debugging
Code Examples
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau
import numpy as np
# Load Fashion-MNIST dataset
(X_train, y_train), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()
# Class names for interpretation
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
print(f"Training data shape: {X_train.shape}") # (60000, 28, 28)
print(f"Test data shape: {X_test.shape}") # (10000, 28, 28)
print(f"Labels range: {y_train.min()} to {y_train.max()}")
# Preprocess: flatten 28x28 to 784 and normalize to [0, 1]
X_train = X_train.reshape(-1, 784).astype('float32') / 255.0
X_test = X_test.reshape(-1, 784).astype('float32') / 255.0
# Build model
model = keras.Sequential([
Dense(512, activation='relu', input_shape=(784,)),
BatchNormalization(),
Dropout(0.3),
Dense(256, activation='relu'),
BatchNormalization(),
Dropout(0.3),
Dense(128, activation='relu'),
Dropout(0.2),
Dense(10, activation='softmax')
])
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=0.001),
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
model.summary()
# Callbacks
callbacks = [
EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True),
ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3, min_lr=1e-6)
]
# Train
history = model.fit(
X_train, y_train,
epochs=50,
batch_size=64,
validation_split=0.15,
callbacks=callbacks,
verbose=1
)
# Evaluate
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
print(f"\nTest accuracy: {test_acc:.4f}")
print(f"Test loss: {test_loss:.4f}")
# Predict on a few samples
predictions = model.predict(X_test[:5])
for i in range(5):
predicted = class_names[predictions[i].argmax()]
actual = class_names[y_test[i]]
confidence = predictions[i].max() * 100
print(f"Predicted: {predicted} ({confidence:.1f}%), Actual: {actual}")import tensorflow as tf
from tensorflow.keras import Input, Model
from tensorflow.keras.layers import Dense, Concatenate, Dropout
import numpy as np
# Scenario: predict house price using numeric features AND neighborhood text embedding
# Numeric input (5 features: area, bedrooms, age, distance_to_metro, floor)
numeric_input = Input(shape=(5,), name='numeric_features')
x1 = Dense(32, activation='relu')(numeric_input)
x1 = Dense(16, activation='relu')(x1)
# Text embedding input (pre-computed 50-dim embedding of neighborhood description)
text_input = Input(shape=(50,), name='text_embedding')
x2 = Dense(32, activation='relu')(text_input)
x2 = Dense(16, activation='relu')(x2)
# Merge both paths
combined = Concatenate()([x1, x2])
x = Dense(32, activation='relu')(combined)
x = Dropout(0.2)(x)
output = Dense(1, activation='linear', name='price')(x) # Regression output
model = Model(inputs=[numeric_input, text_input], outputs=output)
model.compile(
optimizer='adam',
loss='mse',
metrics=['mae']
)
model.summary()
# Generate dummy data for demonstration
np.random.seed(42)
X_numeric = np.random.rand(1000, 5)
X_text = np.random.rand(1000, 50)
y_price = np.random.rand(1000) * 100 # Price in lakhs
# Train
history = model.fit(
[X_numeric, X_text], y_price,
epochs=10,
batch_size=32,
validation_split=0.2,
verbose=1
)
# Predict
sample_numeric = np.array([[0.8, 0.6, 0.3, 0.2, 0.5]])
sample_text = np.random.rand(1, 50)
prediction = model.predict([sample_numeric, sample_text])
print(f"Predicted price: {prediction[0][0]:.2f} lakhs")Concatenate. Each layer is called like a function on its input tensor. The Model class takes a list of inputs and outputs. This pattern is common in real-world applications where you combine different data types.import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense
import matplotlib.pyplot as plt
import numpy as np
# Load and preprocess MNIST
(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()
X_train = X_train.reshape(-1, 784).astype('float32') / 255.0
X_test = X_test.reshape(-1, 784).astype('float32') / 255.0
# Simple model
model = keras.Sequential([
Dense(128, activation='relu', input_shape=(784,)),
Dense(64, activation='relu'),
Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Train and capture history
history = model.fit(X_train, y_train, epochs=15, batch_size=32,
validation_split=0.2, verbose=0)
# Plot loss curves
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Loss plot
axes[0].plot(history.history['loss'], label='Training Loss', linewidth=2)
axes[0].plot(history.history['val_loss'], label='Validation Loss', linewidth=2)
axes[0].set_title('Loss Over Epochs')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Loss')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
# Accuracy plot
axes[1].plot(history.history['accuracy'], label='Training Accuracy', linewidth=2)
axes[1].plot(history.history['val_accuracy'], label='Validation Accuracy', linewidth=2)
axes[1].set_title('Accuracy Over Epochs')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Accuracy')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('training_curves.png', dpi=100)
plt.show()
print(f"Final training accuracy: {history.history['accuracy'][-1]:.4f}")
print(f"Final validation accuracy: {history.history['val_accuracy'][-1]:.4f}")History object returned by model.fit() stores the loss and metric values for every epoch. Plotting these learning curves is essential for diagnosing problems: if training loss keeps decreasing but validation loss starts increasing, the model is overfitting. If both losses are high, the model is underfitting. This visual analysis guides decisions about model complexity, regularization, and training duration.import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense
import numpy as np
# Method 1: Using a learning rate schedule
initial_lr = 0.01
decay_steps = 1000
lr_schedule = keras.optimizers.schedules.ExponentialDecay(
initial_learning_rate=initial_lr,
decay_steps=decay_steps,
decay_rate=0.9,
staircase=True # Decay in discrete steps, not continuously
)
optimizer = keras.optimizers.Adam(learning_rate=lr_schedule)
# Method 2: Using a custom callback
class PrintLearningRate(keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
lr = self.model.optimizer.learning_rate
if hasattr(lr, 'numpy'):
current_lr = lr.numpy()
else:
current_lr = float(tf.keras.backend.get_value(lr))
print(f" -> Learning rate after epoch {epoch + 1}: {current_lr:.6f}")
# Build and train with scheduled LR
(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()
X_train = X_train.reshape(-1, 784).astype('float32') / 255.0
X_test = X_test.reshape(-1, 784).astype('float32') / 255.0
model = keras.Sequential([
Dense(128, activation='relu', input_shape=(784,)),
Dense(10, activation='softmax')
])
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=5, batch_size=128,
validation_split=0.1, verbose=1)
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
print(f"\nTest accuracy: {test_acc:.4f}")ExponentialDecay multiplies the learning rate by decay_rate every decay_steps steps. staircase=True makes the decay happen in steps rather than continuously. The custom callback demonstrates how to inspect and print the current learning rate. Proper learning rate management often makes the difference between a mediocre model and an excellent one.import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.regularizers import l2
import numpy as np
# Create a small dataset that is easy to overfit
np.random.seed(42)
X = np.random.randn(200, 20)
y = (X[:, 0] + X[:, 1] * 0.5 + np.random.randn(200) * 0.1 > 0).astype(int)
X_train, X_test = X[:150], X[150:]
y_train, y_test = y[:150], y[150:]
def build_model(name, regularization='none'):
layers_list = []
if regularization == 'l2':
layers_list.append(Dense(128, activation='relu', input_shape=(20,),
kernel_regularizer=l2(0.01)))
layers_list.append(Dense(64, activation='relu', kernel_regularizer=l2(0.01)))
else:
layers_list.append(Dense(128, activation='relu', input_shape=(20,)))
if regularization == 'dropout':
layers_list.append(Dropout(0.5))
layers_list.append(Dense(64, activation='relu'))
if regularization == 'dropout':
layers_list.append(Dropout(0.5))
layers_list.append(Dense(1, activation='sigmoid'))
model = keras.Sequential(layers_list)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
return model
results = {}
for reg_type in ['none', 'dropout', 'l2']:
model = build_model(reg_type, regularization=reg_type)
history = model.fit(X_train, y_train, epochs=100, batch_size=16,
validation_data=(X_test, y_test), verbose=0)
train_acc = history.history['accuracy'][-1]
val_acc = history.history['val_accuracy'][-1]
results[reg_type] = {'train': train_acc, 'val': val_acc}
print(f"{'Model':<12} {'Train Acc':>10} {'Val Acc':>10} {'Gap':>8}")
print("-" * 42)
for name, r in results.items():
gap = r['train'] - r['val']
print(f"{name:<12} {r['train']:>10.4f} {r['val']:>10.4f} {gap:>8.4f}")import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense
import numpy as np
# Load data
(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()
X_train = X_train.reshape(-1, 784).astype('float32') / 255.0
X_test = X_test.reshape(-1, 784).astype('float32') / 255.0
# Build and train for 5 epochs
model = keras.Sequential([
Dense(128, activation='relu', input_shape=(784,)),
Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
print("=== Phase 1: Training for 5 epochs ===")
model.fit(X_train, y_train, epochs=5, batch_size=64, verbose=1)
# Save the complete model
model.save('checkpoint_epoch5.keras')
print("\nModel saved.")
# Simulate loading in a new session
print("\n=== Phase 2: Loading and resuming training ===")
loaded_model = keras.models.load_model('checkpoint_epoch5.keras')
# Verify it works
loss, acc = loaded_model.evaluate(X_test, y_test, verbose=0)
print(f"Loaded model test accuracy: {acc:.4f}")
# Continue training for 5 more epochs
loaded_model.fit(X_train, y_train, epochs=5, batch_size=64, verbose=1)
# Final evaluation
final_loss, final_acc = loaded_model.evaluate(X_test, y_test, verbose=0)
print(f"\nFinal test accuracy after 10 total epochs: {final_acc:.4f}")
# Save only weights
loaded_model.save_weights('final_weights.weights.h5')
print("Weights saved separately.")model.save() saves the complete model (architecture, weights, optimizer state) so you can resume training exactly where you left off. save_weights() saves only the learned parameters, which is useful when you define the architecture in code. The .keras format is recommended for TensorFlow 2.16+. Loading a model and calling fit() again continues from the last optimizer state, including learning rate and momentum.import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.callbacks import TensorBoard, EarlyStopping
import datetime
import numpy as np
# Load data
(X_train, y_train), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()
X_train = X_train.reshape(-1, 784).astype('float32') / 255.0
X_test = X_test.reshape(-1, 784).astype('float32') / 255.0
# Create a unique log directory for each run
log_dir = "logs/fashion_" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
# Build model
model = keras.Sequential([
Dense(256, activation='relu', input_shape=(784,)),
Dropout(0.3),
Dense(128, activation='relu'),
Dropout(0.2),
Dense(10, activation='softmax')
])
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
# TensorBoard callback with histogram logging
tb_callback = TensorBoard(
log_dir=log_dir,
histogram_freq=1, # Log weight histograms every epoch
write_graph=True, # Log the computation graph
write_images=False,
update_freq='epoch',
profile_batch=0 # Disable profiling for speed
)
early_stop = EarlyStopping(monitor='val_loss', patience=5,
restore_best_weights=True)
# Train
history = model.fit(
X_train, y_train,
epochs=30,
batch_size=64,
validation_split=0.15,
callbacks=[tb_callback, early_stop],
verbose=1
)
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
print(f"\nTest accuracy: {test_acc:.4f}")
print(f"\nTo view TensorBoard, run in terminal:")
print(f" tensorboard --logdir {log_dir}")
print(f"Then open http://localhost:6006")TensorBoard callback, all training metrics (loss, accuracy, learning rate) are logged to disk. Setting histogram_freq=1 additionally logs weight and bias distributions for every layer at every epoch, which is useful for detecting vanishing or exploding gradients. Each run gets a unique directory based on the timestamp, so you can compare multiple experiments side by side in the TensorBoard UI.Common Mistakes
Forgetting to Normalize Input Data
# Pixel values are 0-255 integers
(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()
X_train = X_train.reshape(-1, 784) # No normalization!
model = keras.Sequential([
Dense(128, activation='relu', input_shape=(784,)),
Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10) # Very slow convergence, poor accuracyX_train = X_train.reshape(-1, 784).astype('float32') / 255.0 # Normalize to [0, 1]
X_test = X_test.reshape(-1, 784).astype('float32') / 255.0
# Now training converges much faster
model.fit(X_train, y_train, epochs=10)Using Wrong Loss Function for the Task
# Multi-class classification with integer labels
model.compile(
optimizer='adam',
loss='categorical_crossentropy', # Wrong! Expects one-hot labels
metrics=['accuracy']
)
# y_train contains [0, 3, 7, 2, ...] not [[1,0,0,...], [0,0,0,1,...], ...]# Option 1: Use sparse_categorical_crossentropy for integer labels
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
# Option 2: One-hot encode labels and use categorical_crossentropy
from tensorflow.keras.utils import to_categorical
y_train_onehot = to_categorical(y_train, num_classes=10)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])sparse_categorical_crossentropy when labels are integers (0, 1, 2, ...) and categorical_crossentropy when labels are one-hot encoded. For binary classification (two classes), use binary_crossentropy with a sigmoid output, not softmax. Mismatching the loss function and label format is one of the most common beginner errors.Not Using restore_best_weights in EarlyStopping
early_stop = EarlyStopping(monitor='val_loss', patience=5)
# Stops at epoch 25, but best was epoch 20
# Without restore_best_weights, you keep epoch 25's weights (worse)early_stop = EarlyStopping(
monitor='val_loss',
patience=5,
restore_best_weights=True # Go back to the best epoch's weights
)restore_best_weights is False. This means that if training is stopped at epoch 25 but the best validation loss was at epoch 20, you end up with the inferior epoch 25 weights. Always set restore_best_weights=True to ensure you keep the best model.Mismatching input_shape and Actual Data Shape
# Data is 28x28 images (2D)
(X_train, y_train), _ = keras.datasets.mnist.load_data()
# X_train.shape is (60000, 28, 28)
model = keras.Sequential([
Dense(128, activation='relu', input_shape=(784,)), # Expects flat 784
Dense(10, activation='softmax')
])
model.fit(X_train, y_train, epochs=5) # Error!# Flatten the data before feeding to Dense layers
X_train = X_train.reshape(-1, 784).astype('float32') / 255.0
# Or use a Flatten layer
from tensorflow.keras.layers import Flatten
model = keras.Sequential([
Flatten(input_shape=(28, 28)), # Automatically flattens 28x28 to 784
Dense(128, activation='relu'),
Dense(10, activation='softmax')
])Flatten layer. The input_shape must exactly match the shape of one sample (excluding the batch dimension).Applying Dropout During Evaluation/Prediction
# Incorrect manual implementation of dropout-like behavior
import numpy as np
weights = model.get_weights()
# Manually zeroing out neurons during prediction -- wrong approach
for i in range(len(weights)):
mask = np.random.binomial(1, 0.7, size=weights[i].shape)
weights[i] *= mask# Keras automatically handles dropout correctly
# During training: neurons are randomly dropped
# During prediction: all neurons are active (outputs are scaled)
# Just call predict normally
predictions = model.predict(X_test) # Dropout is automatically OFF
# model.evaluate also turns dropout off
test_loss, test_acc = model.evaluate(X_test, y_test)model.fit(), dropout randomly deactivates neurons. During model.predict() and model.evaluate(), dropout is disabled and all neurons are active. You do not need to manage this manually. The layer uses the training flag internally.Summary
- TensorFlow is Google's open-source deep learning framework. Keras, integrated as tf.keras, is its high-level API that simplifies building, training, and evaluating neural networks. Together they provide the complete pipeline from model definition to production deployment.
- The Sequential API builds models by stacking layers linearly. Use Dense(units, activation, input_shape) for fully connected layers. The first layer needs input_shape; subsequent layers infer it automatically. Common activations: 'relu' for hidden layers, 'softmax' for multi-class output, 'sigmoid' for binary output.
- Compiling a model requires three things: an optimizer (adam is the default choice), a loss function (sparse_categorical_crossentropy for integer class labels, binary_crossentropy for binary classification, mse for regression), and metrics (accuracy for classification, mae for regression).
- model.fit() trains the model. Key parameters: epochs (passes through data), batch_size (samples per weight update, typically 32 or 64), validation_split (fraction held out for validation). It returns a History object with loss and metric values per epoch for plotting learning curves.
- Callbacks control the training loop: EarlyStopping stops when validation metric stops improving (always set restore_best_weights=True), ModelCheckpoint saves the best model to disk, ReduceLROnPlateau reduces learning rate when progress stalls. Combine multiple callbacks for robust training.
- The Functional API handles complex architectures (multiple inputs/outputs, skip connections, branching paths) by treating layers as functions. Use Input() to define entry points, call layers on tensors, and create a Model with specified inputs and outputs.
- Regularization prevents overfitting: Dropout randomly deactivates neurons during training (automatically disabled during prediction), L2 regularization penalizes large weights, L1 produces sparse weights. Batch Normalization stabilizes training and allows higher learning rates.
- Always normalize input data to [0,1] or [-1,1] range before training. Match the loss function to your label format: sparse_categorical_crossentropy for integer labels, categorical_crossentropy for one-hot, binary_crossentropy for binary classification.
- Save complete models with model.save('name.keras') and load with keras.models.load_model(). Save only weights with save_weights(). The .keras format preserves architecture, weights, and optimizer state for resuming training.
- TensorBoard provides real-time visualization of training metrics, model architecture, and weight distributions. Add a TensorBoard callback with a unique log directory per run to compare experiments. Launch with tensorboard --logdir and view at localhost:6006.