What Is It?
What Are Ensemble Methods?
An ensemble method combines multiple individual models (called base learners or weak learners) to produce a single, stronger prediction. The idea is simple: a group of average students working together can outperform a single excellent student.
Why does this work? Individual models have different strengths and weaknesses. Some overpredict, some underpredict, some fail on certain data regions. By combining them, the errors tend to cancel out, producing more accurate and robust predictions.
Two Main Ensemble Strategies
- Bagging (Bootstrap Aggregating): Train multiple models independently on random subsets of data, then average their predictions. Example: Random Forest (many decision trees on random samples). Reduces variance (overfitting).
- Boosting: Train models sequentially, where each new model focuses on correcting the errors of the previous ones. Example: XGBoost, LightGBM, CatBoost. Reduces bias (underfitting) and variance.
Boosting is generally more powerful than bagging and produces the best results in most structured/tabular data problems. XGBoost, LightGBM, and CatBoost -- the three dominant boosting libraries -- have won the majority of Kaggle competitions involving tabular data.
Why Does It Matter?
Why Are Boosting Algorithms So Dominant?
1. They Win Competitions
In Kaggle competitions (the Olympics of ML), boosting algorithms -- particularly XGBoost and LightGBM -- have won more competitions on tabular data than any other method, including deep learning. When the data is structured (rows and columns, not images or text), gradient boosting is almost always the top choice.
2. They Handle Messy Real-World Data
XGBoost can handle missing values natively (learns the best direction for missing data at each split). CatBoost handles categorical features without manual encoding. LightGBM handles large datasets efficiently with histogram-based splitting. These practical advantages make boosting ideal for production ML systems.
3. Built-in Regularization
Unlike plain decision trees that overfit easily, gradient boosting has multiple regularization mechanisms: learning rate, max depth, subsampling, column sampling, L1/L2 regularization on leaf weights. This makes it robust against overfitting even with many trees.
4. Feature Importance for Free
Boosting models automatically compute feature importance, telling you which features drive predictions. This is invaluable for understanding the model and guiding feature engineering.
5. Fast and Scalable
LightGBM and XGBoost support parallel processing, GPU acceleration, and efficient memory usage. They can train on millions of rows in minutes -- far faster than deep learning models on equivalent tabular tasks.
Detailed Explanation
Detailed Explanation
1. Bagging vs Boosting
| Aspect | Bagging | Boosting |
|---|---|---|
| Training | Models trained independently in parallel | Models trained sequentially, each correcting previous errors |
| Focus | Reduces variance (overfitting) | Reduces both bias and variance |
| Weighting | Equal weight for all models | Higher weight for better-performing models |
| Example | Random Forest | XGBoost, LightGBM, CatBoost |
| Sensitivity to noise | Less sensitive (averaging smooths noise) | More sensitive (can boost on noisy samples) |
2. How Gradient Boosting Works
Gradient boosting builds trees sequentially, where each new tree tries to correct the errors (residuals) of all previous trees combined.
- Start with a simple prediction: For regression, predict the mean of the target. For classification, predict the log-odds.
- Compute residuals: Calculate the difference between actual values and current predictions. These residuals are the errors.
- Fit a new tree to the residuals: Train a small decision tree that predicts these errors.
- Update predictions: Add the new tree's predictions (multiplied by a learning rate) to the running total. prediction = prediction + learning_rate * new_tree(X)
- Repeat steps 2-4 for a specified number of trees (n_estimators).
The learning rate (eta) controls how much each tree contributes. Smaller learning rate means each tree contributes less, requiring more trees but often producing better results. Typical values: 0.01 to 0.3.
3. XGBoost (eXtreme Gradient Boosting)
XGBoost is the most popular gradient boosting implementation. It brought several innovations:
- Regularization: L1 and L2 regularization on leaf weights prevents overfitting. Plain gradient boosting does not have this.
- Parallel processing: Parallelizes the tree building process at the feature level (not the tree level -- trees are still sequential).
- Handling missing values: Learns the optimal direction for missing values at each split during training. You do not need to impute missing data.
- Column subsampling: Like Random Forest, can use a random subset of features at each tree or split, reducing overfitting.
- Efficient split finding: Uses a weighted quantile sketch to find split points efficiently for large datasets.
4. LightGBM (Light Gradient Boosting Machine)
LightGBM by Microsoft is designed for speed and efficiency on large datasets.
- Histogram-based splitting: Instead of sorting all values to find the best split (O(n log n)), LightGBM bins continuous features into discrete histograms (O(n)). This is much faster, especially for large datasets.
- Leaf-wise tree growth: While XGBoost grows trees level-by-level (depth-wise), LightGBM grows the leaf with the maximum loss reduction. This produces deeper, more accurate trees with fewer leaves.
- GOSS (Gradient-based One-Side Sampling): Keeps all data points with large gradients (large errors) but only samples a fraction of small-gradient points. This speeds up training while maintaining accuracy.
- EFB (Exclusive Feature Bundling): Bundles mutually exclusive features (sparse features that are rarely non-zero simultaneously) to reduce dimensionality.
5. CatBoost (Categorical Boosting)
CatBoost by Yandex specializes in handling categorical features.
- Native categorical handling: Uses ordered target encoding internally. No need for manual one-hot encoding or label encoding.
- Ordered boosting: Uses a permutation-driven approach that reduces prediction shift (a form of target leakage in gradient boosting).
- Fast inference: Generates predictions faster than XGBoost and LightGBM, making it ideal for real-time applications.
- Works well out of the box: Default hyperparameters are well-tuned, often requiring less tuning than XGBoost.
6. Comparing the Three
| Aspect | XGBoost | LightGBM | CatBoost |
|---|---|---|---|
| Training speed | Medium | Fastest | Slow (but fast inference) |
| Categorical features | Requires encoding | Requires encoding | Native handling |
| Missing values | Native handling | Native handling | Native handling |
| Tree growth | Depth-wise (level) | Leaf-wise | Symmetric (balanced) |
| Best for | General purpose | Large datasets, speed | Datasets with many categoricals |
| Overfitting risk | Medium | Higher (deep trees) | Lower (ordered boosting) |
7. Key Hyperparameters for XGBoost
- n_estimators (100-10000): Number of boosting rounds (trees). More trees = more capacity but slower and risk of overfitting. Use early stopping.
- max_depth (3-10): Maximum depth of each tree. Deeper trees capture more complex patterns but overfit more. Default: 6.
- learning_rate / eta (0.01-0.3): Step size at each boosting round. Smaller = more robust but needs more trees.
- subsample (0.5-1.0): Fraction of training samples used per tree. Values < 1 add randomness and reduce overfitting.
- colsample_bytree (0.3-1.0): Fraction of features used per tree. Like Random Forest's max_features.
- reg_alpha (0-10): L1 regularization on leaf weights. Encourages sparsity.
- reg_lambda (0-10): L2 regularization on leaf weights. Prevents large leaf values.
- min_child_weight (1-10): Minimum sum of instance weights in a leaf. Higher values prevent the tree from learning overly specific patterns.
Code Examples
import numpy as np
from xgboost import XGBClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
# Load data
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
data.data, data.target, test_size=0.2, random_state=42)
# Train XGBoost
xgb = XGBClassifier(
n_estimators=200,
max_depth=5,
learning_rate=0.1,
subsample=0.8,
colsample_bytree=0.8,
reg_lambda=1.0,
random_state=42,
eval_metric='logloss'
)
xgb.fit(X_train, y_train)
# Evaluate
y_pred = xgb.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
print(f"\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=data.target_names))
# Feature importance (top 5)
importances = xgb.feature_importances_
top5 = np.argsort(importances)[-5:][::-1]
print("Top 5 features:")
for idx in top5:
print(f" {data.feature_names[idx]}: {importances[idx]:.4f}")import numpy as np
from xgboost import XGBClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
# Load and split data
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
data.data, data.target, test_size=0.2, random_state=42)
# Train with early stopping
xgb = XGBClassifier(
n_estimators=1000, # Set high, early stopping will find the right number
max_depth=5,
learning_rate=0.05, # Small learning rate with many trees
subsample=0.8,
colsample_bytree=0.8,
random_state=42,
eval_metric='logloss',
early_stopping_rounds=20 # Stop if no improvement for 20 rounds
)
xgb.fit(
X_train, y_train,
eval_set=[(X_test, y_test)],
verbose=False
)
print(f"Best iteration: {xgb.best_iteration}")
print(f"Best score: {xgb.best_score:.4f}")
print(f"Trees used: {xgb.best_iteration + 1} out of 1000")
print(f"Accuracy: {xgb.score(X_test, y_test):.4f}")
print(f"\nEarly stopping saved training {1000 - xgb.best_iteration - 1} unnecessary trees.")import numpy as np
import lightgbm as lgb
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, roc_auc_score
import time
# Generate a larger dataset
X, y = make_classification(n_samples=50000, n_features=50,
n_informative=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42)
# LightGBM with sklearn API
start = time.time()
lgbm = lgb.LGBMClassifier(
n_estimators=500,
max_depth=-1, # No limit (leaf-wise growth controls depth)
num_leaves=31, # Controls tree complexity
learning_rate=0.05,
subsample=0.8,
colsample_bytree=0.8,
min_child_samples=20,
random_state=42,
verbose=-1
)
lgbm.fit(
X_train, y_train,
eval_set=[(X_test, y_test)],
callbacks=[lgb.early_stopping(20), lgb.log_evaluation(0)]
)
train_time = time.time() - start
y_pred = lgbm.predict(X_test)
y_prob = lgbm.predict_proba(X_test)[:, 1]
print(f"Training time: {train_time:.2f}s")
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
print(f"ROC-AUC: {roc_auc_score(y_test, y_prob):.4f}")
print(f"Best iteration: {lgbm.best_iteration_}")
print(f"Number of leaves per tree: {lgbm.get_params()['num_leaves']}")import numpy as np
import pandas as pd
from catboost import CatBoostClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Create dataset with categorical features
np.random.seed(42)
n = 1000
df = pd.DataFrame({
'age': np.random.randint(18, 65, n),
'income': np.random.normal(50000, 15000, n).astype(int),
'city': np.random.choice(['Mumbai', 'Delhi', 'Bangalore', 'Chennai', 'Kolkata'], n),
'education': np.random.choice(['High School', 'Bachelors', 'Masters', 'PhD'], n),
'employment': np.random.choice(['Salaried', 'Self-Employed', 'Freelancer'], n),
'loan_approved': np.random.randint(0, 2, n)
})
# Identify categorical columns
cat_features = ['city', 'education', 'employment']
cat_indices = [df.columns.get_loc(c) for c in cat_features]
X = df.drop('loan_approved', axis=1)
y = df['loan_approved']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# CatBoost handles categorical features natively!
model = CatBoostClassifier(
iterations=500,
depth=6,
learning_rate=0.05,
cat_features=cat_indices, # Tell CatBoost which columns are categorical
verbose=0,
random_state=42
)
model.fit(X_train, y_train, eval_set=(X_test, y_test), early_stopping_rounds=30)
print(f"Accuracy: {accuracy_score(y_test, model.predict(X_test)):.4f}")
print(f"Best iteration: {model.get_best_iteration()}")
print(f"\nFeature importance:")
for name, imp in sorted(zip(X.columns, model.feature_importances_), key=lambda x: -x[1]):
print(f" {name}: {imp:.2f}")
print(f"\nNote: No encoding needed for {cat_features}!")import numpy as np
import time
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, cross_val_score
from xgboost import XGBClassifier
import lightgbm as lgb
from catboost import CatBoostClassifier
# Generate dataset
X, y = make_classification(n_samples=10000, n_features=30,
n_informative=15, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
models = {
'XGBoost': XGBClassifier(n_estimators=200, max_depth=6, learning_rate=0.1,
random_state=42, eval_metric='logloss'),
'LightGBM': lgb.LGBMClassifier(n_estimators=200, max_depth=-1, num_leaves=31,
learning_rate=0.1, random_state=42, verbose=-1),
'CatBoost': CatBoostClassifier(iterations=200, depth=6, learning_rate=0.1,
random_state=42, verbose=0)
}
print(f"{'Model':<12} {'Accuracy':<12} {'CV Mean':<12} {'CV Std':<10} {'Time (s)':<10}")
print('-' * 56)
for name, model in models.items():
start = time.time()
model.fit(X_train, y_train)
train_time = time.time() - start
acc = model.score(X_test, y_test)
cv_scores = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy')
print(f"{name:<12} {acc:<12.4f} {cv_scores.mean():<12.4f} {cv_scores.std():<10.4f} {train_time:<10.2f}")
print("\nAll three achieve similar accuracy on this dataset.")
print("LightGBM is typically fastest. CatBoost shines with categoricals.")import numpy as np
from xgboost import XGBClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score
# Load data
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
data.data, data.target, test_size=0.2, random_state=42)
# Define parameter grid
param_grid = {
'max_depth': [3, 5, 7],
'learning_rate': [0.01, 0.05, 0.1],
'n_estimators': [100, 200, 300],
'subsample': [0.8, 1.0],
'colsample_bytree': [0.8, 1.0]
}
# Grid search with cross-validation
xgb = XGBClassifier(random_state=42, eval_metric='logloss')
grid = GridSearchCV(xgb, param_grid, cv=5, scoring='accuracy',
verbose=0, n_jobs=-1)
grid.fit(X_train, y_train)
print(f"Best parameters:")
for param, value in grid.best_params_.items():
print(f" {param}: {value}")
print(f"\nBest CV accuracy: {grid.best_score_:.4f}")
print(f"Test accuracy: {grid.score(X_test, y_test):.4f}")
# Top 3 configurations
results = grid.cv_results_
top3 = np.argsort(results['rank_test_score'])[:3]
print("\nTop 3 configurations:")
for idx in top3:
print(f" Rank {results['rank_test_score'][idx]}: "
f"accuracy={results['mean_test_score'][idx]:.4f}, "
f"params={results['params'][idx]}")import numpy as np
import pandas as pd
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import (accuracy_score, classification_report,
roc_auc_score, confusion_matrix)
from sklearn.preprocessing import LabelEncoder
# Simulate telecom churn dataset
np.random.seed(42)
n = 2000
df = pd.DataFrame({
'tenure_months': np.random.randint(1, 72, n),
'monthly_charges': np.random.normal(65, 20, n).round(2),
'total_charges': np.random.normal(3000, 1500, n).round(2),
'contract_type': np.random.choice(['Month-to-month', 'One year', 'Two year'], n,
p=[0.5, 0.3, 0.2]),
'internet_service': np.random.choice(['DSL', 'Fiber optic', 'No'], n),
'online_security': np.random.choice(['Yes', 'No'], n),
'tech_support': np.random.choice(['Yes', 'No'], n),
'num_support_tickets': np.random.poisson(2, n),
})
# Generate target with realistic patterns
churn_prob = (
0.3 * (df['contract_type'] == 'Month-to-month').astype(float) +
0.2 * (df['tenure_months'] < 12).astype(float) +
0.15 * (df['num_support_tickets'] > 3).astype(float) +
0.1 * (df['online_security'] == 'No').astype(float) +
np.random.normal(0, 0.1, n)
).clip(0, 1)
df['churned'] = (np.random.random(n) < churn_prob).astype(int)
# Feature Engineering
df['avg_monthly_charge'] = df['total_charges'] / (df['tenure_months'] + 1)
df['charge_increase'] = df['monthly_charges'] - df['avg_monthly_charge']
df['is_new_customer'] = (df['tenure_months'] <= 6).astype(int)
df['high_support'] = (df['num_support_tickets'] > 3).astype(int)
# Encode categoricals
le_cols = ['contract_type', 'internet_service', 'online_security', 'tech_support']
for col in le_cols:
df[col] = LabelEncoder().fit_transform(df[col])
print(f"Dataset shape: {df.shape}")
print(f"Churn rate: {df['churned'].mean():.2%}")
# Split
X = df.drop('churned', axis=1)
y = df['churned']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
# Train XGBoost with early stopping
xgb = XGBClassifier(
n_estimators=500,
max_depth=5,
learning_rate=0.05,
subsample=0.8,
colsample_bytree=0.8,
reg_lambda=1.0,
scale_pos_weight=len(y_train[y_train==0]) / len(y_train[y_train==1]), # Handle imbalance
random_state=42,
eval_metric='auc',
early_stopping_rounds=20
)
xgb.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=False)
# Evaluate
y_pred = xgb.predict(X_test)
y_prob = xgb.predict_proba(X_test)[:, 1]
print(f"\nResults:")
print(f" Accuracy: {accuracy_score(y_test, y_pred):.4f}")
print(f" ROC-AUC: {roc_auc_score(y_test, y_prob):.4f}")
print(f" Trees used: {xgb.best_iteration + 1}")
# Confusion matrix
cm = confusion_matrix(y_test, y_pred)
print(f"\nConfusion Matrix:")
print(f" TN={cm[0,0]}, FP={cm[0,1]}")
print(f" FN={cm[1,0]}, TP={cm[1,1]}")
# Feature importance
print(f"\nFeature Importance (Top 8):")
imps = pd.Series(xgb.feature_importances_, index=X.columns).sort_values(ascending=False)
for feat, imp in imps.head(8).items():
bar = '#' * int(imp * 40)
print(f" {feat:25s}: {imp:.4f} {bar}")Common Mistakes
Not Using Early Stopping with Gradient Boosting
from xgboost import XGBClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
data.data, data.target, test_size=0.2, random_state=42)
# WRONG: Fixed n_estimators without early stopping
xgb = XGBClassifier(n_estimators=5000, max_depth=10, learning_rate=0.01)
xgb.fit(X_train, y_train) # Trains all 5000 trees, likely overfittingfrom xgboost import XGBClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
data.data, data.target, test_size=0.2, random_state=42)
xgb = XGBClassifier(n_estimators=5000, max_depth=5, learning_rate=0.01,
eval_metric='logloss', early_stopping_rounds=30)
xgb.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=False)
print(f"Optimal trees: {xgb.best_iteration + 1}")Setting Learning Rate Too High with Many Trees
from xgboost import XGBClassifier
# WRONG: High learning rate + many trees = severe overfitting
xgb = XGBClassifier(n_estimators=1000, learning_rate=0.5, max_depth=10)
# Each tree contributes 50% of its prediction, making the model
# memorize training data quickly and overfit badlyfrom xgboost import XGBClassifier
# CORRECT: Lower learning rate requires more trees but generalizes better
xgb = XGBClassifier(
n_estimators=1000,
learning_rate=0.05, # Small learning rate
max_depth=5, # Moderate depth
early_stopping_rounds=30 # Let early stopping find the right number
)Not Handling Class Imbalance in Boosting
from xgboost import XGBClassifier
import numpy as np
# Imbalanced dataset: 95% class 0, 5% class 1
np.random.seed(42)
y = np.array([0]*950 + [1]*50)
X = np.random.randn(1000, 10)
xgb = XGBClassifier(n_estimators=100, random_state=42)
xgb.fit(X, y)
print(f"Predicts all as class 0: {sum(xgb.predict(X) == 0)}/{len(X)}")from xgboost import XGBClassifier
import numpy as np
np.random.seed(42)
y = np.array([0]*950 + [1]*50)
X = np.random.randn(1000, 10)
# Method 1: scale_pos_weight
xgb = XGBClassifier(
n_estimators=100,
scale_pos_weight=950/50, # Ratio of negative to positive
random_state=42
)
xgb.fit(X, y)
print(f"Predictions: class 0={sum(xgb.predict(X)==0)}, class 1={sum(xgb.predict(X)==1)}")Using LightGBM max_depth Without Understanding Leaf-Wise Growth
import lightgbm as lgb
# WRONG: Setting max_depth without adjusting num_leaves
lgbm = lgb.LGBMClassifier(
max_depth=15,
num_leaves=31 # Default: 31 leaves, but max_depth=15 allows 2^15=32768 leaves
)
# num_leaves is the real constraint here, not max_depthimport lightgbm as lgb
# CORRECT: Control complexity primarily through num_leaves
lgbm = lgb.LGBMClassifier(
num_leaves=63, # Primary complexity control
max_depth=-1, # No limit (let num_leaves control)
min_child_samples=20, # Prevent overly specific leaves
learning_rate=0.05,
verbose=-1
)
# Rule of thumb: num_leaves should be less than 2^max_depth
# If max_depth=6, num_leaves should be <= 64Manually Encoding Categoricals Before CatBoost
from catboost import CatBoostClassifier
from sklearn.preprocessing import LabelEncoder
import pandas as pd
df = pd.DataFrame({'city': ['Mumbai', 'Delhi', 'Mumbai', 'Chennai'],
'target': [1, 0, 1, 0]})
# WRONG: Manually encoding defeats the purpose of CatBoost
le = LabelEncoder()
df['city_encoded'] = le.fit_transform(df['city'])
model = CatBoostClassifier(verbose=0)
model.fit(df[['city_encoded']], df['target'])
# Label encoding creates an arbitrary ordinal relationship (Mumbai > Delhi > Chennai)from catboost import CatBoostClassifier
import pandas as pd
df = pd.DataFrame({'city': ['Mumbai', 'Delhi', 'Mumbai', 'Chennai'],
'target': [1, 0, 1, 0]})
# CORRECT: Pass raw categorical columns to CatBoost
model = CatBoostClassifier(cat_features=[0], verbose=0)
model.fit(df[['city']], df['target']) # Raw string values!Summary
- Ensemble methods combine multiple models to produce better predictions. Bagging (Random Forest) reduces variance by averaging independent models. Boosting (XGBoost, LightGBM, CatBoost) reduces both bias and variance by training models sequentially to correct errors.
- Gradient boosting builds trees sequentially: each new tree predicts the residuals (errors) of the current ensemble. The learning rate controls how much each tree contributes. Smaller learning rate + more trees = better generalization.
- XGBoost introduced regularization (L1/L2 on leaf weights), parallel processing, native missing value handling, and column subsampling. It is the most widely used boosting implementation.
- LightGBM is optimized for speed: histogram-based splitting (faster than sorting), leaf-wise tree growth (more accurate per tree), GOSS (smart sampling), and EFB (feature bundling). Use for large datasets.
- CatBoost handles categorical features natively without manual encoding. Uses ordered boosting to reduce prediction shift. Works well out of the box with minimal tuning.
- Always use early stopping with gradient boosting. Set n_estimators high and let early_stopping_rounds find the optimal number of trees. This prevents overfitting and saves computation.
- Key XGBoost hyperparameters: learning_rate (0.01-0.3), max_depth (3-10), n_estimators (with early stopping), subsample (0.5-1.0), colsample_bytree (0.3-1.0), reg_lambda (L2), reg_alpha (L1).
- For class imbalance, use scale_pos_weight in XGBoost, is_unbalance in LightGBM, or auto_class_weights in CatBoost. This weights the minority class higher during training.
- Feature importance from boosting models shows which features drive predictions. Use it to understand your model, guide feature engineering, and communicate results to stakeholders.
- On tabular/structured data, gradient boosting (XGBoost/LightGBM/CatBoost) consistently outperforms or matches deep learning while being faster, more interpretable, and easier to deploy.