```{python}
import pandas as pd
import numpy as np
from sklearn.compose import make_column_transformer
from sklearn.datasets import fetch_openml, load_breast_cancer
from sklearn.impute import SimpleImputer, KNNImputer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, LeaveOneOut, cross_val_score, GridSearchCV
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.metrics import accuracy_score
import plotly.express as px
```
Exercise 12: Logistic Regression, Regularization, and Cross-Validation
In this exercise, we’ll practice classification using a logistic regression model. Since this model is prone to overfit, we’ll learn how to tame its excesses with regularization, and estimate how well it will generalize to new data using cross-validation.
Completing this exercise will help you be able to:
- Fit a logistic regression model to a dataset using scikit-learn
- Use regularization to prevent overfitting
- Use cross-validation to estimate how well a model will generalize to new data
Along the way, we’ll also see a technique for handling missing data, and get more practice with how to build pipelines in scikit-learn.
Code will be provided for this exercise. You should retype the code on your own and try to understand what each part does. There will also be some conceptual questions for you to answer.
Getting Started
We’ll use a dataset of who survived the Titanic shipwreck. To help make the point of this exercise more clear, we’ll also augment the dataset with some irrelevant (noise) features. We’ll keep the original goal, though: to predict whether a passenger survived or not.1
This task is a classic in machine learning, but I’ve never seen anyone mention how fake this prediction task is! There is no real-world scenario where you get to see whether some passengers survived and some didn’t, and then you have to predict whether other passengers survived or not. When you’re doing projects, please think about whether the task you’re trying to solve is (a) a task that anyone would actually care about and (b) a task that’s doable using data that you’d actually have access to.
Here are the imports you’ll need.
We’ll load the data from a site called OpenML. OpenML is a repository of datasets that are useful for machine learning. Specifically we’ll load the Titanic dataset. This dataset contains information about passengers on the Titanic, including whether they survived or not. The Kaggle page for this dataset has more information about the features.
```{python}
titanic_all = fetch_openml(
data_id=40945, as_frame=True, parser="pandas"
)
titanic = titanic_all.frame
```
Now we’ll add some noise features. Just copy-paste this code; don’t worry about how it works.
= np.random.RandomState(0)
rng = titanic.shape[0]
n_samples = 500
n_features = [f"noise_{i}" for i in range(n_features)]
noise_feature_names = pd.DataFrame(
noise_features =(n_samples, n_features)),
rng.normal(size=noise_feature_names
columns
)= pd.concat([titanic, noise_features], axis=1)
titanic titanic.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1309 entries, 0 to 1308
Columns: 514 entries, pclass to noise_499
dtypes: category(3), float64(503), int64(3), object(5)
memory usage: 5.1+ MB
We’ll use two numeric features (in addition to the noise features):
age
: the age of the passengerfare
: the fare paid by the passenger
And three categorical features:
embarked
: the port where the passenger embarked (C
= Cherbourg,Q
= Queenstown,S
= Southampton)sex
pclass
: the passenger class (1
= first class,2
= second class,3
= third class)
Note that pclass
is encoded as a number, but it’s really a categorical feature. We’ll treat it as categorical.
```{python}
# Select the features and target
target_column = "survived"
numeric_features = ["age", "fare"] + noise_feature_names
categorical_features = ["embarked", "sex", "pclass"]
feature_columns = numeric_features + categorical_features
```
Let’s hold out a test set. We’ll use a large test set so that our estimates of generalization error have less variance.
```{python}
titanic_train, titanic_test = train_test_split(
titanic, test_size=.5, random_state=0
)
print(f"Training set size: {titanic_train.shape[0]}")
print(f"Test set size: {titanic_test.shape[0]}")
```
Training set size: 654
Test set size: 655
The numeric features will need some preprocessing. We’ll first scale them to be zero mean and unit variance, and then impute missing values. Note: at this step we’re just defining what processing will happen; the actual processing will be done later when we fit the pipeline.
```{python}
numeric_transformer = make_pipeline(
StandardScaler(),
SimpleImputer(strategy="median")
)
```
Pause to think:
- What is
strategy="median"
doing? What alternative strategies could we use for imputing missing values? Do you think the median would be a good guess for the missing values? - Why do we need to impute missing values after scaling the features? What would happen if we did it the other way around?
The categorical features need to be one-hot-encoded.
```{python}
categorical_transformer = make_pipeline(
SimpleImputer(strategy="most_frequent"),
OneHotEncoder(handle_unknown="ignore", sparse_output=False)
)
```
Pause to think: How many columns will the categorical transformer produce? What will each column represent? Write a brief explanation.
We can combine the numeric and categorical transformers into a single transformer that will apply each one to the appropriate columns.
```{python}
preprocessor = make_column_transformer(
(numeric_transformer, numeric_features),
(categorical_transformer, categorical_features)
).set_output(transform='pandas')
```
Let’s show an example of what this preprocessor will do to our data. Check your results from above.
= preprocessor.fit_transform(titanic_train[feature_columns])
example_preprocessed_data
example_preprocessed_data.info() example_preprocessed_data.head()
<class 'pandas.core.frame.DataFrame'>
Index: 654 entries, 293 to 684
Columns: 510 entries, pipeline-1__age to pipeline-2__pclass_3
dtypes: float64(510)
memory usage: 2.5 MB
pipeline-1__age | pipeline-1__fare | pipeline-1__noise_0 | pipeline-1__noise_1 | pipeline-1__noise_2 | pipeline-1__noise_3 | pipeline-1__noise_4 | pipeline-1__noise_5 | pipeline-1__noise_6 | pipeline-1__noise_7 | ... | pipeline-1__noise_498 | pipeline-1__noise_499 | pipeline-2__embarked_C | pipeline-2__embarked_Q | pipeline-2__embarked_S | pipeline-2__sex_female | pipeline-2__sex_male | pipeline-2__pclass_1 | pipeline-2__pclass_2 | pipeline-2__pclass_3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
293 | -0.140868 | 0.405374 | 0.534576 | -0.178911 | 1.530273 | 0.527998 | -1.732251 | 0.694583 | 0.769149 | -1.213062 | ... | 0.115889 | -0.166542 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 |
76 | 0.649024 | 1.070188 | -0.250568 | -0.502139 | 0.217859 | -1.081952 | -1.166470 | 0.074495 | 0.160803 | -1.291101 | ... | -1.461193 | -0.095610 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 |
48 | 1.654342 | -0.118530 | -0.283955 | 1.668724 | -0.704216 | -1.267655 | 0.303582 | -0.202661 | 0.905217 | -0.310839 | ... | 0.240206 | 0.980616 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 |
689 | -0.571718 | -0.549441 | -0.739937 | -0.022050 | -1.425322 | 2.253745 | 0.053555 | 0.068434 | -0.809622 | 1.080123 | ... | -1.503977 | -0.938596 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 |
1195 | -0.140868 | -0.538772 | 0.097809 | -0.949562 | 0.559721 | 0.118893 | -0.004897 | 0.996909 | -0.111381 | -0.672056 | ... | -1.530474 | 2.113191 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 |
5 rows × 510 columns
Notice that there are no missing values and that there are separate columns for each of the possible values of the categorical variables.
Now we can put all of those elements together to train a basic logistic regression model.
```{python}
model = make_pipeline(
preprocessor,
LogisticRegression(penalty=None)
)
```
Your turn: fit and evaluate this model (using model.fit(...)
). What is the accuracy on the training set? What is the accuracy on the test set?
Accuracy on training set: 1.000
Accuracy on test set: 0.652
What do you notice about the accuracy of the model on the training set vs the test set?
Regularization
The model was able to get a perfect score on the training set. But to do that, it used some of the noise features, which clearly will not generalize to new data. We can see this by looking at the coefficients of the model:
def get_coefs_df(model, feature_names):
= model.named_steps['logisticregression']
logreg assert logreg.coef_.shape[0] == 1
assert logreg.coef_.shape[1] == len(feature_names)
= pd.DataFrame(dict(
coefs_df =feature_names,
name=logreg.coef_[0]
coef
))'abs_coef'] = coefs_df['coef'].abs()
coefs_df[return coefs_df.sort_values('abs_coef', ascending=False)
= get_coefs_df(model, example_preprocessed_data.columns)
coefs_df 10) coefs_df.head(
name | coef | abs_coef | |
---|---|---|---|
506 | pipeline-2__sex_male | -12.828261 | 12.828261 |
505 | pipeline-2__sex_female | 11.729937 | 11.729937 |
509 | pipeline-2__pclass_3 | -7.393828 | 7.393828 |
1 | pipeline-1__fare | 5.932705 | 5.932705 |
381 | pipeline-1__noise_379 | -5.126387 | 5.126387 |
507 | pipeline-2__pclass_1 | 4.860676 | 4.860676 |
473 | pipeline-1__noise_471 | -4.829957 | 4.829957 |
104 | pipeline-1__noise_102 | 4.396529 | 4.396529 |
193 | pipeline-1__noise_191 | -4.325752 | 4.325752 |
484 | pipeline-1__noise_482 | -4.149559 | 4.149559 |
'is_noise'] = coefs_df['name'].str.contains('noise')
coefs_df[print("Total noise coefficients: {:.2f}".format(
'is_noise')['coef'].abs().sum())
coefs_df.query( )
Total noise coefficients: 617.16
If we could get the model to put less weight on the noise features, it might generalize better. But in a real example, we don’t know which features are noise. And some features may be partly noise, partly signal. So we can’t just remove the noise features.
Instead, we can use regularization to encourage the model to put less weight on any individual feature. The hope is that this will encourage the model to put more weight on the features that are signal, and less weight on the features that are noise.
```{python}
model = make_pipeline(
preprocessor,
LogisticRegression(C=1e-1, penalty='l1', solver='liblinear')
).fit(
X=titanic_train[feature_columns],
y=titanic_train[target_column]
)
print("Accuracy on training set:", model.score(
X=titanic_train[feature_columns],
y=titanic_train[target_column]
))
print("Accuracy on test set:", model.score(
X=titanic_test[feature_columns],
y=titanic_test[target_column]
))
```
Accuracy on training set: 0.8669724770642202
Accuracy on test set: 0.76793893129771
= get_coefs_df(model, example_preprocessed_data.columns)
coefs_df 10) coefs_df.head(
name | coef | abs_coef | |
---|---|---|---|
506 | pipeline-2__sex_male | -1.213080 | 1.213080 |
505 | pipeline-2__sex_female | 0.934112 | 0.934112 |
509 | pipeline-2__pclass_3 | -0.544729 | 0.544729 |
507 | pipeline-2__pclass_1 | 0.422453 | 0.422453 |
473 | pipeline-1__noise_471 | -0.216718 | 0.216718 |
193 | pipeline-1__noise_191 | -0.190734 | 0.190734 |
381 | pipeline-1__noise_379 | -0.167910 | 0.167910 |
389 | pipeline-1__noise_387 | -0.161352 | 0.161352 |
484 | pipeline-1__noise_482 | -0.152020 | 0.152020 |
296 | pipeline-1__noise_294 | -0.138615 | 0.138615 |
'is_noise'] = coefs_df['name'].str.contains('noise')
coefs_df[print("Total weight on noise coefficients: {:.2f}".format(
'is_noise')['coef'].abs().sum())
coefs_df.query( )
Total weight on noise coefficients: 5.27
Your turn: what do you notice about the model’s accuracy scores now? What do you notice about the coefficients?
Your turn: Try adjusting the value of C
in the model. Answer:
- How does the training accuracy change as
C
changes? - What value of
C
gives the best training accuracy? - How does the test accuracy change as
C
changes? - What value of
C
gives the best test accuracy? - How does the amount of weight that the model puts on the noise features change as
C
changes? - Does a high value of
C
seem to correspond to regularizing a lot or a little?
This plot might help you answer some of those questions:
# Make a plot of train and test accuracy vs C
= [1e-4, 1e-3, 1e-2, 1e-1, 1, 10, 100]
C_values_to_try = []
train_accuracies = []
test_accuracies for C in C_values_to_try:
= make_pipeline(
model
preprocessor,=C, penalty='l1', solver='liblinear')
LogisticRegression(C
).fit(=titanic_train[feature_columns],
X=titanic_train[target_column]
y
)
train_accuracies.append(model.score(=titanic_train[feature_columns],
X=titanic_train[target_column]
y
))
test_accuracies.append(model.score(=titanic_test[feature_columns],
X=titanic_test[target_column]
y
))= pd.DataFrame(dict(
accuracies_df =C_values_to_try,
C=train_accuracies,
train_accuracy=test_accuracies
test_accuracy='C', var_name='dataset', value_name='accuracy') )).melt(id_vars
='C', y='accuracy', color='dataset', log_x=True) px.line(accuracies_df, x
Cross-Validation
We’ve seen that regularization can help prevent overfitting. But how do we know how much regularization to use? We could try different values of the regularization parameter C
and see which one gives the best accuracy on the test set. But that would be a waste of data. We’d be using the test set to choose the model, which would make it less reliable as an estimate of generalization error.
Instead, we can use cross-validation to estimate how well the model will generalize to new data. The idea is to hold out one of the data points, train the model on the rest, and then see how well it predicts the held-out data point. We can repeat this process for each data point, and then average the scores to get an estimate of how well the model will generalize.
Buuut… that requires training the model as many times as there are data points. That’s a lot of training! (I tried it and got tired of waiting.) So, instead of holding out one data point at a time, we can hold out a bunch of points at a time. These groups of points are called folds. We then test the model on each fold, using the other folds as the training set. We can then average the scores on each fold to get an estimate of how well the model will generalize.
Your turn: If we have 100 data points and we use 10-fold cross-validation, how many data points will be in each fold? How many times will we train a model? How big will the training set be each time? How big will the test set be each time?
To use cross-validation to find a good value of C
, we can use scikit-learn’s GridSearchCV
class. This class will try each value of C
in C_values_to_try
, and use cross-validation to estimate how well the model will generalize.
```{python}
C_values_to_try = [1e-4, 1e-3, 1e-2, 1e-1, 1, 10, 100]
model = make_pipeline(
preprocessor,
LogisticRegression(C=C, penalty='l1', solver='liblinear')
)
search = GridSearchCV(
model,
param_grid=dict(
logisticregression__C=C_values_to_try
),
cv=5,
verbose=4
).fit(
X=titanic_train[feature_columns],
y=titanic_train[target_column]
)
```
Fitting 5 folds for each of 7 candidates, totalling 35 fits
[CV 1/5] END ......logisticregression__C=0.0001;, score=0.618 total time= 0.0s
[CV 2/5] END ......logisticregression__C=0.0001;, score=0.618 total time= 0.1s
[CV 3/5] END ......logisticregression__C=0.0001;, score=0.618 total time= 0.1s
[CV 4/5] END ......logisticregression__C=0.0001;, score=0.618 total time= 0.1s
[CV 5/5] END ......logisticregression__C=0.0001;, score=0.615 total time= 0.1s
[CV 1/5] END .......logisticregression__C=0.001;, score=0.618 total time= 0.1s
[CV 2/5] END .......logisticregression__C=0.001;, score=0.618 total time= 0.1s
[CV 3/5] END .......logisticregression__C=0.001;, score=0.618 total time= 0.1s
[CV 4/5] END .......logisticregression__C=0.001;, score=0.618 total time= 0.1s
[CV 5/5] END .......logisticregression__C=0.001;, score=0.615 total time= 0.1s
[CV 1/5] END ........logisticregression__C=0.01;, score=0.618 total time= 0.1s
[CV 2/5] END ........logisticregression__C=0.01;, score=0.618 total time= 0.1s
[CV 3/5] END ........logisticregression__C=0.01;, score=0.618 total time= 0.1s
[CV 4/5] END ........logisticregression__C=0.01;, score=0.618 total time= 0.1s
[CV 5/5] END ........logisticregression__C=0.01;, score=0.615 total time= 0.1s
[CV 1/5] END .........logisticregression__C=0.1;, score=0.748 total time= 0.1s
[CV 2/5] END .........logisticregression__C=0.1;, score=0.740 total time= 0.1s
[CV 3/5] END .........logisticregression__C=0.1;, score=0.771 total time= 0.1s
[CV 4/5] END .........logisticregression__C=0.1;, score=0.771 total time= 0.1s
[CV 5/5] END .........logisticregression__C=0.1;, score=0.738 total time= 0.1s
[CV 1/5] END ...........logisticregression__C=1;, score=0.626 total time= 0.1s
[CV 2/5] END ...........logisticregression__C=1;, score=0.687 total time= 0.1s
[CV 3/5] END ...........logisticregression__C=1;, score=0.679 total time= 0.1s
[CV 4/5] END ...........logisticregression__C=1;, score=0.649 total time= 0.1s
[CV 5/5] END ...........logisticregression__C=1;, score=0.623 total time= 0.1s
[CV 1/5] END ..........logisticregression__C=10;, score=0.611 total time= 0.1s
[CV 2/5] END ..........logisticregression__C=10;, score=0.679 total time= 0.1s
[CV 3/5] END ..........logisticregression__C=10;, score=0.679 total time= 0.1s
[CV 4/5] END ..........logisticregression__C=10;, score=0.626 total time= 0.1s
[CV 5/5] END ..........logisticregression__C=10;, score=0.631 total time= 0.1s
[CV 1/5] END .........logisticregression__C=100;, score=0.588 total time= 0.1s
[CV 2/5] END .........logisticregression__C=100;, score=0.695 total time= 0.1s
[CV 3/5] END .........logisticregression__C=100;, score=0.672 total time= 0.1s
[CV 4/5] END .........logisticregression__C=100;, score=0.626 total time= 0.1s
[CV 5/5] END .........logisticregression__C=100;, score=0.615 total time= 0.1s
Let’s see what the cross validation process estimated for each value of C
.
= pd.DataFrame(search.cv_results_)
cv_results cv_results
mean_fit_time | std_fit_time | mean_score_time | std_score_time | param_logisticregression__C | params | split0_test_score | split1_test_score | split2_test_score | split3_test_score | split4_test_score | mean_test_score | std_test_score | rank_test_score | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.050302 | 0.005591 | 0.013186 | 0.002130 | 0.0001 | {'logisticregression__C': 0.0001} | 0.618321 | 0.618321 | 0.618321 | 0.618321 | 0.615385 | 0.617733 | 0.001174 | 5 |
1 | 0.053057 | 0.000240 | 0.014282 | 0.001332 | 0.001 | {'logisticregression__C': 0.001} | 0.618321 | 0.618321 | 0.618321 | 0.618321 | 0.615385 | 0.617733 | 0.001174 | 5 |
2 | 0.053784 | 0.000100 | 0.013579 | 0.000115 | 0.01 | {'logisticregression__C': 0.01} | 0.618321 | 0.618321 | 0.618321 | 0.618321 | 0.615385 | 0.617733 | 0.001174 | 5 |
3 | 0.061919 | 0.001214 | 0.013345 | 0.000388 | 0.1 | {'logisticregression__C': 0.1} | 0.748092 | 0.740458 | 0.770992 | 0.770992 | 0.738462 | 0.753799 | 0.014402 | 1 |
4 | 0.101603 | 0.013110 | 0.009218 | 0.000028 | 1 | {'logisticregression__C': 1} | 0.625954 | 0.687023 | 0.679389 | 0.648855 | 0.623077 | 0.652860 | 0.026450 | 2 |
5 | 0.112736 | 0.007271 | 0.009206 | 0.000023 | 10 | {'logisticregression__C': 10} | 0.610687 | 0.679389 | 0.679389 | 0.625954 | 0.630769 | 0.645238 | 0.028662 | 3 |
6 | 0.096419 | 0.003384 | 0.009305 | 0.000087 | 100 | {'logisticregression__C': 100} | 0.587786 | 0.694656 | 0.671756 | 0.625954 | 0.615385 | 0.639107 | 0.038785 | 4 |
Let’s reshape that data frame to make it easier to plot.
= cv_results.melt(
cv_results_melted ='param_logisticregression__C',
id_vars=['split{}_test_score'.format(i) for i in range(5)],
value_vars='fold',
var_name='score'
value_name=dict(
).rename(columns='C'
param_logisticregression__C
)) cv_results_melted
C | fold | score | |
---|---|---|---|
0 | 0.0001 | split0_test_score | 0.618321 |
1 | 0.001 | split0_test_score | 0.618321 |
2 | 0.01 | split0_test_score | 0.618321 |
3 | 0.1 | split0_test_score | 0.748092 |
4 | 1 | split0_test_score | 0.625954 |
5 | 10 | split0_test_score | 0.610687 |
6 | 100 | split0_test_score | 0.587786 |
7 | 0.0001 | split1_test_score | 0.618321 |
8 | 0.001 | split1_test_score | 0.618321 |
9 | 0.01 | split1_test_score | 0.618321 |
10 | 0.1 | split1_test_score | 0.740458 |
11 | 1 | split1_test_score | 0.687023 |
12 | 10 | split1_test_score | 0.679389 |
13 | 100 | split1_test_score | 0.694656 |
14 | 0.0001 | split2_test_score | 0.618321 |
15 | 0.001 | split2_test_score | 0.618321 |
16 | 0.01 | split2_test_score | 0.618321 |
17 | 0.1 | split2_test_score | 0.770992 |
18 | 1 | split2_test_score | 0.679389 |
19 | 10 | split2_test_score | 0.679389 |
20 | 100 | split2_test_score | 0.671756 |
21 | 0.0001 | split3_test_score | 0.618321 |
22 | 0.001 | split3_test_score | 0.618321 |
23 | 0.01 | split3_test_score | 0.618321 |
24 | 0.1 | split3_test_score | 0.770992 |
25 | 1 | split3_test_score | 0.648855 |
26 | 10 | split3_test_score | 0.625954 |
27 | 100 | split3_test_score | 0.625954 |
28 | 0.0001 | split4_test_score | 0.615385 |
29 | 0.001 | split4_test_score | 0.615385 |
30 | 0.01 | split4_test_score | 0.615385 |
31 | 0.1 | split4_test_score | 0.738462 |
32 | 1 | split4_test_score | 0.623077 |
33 | 10 | split4_test_score | 0.630769 |
34 | 100 | split4_test_score | 0.615385 |
px.box(
cv_results_melted,="C", y="score", log_x=True,
x="Cross-validation scores", labels=dict(
title="Regularization strength (C)",
C="Accuracy"
score ))
It will then choose the value of C
that gives the best cross-validation score, and train a model using that value of C
on the entire training set. The grid-search process estimated that the best value of C
is:
search.best_params_
{'logisticregression__C': 0.1}
We can use that model to make predictions on the test set.
print("Accuracy on training set:", search.score(
=titanic_train[feature_columns],
X=titanic_train[target_column]
y
))print("Accuracy on test set:", search.score(
=titanic_test[feature_columns],
X=titanic_test[target_column]
y ))
Accuracy on training set: 0.8669724770642202
Accuracy on test set: 0.76793893129771
Your turn: How did that test set accuracy compare with the accuracy estimated by cross validation?
Gradient Boosting
Just to leave you one last good example: gradient boosting classifiers are often a good choice for many problems. Let’s try one on this dataset.
from sklearn.ensemble import HistGradientBoostingClassifier
Since gradient boosting models can handle missing data and don’t require scaling, let’s use an even simpler pipeline.
= make_pipeline(
model
make_column_transformer('passthrough', numeric_features),
(="ignore", sparse_output=False), categorical_features)
(OneHotEncoder(handle_unknown
),
HistGradientBoostingClassifier()
).fit(=titanic_train[feature_columns],
X=titanic_train[target_column]
y )
print("Accuracy on training set:", model.score(
=titanic_train[feature_columns],
X=titanic_train[target_column]
y ))
Accuracy on training set: 1.0
print("Accuracy on test set:", model.score(
=titanic_test[feature_columns],
X=titanic_test[target_column]
y ))
Accuracy on test set: 0.783206106870229
How did it do that? Read about gradient boosted trees in the sklearn docs!
Note that gradient boosting can also be used for regression tasks (HistGradientBoostingRegressor
).
Footnotes
This exercise is based in part on this example from the sklearn docs.↩︎