import pandas as pd
import numpy as np
import plotly.express as px
import plotly.io as pio
= "plotly_white" pio.templates.default
Exercise 10: Thresholds and Metrics
The goal of this exercise is to practice with metrics of classifier performance.
Getting started
We’ll start today with an interactive activity.
We’ll focus today on the one-group example; we’ll return to the part about blue vs orange groups later.
Explore
Spend about 10 minutes playing with the “Threshold Decision” demonstration. Discuss the following questions with your partner:
What are the “scores”? What real-life concept does this capture? (Do you know your score?)
Why might a bank want to use “score” to decide whether to grant a loan? (Why don’t banks grant all loan applications? Why do they ever grant loans?)
What sort of predictions are being made here? What constitutes a “correct” prediction?
Slowly sweep the threshold from 0 to 100. On scrap paper, sketch how Correct, True Positive Rate, and Positive Rate change as the threshold changes.
Implement
Now let’s implement those metrics ourselves to check our understanding.
We’ve made a dataset that approximately mimics the dataset from the article. Download people_onegroup.csv and put it in your data
folder.
= pd.read_csv("data/people_onegroup.csv")
people people
score | repay | |
---|---|---|
0 | 37 | True |
1 | 39 | True |
2 | 41 | True |
3 | 43 | True |
4 | 44 | True |
... | ... | ... |
193 | 56 | False |
194 | 58 | False |
195 | 59 | False |
196 | 61 | False |
197 | 63 | False |
198 rows × 2 columns
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 198 entries, 0 to 197
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 score 198 non-null int64
1 repay 198 non-null bool
dtypes: bool(1), int64(1)
memory usage: 1.9 KB
The article includes dotplots of the people. We can approximate that using a histogram.
="score", color="repay", nbins=100) px.histogram(people, x
Part 1: EDA
Part 1a: Counts
How many people repayed? What fraction of them did?
Try doing this in two ways. First, use the usual grouping and counting pattern, but add a column for the fraction of people in each group. (Divide the size
column by the total number of people in the dataset)
repay | size | fraction | |
---|---|---|---|
0 | False | 99 | 0.5 |
1 | True | 99 | 0.5 |
Alternative way: think about what this does:
'repay'].sum() people[
99
What if we changed .sum
to .mean
? (I call this the “sum-as-count pattern”.)
Part 1b: Mean scores
What was the mean score
for people who repayed? What was the mean score
for people who didn’t?
repay | score | |
---|---|---|
0 | False | 40.010101 |
1 | True | 60.000000 |
Part 2: Thresholds
- On the website, pick a threshold that results in all 4 colors being visible; it’s especially visible in the Positive Rate pie.
- Assign that threshold to a variable in your
qmd
. - Add a new column,
granted
, to thepeople
dataframe that indicates if the bank grants the loan. Like the website, grant a loan if the score exceeds thethreshold
. - How many loans were granted?
Here’s an example for a threshold
of 64.
score | repay | granted | |
---|---|---|---|
0 | 37 | True | False |
1 | 39 | True | False |
2 | 41 | True | False |
3 | 43 | True | False |
4 | 44 | True | False |
Part 3: Metrics
We’ll start by making a confusion matrix for the classifier. You’ll find the following helpful (again, I’m showing the result for a threshold of 64):
= pd.crosstab(people['granted'], people['repay'])
crosstab crosstab
repay | False | True |
---|---|---|
granted | ||
False | 99 | 67 |
True | 0 | 32 |
Now pull out which is which:
= crosstab.values
(a, b), (c, d) print(f"a: {a}, b: {b}, c: {c}, d: {d}")
a: 99, b: 67, c: 0, d: 32
In your document, replace a
, b
, c
, and d
with names like false_positive
and true_negative
(or fp
and tn
).
- Compute and show the Positive Rate, the True Positive Rate, and Correctness. Check all of your results against the webapp; the numbers should be close although they may differ slightly because the data was not constructed identically (and because the webapp doesn’t show the full precision of the threshold or the metrics).
Positive Rate: 0.162
True Positive Rate: 0.323
Correctness: 0.662
- Compute and show the precision and recall of this classifier at the threshold you’ve selected.
Part 4: Trade-offs
Adjust the threshold
to maximize the Correct rate. What is the True Positive Rate then?
What trade-off do we have to make if we want to maximize True Positive Rate instead?
Give specific examples of thresholds that achieve these objectives.