Case Importance#
Objectives: what you will take away#
Definitions & an understanding of case importance and the situations in which they can be retrieved.
Prerequisites: before you begin#
You’ve successfully installed Howso Engine
You have an understanding of Howso’s basic workflow.
Data#
Our example dataset for this recipe is the well known Adult
dataset. It is accessible via the pmlb package installed earlier. We use the fetch_data()
function to retrieve the dataset in Step 1 below.
Concepts & Terminology#
How-To Guide#
Case importance is similar to feature importance in that it comprises of two metrics, case mean decrease in accuracy (MDA) and case contribution. As opposed to influential and similar cases which examines the influence of cases on a single case or prediction, case importance examines how important a case is in regards to the overall predictions on a group of cases. Case importance share the same underlying methodology with Feature Importance. Unlike feature contributions, case contributions are calculated just locally. Conceptually, local metrics use either a specific subset of the cases that are trained into the Trainee or a set of new cases.
Setup#
The user guide assumes you have created and setup a Trainee
as demonstrated in basic workflow.
The Trainee
will be referenced as trainee
in the sections below.
Case Contributions#
Case contributions can be retrieved by setting case_contributions_robust
or case_contributions_full
to True
.
details = {'case_contributions_robust': True}
Case MDA#
Case MDA can be retrieved by setting case_mda_robust
or case_mda_full
to True
.
details = {'case_mda_robust': True}
React#
Since case importance is a local metric, cases or case indices must be provided as well as an action feature.
results = trainee.react(
test_case[context_features],
context_features=context_features,
action_features=action_features,
)
Results#
The results can be retrieved in the details
section of the results.
case_contributions = pd.DataFrame(results['details']['case_contributions'][0])
case_mda = pd.DataFrame(results['details']['case_mda'][0])
Complete Code#
The code from all of the steps in this guide is combined below:
import pandas as pd
from pmlb import fetch_data
from howso.engine import Trainee
from howso.utilities import infer_feature_attributes
# import data
df = fetch_data('adult')
# Subsample the data to ensure the example runs quickly
df = df.sample(1000)
# Split out the last row for a prediction set and drop the Action Feature
test_case = df.iloc[[-1]].copy()
df.drop(df.index[-1], inplace=True)
test_case = test_case.drop('target', axis=1)
features = infer_feature_attributes(df)
action_features = ['target']
context_features = features.get_names(without=action_features)
trainee = Trainee(features=features)
trainee.train(df)
trainee.analyze(context_features=context_features, action_features=action_features)
details = {'case_contributions_robust': True}
results = trainee.react(
test_case[context_features],
context_features=context_features,
action_features=action_features,
details=details
)
case_contributions = pd.DataFrame(results['details']['case_contributions_robust'][0])