Feature Importance#

Objectives: what you will take away#

How-To Retrieve the different types of feature importance metrics across several different categories: global vs local, and Robust vs non-robust (full) Prediction Contributions and Accuracy Contributions.

Prerequisites: before you begin#

You have successfully installed Howso Engine
You have an understanding of Howso’s basic workflow.

Notebook Recipe#

The following recipe will supplement the content this guide will cover and go into some additional functionality:

Feature Importance

Concepts & Terminology#

The main piece of terminology this guide introduces is the concept of Feature Importance. To understand this, we recommend being familiar with the following concepts:

The two metrics available for feature importance is feature contribution and feature Accuracy Contributions (AC).

Robust vs Non-Robust (Full)#

Robust metrics are recommended as they use a greater variety of feature combinations, and they include a calculation performance boost as the number of features increases.

How-To Guide#

Setup#

The user guide assumes you have created and setup a Trainee as demonstrated in basic workflow. The created Trainee will be referenced as trainee in the sections below.

Global Feature Importance#

To get global feature importance metrics, Trainee.react_aggregate(), is called on a trained and analyzed Trainee. Trainee.react_aggregate() calls react internally on the cases already trained into the Trainee and calculates the metrics. In this method, the desired metrics can be selected as parameters. These parameters are named individually in the details parameter and setting them to True will calculate and return the desired metrics. For example, feature_robust_accuracy_contributions and feature_robust_prediction_contributions will calculate the robust versions of Accuracy Contributions and Prediction Contributions, while feature_full_accuracy_contributions and feature_full_prediction_contributions will calculate the non-robust (full) versions. An action feature must be specified. feature_influences_action_feature is recommended for feature influence metrics such as prediction contributions and accuracy contributions, especially when used in conjunction with retrieving prediction stats, however, action_feature can be also used as well. action_feature sets the action feature for both influence metrics and prediction stats. Since often only the influence metrics’s action feature is intended to be set, feature_influences_action_feature provides a more precise parameter.

feature_robust_prediction_contributions = trainee.react_aggregate(
    context_features=context_features,
    feature_influences_action_feature=action_features[0],
    details={'feature_robust_prediction_contributions' : True}
)

feature_robust_accuracy_contributions = trainee.react_aggregate(
    context_features=context_features,
    feature_influences_action_feature=action_features[0],
    details={'feature_robust_accuracy_contributions': True}
)

Local Feature Importance#

To get local feature importance metrics, Trainee.react(), is first called on a trained and analyzed Trainee. In this method, the desired metrics, feature_robust_prediction_contributions and feature_robust_accuracy_contributions, can be selected as inputs to the details parameters as key value pairs from a dictionary. These parameters are named individually and setting them to True will calculate the desired metrics. Robust calculations are performed by default.

details = {
    'feature_robust_prediction_contributions':True,
    'feature_robust_accuracy_contributions':True,
}

results = trainee.react(
    df,
    context_features=context_features,
    action_features=action_features,
    details=details
)

In order to retrieve the calculated stats, they can be retrieved from the Trainee.react() output dictionary. They are stored under the explanation key under the name of the metric. Whether these metrics are Robust or non-robust (full) is determined when the metrics are calculated in Trainee.react() from the previous step.

feature_robust_prediction_contributions = results['details']['feature_robust_prediction_contributions']
feature_robust_accuracy_contributions = results['details']['feature_robust_accuracy_contributions']

Warning

Accuracy and Prediction Contributions are also metrics for cases and not just features, so please be aware when reading other guides that may use those terms.

Combined Code#

import pandas as pd
from pmlb import fetch_data

from howso.engine import Trainee
from howso.utilities import infer_feature_attributes

df = fetch_data('adult', local_cache_dir="data/adult")

# Subsample the data to ensure the example runs quickly
df = df.sample(1000, random_state=0).reset_index(drop=True)

# Split out the last row for a prediction set and drop the Action Feature
test_case = df.iloc[[-1]].copy()
df.drop(df.index[-1], inplace=True)

# Auto detect features
features = infer_feature_attributes(df)

# Specify Context and Action Features
action_features = ['target']
context_features = features.get_names(without=action_features)

# Create a new Trainee, specify features
 trainee = Trainee(features=features)

# Train and analyze
trainee.train(df)
trainee.analyze()

feature_robust_prediction_contributions = trainee.react_aggregate(
    context_features=context_features,
    feature_influences_action_feature=action_features[0],
    details={"feature_robust_prediction_contributions" : True}
)

feature_robust_accuracy_contributions = trainee.react_aggregate(
    context_features=context_features,
    feature_influences_action_feature=action_features[0],
    details={"feature_robust_accuracy_contributions" : True}
)

details = {
    'feature_robust_prediction_contributions':True,
    'feature_robust_accuracy_contributions':True,
}

results = trainee.react(
    df,
    context_features=context_features,
    action_features=action_features,
    details=details
)

feature_robust_prediction_contributions = results['explanation']['feature_robust_prediction_contributions']
feature_robust_accuracy_contributions = results['explanation']['feature_robust_accuracy_contributions']