Predictions#
Objectives: what you will take away#
Definitions & an understanding of basic regression, classification, continuous vs categorical/nominal Action Features,
Trainee
,react()
,react_aggregate()
.How-To perform a basic regression or classification analysis using the Howso Engine to predict the Highway MPG or Fuel Type based on vehicle Context Features.
Prerequisites: before you begin#
You’ve successfully installed Howso Engine
You have an understanding of Howso’s basic workflow.
Data#
Download
23,606 vehicles from 1984 - 2022, including make, model, MPG, drive-type, size, class and fuel type.
Concepts & Terminology#
Regression - is used to describe the relationship between one or more Context Features and a continuous numeric Action Feature, as in this guide predicting the Highway MPG of a vehicle based on its physical characteristics and year manufactured.
Classification - is used to describe the relationship between one or more Context Features and a categorical/nominal Action Feature, as in this guide predicting the FuelType of a vehicle based on its physical characteristics and year manufactured. For Howso Engine, the action feature may be left in string format and does not need to be converted to numeric format.
Trainee and React - In this simple example, we will be creating a Trainee that we can be used React to new case data, such as a new car we might be looking to build.
Train and Analyze - To create a Trainee
, we will first load data, define Feature Attributes
of the data and Train the Trainee
. The Trainee
can be used for many tasks,
but because we know exactly what we want to do, we will Analyze to improve the
performance of our trainee by defining the specific set of Context Features
that we know we want to use to predict an Action Features. The action
feature in this example will be Highway MPG.
Evaluating the Trainee - To understand the accuracy of the trainee for our tasks, we can use the built-in Trainee.react_aggregate()
.
Since we are not using a train-test split approach in this example, we will use the react_aggregate()
method, which
performs a react()
on each of the cases that is trained into the model using a leave-one-out approach.
That method allows us to use prediction stats to evaluate regression accuracy statistics such as:
R-Squared - \(R^2\) is a value that represents how well the predictions fit the data, the closer to 1.0 the better the fit
Mean Absolute Error (MAE) average absolute error between actual and predicted values over the whole dataset, and relative to the scale of what is being measured
Root Mean Square Error (RMSE) mean square root of errors over whole dataset, similar to MAE and relative to scale of what is measured
Or classification metrics including those derived from the true positive (TP), true negative (TN), false positive (FP), false negative (FN) metrics:
Accuracy - Describes the model performance across all classes and is comprised of the ratio of number of correct predictions to the total number of predictions. - (TP+TN)/(TP+FP+FN+TN).
Precision - Describes what proportion of positive predictions were correct. - (TP+TN)/(TP+FP+FN+TN).
Recall - Describes what proportion of actual positives were predicted correctly. - (TN)/(TN+FP).
Mean Absolute Error (MAE) average absolute error between actual and predicted Categorical Action Probabilities (CAP) over the whole dataset. - CAP is the prediction probability for each class of the action feature.
React to New Cases - Lastly, we will simply request the Trainee
to react()
to new cases we
present to it, giving us predictions of what the Highway MPG would be.
How-To Guide#
We want to predict the Highway MPG and the Fuel Type of a new vehicle based on a Trainee
we create from the vehicles dataset. In this guide, we will directly
show the code for Highway MPG prediction while including the code for Fuel Type as comments wherever the code differs.
Step 1 - Load Libraries#
import pandas as pd
import matplotlib.pyplot as plt
from howso.engine import Trainee
from howso.utilities import infer_feature_attributes
Step 2 - Load Data#
Using a pandas DataFrame, load the vehicles dataset from the csv file. We are going to drop make/model features because that is kinda cheating… Make sure it’s what you expect, take a quick look at some of the data and use describe to make sure it has the shape you’d expect.
df = pd.read_csv("./data/vehicle_predict.csv")
df = df.drop(['Make', 'Model'], axis=1)
df.describe()
Step 3 - Define Features#
Howso can auto-detect features from data, using infer_feature_attributes()
but it is a best practice to review and configure. In this tutorial, we will proceed as if the features were not detected as we want them to be, so we will make necessary adjustments.
Note
Howso automatically determines whether to perform a regression or classification task by the feature attributes of the action feature you are trying to predict, specifically the feature type as shown below, thus it is very important to make sure that the feature types are correct.
# Auto detect features
features = infer_feature_attributes(df)
# For Regression, we will set `HighwayMPG` feature type to continuous
features['HighwayMPG']['type'] = 'continuous'
# For Classification, we will set `FuelType` feature type to nominal
features['FuelType']['type'] = 'nominal'
# We will also set these context features to continuous
features['CityMPG']['type'] = 'continuous'
features['Year']['type'] = 'continuous'
features['PassengerVolume']['type'] = 'continuous'
features['LuggageVolume']['type'] = 'continuous'
Step 4 - Create a Trainee and Train#
Next we will create a Trainee
and train()
based on data we have loaded into the DataFrame from the vehicles.csv.
# Create a new Trainee, specify features
trainee = Trainee(features=features)
# Train trainee
trainee.train(df)
Step 5 - Analyze Trainee, Set Context & Action Features#
We know a specific task we want our Trainee
to react()
to, that is, to predict Highway MPG (the action feature) - using the context features: Year, DriveType, FuelType, CityMPG, PassengerVolume, LuggageVolume, and VehicleClass. We can use analyze()
to improve performance of our model by analyzing for this specific target.
action_features = ['HighwayMPG']
# Code for `FuelType` prediction
# action_features = ['FuelType']
context_features = features.get_names(without=action_features)
trainee.analyze(context_features=context_features, action_features=action_features)
Step 6 - Generate Accuracy Metrics#
Review the accuracy of the Trainee
by using the built-in react_aggregate()
method, which performs a react()
on each of the cases that is trained into the model. Then we can evaluate accuracy with the returned R-Squared (\(R^2\)), Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) metrics since this is a regression task.
# Recommended metrics
stats = trainee.react_aggregate(
action_feature=action_features[0],
details={
'prediction_stats': True,
'selected_prediction_stats': ['rmse', 'spearman_coeff', 'r2', 'mae']
}
)
stats
Step 7 - Review Accuracy Metrics#
We see the Trainee
has a very good fit for predicting Highway MPG with an \(R^2\) of 0.99, which shows the Trainee
should be effective at predicting new cases of Highway MPG.
rmse 1.20
spearman_coeff 0.96
r2 0.99
mae 0.72
Name: HighwayMPG, dtype: float64
Step 8 - React to New Case#
We have a new vehicle we want to predict Highway MPG for. The test case is a 2022, All Wheel Drive, Mid-Sized Car, using Premium fuel, with a PassengerVolume of 95, LuggageVolume of 23 and gets City MPG of 21.
The Trainee
can react()
to this new case, and makes a prediction.
data = {
'Year': [2022],
'DriveType': ['All-Wheel Drive'],
'FuelType' : ['Premium'],
'VehicleClass': ['Midsize Cars'],
'CityMPG': [21],
'PassengerVolume': [95],
'LuggageVolume': [23]
}
test_case = pd.DataFrame(data)
result = trainee.react(
test_case,
action_features=action_features,
context_features=context_features
)
Note
The method Trainee.predict()
can also be used for predictions instead of Trainee.react()
. Trainee.predict()
serves as a convenience function that eliminates the extra output if all you want is the prediction.
Step 9 - Review Prediction#
Reviewing the prediction shows HighwayMPG of 29.
result['action']
HighwayMPG
29
Combined Code#
import pandas as pd
import matplotlib.pyplot as plt
from howso.engine import Trainee
from howso.utilities import infer_feature_attributes
df = pd.read_csv("./data/vehicle_predict.csv")
df = df.drop(['Make', 'Model'], axis=1)
# Auto detect features
features = infer_feature_attributes(df)
# For Regression, we will set `HighwayMPG` feature type to continuous
features['HighwayMPG']['type'] = 'continuous'
# For Classification, we will set `FuelType` feature type to nominal
features['FuelType']['type'] = 'nominal'
# We will also set these context features to continuous
features['CityMPG']['type'] = 'continuous'
features['Year']['type'] = 'continuous'
features['PassengerVolume']['type'] = 'continuous'
features['LuggageVolume']['type'] = 'continuous'
# Create a new Trainee, specify features
trainee = Trainee(features=features)
# Train trainee
trainee.train(df)
action_features = ['HighwayMPG']
# Code for `FuelType` prediction
# action_features = ['FuelType']
context_features = features.get_names(without=action_features)
trainee.analyze(context_features=context_features, action_features=action_features)
# Recommended metrics
stats = trainee.react_aggregate(
action_feature=action_features[0],
details={
'prediction_stats': True,
'selected_prediction_stats': ['rmse', 'spearman_coeff', 'r2', 'mae']
}
)
stats
data = {
'Year': [2022],
'DriveType': ['All-Wheel Drive'],
'FuelType' : ['Premium'],
'VehicleClass': ['Midsize Cars'],
'CityMPG': [21],
'PassengerVolume': [95],
'LuggageVolume': [23]
}
test_case = pd.DataFrame(data)
result = trainee.react(
test_case,
action_features=action_features,
context_features=context_features
)