FeatureAttributes#

class FeatureAttributes#

The mapping of attributes for a single feature.

type: str#

The type of the feature.

  • continuous: A continuous numeric value. (e.g. Temperature or humidity)

  • nominal: A number or string value with no ordering. (e.g. The name of a fruit)

  • ordinal: A nominal number value with ordering. (e.g. Rating scale, 1-5 stars). Howso assumes ordinals have equal intervals. In the star rating example, this means that the differences between the stars are the same, so a jump from 1 to 2 stars represents the same magnitude as a jump from 4 to 5 stars. If a different magnitude is desired, the feature should be preprocessed to a different scale (e.g 1, 2, 4, 7, 8).

Discrete features are another type of data often used in statistics and ML. Discrete features are generally, but not limited to, feature values which can only take on certain values (often as integers). Examples of discrete features include age (if only given in years) and the number of children a person has.

You can encode discrete features in Howso’s feature attributes diction by mapping them as continuous values with decimals of 0 (if they are integer count values) or as ordinals. The difference in these two mappings lies in the behavior when the feature is the action feature, or feature being predicted. If the feature is mapped as continuous with 0 decimals, then the feature may be predicted as any integer. If a feature is mapped as ordinal, then like nominals, the predicted value only comes from the pool of existing values in the training data. If the feature is only used as a context feature, then there is no distinction between these two mapping options.

auto_derive_on_train: FeatureAutoDeriveOnTrain | None#
bounds: FeatureBounds | None#
cycle_length: int | None#

Cyclic features start at 0 and have a range of [0, cycle_length]. The cycle_length is the maximum value (exclusive) of the cycle. Values exceeding the cycle length are normalized to the original cycle (e.g., cycles with length 360 for degrees will evaluate a 370 as 10 and 360 as 0). Negative values are not supported in cyclic features. Only applicable to continuous or ordinal features.

Examples:

  • degrees: values should be 0-359, cycle_length = 360

  • days: values should be 0-6, cycle_length = 7

  • hours: values should be 0-23, cycle_length = 24

data_type: str | None#

Specify the data type for features with a type of nominal or continuous. Default is string for nominals and number for continuous.

Valid values include:

  • string, number, json, amalgam, yaml: Valid for both nominal and continuous.

  • string_mixable: Valid only when type is continuous (predicted values may result in interpolated strings containing a combination of characters from multiple original values).

  • formatted_date_time: Valid only when type is continuous. Used for string datetimes and paired with date_time_format. Defaults to ISO8601 if no date_time_format is provided. For epoch datetimes, please specify type: continuous and data_type: number.

  • boolean: Valid only for nominals.

date_time_format: str | None#

If specified, feature values should match the date format specified by this string. Only applicable to continuous features.

decimal_places: int | None#

Decimal places to round to, default is no rounding. If significant_digits is also specified, the number will be rounded to the specified number of significant digits first, then rounded to the number of decimal points as specified by this parameter.

dependent_features: List[str] | None#

A list of other feature names that this feature either depends on or features that depend on this feature. Should be used when there are multi-type value features that tightly depend on values based on other multi-type value features.

derived_feature_code: str | None#

Code defining how the value for this feature could be derived if this feature is specified as a derived_context_feature or a derived_action_feature during react flows. For react_series, the data referenced is the accumulated series data (as a list of rows), and for non-series reacts, the data is the one single row. Each row is comprised of all the combined context and action features. Referencing data in these rows uses 0-based indexing, where the current row index is 0, the previous row’s is 1, etc. The specified code may do simple logic and numeric operations on feature values referenced via feature name and row offset.

Examples:

  • "#x 1": Use the value for feature ‘x’ from the previously processed row (offset of 1, one lag value).

  • "(- #y 0 #x 1)": Feature ‘y’ value from current (offset 0) row minus feature ‘x’ value from previous (offset 1) row.

dropna: bool | None#

DEPRECATED - When true, samples where the feature value is NaN are removed.

id_feature: bool | None#

Set to true for nominal features containing nominal IDs, specifying that his feature should be used to compute case weights for id based privacy. For time series, this feature will be used as the id for each time series generation.

locale: str | None#

The date time format locale. If unspecified, uses platform default locale.

non_sensitive: bool | None#

Flag a categorical nominal feature as non-sensitive. It is recommended that all nominal features be represented with either an int-id subtype or another available nominal subtype using the subtype attribute. However, if the nominal feature is non-sensitive, setting this parameter to true will bypass the subtype requirement. Only applicable to nominal features.

null_is_dependent: bool | None#

Modify how dependent features with nulls are treated during a react, specifically when they use null as a context value. Only applicable to dependent features. When false (default), the feature will be treated as a non-dependent context feature. When true for nominal types, treats null as an individual dependent class value, only cases that also have nulls as this feature’s value will be considered. When true for continuous types, only the cases with the same dependent feature values as the cases that also have nulls as this feature’s value will be considered.

observational_error: float | None#

Specifies the observational mean absolute error for this feature. Use when the error value is already known. Defaults to 0.

original_type: FeatureOriginalType | None#
original_format: Dict[object] | None#

Original data formats used by clients. Automatically populated by clients to store client language specific context about features.

post_process: str | None#

Custom Amalgam code that is called on resulting values of this feature during react operations.

significant_digits: int | None#

Round to the specified significant digits, default is no rounding.

subtype: str | None#

The type used in novel nominal substitution.

time_series: FeatureTimeSeries | None#
unique: bool | None#

Flag feature as only having unique values. Only applicable to nominals features.