Selector

class Selector(self, algorithm: Literal['kbest', 'kbest_chi2', 'pca', 'wrapper', 'sequential', 'select_model', 'rfe', 'rfecv'] = 'kbest', num_features: int = 10, estimator=LinearSVC(dual=False, penalty='l1'), **kwargs)

feature selection algorithm Wrapper class - parent class Data

Parameters

algorithm{“kbest”, “kbest_chi2”, “pca”, “wrapper”, “sequential”, “select_model”, “rfe”, “rfecv”}, default=”kbest”

which selecting algorithm to use: - ‘kbest’: SelectKBest - ‘kbest_chi2’: SelectKBest with score_func=chi2 (only non-negative values) - ‘pca’: PCA (new column names after transformation) - ‘wrapper’: uses p-values of Ordinary Linear Model from statsmodels library (no num_features parameter -> problems with too many features) - ‘sequential’: SequentialFeatureSelector - ‘select_model’: SelectFromModel (meta-transformer for selecting features based on importance weights) - ‘rfe’: RFE (recursive feature elimination) - ‘rfecv’: RFECV (recursive feature elimination with cross-validation)

num_featuresint, default=10

number of features to select

estimatorestimator instance

parameter is needed for SequentialFeatureSelector, SelectFromModel, RFE, RFECV (default: LinearSVC)

**kwargs:

additional parameters for selector

Attributes

algorithmstr

name of the used algorithm

num_featuresint

number of features to select

selected_featureslist[str]

list with selected feature names

transformertransformer instance

transformer instance (e.g. StandardScaler)

Example

>>> from sam_ml.data.preprocessing import Selector
>>>
>>> model = Selector()
>>> print(model)
Selector()

Methods

Method

Description

get_params

Function to get the parameter from the transformer instance

params

Function to get the possible/recommended parameter values for the class

select

Select the best features from data

set_params

Function to set the parameter of the transformer instance

Selector.get_params(deep: bool = True) dict

Function to get the parameter from the transformer instance

Parameters

deepbool, default=True

If True, will return the parameters for this estimator and contained sub-objects that are estimators

Returns

params: dict

parameter names mapped to their values

static Selector.params() dict

Function to get the possible/recommended parameter values for the class

Returns

paramdict

possible values for the parameter “algorithm” and recommended values for “estimator”

Examples

>>> # get possible/recommended parameters
>>> from sam_ml.data.preprocessing import Selector
>>>
>>> # first way without class object
>>> params1 = Selector.params()
>>> print(params1)
{"algorithm": ["kbest", ...], "estimator": [LinearSVC(penalty="l1", dual=False), ...]}
>>> # second way with class object
>>> model = Selector()
>>> params2 = model.params()
>>> print(params2)
{"algorithm": ["kbest", ...], "estimator": [LinearSVC(penalty="l1", dual=False), ...]}
Selector.select(X: DataFrame, y: DataFrame | None = None, train_on: bool = True) DataFrame

Select the best features from data

Parameters

Xpd.DataFrame

X data to use for feature selection

ypd.Series, default=None

y data to use for feature selection. Only needed when train_on=True

train_onbool, default=True

If True, the estimator instance will be trained to select the best features for the given y. Otherwise, it just selects the correct columns from X.

Returns

X_selectedpd.DataFrame

X with only the selected columns

Examples

>>> # load data (replace with own data)
>>> import pandas as pd
>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>> df = load_iris()
>>> X, y = pd.DataFrame(df.data, columns=df.feature_names), pd.Series(df.target)
>>> x_train, x_test, y_train, y_test = train_test_split(X,y, train_size=0.80, random_state=42)
>>>
>>> # select features
>>> from sam_ml.data.preprocessing import Selector
>>>
>>> model = Selector(num_features=2)
>>> x_train_selected = model.select(x_train, y_train) # train selector
>>> x_test_selected = model.select(x_test, train_on=False) # select test data
>>> print("all feature names:")
>>> print(list(x_train.columns))
>>> print()
>>> print("selected features:")
>>> print(list(x_train_selected.columns))
all feature names:
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

selected features:
['petal length (cm)', 'petal width (cm)']
Selector.set_params(**params)

Function to set the parameter of the transformer instance

Parameters

**paramsdict

Estimator parameters

Returns

selfestimator instance

Estimator instance