Selector
- class Selector(self, algorithm: Literal['kbest', 'kbest_chi2', 'pca', 'wrapper', 'sequential', 'select_model', 'rfe', 'rfecv'] = 'kbest', num_features: int = 10, estimator=LinearSVC(dual=False, penalty='l1'), **kwargs)
feature selection algorithm Wrapper class - parent class Data
Parameters |
|
Attributes |
|
Example
>>> from sam_ml.data.preprocessing import Selector
>>>
>>> model = Selector()
>>> print(model)
Selector()
Methods
Method |
Description |
---|---|
Function to get the parameter from the transformer instance |
|
Function to get the possible/recommended parameter values for the class |
|
Select the best features from data |
|
Function to set the parameter of the transformer instance |
- Selector.get_params(deep: bool = True) dict
Function to get the parameter from the transformer instance
Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained sub-objects that are estimators
Returns
- params: dict
parameter names mapped to their values
- static Selector.params() dict
Function to get the possible/recommended parameter values for the class
Returns
- paramdict
possible values for the parameter “algorithm” and recommended values for “estimator”
Examples
>>> # get possible/recommended parameters >>> from sam_ml.data.preprocessing import Selector >>> >>> # first way without class object >>> params1 = Selector.params() >>> print(params1) {"algorithm": ["kbest", ...], "estimator": [LinearSVC(penalty="l1", dual=False), ...]} >>> # second way with class object >>> model = Selector() >>> params2 = model.params() >>> print(params2) {"algorithm": ["kbest", ...], "estimator": [LinearSVC(penalty="l1", dual=False), ...]}
- Selector.select(X: DataFrame, y: DataFrame | None = None, train_on: bool = True) DataFrame
Select the best features from data
Parameters
- Xpd.DataFrame
X data to use for feature selection
- ypd.Series, default=None
y data to use for feature selection. Only needed when
train_on=True
- train_onbool, default=True
If
True
, the estimator instance will be trained to select the best features for the given y. Otherwise, it just selects the correct columns from X.
Returns
- X_selectedpd.DataFrame
X with only the selected columns
Examples
>>> # load data (replace with own data) >>> import pandas as pd >>> from sklearn.datasets import load_iris >>> from sklearn.model_selection import train_test_split >>> df = load_iris() >>> X, y = pd.DataFrame(df.data, columns=df.feature_names), pd.Series(df.target) >>> x_train, x_test, y_train, y_test = train_test_split(X,y, train_size=0.80, random_state=42) >>> >>> # select features >>> from sam_ml.data.preprocessing import Selector >>> >>> model = Selector(num_features=2) >>> x_train_selected = model.select(x_train, y_train) # train selector >>> x_test_selected = model.select(x_test, train_on=False) # select test data >>> print("all feature names:") >>> print(list(x_train.columns)) >>> print() >>> print("selected features:") >>> print(list(x_train_selected.columns)) all feature names: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'] selected features: ['petal length (cm)', 'petal width (cm)']