Sampler

class Sampler(self, algorithm: Literal['SMOTE', 'BSMOTE', 'rus', 'ros', 'tl', 'nm', 'cc', 'oss'] = 'ros', random_state: int = 42, sampling_strategy: str | float = 'auto', **kwargs)

sample algorithm Wrapper class - parent class Data

Parameters

algorithm{“SMOTE”, “BSMOTE”, “rus”, “ros”, “tl”, “nm”, “cc”, “oss”}, defautl=”ros

which sampling algorithm to use: - SMOTE: Synthetic Minority Oversampling Technique (upsampling) - BSMOTE: BorderlineSMOTE (upsampling) - ros: RandomOverSampler (upsampling) (default) - rus: RandomUnderSampler (downsampling) - tl: TomekLinks (cleaning downsampling) - nm: NearMiss (downsampling) - cc: ClusterCentroids (downsampling) - oss: OneSidedSelection (cleaning downsampling)

random_stateint, default=42

seed for random sampling

sampling_strategystr or float, default=”auto”

percentage of class size of minority in relation to the class size of the majority

**kwargs:

additional parameters for sampler

Attributes

algorithmstr

name of the used algorithm

transformertransformer instance

transformer instance (e.g. StandardScaler)

Example

>>> from sam_ml.data.preprocessing import Sampler
>>>
>>> model = Sampler()
>>> print(model)
Sampler()

Methods

Method

Description

get_params

Function to get the parameter from the transformer instance

params

Function to get the possible parameter values for the class

sample

Function for up- and downsampling

set_params

Function to set the parameter of the transformer instance

Sampler.get_params(deep: bool = True) dict

Function to get the parameter from the transformer instance

Parameters

deepbool, default=True

If True, will return the parameters for this estimator and contained sub-objects that are estimators

Returns

params: dict

parameter names mapped to their values

static Sampler.params() dict

Function to get the possible parameter values for the class

Returns

paramdict

possible values for the parameter “algorithm”

Examples

>>> # get possible parameters
>>> from sam_ml.data.preprocessing import Sampler
>>>
>>> # first way without class object
>>> params1 = Sampler.params()
>>> print(params1)
{"algorithm": ["ros", ...]}
>>> # second way with class object
>>> model = Sampler()
>>> params2 = model.params()
>>> print(params2)
{"algorithm": ["ros", ...]}
Sampler.sample(x_train: DataFrame, y_train: Series) tuple[DataFrame, Series]

Function for up- and downsampling

Parameters

x_train, y_trainpd.DataFrame, pd.Series

data to sample

Returns

x_train_sampledpd.DataFrame

sampled x data

y_train_sampledpd.Series

sampled y data

Notes

ONLY sample the train data. NEVER all data because then you will have some samples in train as well as in test data with random splitting

Examples

>>> # load data (replace with own data)
>>> import pandas as pd
>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>> df = load_iris()
>>> X, y = pd.DataFrame(df.data, columns=df.feature_names), pd.Series(df.target)
>>> x_train, x_test, y_train, y_test = train_test_split(X,y, train_size=0.80, random_state=42)
>>>
>>> # sample data
>>> from sam_ml.data.preprocessing import Sampler
>>>
>>> model = Sampler()
>>> x_train_sampled, y_train_sampled = model.sample(x_train, y_train)
>>> print("before sampling:")
>>> print(y_train.value_counts())
>>> print()
>>> print("after sampling:")
>>> print(y_train_sampled.value_counts())
before sampling:
1    41
0    40
2    39
Name: count, dtype: int64

after sampling:
0    41
1    41
2    41
Name: count, dtype: int64
Sampler.set_params(**params)

Function to set the parameter of the transformer instance

Parameters

**paramsdict

Estimator parameters

Returns

selfestimator instance

Estimator instance