Sampler
- class Sampler(self, algorithm: Literal['SMOTE', 'BSMOTE', 'rus', 'ros', 'tl', 'nm', 'cc', 'oss'] = 'ros', random_state: int = 42, sampling_strategy: str | float = 'auto', **kwargs)
sample algorithm Wrapper class - parent class Data
Parameters |
|
Attributes |
|
Example
>>> from sam_ml.data.preprocessing import Sampler
>>>
>>> model = Sampler()
>>> print(model)
Sampler()
Methods
Method |
Description |
---|---|
Function to get the parameter from the transformer instance |
|
Function to get the possible parameter values for the class |
|
Function for up- and downsampling |
|
Function to set the parameter of the transformer instance |
- Sampler.get_params(deep: bool = True) dict
Function to get the parameter from the transformer instance
Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained sub-objects that are estimators
Returns
- params: dict
parameter names mapped to their values
- static Sampler.params() dict
Function to get the possible parameter values for the class
Returns
- paramdict
possible values for the parameter “algorithm”
Examples
>>> # get possible parameters >>> from sam_ml.data.preprocessing import Sampler >>> >>> # first way without class object >>> params1 = Sampler.params() >>> print(params1) {"algorithm": ["ros", ...]} >>> # second way with class object >>> model = Sampler() >>> params2 = model.params() >>> print(params2) {"algorithm": ["ros", ...]}
- Sampler.sample(x_train: DataFrame, y_train: Series) tuple[DataFrame, Series]
Function for up- and downsampling
Parameters
- x_train, y_trainpd.DataFrame, pd.Series
data to sample
Returns
- x_train_sampledpd.DataFrame
sampled x data
- y_train_sampledpd.Series
sampled y data
Notes
ONLY sample the train data. NEVER all data because then you will have some samples in train as well as in test data with random splitting
Examples
>>> # load data (replace with own data) >>> import pandas as pd >>> from sklearn.datasets import load_iris >>> from sklearn.model_selection import train_test_split >>> df = load_iris() >>> X, y = pd.DataFrame(df.data, columns=df.feature_names), pd.Series(df.target) >>> x_train, x_test, y_train, y_test = train_test_split(X,y, train_size=0.80, random_state=42) >>> >>> # sample data >>> from sam_ml.data.preprocessing import Sampler >>> >>> model = Sampler() >>> x_train_sampled, y_train_sampled = model.sample(x_train, y_train) >>> print("before sampling:") >>> print(y_train.value_counts()) >>> print() >>> print("after sampling:") >>> print(y_train_sampled.value_counts()) before sampling: 1 41 0 40 2 39 Name: count, dtype: int64 after sampling: 0 41 1 41 2 41 Name: count, dtype: int64