SamplerPipelineο
- class SamplerPipeline(self, algorithm: str | list[sam_ml.data.preprocessing.sampling.Sampler] = 'SMOTE_rus_20_50')ο
Class uses multplie up- and down-sampling algorithms instead of only one - parent class Data
Parameters |
|
Attributes |
|
Example
>>> from sam_ml.data.preprocessing import SamplerPipeline
>>>
>>> model = SamplerPipeline()
>>> print(model)
SamplerPipeline(Sampler(algorithm='SMOTE', sampling_strategy=0.2, ), Sampler(algorithm='rus', sampling_strategy=0.5, ))
Methods
Method |
Description |
---|---|
Function to get the parameter from the transformer instance |
|
Function to get the recommended parameter values for the class |
|
Function for up- and downsampling |
|
Function to set the parameter of the transformer instance |
- SamplerPipeline.get_params(deep: bool = True)ο
Function to get the parameter from the transformer instance
Parametersο
- deepbool, default=True
If True, will return the parameters for this estimator and contained sub-objects that are estimators
Returnsο
- params: dict
parameter names mapped to their values
- static SamplerPipeline.params() dict ο
Function to get the recommended parameter values for the class
Returnsο
- paramdict
recommended values for the parameter βalgorithmβ
Examplesο
>>> # get possible parameters >>> from sam_ml.data.preprocessing import SamplerPipeline >>> >>> # first way without class object >>> params1 = SamplerPipeline.params() >>> print(params1) {"algorithm": ["SMOTE_rus_20_50", ...]} >>> # second way with class object >>> model = SamplerPipeline() >>> params2 = model.params() >>> print(params2) {"algorithm": ["SMOTE_rus_20_50", ...]}
- SamplerPipeline.sample(x_train: DataFrame, y_train: Series) tuple[DataFrame, Series] ο
Function for up- and downsampling
Parametersο
- x_train, y_trainpd.DataFrame, pd.Series
data to sample
Returnsο
- x_train_sampledpd.DataFrame
sampled x data
- y_train_sampledpd.Series
sampled y data
Notesο
ONLY sample the train data. NEVER all data because then you will have some samples in train as well as in test data with random splitting
Examplesο
>>> # load data (replace with own data) >>> import pandas as pd >>> from sklearn.datasets import make_classification >>> from sklearn.model_selection import train_test_split >>> X, y = make_classification(n_samples=3000, n_features=4, n_classes=2, weights=[0.9], random_state=42) >>> X, y = pd.DataFrame(X, columns=["col1", "col2", "col3", "col4"]), pd.Series(y) >>> x_train, x_test, y_train, y_test = train_test_split(X,y, train_size=0.80, random_state=42) >>> >>> # sample data >>> from sam_ml.data.preprocessing import SamplerPipeline >>> model = SamplerPipeline() >>> x_train_sampled, y_train_sampled = model.sample(x_train, y_train) >>> print("before sampling:") >>> print(y_train.value_counts()) >>> print() >>> print("after sampling:") >>> print(y_train_sampled.value_counts()) before sampling: 0 2140 1 260 Name: count, dtype: int64 after sampling: 0 856 1 428 Name: count, dtype: int64