Pipeline Factory

The create_pipeline function dynamically creates a machine learning pipeline based on the input model. All functions of the model (also special ones like plot_tree from DTC) can be used with the pipeline. You can use the Pipeline for both Classifier and Regressor.

Parameters

modelClassifier or Regressor class object: Model used in pipeline (Classifier or Regressor)
vectorizerstr, Embeddings_builder, or None: object or algorithm of Embeddings_builder class which will be used for automatic string column vectorizing (None for no vectorizing)
scalerstr, Scaler, or None: object or algorithm of Scaler class for scaling the data (None for no scaling)
selectorstr, Selector, or None: object or algorithm of Selector class for feature selection (None for no selecting)
samplerstr, Sampler, SamplerPipeline, or None: object or algorithm of Sampler / SamplerPipeline class for sampling the train data (None for no sampling). For Regressor model, always None (will be implemented in the future).
model_namestr: name of the model

Returns

DynamicPipeline object which inherits from the model parent class and BasePipeline

Examples

>>> # load data (replace with own data)
>>> import pandas as pd
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> X, y = make_classification(n_samples=3000, n_features=4, n_classes=2, weights=[0.9], random_state=42)
>>> X, y = pd.DataFrame(X, columns=["col1", "col2", "col3", "col4"]), pd.Series(y)
>>> x_train, x_test, y_train, y_test = train_test_split(X,y, train_size=0.80, random_state=42)
>>> 
>>> # train and evaluate model pipeline
>>> from sam_ml.models import create_pipeline
>>> from sam_ml.models.classifier import LR
>>>
>>> model = create_pipeline(LR(), scaler="standard", sampler="SMOTE_rus_20_50")
>>> model.train(x_train, y_train)
>>> scores = model.evaluate(x_test, y_test)
Train score: 0.9625 - Train time: 0:00:00
accuracy: 0.9583333333333334
precision: 0.8563762626262625
recall: 0.9377241446156828
s_score: 0.9603691957893064
l_score: 0.9989822522866367

classification report: 
                precision   recall  f1-score    support

        0       0.99        0.96    0.98        543
        1       0.72        0.91    0.81        57

accuracy                            0.96        600
macro avg       0.86        0.94    0.89        600
weighted avg    0.97        0.96    0.96        600