Pipeline Factory
The create_pipeline
function dynamically creates a machine learning pipeline based on the input model.
All functions of the model (also special ones like plot_tree
from DTC
) can be used with the pipeline. You can use the Pipeline for both Classifier and Regressor.
- sam_ml.models.create_pipeline(model: Classifier | Regressor, vectorizer: str | Embeddings_builder | None = None, scaler: str | Scaler | None = None, selector: str | tuple[str, int] | Selector | None = None, sampler: str | Sampler | SamplerPipeline | None = None, model_name: str = 'pipe') BasePipeline
Parameters
- modelClassifier or Regressor class object
Model used in pipeline (
Classifier
orRegressor
)- vectorizerstr, Embeddings_builder, or None
object or algorithm of
Embeddings_builder
class which will be used for automatic string column vectorizing (None for no vectorizing)- scalerstr, Scaler, or None
object or algorithm of
Scaler
class for scaling the data (None for no scaling)- selectorstr, Selector, or None
object or algorithm of
Selector
class for feature selection (None for no selecting)- samplerstr, Sampler, SamplerPipeline, or None
object or algorithm of
Sampler
/SamplerPipeline
class for sampling the train data (None for no sampling). For Regressor model, alwaysNone
(will be implemented in the future).- model_namestr
name of the model
Returns
DynamicPipeline object which inherits from the model parent class and BasePipeline
Examples
>>> # load data (replace with own data) >>> import pandas as pd >>> from sklearn.datasets import make_classification >>> from sklearn.model_selection import train_test_split >>> X, y = make_classification(n_samples=3000, n_features=4, n_classes=2, weights=[0.9], random_state=42) >>> X, y = pd.DataFrame(X, columns=["col1", "col2", "col3", "col4"]), pd.Series(y) >>> x_train, x_test, y_train, y_test = train_test_split(X,y, train_size=0.80, random_state=42) >>> >>> # train and evaluate model pipeline >>> from sam_ml.models import create_pipeline >>> from sam_ml.models.classifier import LR >>> >>> model = create_pipeline(LR(), scaler="standard", sampler="SMOTE_rus_20_50") >>> model.train(x_train, y_train) >>> scores = model.evaluate(x_test, y_test) Train score: 0.9625 - Train time: 0:00:00 accuracy: 0.9583333333333334 precision: 0.8563762626262625 recall: 0.9377241446156828 s_score: 0.9603691957893064 l_score: 0.9989822522866367 classification report: precision recall f1-score support 0 0.99 0.96 0.98 543 1 0.72 0.91 0.81 57 accuracy 0.96 600 macro avg 0.86 0.94 0.89 600 weighted avg 0.97 0.96 0.96 600