AutoML class

class AutoML(self, models: str | list, vectorizer: str | sam_ml.data.preprocessing.embeddings.Embeddings_builder | None | list[str | sam_ml.data.preprocessing.embeddings.Embeddings_builder | None], scaler: str | sam_ml.data.preprocessing.scaler.Scaler | None | list[str | sam_ml.data.preprocessing.scaler.Scaler | None], selector: str | tuple[str, int] | sam_ml.data.preprocessing.feature_selection.Selector | None | list[str | tuple[str, int] | sam_ml.data.preprocessing.feature_selection.Selector | None], sampler: str | sam_ml.data.preprocessing.sampling.Sampler | sam_ml.data.preprocessing.sampling_pipeline.SamplerPipeline | None | list[str | sam_ml.data.preprocessing.sampling.Sampler | sam_ml.data.preprocessing.sampling_pipeline.SamplerPipeline | None])

Auto-ML parent class {abstract} - parent class object

Parameters

models : str or list

  • string of model set from model_combs method

  • list of Wrapperclass models from sam_ml library

vectorizerstr, Embeddings_builder, or None

object or algorithm of Embeddings_builder class which will be used for automatic string column vectorizing (None for no vectorizing)

scalerstr, Scaler, or None

object or algorithm of Scaler class for scaling the data (None for no scaling)

selectorstr, Selector, or None

object, tuple of algorithm and feature number, or algorithm of Selector class for feature selection (None for no selecting)

samplerstr, Sampler, SamplerPipeline, or None

object or algorithm of Sampler / SamplerPipeline class for sampling the train data (None for no sampling)

Attributes

modelsdict

dictionary with model names as keys and model instances as values

scoresdict[str, float]

dictionary with scores for every model as dictionary

Note

If a list is provided for one or multiple of the preprocessing steps, all model with preprocessing steps combination will be added as pipelines

Methods

Method

Description

_AutoML__finish_sound

little function to play a microwave sound

_AutoML__sort_dict

Function to sort a dict by a given list of keys

add_model

Function for adding model in self.models

eval_models

Function to train and evaluate every model

eval_models_cv

Function to run a cross validation on every model

find_best_model_mass_search

Function to run a successive halving hyperparameter search for every model

find_best_model_randomCV

Function to run a random cross validation hyperparameter search for every model

find_best_model_smac

Function to run a Hyperparametertuning with SMAC library HyperparameterOptimizationFacade for every model [can only be used in the sam_ml version with swig]

model_combs

Function for mapping string to set of models

output_scores_as_pd

Function to output self.scores as pd.DataFrame

remove_model

Function for deleting model in self.models

static AutoML._AutoML__finish_sound()

little function to play a microwave sound

static AutoML._AutoML__sort_dict(scores: dict, sort_by: list[str]) DataFrame

Function to sort a dict by a given list of keys

Parameters

scoresdict

dictionary with scores

sorted_bylist[str]

keys to sort the scores by. You can provide also keys that are not in scores and they will be filtered out.

Returns

scores_dfpd.DataFrame

sorted dataframe of scores

AutoML.add_model(model)

Function for adding model in self.models

Parameters

modelestimator instance

add model instance to self.models

AutoML.eval_models(x_train: DataFrame, y_train: Series, x_test: DataFrame, y_test: Series, scoring: str | Callable, **kwargs) dict[str, dict]

Function to train and evaluate every model

Parameters

x_train, y_trainpd.DataFrame, pd.Series

Data to train the models

x_test, y_testpd.DataFrame, pd.Series

Data to evaluate the models

scoringstr or callable (custom score)

metrics to evaluate the models

custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)

**kwargs:

additional parameters from child-class for evaluate method of models

Returns

scoresdict[str, dict]

dictionary with scores for every model as dictionary

also saves metrics in self.scores

Notes

if you interrupt the keyboard during the run of eval_models, the interim result will be returned

AutoML.eval_models_cv(X: DataFrame, y: Series, cv_num: int, small_data_eval: bool, custom_score: Callable | None, **kwargs) dict[str, dict]

Function to run a cross validation on every model

Parameters

X, ypd.DataFrame, pd.Series

Data to cross validate on

cv_numint

number of different random splits (only used when small_data_eval=False)

small_data_evalbool

if True, cross_validation_small_data will be used (one-vs-all evaluation). Otherwise, random split cross validation

custom_scorecallable or None

custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)

If None, no custom score will be calculated and also the key β€œcustom_score” does not exist in the returned dictionary.

**kwargs:

additional parameters from child-class for cross validation methods of models

Returns

scoresdict[str, dict]

dictionary with scores for every model as dictionary

also saves metrics in self.scores

Notes

if you interrupt the keyboard during the run of eval_models_cv, the interim result will be returned

Function to run a successive halving hyperparameter search for every model

It uses the warm_start parameter of the model and is an own implementation. Recommended to use as a fast method to narrow down different preprocessing steps and model combinations, but find_best_model_smac or randomCVsearch return better results.

Parameters

x_train, y_trainpd.DataFrame, pd.Series

Data to train and optimise the models

x_test, y_testpd.DataFrame, pd.Series

Data to evaluate the models

n_trailsint

max number of parameter sets to test for each model

scoringstr or callable (custom score)

metrics to evaluate the models

custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)

leave_loadbarbool

shall the loading bar of the model training during the different splits be visible after training (True - load bar will still be visible)

save_result_pathstr or None

path to use for saving the results after each step. If None no results will be saved

**kwargs:

additional parameters from child-class for train_warm_start, evaluate, and evaluate_score method of models

Returns

best_model_namestr

name of the best model in search

scoredict[str, float]

scores of the best model

AutoML.find_best_model_randomCV(x_train: DataFrame, y_train: Series, x_test: DataFrame, y_test: Series, n_trails: int, cv_num: int, scoring: str | Callable, small_data_eval: bool, leave_loadbar: bool, **kwargs) dict[str, dict]

Function to run a random cross validation hyperparameter search for every model

Parameters

x_train, y_trainpd.DataFrame, pd.Series

Data to train and optimise the models

x_test, y_testpd.DataFrame, pd.Series

Data to evaluate the models

n_trailsint

max number of parameter sets to test

cv_numint

number of different random splits (only used when small_data_eval=False)

scoringstr or callable (custom score)

metrics to evaluate the models

custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)

small_data_evalbool

if True: trains model on all datapoints except one and does this for all datapoints (recommended for datasets with less than 150 datapoints)

leave_loadbarbool

shall the loading bar of the randomCVsearch of each individual model be visible after training (True - load bar will still be visible)

**kwargs:

additional parameters from child-class for randomCVsearch and evaluate method of models

Returns

scoresdict[str, dict]

dictionary with scores for every model as dictionary

also saves metrics in self.scores

Notes

If you interrupt the keyboard during the run of randomCVsearch of a model, the interim result for this model will be used and the next model starts.

AutoML.find_best_model_smac(x_train: DataFrame, y_train: Series, x_test: DataFrame, y_test: Series, n_trails: int, cv_num: int, scoring: str | Callable, small_data_eval: bool, walltime_limit_per_modeltype: int, smac_log_level: int, **kwargs) dict[str, dict]

Function to run a Hyperparametertuning with SMAC library HyperparameterOptimizationFacade for every model [can only be used in the sam_ml version with swig]

The smac_search-method will more β€œintelligent” search your hyperparameter space than the randomCVsearch and returns the best hyperparameter set. Additionally to the n_trails parameter, it also takes a walltime_limit parameter that defines the maximum time in seconds that the search will take.

Parameters

x_train, y_trainpd.DataFrame, pd.Series

Data to train and optimise the models

x_test, y_testpd.DataFrame, pd.Series

Data to evaluate the models

n_trailsint

max number of parameter sets to test for each model

cv_numint

number of different random splits (only used when small_data_eval=False)

scoringstr or callable (custom score)

metrics to evaluate the models

custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)

small_data_evalbool

if True: trains model on all datapoints except one and does this for all datapoints (recommended for datasets with less than 150 datapoints)

walltime_limit_per_modeltypeint

the maximum time in seconds that SMAC is allowed to run for each model

smac_log_levelint

10 - DEBUG, 20 - INFO, 30 - WARNING, 40 - ERROR, 50 - CRITICAL (SMAC3 library log levels)

**kwargs:

additional parameters from child-class for smac_search and evaluate method of models

Returns

scoresdict[str, dict]

dictionary with scores for every model as dictionary

also saves metrics in self.scores

abstract static AutoML.model_combs(kind: str) list

Function for mapping string to set of models

Parameters

kindstr

which kind of model set to use:

  • β€œall”:

    use all models

  • …

Returns

modelslist

list of model instances

AutoML.output_scores_as_pd(sort_by: str | list[str], console_out: bool) DataFrame

Function to output self.scores as pd.DataFrame

Parameters

sorted_bystr or list[str]

key(s) to sort the scores by. You can provide also keys that are not in self.scores and they will be filtered out.

console_outbool

shall the DataFrame be printed out

Returns

scorespd.DataFrame

sorted DataFrame of self.scores

AutoML.remove_model(model_name: str)

Function for deleting model in self.models

Parameters

model_namestr

name of model in self.models