AutoML class

class AutoML(self, models: str | list, vectorizer: str | sam_ml.data.preprocessing.embeddings.Embeddings_builder | None | list[str | sam_ml.data.preprocessing.embeddings.Embeddings_builder | None], scaler: str | sam_ml.data.preprocessing.scaler.Scaler | None | list[str | sam_ml.data.preprocessing.scaler.Scaler | None], selector: str | tuple[str, int] | sam_ml.data.preprocessing.feature_selection.Selector | None | list[str | tuple[str, int] | sam_ml.data.preprocessing.feature_selection.Selector | None], sampler: str | sam_ml.data.preprocessing.sampling.Sampler | sam_ml.data.preprocessing.sampling_pipeline.SamplerPipeline | None | list[str | sam_ml.data.preprocessing.sampling.Sampler | sam_ml.data.preprocessing.sampling_pipeline.SamplerPipeline | None])

Auto-ML parent class {abstract} - parent class object

Parameters

models : str or list

string of model set from model_combs method

list of Wrapperclass models from sam_ml library

vectorizerstr, Embeddings_builder, or None: object or algorithm of Embeddings_builder class which will be used for automatic string column vectorizing (None for no vectorizing)
scalerstr, Scaler, or None: object or algorithm of Scaler class for scaling the data (None for no scaling)
selectorstr, Selector, or None: object, tuple of algorithm and feature number, or algorithm of Selector class for feature selection (None for no selecting)
samplerstr, Sampler, SamplerPipeline, or None: object or algorithm of Sampler / SamplerPipeline class for sampling the train data (None for no sampling)

Attributes

modelsdict: dictionary with model names as keys and model instances as values
scoresdict[str, float]: dictionary with scores for every model as dictionary

Note

If a list is provided for one or multiple of the preprocessing steps, all model with preprocessing steps combination will be added as pipelines

Methods

Method	Description
`_AutoML__finish_sound`	little function to play a microwave sound
`_AutoML__sort_dict`	Function to sort a dict by a given list of keys
`add_model`	Function for adding model in self.models
`eval_models`	Function to train and evaluate every model
`eval_models_cv`	Function to run a cross validation on every model
`find_best_model_mass_search`	Function to run a successive halving hyperparameter search for every model
`find_best_model_randomCV`	Function to run a random cross validation hyperparameter search for every model
`find_best_model_smac`	Function to run a Hyperparametertuning with SMAC library HyperparameterOptimizationFacade for every model [can only be used in the sam_ml version with swig]
`model_combs`	Function for mapping string to set of models
`output_scores_as_pd`	Function to output self.scores as pd.DataFrame
`remove_model`	Function for deleting model in self.models

static AutoML._AutoML__finish_sound(): little function to play a microwave sound

static AutoML._AutoML__sort_dict(scores: dict, sort_by: list[str]) → DataFrame

Function to sort a dict by a given list of keys

Parameters

scoresdict: dictionary with scores
sorted_bylist[str]: keys to sort the scores by. You can provide also keys that are not in scores and they will be filtered out.

Returns

scores_dfpd.DataFrame: sorted dataframe of scores

AutoML.add_model(model)

Function for adding model in self.models

Parameters

modelestimator instance: add model instance to self.models

AutoML.eval_models(x_train: DataFrame, y_train: Series, x_test: DataFrame, y_test: Series, scoring: str | Callable, **kwargs) → dict[str, dict]

Function to train and evaluate every model

Parameters

x_train, y_trainpd.DataFrame, pd.Series

Data to train the models

x_test, y_testpd.DataFrame, pd.Series

Data to evaluate the models

scoringstr or callable (custom score)

metrics to evaluate the models

custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)

**kwargs:

additional parameters from child-class for evaluate method of models

Returns

scoresdict[str, dict]: dictionary with scores for every model as dictionary

also saves metrics in self.scores

Notes

if you interrupt the keyboard during the run of eval_models, the interim result will be returned

AutoML.eval_models_cv(X: DataFrame, y: Series, cv_num: int, small_data_eval: bool, custom_score: Callable | None, **kwargs) → dict[str, dict]

Function to run a cross validation on every model

Parameters

X, ypd.DataFrame, pd.Series

Data to cross validate on

cv_numint

number of different random splits (only used when small_data_eval=False)

small_data_evalbool

if True, cross_validation_small_data will be used (one-vs-all evaluation). Otherwise, random split cross validation

custom_scorecallable or None

custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)

If None, no custom score will be calculated and also the key “custom_score” does not exist in the returned dictionary.

**kwargs:

additional parameters from child-class for cross validation methods of models

Returns

scoresdict[str, dict]: dictionary with scores for every model as dictionary

also saves metrics in self.scores

Notes

if you interrupt the keyboard during the run of eval_models_cv, the interim result will be returned

AutoML.find_best_model_mass_search(x_train: DataFrame, y_train: Series, x_test: DataFrame, y_test: Series, n_trails: int, scoring: str | Callable, leave_loadbar: bool, save_results_path: str | None, **kwargs) → tuple[str, dict[str, float]]

Function to run a successive halving hyperparameter search for every model

It uses the warm_start parameter of the model and is an own implementation. Recommended to use as a fast method to narrow down different preprocessing steps and model combinations, but find_best_model_smac or randomCVsearch return better results.

Parameters

x_train, y_trainpd.DataFrame, pd.Series

Data to train and optimise the models

x_test, y_testpd.DataFrame, pd.Series

Data to evaluate the models

n_trailsint

max number of parameter sets to test for each model

scoringstr or callable (custom score)

metrics to evaluate the models

custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)

leave_loadbarbool

shall the loading bar of the model training during the different splits be visible after training (True - load bar will still be visible)

save_result_pathstr or None

path to use for saving the results after each step. If None no results will be saved

**kwargs:

additional parameters from child-class for train_warm_start, evaluate, and evaluate_score method of models

Returns

best_model_namestr: name of the best model in search
scoredict[str, float]: scores of the best model

AutoML.find_best_model_randomCV(x_train: DataFrame, y_train: Series, x_test: DataFrame, y_test: Series, n_trails: int, cv_num: int, scoring: str | Callable, small_data_eval: bool, leave_loadbar: bool, **kwargs) → dict[str, dict]

Function to run a random cross validation hyperparameter search for every model

Parameters

x_train, y_trainpd.DataFrame, pd.Series

Data to train and optimise the models

x_test, y_testpd.DataFrame, pd.Series

Data to evaluate the models

n_trailsint

max number of parameter sets to test

cv_numint

number of different random splits (only used when small_data_eval=False)

scoringstr or callable (custom score)

metrics to evaluate the models

custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)

small_data_evalbool

if True: trains model on all datapoints except one and does this for all datapoints (recommended for datasets with less than 150 datapoints)

leave_loadbarbool

shall the loading bar of the randomCVsearch of each individual model be visible after training (True - load bar will still be visible)

**kwargs:

additional parameters from child-class for randomCVsearch and evaluate method of models

Returns

scoresdict[str, dict]: dictionary with scores for every model as dictionary

also saves metrics in self.scores

Notes

If you interrupt the keyboard during the run of randomCVsearch of a model, the interim result for this model will be used and the next model starts.

AutoML.find_best_model_smac(x_train: DataFrame, y_train: Series, x_test: DataFrame, y_test: Series, n_trails: int, cv_num: int, scoring: str | Callable, small_data_eval: bool, walltime_limit_per_modeltype: int, smac_log_level: int, **kwargs) → dict[str, dict]

Function to run a Hyperparametertuning with SMAC library HyperparameterOptimizationFacade for every model [can only be used in the sam_ml version with swig]

The smac_search-method will more “intelligent” search your hyperparameter space than the randomCVsearch and returns the best hyperparameter set. Additionally to the n_trails parameter, it also takes a walltime_limit parameter that defines the maximum time in seconds that the search will take.

Parameters

x_train, y_trainpd.DataFrame, pd.Series

Data to train and optimise the models

x_test, y_testpd.DataFrame, pd.Series

Data to evaluate the models

n_trailsint

max number of parameter sets to test for each model

cv_numint

number of different random splits (only used when small_data_eval=False)

scoringstr or callable (custom score)

metrics to evaluate the models

custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)

small_data_evalbool

if True: trains model on all datapoints except one and does this for all datapoints (recommended for datasets with less than 150 datapoints)

walltime_limit_per_modeltypeint

the maximum time in seconds that SMAC is allowed to run for each model

smac_log_levelint

10 - DEBUG, 20 - INFO, 30 - WARNING, 40 - ERROR, 50 - CRITICAL (SMAC3 library log levels)

**kwargs:

additional parameters from child-class for smac_search and evaluate method of models

Returns

scoresdict[str, dict]: dictionary with scores for every model as dictionary

also saves metrics in self.scores

abstract static AutoML.model_combs(kind: str) → list

Function for mapping string to set of models

Parameters

kindstr

which kind of model set to use:

“all”:
use all models
…

Returns

modelslist: list of model instances

AutoML.output_scores_as_pd(sort_by: str | list[str], console_out: bool) → DataFrame

Function to output self.scores as pd.DataFrame

Parameters

sorted_bystr or list[str]: key(s) to sort the scores by. You can provide also keys that are not in self.scores and they will be filtered out.
console_outbool: shall the DataFrame be printed out

Returns

scorespd.DataFrame: sorted DataFrame of self.scores

AutoML.remove_model(model_name: str)

Function for deleting model in self.models

Parameters

model_namestr: name of model in self.models