AutoML classο
- class AutoML(self, models: str | list, vectorizer: str | sam_ml.data.preprocessing.embeddings.Embeddings_builder | None | list[str | sam_ml.data.preprocessing.embeddings.Embeddings_builder | None], scaler: str | sam_ml.data.preprocessing.scaler.Scaler | None | list[str | sam_ml.data.preprocessing.scaler.Scaler | None], selector: str | tuple[str, int] | sam_ml.data.preprocessing.feature_selection.Selector | None | list[str | tuple[str, int] | sam_ml.data.preprocessing.feature_selection.Selector | None], sampler: str | sam_ml.data.preprocessing.sampling.Sampler | sam_ml.data.preprocessing.sampling_pipeline.SamplerPipeline | None | list[str | sam_ml.data.preprocessing.sampling.Sampler | sam_ml.data.preprocessing.sampling_pipeline.SamplerPipeline | None])ο
Auto-ML parent class {abstract} - parent class object
Parameters |
models : str or list
|
Attributes |
|
Note
If a list is provided for one or multiple of the preprocessing steps, all model with preprocessing steps combination will be added as pipelines
Methods
Method |
Description |
---|---|
little function to play a microwave sound |
|
Function to sort a dict by a given list of keys |
|
Function for adding model in self.models |
|
Function to train and evaluate every model |
|
Function to run a cross validation on every model |
|
Function to run a successive halving hyperparameter search for every model |
|
Function to run a random cross validation hyperparameter search for every model |
|
Function to run a Hyperparametertuning with SMAC library HyperparameterOptimizationFacade for every model [can only be used in the sam_ml version with swig] |
|
Function for mapping string to set of models |
|
Function to output self.scores as pd.DataFrame |
|
Function for deleting model in self.models |
- static AutoML._AutoML__finish_sound()ο
little function to play a microwave sound
- static AutoML._AutoML__sort_dict(scores: dict, sort_by: list[str]) DataFrame ο
Function to sort a dict by a given list of keys
Parametersο
- scoresdict
dictionary with scores
- sorted_bylist[str]
keys to sort the scores by. You can provide also keys that are not in scores and they will be filtered out.
Returnsο
- scores_dfpd.DataFrame
sorted dataframe of scores
- AutoML.add_model(model)ο
Function for adding model in self.models
Parametersο
- modelestimator instance
add model instance to self.models
- AutoML.eval_models(x_train: DataFrame, y_train: Series, x_test: DataFrame, y_test: Series, scoring: str | Callable, **kwargs) dict[str, dict] ο
Function to train and evaluate every model
Parametersο
- x_train, y_trainpd.DataFrame, pd.Series
Data to train the models
- x_test, y_testpd.DataFrame, pd.Series
Data to evaluate the models
- scoringstr or callable (custom score)
metrics to evaluate the models
custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)
- **kwargs:
additional parameters from child-class for
evaluate
method of models
Returnsο
- scoresdict[str, dict]
dictionary with scores for every model as dictionary
also saves metrics in self.scores
Notesο
if you interrupt the keyboard during the run of eval_models, the interim result will be returned
- AutoML.eval_models_cv(X: DataFrame, y: Series, cv_num: int, small_data_eval: bool, custom_score: Callable | None, **kwargs) dict[str, dict] ο
Function to run a cross validation on every model
Parametersο
- X, ypd.DataFrame, pd.Series
Data to cross validate on
- cv_numint
number of different random splits (only used when
small_data_eval=False
)- small_data_evalbool
if True, cross_validation_small_data will be used (one-vs-all evaluation). Otherwise, random split cross validation
- custom_scorecallable or None
custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)
If
None
, no custom score will be calculated and also the key βcustom_scoreβ does not exist in the returned dictionary.- **kwargs:
additional parameters from child-class for
cross validation
methods of models
Returnsο
- scoresdict[str, dict]
dictionary with scores for every model as dictionary
also saves metrics in self.scores
Notesο
if you interrupt the keyboard during the run of eval_models_cv, the interim result will be returned
- AutoML.find_best_model_mass_search(x_train: DataFrame, y_train: Series, x_test: DataFrame, y_test: Series, n_trails: int, scoring: str | Callable, leave_loadbar: bool, save_results_path: str | None, **kwargs) tuple[str, dict[str, float]] ο
Function to run a successive halving hyperparameter search for every model
It uses the
warm_start
parameter of the model and is an own implementation. Recommended to use as a fast method to narrow down different preprocessing steps and model combinations, butfind_best_model_smac
orrandomCVsearch
return better results.Parametersο
- x_train, y_trainpd.DataFrame, pd.Series
Data to train and optimise the models
- x_test, y_testpd.DataFrame, pd.Series
Data to evaluate the models
- n_trailsint
max number of parameter sets to test for each model
- scoringstr or callable (custom score)
metrics to evaluate the models
custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)
- leave_loadbarbool
shall the loading bar of the model training during the different splits be visible after training (True - load bar will still be visible)
- save_result_pathstr or None
path to use for saving the results after each step. If
None
no results will be saved- **kwargs:
additional parameters from child-class for
train_warm_start
,evaluate
, andevaluate_score
method of models
Returnsο
- best_model_namestr
name of the best model in search
- scoredict[str, float]
scores of the best model
- AutoML.find_best_model_randomCV(x_train: DataFrame, y_train: Series, x_test: DataFrame, y_test: Series, n_trails: int, cv_num: int, scoring: str | Callable, small_data_eval: bool, leave_loadbar: bool, **kwargs) dict[str, dict] ο
Function to run a random cross validation hyperparameter search for every model
Parametersο
- x_train, y_trainpd.DataFrame, pd.Series
Data to train and optimise the models
- x_test, y_testpd.DataFrame, pd.Series
Data to evaluate the models
- n_trailsint
max number of parameter sets to test
- cv_numint
number of different random splits (only used when
small_data_eval=False
)- scoringstr or callable (custom score)
metrics to evaluate the models
custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)
- small_data_evalbool
if True: trains model on all datapoints except one and does this for all datapoints (recommended for datasets with less than 150 datapoints)
- leave_loadbarbool
shall the loading bar of the randomCVsearch of each individual model be visible after training (True - load bar will still be visible)
- **kwargs:
additional parameters from child-class for
randomCVsearch
andevaluate
method of models
Returnsο
- scoresdict[str, dict]
dictionary with scores for every model as dictionary
also saves metrics in self.scores
Notesο
If you interrupt the keyboard during the run of randomCVsearch of a model, the interim result for this model will be used and the next model starts.
- AutoML.find_best_model_smac(x_train: DataFrame, y_train: Series, x_test: DataFrame, y_test: Series, n_trails: int, cv_num: int, scoring: str | Callable, small_data_eval: bool, walltime_limit_per_modeltype: int, smac_log_level: int, **kwargs) dict[str, dict] ο
Function to run a Hyperparametertuning with SMAC library HyperparameterOptimizationFacade for every model [can only be used in the sam_ml version with swig]
The smac_search-method will more βintelligentβ search your hyperparameter space than the randomCVsearch and returns the best hyperparameter set. Additionally to the n_trails parameter, it also takes a walltime_limit parameter that defines the maximum time in seconds that the search will take.
Parametersο
- x_train, y_trainpd.DataFrame, pd.Series
Data to train and optimise the models
- x_test, y_testpd.DataFrame, pd.Series
Data to evaluate the models
- n_trailsint
max number of parameter sets to test for each model
- cv_numint
number of different random splits (only used when
small_data_eval=False
)- scoringstr or callable (custom score)
metrics to evaluate the models
custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)
- small_data_evalbool
if True: trains model on all datapoints except one and does this for all datapoints (recommended for datasets with less than 150 datapoints)
- walltime_limit_per_modeltypeint
the maximum time in seconds that SMAC is allowed to run for each model
- smac_log_levelint
10 - DEBUG, 20 - INFO, 30 - WARNING, 40 - ERROR, 50 - CRITICAL (SMAC3 library log levels)
- **kwargs:
additional parameters from child-class for
smac_search
andevaluate
method of models
Returnsο
- scoresdict[str, dict]
dictionary with scores for every model as dictionary
also saves metrics in self.scores
- abstract static AutoML.model_combs(kind: str) list ο
Function for mapping string to set of models
Parametersο
- kindstr
which kind of model set to use:
- βallβ:
use all models
β¦
Returnsο
- modelslist
list of model instances
- AutoML.output_scores_as_pd(sort_by: str | list[str], console_out: bool) DataFrame ο
Function to output self.scores as pd.DataFrame
Parametersο
- sorted_bystr or list[str]
key(s) to sort the scores by. You can provide also keys that are not in self.scores and they will be filtered out.
- console_outbool
shall the DataFrame be printed out
Returnsο
- scorespd.DataFrame
sorted DataFrame of self.scores