Model class
- class Model(self, model_object, model_name: str, model_type: str, grid: ConfigSpace.configuration_space.ConfigurationSpace)
Model parent class {abstract} - parent class object
Parameters |
|
Attributes |
|
Methods
Method |
Description |
---|---|
Function to get parameters that differ from the default ones |
|
Function to create multiple scores for given y_true-y_pred pairs |
|
Calculate a score for given y true and y prediction values |
|
Function to create from the crossvalidation results a dictionary |
|
Function to create a dictionary with scorer for the crossvalidation |
|
Function to print out the values of a dictionary |
|
Random split crossvalidation |
|
One-vs-all cross validation for small datasets |
|
Function to create multiple scores with predict function of model |
|
Function to create a score with self.__get_score of model |
|
Function to generate a matplotlib plot of the top45 feature importance from the model. |
|
Function to fit the model |
|
Function to warm_start fit the model |
|
Function to create a deepcopy of object |
|
Function to get the parameter from the model object |
|
Function to generate one grid configuration |
|
Function to generate grid configurations |
|
Function to load a pickled model class object |
|
Function to predict with predict-method from model object |
|
Function to predict with predict_proba-method from model object |
|
Hyperparametertuning with randomCVsearch |
|
Function to replace self.grid |
|
Function to pickle and save the class object |
|
Function to set the parameter of the model object |
|
Hyperparametertuning with SMAC library HyperparameterOptimizationFacade [can only be used in the sam_ml version with swig] |
|
Function to train the model |
|
Function to warm_start train the model |
- Model._changed_parameters()
Function to get parameters that differ from the default ones
Returns
dictionary of model parameter that are different from default values
- abstract Model._get_all_scores(y_test: Series, pred: list, custom_score: Callable, **kwargs) dict[str, float]
Function to create multiple scores for given y_true-y_pred pairs
Parameters
- y_test, predpd.Series, list
Data to evaluate model
- custom_scorecallable or None
custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)
If
None
, no custom score will be calculated and also the key “custom_score” does not exist in the returned dictionary.- **kwargs:
additional parameters from child-class
Returns
- scoresdict
dictionary with score names as keys and score values as values
- abstract Model._get_score(scoring: str, y_test: Series, pred: list, **kwargs) float
Calculate a score for given y true and y prediction values
Parameters
- scoring{“accuracy”, “precision”, “recall”, “s_score”, “l_score”} or callable (custom score), default=”accuracy”
metrics to evaluate the models
custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)
- y_test, predpd.Series, pd.Series
Data to evaluate model
Returns
- scorefloat
metrics score value
- abstract Model._make_cv_scores(score: dict, custom_score: Callable | None) dict[str, float]
Function to create from the crossvalidation results a dictionary
Parameters
- scoredict
crossvalidation average column results
- custom_scorecallable or None
custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)
If
None
, no custom score will be calculated and also the key “custom_score” does not exist in the returned dictionary.
Returns
- cv_scoresdict
restructured dictionary
- abstract Model._make_scorer(custom_score: Callable | None, **kwargs) dict[str, Callable]
Function to create a dictionary with scorer for the crossvalidation
Parameters
- custom_scorecallable or None
custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)
If
None
, no custom score will be calculated and also the key “custom_score” does not exist in the returned dictionary.- **kwargs:
additional parameters from child-class
Returns
- scorerdict[str, Callable]
dictionary with scorer functions
- Model._print_scores(scores: dict, y_test: Series, pred: list)
Function to print out the values of a dictionary
Parameters
- scores: dict
dictionary with score names and values
- y_test, predpd.Series, list
Data to evaluate model
Returns
key-value pairs in console, format:
key1: value1
key2: value2
…
- Model.cross_validation(X: DataFrame, y: Series, cv_num: int, console_out: bool, custom_score: Callable | None, **kwargs) dict[str, float]
Random split crossvalidation
Parameters
- X, ypd.DataFrame, pd.Series
Data to cross validate on
- cv_numint
number of different random splits
- console_outbool
shall the result dataframe of the different scores for the different runs be printed
- custom_scorecallable or None
custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)
If
None
, no custom score will be calculated and also the key “custom_score” does not exist in the returned dictionary.- **kwargs:
additional parameters from child-class for
make_scorer
method
Returns
- scoresdict
dictionary of the format from the self._make_cv_scores function
The scores are also saved in
self.cv_scores
.
- Model.cross_validation_small_data(X: DataFrame, y: Series, leave_loadbar: bool, console_out: bool, custom_score: Callable | None, **kwargs) dict[str, float]
One-vs-all cross validation for small datasets
In the cross_validation_small_data-method, the model will be trained on all datapoints except one and then tested on this last one. This will be repeated for all datapoints so that we have our predictions for all datapoints.
Advantage: optimal use of information for training
Disadvantage: long train time
This concept is very useful for small datasets (recommended: datapoints < 150) because the long train time is still not too long and especially with a small amount of information for the model, it is important to use all the information one has for the training.
Parameters
- X, ypd.DataFrame, pd.Series
Data to cross validate on
- leave_loadbarbool
shall the loading bar of the training be visible after training (True - load bar will still be visible)
- console_outbool
shall the result of the different scores and a classification_report be printed into the console
- custom_scorecallable or None
custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)
If
None
, no custom score will be calculated and also the key “custom_score” does not exist in the returned dictionary.- **kwargs:
additional parameters from child-class for
_get_all_scores
method
Returns
- scoresdict
dictionary of the format from the self._get_all_scores function
The scores are also saved in
self.cv_scores
.
- Model.evaluate(x_test: DataFrame, y_test: Series, console_out: bool, custom_score: Callable, **kwargs) dict[str, float]
Function to create multiple scores with predict function of model
Parameters
- x_test, y_testpd.DataFrame, pd.Series
Data to evaluate model
- console_outbool
shall the result of the different scores and a classification_report be printed into the console
- custom_scorecallable or None
custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)
If
None
, no custom score will be calculated and also the key “custom_score” does not exist in the returned dictionary.- **kwargs:
additional parameters from child-class for
_get_all_scores
method
Returns
- scoresdict
dictionary of the format from the self._get_all_scores function
- Model.evaluate_score(scoring: str | Callable, x_test: DataFrame, y_test: Series, **kwargs) float
Function to create a score with self.__get_score of model
Parameters
- scoringstr or callable (custom score)
metrics to evaluate the models
custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)
- x_test, y_testpd.DataFrame, pd.Series
Data for evaluating the model
- **kwargs:
additional parameters from child-class for
_get_score
method
Returns
- scorefloat
metrics score value
- Model.feature_importance() show
Function to generate a matplotlib plot of the top45 feature importance from the model. You can only use the method if you trained your model before.
Returns
plt.show object
Examples
>>> # load data (replace with own data) >>> import pandas as pd >>> from sklearn.datasets import load_iris >>> df = load_iris() >>> X, y = pd.DataFrame(df.data, columns=df.feature_names), pd.Series(df.target) >>> >>> # train and plot features of model >>> from sam_ml.models.classifier import LR >>> >>> model = LR() >>> model.train(X, y) >>> model.feature_importance()
- Model.fit(x_train: DataFrame, y_train: Series, **kwargs)
Function to fit the model
Parameters
- x_train, y_trainpd.DataFrame, pd.Series
Data to train model
- **kwargs:
additional parameters from child-class for
fit
method
Returns
- selfestimator instance
Estimator instance
- Model.fit_warm_start(x_train: DataFrame, y_train: Series, **kwargs)
Function to warm_start fit the model
This function only differs for pipeline objects (with preprocessing) from the train method. For pipeline objects, it only traines the preprocessing steps the first time and then only uses them to preprocess.
Parameters
- x_train, y_trainpd.DataFrame, pd.Series
Data to train model
- **kwargs:
additional parameters from child-class for
fit
method
Returns
- selfestimator instance
Estimator instance
- Model.get_deepcopy()
Function to create a deepcopy of object
Returns
- selfestimator instance
deepcopy of estimator instance
- Model.get_params(deep: bool = True) dict
Function to get the parameter from the model object
Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained sub-objects that are estimators
Returns
- params: dict
parameter names mapped to their values
- Model.get_random_config() dict
Function to generate one grid configuration
Returns
- configdict
dictionary of random parameter configuration from grid
Examples
>>> from sam_ml.models.classifier import LR >>> >>> model = LR() >>> model.get_random_config() {'C': 0.31489116479568624, 'penalty': 'elasticnet', 'solver': 'saga', 'l1_ratio': 0.6026718993550663}
- Model.get_random_configs(n_trails: int) list[dict]
Function to generate grid configurations
Parameters
- n_trailsint
number of grid configurations
Returns
- configslist
list with sets of random parameter from grid
Notes
filter out duplicates -> could be less than n_trails
Examples
>>> from sam_ml.models.classifier import LR >>> >>> model = LR() >>> model.get_random_configs(3) [Configuration(values={ 'C': 1.0, 'penalty': 'l2', 'solver': 'lbfgs', }), Configuration(values={ 'C': 2.5378155082656657, 'penalty': 'l2', 'solver': 'saga', }), Configuration(values={ 'C': 2.801635158716261, 'penalty': 'l2', 'solver': 'lbfgs', })]
- static Model.load_model(path: str)
Function to load a pickled model class object
Parameters
- pathstr
path to save the model with suffix ‘.pkl’
Returns
- modelestimator instance
estimator instance
- Model.predict(x_test: DataFrame) list
Function to predict with predict-method from model object
Parameters
- x_testpd.DataFrame
Data for prediction
Returns
- predictionlist
list with predicted class numbers for data
- Model.predict_proba(x_test: DataFrame) ndarray
Function to predict with predict_proba-method from model object
Parameters
- x_testpd.DataFrame
Data for prediction
Returns
- predictionnp.ndarray
np.ndarray with probability for every class per datapoint
- Model.randomCVsearch(x_train: DataFrame, y_train: Series, n_trails: int, cv_num: int, scoring: str | Callable, small_data_eval: bool, leave_loadbar: bool, **kwargs) tuple[dict, float]
Hyperparametertuning with randomCVsearch
Parameters
- x_train, y_trainpd.DataFrame, pd.Series
Data to cross validate on
- n_trailsint
max number of parameter sets to test
- cv_numint
number of different random splits
- scoringstr or callable (custom score)
metrics to evaluate the models
custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)
- small_data_evalbool
if True: trains model on all datapoints except one and does this for all datapoints (recommended for datasets with less than 150 datapoints)
- leave_loadbarbool
shall the loading bar of the different parameter sets be visible after training (True - load bar will still be visible)
- **kwargs:
additional parameters from child-class for
cross validation
methods
Returns
- best_hyperparametersdict
best hyperparameter set
- best_scorefloat
the score of the best hyperparameter set
Notes
if you interrupt the keyboard during the run of randomCVsearch, the interim result will be returned
- Model.replace_grid(new_grid: ConfigurationSpace)
Function to replace self.grid
See ConfigurationSpace documentation.
Parameters
- new_gridConfigurationSpace
new grid to replace the old one with
Returns
changes self.grid variable
Examples
>>> from ConfigSpace import ConfigurationSpace, Categorical, Float >>> from sam_ml.models.classifier import LDA >>> >>> model = LDA() >>> new_grid = ConfigurationSpace( ... seed=42, ... space={ ... "solver": Categorical("solver", ["lsqr", "eigen"]), ... "shrinkage": Float("shrinkage", (0, 0.5)), ... }) >>> model.replace_grid(new_grid)
- Model.save_model(path: str, only_estimator: bool = False)
Function to pickle and save the class object
Parameters
- pathstr
path to save the model with suffix ‘.pkl’
- only_estimatorbool, default=False
If True, only the estimator of the class object will be saved
- Model.set_params(**params)
Function to set the parameter of the model object
Parameters
- **paramsdict
Estimator parameters
Returns
- selfestimator instance
Estimator instance
- Model.smac_search(x_train: DataFrame, y_train: Series, scoring: str | Callable, n_trails: int, cv_num: int, small_data_eval: bool, walltime_limit: int, log_level: int, **kwargs) Configuration
Hyperparametertuning with SMAC library HyperparameterOptimizationFacade [can only be used in the sam_ml version with swig]
The smac_search-method will more “intelligent” search your hyperparameter space than the randomCVsearch and returns the best hyperparameter set. Additionally to the n_trails parameter, it also takes a walltime_limit parameter that defines the maximum time in seconds that the search will take.
Parameters
- x_train, y_trainpd.DataFrame, pd.Series
Data to cross validate on
- scoringstr or callable (custom score)
metrics to evaluate the models
custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)
- n_trailsint
max number of parameter sets to test
- cv_numint
number of different random splits
- small_data_evalbool
if True: trains model on all datapoints except one and does this for all datapoints (recommended for datasets with less than 150 datapoints)
- walltime_limitint
the maximum time in seconds that SMAC is allowed to run
- log_levelint
10 - DEBUG, 20 - INFO, 30 - WARNING, 40 - ERROR, 50 - CRITICAL (SMAC3 library log levels)
- **kwargs:
additional parameters from child-class for
cross validation
methods
Returns
- incumbentConfigSpace.Configuration
ConfigSpace.Configuration with best hyperparameters (can be used like dict)
- Model.train(x_train: DataFrame, y_train: Series, console_out: bool = True, **kwargs) tuple[float, str]
Function to train the model
Parameters
- x_train, y_trainpd.DataFrame, pd.Series
Data to train model
- console_outbool, default=True
shall the score and time be printed out
- **kwargs:
additional parameters from child-class for
evaluate_score
method
Returns
- train_scorefloat
train score value
- train_timestr
train time in format: “0:00:00” (hours:minutes:seconds)
- Model.train_warm_start(x_train: DataFrame, y_train: Series, console_out: bool = True, **kwargs) tuple[float, str]
Function to warm_start train the model
This function only differs for pipeline objects (with preprocessing) from the train method. For pipeline objects, it only traines the preprocessing steps the first time and then only uses them to preprocess.
Parameters
- x_train, y_trainpd.DataFrame, pd.Series
Data to train model
- console_outbool, default=True
shall the score and time be printed out
- **kwargs:
additional parameters from child-class for
evaluate_score
method
Returns
- train_scorefloat
train score value
- train_timestr
train time in format: “0:00:00” (hours:minutes:seconds)