Regressor class

class Regressor(self, model_object, model_name: str, model_type: str, grid: ConfigSpace.configuration_space.ConfigurationSpace)

Regressor parent class - parent class Model

Parameters

model_objectclassifier object

model with ‘fit’, ‘predict’, ‘set_params’, and ‘get_params’ method (see sklearn API)

model_namestr

name of the model

model_typestr

kind of estimator (e.g. ‘RFR’ for RandomForestRegressor)

gridConfigurationSpace

hyperparameter grid for the model

Attributes

cv_scoresdict[str, float]

dictionary with cross validation results

feature_nameslist[str]

names of all the features that the model saw during training. Is empty if model was not fitted yet.

gridConfigurationSpace

hyperparameter tuning grid of the model

modelmodel object

model with ‘fit’, ‘predict’, ‘set_params’, and ‘get_params’ method (see sklearn API)

model_namestr

name of the model. Used in loading bars and dictionaries as identifier of the model

model_typestr

kind of estimator (e.g. ‘RFC’ for RandomForestClassifier)

rCVsearch_resultspd.DataFrame or None

results from randomCV hyperparameter tuning. Is None if randomCVsearch was not used yet.

train_scorefloat

train score value

train_timestr

train time in format: “0:00:00” (hours:minutes:seconds)

Methods

Method

Description

_changed_parameters

Function to get parameters that differ from the default ones

_get_all_scores

Calculate r2, rmse, d2_tweedie, and optional custom_score metrics

_get_score

Calculate a score for given y true and y prediction values

_make_cv_scores

Function to create from the crossvalidation results a dictionary

_make_scorer

Function to create a dictionary with scorer for the crossvalidation

_print_scores

Function to print out the values of a dictionary

cross_validation

Random split crossvalidation

cross_validation_small_data

One-vs-all cross validation for small datasets

evaluate

Function to create multiple scores with predict function of model

evaluate_score

Function to create a score with predict function of model

feature_importance

Function to generate a matplotlib plot of the top45 feature importance from the model.

fit

Function to fit the model

fit_warm_start

Function to warm_start fit the model

get_deepcopy

Function to create a deepcopy of object

get_params

Function to get the parameter from the model object

get_random_config

Function to generate one grid configuration

get_random_configs

Function to generate grid configurations

load_model

Function to load a pickled model class object

predict

Function to predict with predict-method from model object

predict_proba

Function to predict with predict_proba-method from model object

randomCVsearch

Hyperparametertuning with randomCVsearch

replace_grid

Function to replace self.grid

save_model

Function to pickle and save the class object

set_params

Function to set the parameter of the model object

smac_search

Hyperparametertuning with SMAC library HyperparameterOptimizationFacade [can only be used in the sam_ml version with swig]

train

Function to train the model

train_warm_start

Function to warm_start train the model

Regressor._changed_parameters()

Function to get parameters that differ from the default ones

Returns

dictionary of model parameter that are different from default values

Regressor._get_all_scores(y_test: Series, pred: list, custom_score: Callable[[list[float], list[float]], float] | None) dict[str, float]

Calculate r2, rmse, d2_tweedie, and optional custom_score metrics

Parameters

y_test, predpd.Series, pd.Series

Data to evaluate model

custom_scorecallable or None

custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)

If None, no custom score will be calculated and also the key “custom_score” does not exist in the returned dictionary.

Returns

scoresdict

dictionary of format:

{‘r2’: …, ‘rmse’: …, ‘d2_tweedie’: …,}

or if custom_score != None:

{‘r2’: …, ‘rmse’: …, ‘d2_tweedie’: …, ‘custom_score’: …,}

Notes

d2_tweedie is only defined for y_test >= 0 and y_pred > 0 values. Otherwise, d2_tweedie is set to -1.

Regressor._get_score(scoring: Literal['r2', 'rmse', 'd2_tweedie'] | Callable[[list[float], list[float]], float], y_test: Series, pred: list) float

Calculate a score for given y true and y prediction values

Parameters

scoring{“r2”, “rmse”, “d2_tweedie”} or callable (custom score)

metrics to evaluate the models

custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)

y_test, predpd.Series, pd.Series

Data to evaluate model

Returns

scorefloat

metrics score value

Regressor._make_cv_scores(score: dict, custom_score: Callable[[list[float], list[float]], float] | None) dict[str, float]

Function to create from the crossvalidation results a dictionary

Parameters

scoredict

crossvalidation average column results

custom_scorecallable or None

custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)

If None, no custom score will be calculated and also the key “custom_score” does not exist in the returned dictionary.

Returns

cv_scoresdict

restructured dictionary

Regressor._make_scorer(y_values: Series, custom_score: Callable[[list[float], list[float]], float] | None) dict[str, Callable]

Function to create a dictionary with scorer for the crossvalidation

Parameters

y_valuespd.Series

y data for testing if d2_tweedie is allowed

custom_scorecallable or None

custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)

If None, no custom score will be calculated and also the key “custom_score” does not exist in the returned dictionary.

Returns

scorerdict[str, Callable]

dictionary with scorer functions

Regressor._print_scores(scores: dict, y_test: Series, pred: list)

Function to print out the values of a dictionary

Parameters

scores: dict

dictionary with score names and values

y_test, predpd.Series, list

Data to evaluate model

Returns

key-value pairs in console, format:

key1: value1

key2: value2

Regressor.cross_validation(X: DataFrame, y: Series, cv_num: int = 10, console_out: bool = True, custom_score: Callable[[list[float], list[float]], float] | None = None) dict[str, float]

Random split crossvalidation

Parameters

X, ypd.DataFrame, pd.Series

Data to cross validate on

cv_numint, default=10

number of different random splits

console_outbool, default=True

shall the result dataframe of the different scores for the different runs be printed

custom_scorecallable or None, default=None

custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)

If None, no custom score will be calculated and also the key “custom_score” does not exist in the returned dictionary.

Returns

scoresdict

dictionary of format:

{‘r2’: …, ‘rmse’: …, ‘d2_tweedie’: …, ‘train_time’: …, ‘train_score’: …,}

or if custom_score != None:

{‘r2’: …, ‘rmse’: …, ‘d2_tweedie’: …, ‘train_time’: …, ‘train_score’: …, ‘custom_score’: …,}

The scores are also saved in self.cv_scores.

Examples

>>> # load data (replace with own data)
>>> import pandas as pd
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=3000, n_features=4, noise=1, random_state=42)
>>> X, y = pd.DataFrame(X, columns=["col1", "col2", "col3", "col4"]), pd.Series(abs(y))
>>>
>>> # cross validate model
>>> from sam_ml.models.regressor import RFR
>>> 
>>> model = RFR()
>>> scores = model.cross_validation(X, y, cv_num=3)

                            0          1          2         average
fit_time                    0.772634   0.903580   0.769893  0.815369
score_time                  0.097742   0.126724   0.108220  0.110895
test_r2 score               0.930978   0.935554   0.950584  0.939039
train_r2 score              0.992086   0.992418   0.991672  0.992059
test_rmse                   13.122513  12.076931  10.936810 12.045418
train_rmse                  4.306834   4.318027   4.457605  4.360822
test_d2 tweedie score       0.916618   0.909032   0.919350  0.915000
train_d2 tweedie score      0.982802   0.983685   0.983286  0.983257
Regressor.cross_validation_small_data(X: DataFrame, y: Series, leave_loadbar: bool = True, console_out: bool = True, custom_score: Callable[[list[float], list[float]], float] | None = None) dict[str, float]

One-vs-all cross validation for small datasets

In the cross_validation_small_data-method, the model will be trained on all datapoints except one and then tested on this last one. This will be repeated for all datapoints so that we have our predictions for all datapoints.

Advantage: optimal use of information for training

Disadvantage: long train time

This concept is very useful for small datasets (recommended: datapoints < 150) because the long train time is still not too long and especially with a small amount of information for the model, it is important to use all the information one has for the training.

Parameters

X, ypd.DataFrame, pd.Series

Data to cross validate on

leave_loadbarbool, default=True

shall the loading bar of the training be visible after training (True - load bar will still be visible)

console_outbool, default=True

shall the result of the different scores and a classification_report be printed into the console

custom_scorecallable or None, default=None

custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)

If None, no custom score will be calculated and also the key “custom_score” does not exist in the returned dictionary.

Returns

scoresdict

dictionary of format:

{‘r2’: …, ‘rmse’: …, ‘d2_tweedie’: …, ‘train_time’: …, ‘train_score’: …,}

or if custom_score != None:

{‘r2’: …, ‘rmse’: …, ‘d2_tweedie’: …, ‘train_time’: …, ‘train_score’: …, ‘custom_score’: …,}

The scores are also saved in self.cv_scores.

Examples

>>> # load data (replace with own data)
>>> import pandas as pd
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=150, n_features=4, noise=1, random_state=42)
>>> X, y = pd.DataFrame(X, columns=["col1", "col2", "col3", "col4"]), pd.Series(abs(y))
>>>
>>> # cross validate model
>>> from sam_ml.models.regressor import RFR
>>> 
>>> model = RFR()
>>> scores = model.cross_validation_small_data(X, y)
r2: 0.5914164661854215
rmse: 50.2870203230133
d2_tweedie: 0.58636121702529
train_time: 0:00:00
train_score: 0.9425178468662095
Regressor.evaluate(x_test: DataFrame, y_test: Series, console_out: bool = True, custom_score: Callable[[list[float], list[float]], float] | None = None) dict[str, float]

Function to create multiple scores with predict function of model

Parameters

x_test, y_testpd.DataFrame, pd.Series

Data to evaluate model

console_outbool, default=True

shall the result of the different scores and a classification_report be printed into the console

custom_scorecallable or None, default=None

custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)

If None, no custom score will be calculated and also the key “custom_score” does not exist in the returned dictionary.

Returns

scoresdict

dictionary of format:

{‘r2’: …, ‘rmse’: …, ‘d2_tweedie’: …,}

or if custom_score != None:

{‘r2’: …, ‘rmse’: …, ‘d2_tweedie’: …, ‘custom_score’: …,}

Examples

>>> # load data (replace with own data)
>>> import pandas as pd
>>> from sklearn.datasets import make_regression
>>> from sklearn.model_selection import train_test_split
>>> X, y = make_regression(n_samples=3000, n_features=4, noise=1, random_state=42)
>>> X, y = pd.DataFrame(X, columns=["col1", "col2", "col3", "col4"]), pd.Series(abs(y))
>>> x_train, x_test, y_train, y_test = train_test_split(X,y, train_size=0.80, random_state=42)
>>>
>>> # train and evaluate model
>>> from sam_ml.models.regressor import RFR
>>> 
>>> model = RFR()
>>> model.train(X, y)
>>> scores = model.evaluate(x_test, y_test)
Train score: 0.9938023719617127 - Train time: 0:00:01
r2: 0.9471767309072388
rmse: 11.46914444113609
d2_tweedie: 0.9214227488752569
Regressor.evaluate_score(x_test: DataFrame, y_test: Series, scoring: Literal['r2', 'rmse', 'd2_tweedie'] | Callable[[list[float], list[float]], float] = 'r2') float

Function to create a score with predict function of model

Parameters

x_test, y_testpd.DataFrame, pd.Series

Data to evaluate model

scoring{“r2”, “rmse”, “d2_tweedie”} or callable (custom score), default=”r2”

metrics to evaluate the models

custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)

Returns

scorefloat

metrics score value

Examples

>>> # load data (replace with own data)
>>> import pandas as pd
>>> from sklearn.datasets import make_regression
>>> from sklearn.model_selection import train_test_split
>>> X, y = make_regression(n_samples=3000, n_features=4, noise=1, random_state=42)
>>> X, y = pd.DataFrame(X, columns=["col1", "col2", "col3", "col4"]), pd.Series(abs(y))
>>> x_train, x_test, y_train, y_test = train_test_split(X,y, train_size=0.80, random_state=42)
>>>
>>> # train and evaluate model
>>> from sam_ml.models.regressor import RFR
>>> 
>>> model = RFR()
>>> model.fit(X, y)
>>> rmse = model.evaluate_score(x_test, y_test, scoring="rmse")
>>> print(f"rmse: {rmse}")
rmse: 11.46914444113609
Regressor.feature_importance() show

Function to generate a matplotlib plot of the top45 feature importance from the model. You can only use the method if you trained your model before.

Returns

plt.show object

Examples

>>> # load data (replace with own data)
>>> import pandas as pd
>>> from sklearn.datasets import load_iris
>>> df = load_iris()
>>> X, y = pd.DataFrame(df.data, columns=df.feature_names), pd.Series(df.target)
>>> 
>>> # train and plot features of model
>>> from sam_ml.models.classifier import LR
>>>
>>> model = LR()
>>> model.train(X, y)
>>> model.feature_importance()
Regressor.fit(x_train: DataFrame, y_train: Series, **kwargs)

Function to fit the model

Parameters

x_train, y_trainpd.DataFrame, pd.Series

Data to train model

**kwargs:

additional parameters from child-class for fit method

Returns

selfestimator instance

Estimator instance

Regressor.fit_warm_start(x_train: DataFrame, y_train: Series, **kwargs)

Function to warm_start fit the model

This function only differs for pipeline objects (with preprocessing) from the train method. For pipeline objects, it only traines the preprocessing steps the first time and then only uses them to preprocess.

Parameters

x_train, y_trainpd.DataFrame, pd.Series

Data to train model

**kwargs:

additional parameters from child-class for fit method

Returns

selfestimator instance

Estimator instance

Regressor.get_deepcopy()

Function to create a deepcopy of object

Returns

selfestimator instance

deepcopy of estimator instance

Regressor.get_params(deep: bool = True) dict

Function to get the parameter from the model object

Parameters

deepbool, default=True

If True, will return the parameters for this estimator and contained sub-objects that are estimators

Returns

params: dict

parameter names mapped to their values

Regressor.get_random_config() dict

Function to generate one grid configuration

Returns

configdict

dictionary of random parameter configuration from grid

Examples

>>> from sam_ml.models.classifier import LR
>>> 
>>> model = LR()
>>> model.get_random_config()
{'C': 0.31489116479568624,
'penalty': 'elasticnet',
'solver': 'saga',
'l1_ratio': 0.6026718993550663}
Regressor.get_random_configs(n_trails: int) list[dict]

Function to generate grid configurations

Parameters

n_trailsint

number of grid configurations

Returns

configslist

list with sets of random parameter from grid

Notes

filter out duplicates -> could be less than n_trails

Examples

>>> from sam_ml.models.classifier import LR
>>> 
>>> model = LR()
>>> model.get_random_configs(3)
[Configuration(values={
    'C': 1.0,
    'penalty': 'l2',
    'solver': 'lbfgs',
}),
Configuration(values={
    'C': 2.5378155082656657,
    'penalty': 'l2',
    'solver': 'saga',
}),
Configuration(values={
    'C': 2.801635158716261,
    'penalty': 'l2',
    'solver': 'lbfgs',
})]
static Regressor.load_model(path: str)

Function to load a pickled model class object

Parameters

pathstr

path to save the model with suffix ‘.pkl’

Returns

modelestimator instance

estimator instance

Regressor.predict(x_test: DataFrame) list

Function to predict with predict-method from model object

Parameters

x_testpd.DataFrame

Data for prediction

Returns

predictionlist

list with predicted class numbers for data

Regressor.predict_proba(x_test: DataFrame) ndarray

Function to predict with predict_proba-method from model object

Parameters

x_testpd.DataFrame

Data for prediction

Returns

predictionnp.ndarray

np.ndarray with probability for every class per datapoint

Regressor.randomCVsearch(x_train: DataFrame, y_train: Series, n_trails: int = 10, cv_num: int = 5, scoring: Literal['r2', 'rmse', 'd2_tweedie'] | Callable[[list[float], list[float]], float] = 'r2', small_data_eval: bool = False, leave_loadbar: bool = True) tuple[dict, float]

Hyperparametertuning with randomCVsearch

Parameters

x_train, y_trainpd.DataFrame, pd.Series

Data to cross validate on

n_trailsint, default=10

max number of parameter sets to test

cv_numint, default=5

number of different random splits

scoring{“r2”, “rmse”, “d2_tweedie”} or callable (custom score), default=”r2”

metrics to evaluate the models

custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)

small_data_evalbool, default=False

if True: trains model on all datapoints except one and does this for all datapoints (recommended for datasets with less than 150 datapoints)

leave_loadbarbool, default=True

shall the loading bar of the different parameter sets be visible after training (True - load bar will still be visible)

Returns

best_hyperparametersdict

best hyperparameter set

best_scorefloat

the score of the best hyperparameter set

Notes

if you interrupt the keyboard during the run of randomCVsearch, the interim result will be returned

Examples

>>> # load data (replace with own data)
>>> import pandas as pd
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=3000, n_features=4, noise=1, random_state=42)
>>> X, y = pd.DataFrame(X, columns=["col1", "col2", "col3", "col4"]), pd.Series(abs(y))
>>>
>>> # use randomCVsearch
>>> from sam_ml.models.regressor import RFR
>>> 
>>> model = RFR()
>>> best_hyperparam, best_score = model.randomCVsearch(X, y, n_trails=20, cv_num=5, scoring="r2")
>>> print(f"best hyperparameters: {best_hyperparam}, best score: {best_score}")
best hyperparameters: {'bootstrap': True, 'criterion': 'friedman_mse', 'max_depth': 9, 'min_samples_leaf': 4, 'min_samples_split': 7, 'min_weight_fraction_leaf': 0.015714592843367126, 'n_estimators': 117}, best score: 0.6880857784416011
Regressor.replace_grid(new_grid: ConfigurationSpace)

Function to replace self.grid

See ConfigurationSpace documentation.

Parameters

new_gridConfigurationSpace

new grid to replace the old one with

Returns

changes self.grid variable

Examples

>>> from ConfigSpace import ConfigurationSpace, Categorical, Float
>>> from sam_ml.models.classifier import LDA
>>>
>>> model = LDA()
>>> new_grid = ConfigurationSpace(
...     seed=42,
...     space={
...         "solver": Categorical("solver", ["lsqr", "eigen"]),
...         "shrinkage": Float("shrinkage", (0, 0.5)),
...     })
>>> model.replace_grid(new_grid)
Regressor.save_model(path: str, only_estimator: bool = False)

Function to pickle and save the class object

Parameters

pathstr

path to save the model with suffix ‘.pkl’

only_estimatorbool, default=False

If True, only the estimator of the class object will be saved

Regressor.set_params(**params)

Function to set the parameter of the model object

Parameters

**paramsdict

Estimator parameters

Returns

selfestimator instance

Estimator instance

Hyperparametertuning with SMAC library HyperparameterOptimizationFacade [can only be used in the sam_ml version with swig]

The smac_search-method will more “intelligent” search your hyperparameter space than the randomCVsearch and returns the best hyperparameter set. Additionally to the n_trails parameter, it also takes a walltime_limit parameter that defines the maximum time in seconds that the search will take.

Parameters

x_train, y_trainpd.DataFrame, pd.Series

Data to cross validate on

n_trailsint, default=50

max number of parameter sets to test

cv_numint, default=5

number of different random splits

scoring{“r2”, “rmse”, “d2_tweedie”} or callable (custom score), default=”r2”

metrics to evaluate the models

custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)

small_data_evalbool, default=False

if True: trains model on all datapoints except one and does this for all datapoints (recommended for datasets with less than 150 datapoints)

walltime_limitint, default=600

the maximum time in seconds that SMAC is allowed to run

log_levelint, default=20

10 - DEBUG, 20 - INFO, 30 - WARNING, 40 - ERROR, 50 - CRITICAL (SMAC3 library log levels)

Returns

incumbentConfigSpace.Configuration

ConfigSpace.Configuration with best hyperparameters (can be used like dict)

Examples

>>> # load data (replace with own data)
>>> import pandas as pd
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=3000, n_features=4, noise=1, random_state=42)
>>> X, y = pd.DataFrame(X, columns=["col1", "col2", "col3", "col4"]), pd.Series(abs(y))
>>>
>>> # use smac_search
>>> from sam_ml.models.regressor import RFR
>>> 
>>> model = RFR()
>>> best_hyperparam = model.smac_search(X, y, n_trails=20, cv_num=5, scoring="rmse")
>>> print(f"best hyperparameters: {best_hyperparam}")
[INFO][abstract_initial_design.py:147] Using 5 initial design configurations and 0 additional configurations.
[INFO][abstract_intensifier.py:305] Using only one seed for deterministic scenario.
[INFO][abstract_intensifier.py:515] Added config 7373ff as new incumbent because there are no incumbents yet.
[INFO][abstract_intensifier.py:590] Added config 06e4dc and rejected config 7373ff as incumbent because it is not better than the incumbents on 1 instances:
[INFO][abstract_intensifier.py:590] Added config 162148 and rejected config 06e4dc as incumbent because it is not better than the incumbents on 1 instances:
[INFO][abstract_intensifier.py:590] Added config 97eecc and rejected config 162148 as incumbent because it is not better than the incumbents on 1 instances:
[INFO][smbo.py:327] Configuration budget is exhausted:
[INFO][smbo.py:328] --- Remaining wallclock time: 582.9456326961517
[INFO][smbo.py:329] --- Remaining cpu time: inf
[INFO][smbo.py:330] --- Remaining trials: 0
best hyperparameters: Configuration(values={
'bootstrap': False,
'criterion': 'friedman_mse',
'max_depth': 10,
'min_samples_leaf': 3,
'min_samples_split': 9,
'min_weight_fraction_leaf': 0.22684614269623157,
'n_estimators': 28,
})
Regressor.train(x_train: DataFrame, y_train: Series, scoring: Literal['r2', 'rmse', 'd2_tweedie'] | Callable[[list[float], list[float]], float] = 'r2', console_out: bool = True) tuple[float, str]

Function to train the model

Every regressor has a train- and fit-method. They both use the fit-method of the wrapped model, but the train-method returns the train time and the train score of the model.

Parameters

x_train, y_trainpd.DataFrame, pd.Series

Data to train model

scoring{“r2”, “rmse”, “d2_tweedie”} or callable (custom score), default=”r2”

metrics to evaluate the models

custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)

console_outbool, default=True

shall the score and time be printed out

Returns

train_scorefloat

train score value

train_timestr

train time in format: “0:00:00” (hours:minutes:seconds)

Examples

>>> # load data (replace with own data)
>>> import pandas as pd
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=3000, n_features=4, noise=1, random_state=42)
>>> X, y = pd.DataFrame(X, columns=["col1", "col2", "col3", "col4"]), pd.Series(abs(y))
>>>
>>> # train model
>>> from sam_ml.models.regressor import RFR
>>> 
>>> model = RFR()
>>> model.train(X, y)
Train score: 0.9938023719617127 - Train time: 0:00:01
Regressor.train_warm_start(x_train: DataFrame, y_train: Series, scoring: Literal['r2', 'rmse', 'd2_tweedie'] | Callable[[list[float], list[float]], float] = 'r2', console_out: bool = True) tuple[float, str]

Function to warm_start train the model

This function only differs for pipeline objects (with preprocessing) from the train method. For pipeline objects, it only traines the preprocessing steps the first time and then only uses them to preprocess.

Parameters

x_train, y_trainpd.DataFrame, pd.Series

Data to train model

scoring{“r2”, “rmse”, “d2_tweedie”} or callable (custom score), default=”r2”

metrics to evaluate the models

custom score function (or loss function) with signature score_func(y, y_pred, **kwargs)

console_outbool, default=True

shall the score and time be printed out

Returns

train_scorefloat

train score value

train_timestr

train time in format: “0:00:00” (hours:minutes:seconds)

Examples

>>> # load data (replace with own data)
>>> import pandas as pd
>>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_samples=3000, n_features=4, noise=1, random_state=42)
>>> X, y = pd.DataFrame(X, columns=["col1", "col2", "col3", "col4"]), pd.Series(abs(y))
>>>
>>> # train model
>>> from sam_ml.models.regressor import RFR
>>> 
>>> model = RFR()
>>> model.train_warm_start(X, y)
Train score: 0.9938023719617127 - Train time: 0:00:01