Embeddings_builder
- class Embeddings_builder(self, algorithm: Literal['bert', 'count', 'tfidf'] = 'tfidf', **kwargs)
Vectorizer Wrapper class - parent class Data
Parameters |
|
Attributes |
|
Example
>>> from sam_ml.data.preprocessing import Embeddings_builder
>>>
>>> model = Embeddings_builder()
>>> print(model)
Embeddings_builder()
Methods
Method |
Description |
---|---|
Function to create in parallel embeddings of given strings with bert model |
|
Function to get the parameter from the transformer instance |
|
Function to get the possible parameter values for the class |
|
Function to set the parameter of the transformer instance |
|
Function to vectorize text data column |
- Embeddings_builder.create_parallel_bert_embeddings(content: list[str]) list
Function to create in parallel embeddings of given strings with bert model
Parameters
- contentlist[str]
list of strings that shall be embedded
Returns
- content_embeddingslist
list of embedding vectors from content strings
- Embeddings_builder.get_params(deep: bool = True) dict
Function to get the parameter from the transformer instance
Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained sub-objects that are estimators
Returns
- params: dict
parameter names mapped to their values
- static Embeddings_builder.params() dict
Function to get the possible parameter values for the class
Returns
- paramdict
possible values for the parameter “algorithm”
Examples
>>> # get possible parameters >>> from sam_ml.data.preprocessing import Embeddings_builder >>> >>> # first way without class object >>> params1 = Embeddings_builder.params() >>> print(params1) {"algorithm": ["tfidf", ...]} >>> # second way with class object >>> model = Embeddings_builder() >>> params2 = model.params() >>> print(params2) {"algorithm": ["tfidf", ...]}
- Embeddings_builder.set_params(**params)
Function to set the parameter of the transformer instance
Parameters
- **paramsdict
Estimator parameters
Returns
- selfestimator instance
Estimator instance
- Embeddings_builder.vectorize(data: Series, train_on: bool = True) DataFrame
Function to vectorize text data column
Parameters
- datapd.Series
column with text to vectorize
- train_onbool, default=True
If
True
, the estimator instance will be trained to build embeddings and then vectorize. Otherwise, it uses the trained instance for vectorizing.
Returns
- emb_dfpd.DataFrame
pandas Dataframe with vectorized data
Examples
>>> import pandas as pd >>> x_train = pd.Series(["Hallo world!", "Goodbye Island", "Greetings Berlin"], name="text") >>> x_test = pd.Series(["Goodbye world!", "Greetings Island"], name="text") >>> >>> # vectorize data >>> from sam_ml.data.preprocessing import Embeddings_builder >>> >>> model = Embeddings_builder() >>> x_train = model.vectorize(x_train) # train vectorizer >>> x_test = model.vectorize(x_test, train_on=False) # vectorize test data >>> print("x_train:") >>> print(x_train) >>> print() >>> print("x_test:") >>> print(x_test) x_train: 0_text 1_text 2_text 3_text 4_text 5_text 0 0.000000 0.000000 0.000000 0.707107 0.000000 0.707107 1 0.000000 0.707107 0.000000 0.000000 0.707107 0.000000 2 0.707107 0.000000 0.707107 0.000000 0.000000 0.000000 x_test: 0_text 1_text 2_text 3_text 4_text 5_text 0 0.0 0.707107 0.000000 0.0 0.000000 0.707107 1 0.0 0.000000 0.707107 0.0 0.707107 0.000000