dialogy.plugins.text.classification package¶
Submodules¶
dialogy.plugins.text.classification.mlp module¶
This module provides a trainable MLP classifier.
- class MLPMultiClass(model_dir, dest=None, guards=None, debug=False, threshold=0.1, score_round_off=5, purpose='train', fallback_label='_error_', data_column='data', label_column='labels', args_map=None, skip_labels=None, kwargs=None)[source]¶
Bases:
dialogy.base.plugin.Plugin
This plugin provides a classifier based on sklearn’s MLPClassifier.
- static get_formatted_gridparams(params)[source]¶
Gets the valid parameters for the gridsearch.
- Args:
values: The values to be validated.
- Returns:
The valid parameters.
- Return type
List
[Any
]
- get_gridsearch_grid(pipeline, **kwargs)[source]¶
Gets gridsearch hyperparameters for the model in proper grid params format.
- Raises:
ValueError: If a gridsearch parameter doesn’t exist in sklearns TFIDF and MLPClassifier modules.
- Return type
List
[Dict
[str
,List
[Any
]]]
- inference(texts)[source]¶
Predict the intent of a list of texts.
- Parameters
texts (List[str]) – A list of strings, derived from ASR transcripts.
- Raises
AttributeError – In case the model isn’t of sklearn pipeline or gridsearchcv.
- Returns
A list of intents corresponding to texts.
- Return type
List[Intent]
- init_model(param_search=<class 'sklearn.model_selection._search.GridSearchCV'>)[source]¶
Initialize the model if artifacts are available.
- Return type
Dict
[str
,Any
]
- save()[source]¶
Save the plugin artifacts.
- Raises
ValueError – In case the mlp model is not trained.
- Return type
None
- train(training_data, param_search=<class 'sklearn.model_selection._search.GridSearchCV'>)[source]¶
Train an intent-classifier on the provided training data.
The training is skipped if the data-format is not valid. :type training_data:
DataFrame
:param training_data: A pandas dataframe containing at least list of strings and corresponding labels. :type training_data: pd.DataFrame- Return type
None
- property valid_mlpmodel: bool¶
- Return type
bool
dialogy.plugins.text.classification.retain_intent module¶
We may apply transforms over predicted intents. This makes it hard to track the impact of classifiers. Here, we will track the original intent, the one produced by a classifier.
- class RetainOriginalIntentPlugin(replace_output=False, dest='output.original_intent', guards=None, debug=False)[source]¶
Bases:
dialogy.base.plugin.Plugin
dialogy.plugins.text.classification.tokenizers module¶
dialogy.plugins.text.classification.xlmr module¶
This module provides a trainable XLMR classifier. [read-more](https://arxiv.org/abs/1911.02116)
- class XLMRMultiClass(model_dir, dest=None, guards=None, debug=False, threshold=0.1, use_cuda=False, score_round_off=5, purpose='train', fallback_label='_error_', use_state=False, data_column='data', label_column='labels', state_column='state', args_map=None, skip_labels=None, kwargs=None)[source]¶
Bases:
dialogy.base.plugin.Plugin
This plugin provides a classifier based on XLM-Roberta <https://arxiv.org/abs/1911.02116>.
The use_state flag in the XLMRMultiClass plugin is used to enable the use of state variable as the part of the text input.
- inference(texts, state=None)[source]¶
Predict the intent of a list of texts. If the model has been trained using the state features, it expects the text to also be appended with the state token else the predictions would be spurious.
- Parameters
texts (List[str]) – A list of strings, derived from ASR transcripts.
state (List[str]) – state, mapped to the ASR transcripts.
- Raises
AttributeError – In case the labelencoder is not available.
- Returns
A list of intents corresponding to texts.
- Return type
List[Intent]
- init_model(label_count=None)[source]¶
Initialize the model if artifacts are available.
- Parameters
label_count (Optional[int], optional) – number of labels to train on or predict, defaults to None
- Raises
ValueError – In case n is not provided or can’t be calculated.
- Return type
None
- save()[source]¶
Save the plugin artifacts.
- Raises
ValueError – In case the labelencoder is not trained.
- Return type
None
- train(training_data)[source]¶
Train an intent-classifier on the provided training data.
The training is skipped if the data-format is not valid. While training with the use_state flag as true, make sure that the state column is the part of the training_data dataframe :type training_data:
DataFrame
:param training_data: A pandas dataframe containing at least list of strings and corresponding labels. :type training_data: pd.DataFrame- Return type
None
- property valid_labelencoder: bool¶
- Return type
bool
- validate(training_data)[source]¶
Validate the training data is in the appropriate format
- Parameters
training_data (pd.DataFrame) – A pandas dataframe containing at least list of strings and corresponding labels. Should also contain a state key if use_state = True while initializing object.
- Returns
True if the dataframe is valid, False otherwise.
- Return type
bool