dialogy.plugins.text.classification package¶

Submodules¶

dialogy.plugins.text.classification.mlp module¶

This module provides a trainable MLP classifier.

class MLPMultiClass(model_dir, dest=None, guards=None, debug=False, threshold=0.1, score_round_off=5, purpose='train', fallback_label='_error_', data_column='data', label_column='labels', args_map=None, skip_labels=None, kwargs=None)[source]¶

Bases: dialogy.base.plugin.Plugin

This plugin provides a classifier based on sklearn’s MLPClassifier.

static get_formatted_gridparams(params)[source]¶

Gets the valid parameters for the gridsearch.

Args:: values: The values to be validated.
Returns:: The valid parameters.

Return type: List[Any]

get_gridsearch_grid(pipeline, **kwargs)[source]¶

Gets gridsearch hyperparameters for the model in proper grid params format.

Raises:: ValueError: If a gridsearch parameter doesn’t exist in sklearns TFIDF and MLPClassifier modules.

Return type: List[Dict[str, List[Any]]]

inference(texts)[source]¶

Predict the intent of a list of texts.

Parameters: texts (List[str]) – A list of strings, derived from ASR transcripts.
Raises: AttributeError – In case the model isn’t of sklearn pipeline or gridsearchcv.
Returns: A list of intents corresponding to texts.
Return type: List[Intent]

init_model(param_search=<class 'sklearn.model_selection._search.GridSearchCV'>)[source]¶

Initialize the model if artifacts are available.

Return type: Dict[str, Any]

load()[source]¶

Load the plugin artifacts.

Return type: None

save()[source]¶

Save the plugin artifacts.

Raises: ValueError – In case the mlp model is not trained.
Return type: None

train(training_data, param_search=<class 'sklearn.model_selection._search.GridSearchCV'>)[source]¶

Train an intent-classifier on the provided training data.

The training is skipped if the data-format is not valid. :type training_data: DataFrame :param training_data: A pandas dataframe containing at least list of strings and corresponding labels. :type training_data: pd.DataFrame

Return type: None

utility(input, _)[source]¶

An abstract method that describes the plugin’s functionality.

Parameters

input (Input) – The workflow’s input.
output (Output) – The workflow’s output.

Returns

The value returned by the plugin.

Return type

Any

property valid_mlpmodel: bool¶

Return type: bool

validate(training_data)[source]¶

Validate the training data is in the appropriate format

Parameters: training_data (pd.DataFrame) – A pandas dataframe containing at least list of strings and corresponding labels.
Returns: True if the dataframe is valid, False otherwise.
Return type: bool

dialogy.plugins.text.classification.retain_intent module¶

We may apply transforms over predicted intents. This makes it hard to track the impact of classifiers. Here, we will track the original intent, the one produced by a classifier.

class RetainOriginalIntentPlugin(replace_output=False, dest='output.original_intent', guards=None, debug=False)[source]¶

Bases: dialogy.base.plugin.Plugin

retain(intents)[source]¶

Return type: Dict[str, Union[str, float]]

utility(_, output)[source]¶

An abstract method that describes the plugin’s functionality.

Parameters

input (Input) – The workflow’s input.
output (Output) – The workflow’s output.

Returns

The value returned by the plugin.

Return type

Any

dialogy.plugins.text.classification.tokenizers module¶

identity_tokenizer(text)[source]¶

Return type: str

dialogy.plugins.text.classification.xlmr module¶

This module provides a trainable XLMR classifier. [read-more](https://arxiv.org/abs/1911.02116)

class XLMRMultiClass(model_dir, dest=None, guards=None, debug=False, threshold=0.1, use_cuda=False, score_round_off=5, purpose='train', fallback_label='_error_', use_state=False, data_column='data', label_column='labels', state_column='state', args_map=None, skip_labels=None, kwargs=None)[source]¶

Bases: dialogy.base.plugin.Plugin

This plugin provides a classifier based on XLM-Roberta <https://arxiv.org/abs/1911.02116>.

The use_state flag in the XLMRMultiClass plugin is used to enable the use of state variable as the part of the text input.

inference(texts, state=None)[source]¶

Predict the intent of a list of texts. If the model has been trained using the state features, it expects the text to also be appended with the state token else the predictions would be spurious.

Parameters

texts (List[str]) – A list of strings, derived from ASR transcripts.
state (List[str]) – state, mapped to the ASR transcripts.

Raises

AttributeError – In case the labelencoder is not available.

Returns

A list of intents corresponding to texts.

Return type

List[Intent]

init_model(label_count=None)[source]¶

Initialize the model if artifacts are available.

Parameters: label_count (Optional[int], optional) – number of labels to train on or predict, defaults to None
Raises: ValueError – In case n is not provided or can’t be calculated.
Return type: None

load()[source]¶

Load the plugin artifacts.

Return type: None

save()[source]¶

Save the plugin artifacts.

Raises: ValueError – In case the labelencoder is not trained.
Return type: None

train(training_data)[source]¶

Train an intent-classifier on the provided training data.

The training is skipped if the data-format is not valid. While training with the use_state flag as true, make sure that the state column is the part of the training_data dataframe :type training_data: DataFrame :param training_data: A pandas dataframe containing at least list of strings and corresponding labels. :type training_data: pd.DataFrame

Return type: None

utility(input, _)[source]¶

An abstract method that describes the plugin’s functionality.

Parameters

input (Input) – The workflow’s input.
output (Output) – The workflow’s output.

Returns

The value returned by the plugin.

Return type

Any

property valid_labelencoder: bool¶

Return type: bool

validate(training_data)[source]¶

Validate the training data is in the appropriate format

Parameters: training_data (pd.DataFrame) – A pandas dataframe containing at least list of strings and corresponding labels. Should also contain a state key if use_state = True while initializing object.
Returns: True if the dataframe is valid, False otherwise.
Return type: bool

dialogy.plugins.text.classification package¶

Submodules¶

dialogy.plugins.text.classification.mlp module¶

dialogy.plugins.text.classification.retain_intent module¶

dialogy.plugins.text.classification.tokenizers module¶

dialogy.plugins.text.classification.xlmr module¶

Module contents¶