dialogy.plugins.text.classification package

Submodules

dialogy.plugins.text.classification.mlp module

This module provides a trainable MLP classifier.

class MLPMultiClass(model_dir, dest=None, guards=None, debug=False, threshold=0.1, score_round_off=5, purpose='train', fallback_label='_error_', data_column='data', label_column='labels', args_map=None, skip_labels=None, kwargs=None)[source]

Bases: dialogy.base.plugin.Plugin

This plugin provides a classifier based on sklearn’s MLPClassifier.

static get_formatted_gridparams(params)[source]

Gets the valid parameters for the gridsearch.

Args:

values: The values to be validated.

Returns:

The valid parameters.

Return type

List[Any]

get_gridsearch_grid(pipeline, **kwargs)[source]

Gets gridsearch hyperparameters for the model in proper grid params format.

Raises:

ValueError: If a gridsearch parameter doesn’t exist in sklearns TFIDF and MLPClassifier modules.

Return type

List[Dict[str, List[Any]]]

inference(texts)[source]

Predict the intent of a list of texts.

Parameters

texts (List[str]) – A list of strings, derived from ASR transcripts.

Raises

AttributeError – In case the model isn’t of sklearn pipeline or gridsearchcv.

Returns

A list of intents corresponding to texts.

Return type

List[Intent]

init_model(param_search=<class 'sklearn.model_selection._search.GridSearchCV'>)[source]

Initialize the model if artifacts are available.

Return type

Dict[str, Any]

load()[source]

Load the plugin artifacts.

Return type

None

save()[source]

Save the plugin artifacts.

Raises

ValueError – In case the mlp model is not trained.

Return type

None

train(training_data, param_search=<class 'sklearn.model_selection._search.GridSearchCV'>)[source]

Train an intent-classifier on the provided training data.

The training is skipped if the data-format is not valid. :type training_data: DataFrame :param training_data: A pandas dataframe containing at least list of strings and corresponding labels. :type training_data: pd.DataFrame

Return type

None

utility(input, _)[source]

An abstract method that describes the plugin’s functionality.

Parameters
  • input (Input) – The workflow’s input.

  • output (Output) – The workflow’s output.

Returns

The value returned by the plugin.

Return type

Any

property valid_mlpmodel: bool
Return type

bool

validate(training_data)[source]

Validate the training data is in the appropriate format

Parameters

training_data (pd.DataFrame) – A pandas dataframe containing at least list of strings and corresponding labels.

Returns

True if the dataframe is valid, False otherwise.

Return type

bool

dialogy.plugins.text.classification.retain_intent module

We may apply transforms over predicted intents. This makes it hard to track the impact of classifiers. Here, we will track the original intent, the one produced by a classifier.

class RetainOriginalIntentPlugin(replace_output=False, dest='output.original_intent', guards=None, debug=False)[source]

Bases: dialogy.base.plugin.Plugin

retain(intents)[source]
Return type

Dict[str, Union[str, float]]

utility(_, output)[source]

An abstract method that describes the plugin’s functionality.

Parameters
  • input (Input) – The workflow’s input.

  • output (Output) – The workflow’s output.

Returns

The value returned by the plugin.

Return type

Any

dialogy.plugins.text.classification.tokenizers module

identity_tokenizer(text)[source]
Return type

str

dialogy.plugins.text.classification.xlmr module

This module provides a trainable XLMR classifier. [read-more](https://arxiv.org/abs/1911.02116)

class XLMRMultiClass(model_dir, dest=None, guards=None, debug=False, threshold=0.1, use_cuda=False, score_round_off=5, purpose='train', fallback_label='_error_', use_state=False, data_column='data', label_column='labels', state_column='state', args_map=None, skip_labels=None, kwargs=None)[source]

Bases: dialogy.base.plugin.Plugin

This plugin provides a classifier based on XLM-Roberta <https://arxiv.org/abs/1911.02116>.

The use_state flag in the XLMRMultiClass plugin is used to enable the use of state variable as the part of the text input.

inference(texts, state=None)[source]

Predict the intent of a list of texts. If the model has been trained using the state features, it expects the text to also be appended with the state token else the predictions would be spurious.

Parameters
  • texts (List[str]) – A list of strings, derived from ASR transcripts.

  • state (List[str]) – state, mapped to the ASR transcripts.

Raises

AttributeError – In case the labelencoder is not available.

Returns

A list of intents corresponding to texts.

Return type

List[Intent]

init_model(label_count=None)[source]

Initialize the model if artifacts are available.

Parameters

label_count (Optional[int], optional) – number of labels to train on or predict, defaults to None

Raises

ValueError – In case n is not provided or can’t be calculated.

Return type

None

load()[source]

Load the plugin artifacts.

Return type

None

save()[source]

Save the plugin artifacts.

Raises

ValueError – In case the labelencoder is not trained.

Return type

None

train(training_data)[source]

Train an intent-classifier on the provided training data.

The training is skipped if the data-format is not valid. While training with the use_state flag as true, make sure that the state column is the part of the training_data dataframe :type training_data: DataFrame :param training_data: A pandas dataframe containing at least list of strings and corresponding labels. :type training_data: pd.DataFrame

Return type

None

utility(input, _)[source]

An abstract method that describes the plugin’s functionality.

Parameters
  • input (Input) – The workflow’s input.

  • output (Output) – The workflow’s output.

Returns

The value returned by the plugin.

Return type

Any

property valid_labelencoder: bool
Return type

bool

validate(training_data)[source]

Validate the training data is in the appropriate format

Parameters

training_data (pd.DataFrame) – A pandas dataframe containing at least list of strings and corresponding labels. Should also contain a state key if use_state = True while initializing object.

Returns

True if the dataframe is valid, False otherwise.

Return type

bool

Module contents