dialogy.plugins.text.calibration package

Submodules

dialogy.plugins.text.calibration.xgb module

Trains a calibraation model. This contains two models: - Vectorizer: TfIdf - Classifier: XGBoostRegressor

class CalibrationModel(threshold, dest=None, guards=None, debug=False, input_column='alternatives', output_column='alternatives', use_transform=False, model_name='calibration.pkl')[source]

Bases: dialogy.base.plugin.Plugin

This plugin provides a calibration model that sits between ASR and SLU. It trains a model that learn to classify alternatives from the text and AM, LM score. Bad alternatives are removed before training SLU and during inference.

filter_asr_output(utterances)[source]

Filters outputs from ASR based on calibration model prediction.

Parameters

asr_output – output dictionary from ASR. Should have an _alternatives_ key.

Returns

Filtered alternatives, in the same format as input.

Return type

Dict[str, Any]

inference(transcripts, utterances)[source]
Return type

List[str]

predict(alternatives)[source]
Return type

Any

save(fname)[source]
Return type

None

train(df)[source]

Trains the calibration pipeline.

Parameters
  • df (pd.DataFrame) – dataframe to train on. Should be a valid transcrition tagging job.

  • model_name (str) – Saves the pipline as {model_name}.pkl

Return type

None

transform(training_data)[source]

Transform data for a plugin in the workflow.

Return type

DataFrame

utility(input, _)[source]

An abstract method that describes the plugin’s functionality.

Parameters
  • input (Input) – The workflow’s input.

  • output (Output) – The workflow’s output.

Returns

The value returned by the plugin.

Return type

Any

validate(df)[source]

Return if df is a valid trascription tagging job should return False for intent tagging jobs. example : ‘{“text”: “I want to change and set my <INAUDIBLE>”, “type”: “TRANSCRIPT”}’

Sharp bits: - All rows in df should have same format. We just consider

the first row for sanity checks.

Parameters

df (pd.DataFrame) – Input dataframe.

Returns

(bool) if the dataframe is valid for training calibration model.

Return type

bool

class FeatureExtractor[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

features(alternatives)[source]
Return type

List[List[float]]

fit(df, y=None)[source]
Return type

Any

transform(df)[source]
Return type

Tuple[Any, Any]

Module contents