dialogy.plugins.text.calibration package¶
Submodules¶
dialogy.plugins.text.calibration.xgb module¶
Trains a calibraation model. This contains two models: - Vectorizer: TfIdf - Classifier: XGBoostRegressor
- class CalibrationModel(threshold, dest=None, guards=None, debug=False, input_column='alternatives', output_column='alternatives', use_transform=False, model_name='calibration.pkl')[source]¶
Bases:
dialogy.base.plugin.Plugin
This plugin provides a calibration model that sits between ASR and SLU. It trains a model that learn to classify alternatives from the text and AM, LM score. Bad alternatives are removed before training SLU and during inference.
- filter_asr_output(utterances)[source]¶
Filters outputs from ASR based on calibration model prediction.
- Parameters
asr_output – output dictionary from ASR. Should have an _alternatives_ key.
- Returns
Filtered alternatives, in the same format as input.
- Return type
Dict[str, Any]
- train(df)[source]¶
Trains the calibration pipeline.
- Parameters
df (pd.DataFrame) – dataframe to train on. Should be a valid transcrition tagging job.
model_name (str) – Saves the pipline as {model_name}.pkl
- Return type
None
- transform(training_data)[source]¶
Transform data for a plugin in the workflow.
- Return type
DataFrame
- validate(df)[source]¶
Return if df is a valid trascription tagging job should return False for intent tagging jobs. example : ‘{“text”: “I want to change and set my <INAUDIBLE>”, “type”: “TRANSCRIPT”}’
Sharp bits: - All rows in df should have same format. We just consider
the first row for sanity checks.
- Parameters
df (pd.DataFrame) – Input dataframe.
- Returns
(bool) if the dataframe is valid for training calibration model.
- Return type
bool