dialogy.plugins.text.merge_asr_output package¶
Module contents¶
We use classifiers for prediction of intents, utterances from an ASR are prominent features for the task. We featurize the utterances by concatenation of all transcripts.
In [1]: from dialogy.workflow import Workflow
...: from dialogy.plugins import MergeASROutputPlugin
...: from dialogy.base import Input
...:
In [2]: merge_asr_output_plugin = MergeASROutputPlugin(dest="input.clf_feature")
...: workflow = Workflow([merge_asr_output_plugin])
...:
In [3]: input_, _ = workflow.run(Input(utterances=["we will come by 7 pm", " will come by 7 pm"]))
In [4]: input_
Out[4]:
{'utterances': ['we will come by 7 pm', ' will come by 7 pm'],
'reference_time': None,
'latent_entities': False,
'transcripts': ['we will come by 7 pm', ' will come by 7 pm'],
'best_transcript': 'we will come by 7 pm',
'clf_feature': ['<s> we will come by 7 pm </s> <s> will come by 7 pm </s>'],
'lang': 'en',
'locale': 'en_IN',
'timezone': 'UTC',
'slot_tracker': None,
'current_state': None,
'expected_slots': None,
'previous_intent': None,
'history': None}
- class MergeASROutputPlugin(input_column='alternatives', output_column=None, use_transform=False, dest=None, guards=None, debug=False)[source]¶
Bases:
dialogy.base.plugin.Plugin
Working details are covered in MergeASROutputPlugin.
- Parameters
Plugin ([type]) – [description]
- merge_asr_output(utterances)[source]¶
Join ASR output to single string.
This function provides a merging strategy for n-best ASR transcripts by joining each transcript, such that:
each sentence end is marked by ” </s>” and,
sentence start marked by ” <s>”.
The key “transcript” is expected in the ASR output, the value of which would be operated on by this function.
The normalization is done by normalize
- Parameters
utterances (Any) –
A structure representing ASR output. We support only:
List[str]
List[List[str]]
List[List[Dict[str, str]]]
List[Dict[str, str]]
- Returns
Concatenated string, separated by <s> and </s> at respective terminal positions of each sentence.
- Return type
List[str]
- Raises
TypeError if transcript is missing in cases of
List[List[Dict[str, str]]]
orList[Dict[str, str]]
.