dialogy.base.input package¶
Module contents¶
The Input class creates immutable instances that describe the inputs of a single turn of a conversation.
There are some attributes that may have aggregations of previous turns like the slot_tracker
or entire history
.
Why do I see Input and Output as inputs to all Plugins?
It is a common pattern for all the plugins to require both as arguments. Since this could be confusing nomenclature, Input and Output bear meaning and even separation for the SLU API, not for Plugins.
Updates¶
While writing plugins, we would need to update the attributes of Input, the following doesn’t work!
In [1]: from dialogy.base import Input
...: from dialogy.utils import make_unix_ts
...:
In [2]: # Check the attributes in the object logged below.
...: input_x = Input(utterances="hello world", lang="hi", timezone="Asia/Kolkata")
...:
In [3]: input_x
Out[3]: Input(utterances='hello world', reference_time=None, latent_entities=False, transcripts=['hello world'], best_transcript='hello world', clf_feature=[], lang='hi', locale='en_IN', timezone='Asia/Kolkata', slot_tracker=None, current_state=None, expected_slots=None, previous_intent=None, history=None)
Issues with Frozen Instance Update¶
Now if we try the following:
In [4]: input_x.utterances = "hello"
---------------------------------------------------------------------------
FrozenInstanceError Traceback (most recent call last)
Input In [4], in <cell line: 1>()
----> 1 input_x.utterances = "hello"
File ~/miniconda3/envs/dialogy/lib/python3.10/site-packages/attr/_make.py:553, in _frozen_setattrs(self, name, value)
549 def _frozen_setattrs(self, name, value):
550 """
551 Attached to frozen classes as __setattr__.
552 """
--> 553 raise FrozenInstanceError()
FrozenInstanceError:
We can see re-assigning values to attributes isn’t allowed.
Updating a frozen instance¶
We have to create new instances, but we have some syntax for it:
In [5]: input_x = Input(utterances="hello world", lang="hi", timezone="Asia/Kolkata")
In [6]: input_y = Input.from_dict({"utterances": "hello"})
In [7]: input_y
Out[7]: Input(utterances='hello', reference_time=None, latent_entities=False, transcripts=['hello'], best_transcript='hello', clf_feature=[], lang='en', locale='en_IN', timezone='UTC', slot_tracker=None, current_state=None, expected_slots=None, previous_intent=None, history=None)
but by doing this, we lost the lang
and timezone
attributes to system defaults.
Reusing an existing instance¶
We can re-use an existing instance to create a new. This way, we don’t have to write every existing property on a previous Input.
In [8]: input_x = Input(utterances="hello world", lang="hi", timezone="Asia/Kolkata")
In [9]: input_y = Input.from_dict({"utterances": "hello"}, reference=input_x)
In [10]: input_y
Out[10]: Input(utterances='hello', reference_time=None, latent_entities=False, transcripts=['hello'], best_transcript='hello', clf_feature=[], lang='hi', locale='en_IN', timezone='Asia/Kolkata', slot_tracker=None, current_state=None, expected_slots=None, previous_intent=None, history=None)
Serialization¶
If there is a need to represent an Input as a dict we can do the following:
In [1]: input_y.json()
Out[1]:
{'utterances': 'hello',
'reference_time': None,
'latent_entities': False,
'transcripts': ['hello'],
'best_transcript': 'hello',
'clf_feature': [],
'lang': 'hi',
'locale': 'en_IN',
'timezone': 'Asia/Kolkata',
'slot_tracker': None,
'current_state': None,
'expected_slots': None,
'previous_intent': None,
'history': None}
- class Input(transcripts=None, best_transcript=None, *, utterances, reference_time=None, latent_entities=False, clf_feature=NOTHING, lang='en', locale='en_IN', timezone='UTC', slot_tracker=None, current_state=None, expected_slots=NOTHING, previous_intent=None, history=None)[source]¶
Bases:
object
Represents the inputs of the SLU API.
- best_transcript: str¶
A derived attribute. Contains the best alternative selected out of the utterances.
- clf_feature: Optional[List[str]]¶
Placeholder for the features of an intent classifier.
["<s> I want to book a flight </s> <s> I want to book flights </s> <s> I want to book a flight to Paris </s>"]
- current_state: Optional[str]¶
Points at the active state (or node) within the conversation graph.
- expected_slots: Set[str]¶
In a given turn, the expected number of slots to fill.
- find_entities_in_history(intent_names=None, slot_names=None)[source]¶
- Return type
Optional
[List
[Dict
[str
,Any
]]]
- history: Optional[List[Dict[str, Any]]]¶
- json()[source]¶
Serialize Input to a JSON-like dict.
- Returns
A dictionary that represents an Input instance.
- Return type
Dict[str, Any]
- lang: str¶
Expected language of the input. This is needed for language dependent plugins. These are present in https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes for “English” the code is “en”.
- latent_entities: bool¶
A switch to turn on/off production of latent entities via Duckling API If you need to parse “4” as 4 am or 4 pm. Note the absence of “am” or “pm” in the utterance. Then you might need this to be set as
True
. It may be helpful to keep itFalse
unless clearly required.
- locale: str¶
The locale identifier consists of at least a language code and a country/region code. We keep “en_IN” as our default. This is used by Duckling for parsing patterns as per the locale. If locale is missing i.e. None, we may fallback to
lang
instead.
- previous_intent: Optional[str]¶
The name of the intent that was predicted by the classifier in (exactly) the previous turn.
- reference_time: Optional[int]¶
The time that should be used for parsing relative time values. This is a Unix timestamp in seconds. utils/datetime.py has make_unix_ts to help convert a date in ISO 8601 format to unix ms timestamp.
- slot_tracker: Optional[List[Dict[str, Any]]]¶
This data structure tracks the slots that were filled in previous turns. This may come handy if we want to filter or reduce entities depending on our history. We use this in our CombineDateTimeOverSlots plugin.
[{ "name": "_callback_", # the intent name "slots": [{ "name": "callback_datetime", # the slot name "type": [ # can fill entities of these types. "time", "date", "datetime" ], "values": [{ # entities that were filled previously "alternative_index": 0, "body": "tomorrow", "entity_type": "date", "grain": "day", "parsers": ["duckling"], "range": { "end": 8, "start": 0 }, "score": None, "type": "value", "value": "2021-10-15T00:00:00+05:30" }] }] }]
- timezone: str¶
Timezones from https://en.wikipedia.org/wiki/List_of_tz_database_time_zones Used by duckling or any other date/time parsing plugins.
- transcripts: List[str]¶
A derived attribute. We cross product each utterance and return a list of strings. We use this to normalize utterances.
- utterances: List[List[Dict[str, Optional[Union[str, float]]]]]¶
ASRs produce utterances. Each utterance contains N-hypothesis. Each hypothesis is a
dict
with keys"transcript"
and"confidence"
.