dialogy.base.input package

Module contents

The Input class creates immutable instances that describe the inputs of a single turn of a conversation. There are some attributes that may have aggregations of previous turns like the slot_tracker or entire history.

Why do I see Input and Output as inputs to all Plugins?

It is a common pattern for all the plugins to require both as arguments. Since this could be confusing nomenclature, Input and Output bear meaning and even separation for the SLU API, not for Plugins.

Updates

While writing plugins, we would need to update the attributes of Input, the following doesn’t work!

In [1]: from dialogy.base import Input
   ...: from dialogy.utils import make_unix_ts
   ...: 

In [2]: # Check the attributes in the object logged below. 
   ...: input_x = Input(utterances="hello world", lang="hi", timezone="Asia/Kolkata")
   ...: 

In [3]: input_x
Out[3]: Input(utterances='hello world', reference_time=None, latent_entities=False, transcripts=['hello world'], best_transcript='hello world', clf_feature=[], lang='hi', locale='en_IN', timezone='Asia/Kolkata', slot_tracker=None, current_state=None, expected_slots=None, previous_intent=None, history=None)

Issues with Frozen Instance Update

Now if we try the following:

In [4]: input_x.utterances = "hello"
---------------------------------------------------------------------------
FrozenInstanceError                       Traceback (most recent call last)
Input In [4], in <cell line: 1>()
----> 1 input_x.utterances = "hello"

File ~/miniconda3/envs/dialogy/lib/python3.10/site-packages/attr/_make.py:553, in _frozen_setattrs(self, name, value)
    549 def _frozen_setattrs(self, name, value):
    550     """
    551     Attached to frozen classes as __setattr__.
    552     """
--> 553     raise FrozenInstanceError()

FrozenInstanceError: 

We can see re-assigning values to attributes isn’t allowed.

Updating a frozen instance

We have to create new instances, but we have some syntax for it:

In [5]: input_x = Input(utterances="hello world", lang="hi", timezone="Asia/Kolkata")

In [6]: input_y = Input.from_dict({"utterances": "hello"})

In [7]: input_y
Out[7]: Input(utterances='hello', reference_time=None, latent_entities=False, transcripts=['hello'], best_transcript='hello', clf_feature=[], lang='en', locale='en_IN', timezone='UTC', slot_tracker=None, current_state=None, expected_slots=None, previous_intent=None, history=None)

but by doing this, we lost the lang and timezone attributes to system defaults.

Reusing an existing instance

We can re-use an existing instance to create a new. This way, we don’t have to write every existing property on a previous Input.

In [8]: input_x = Input(utterances="hello world", lang="hi", timezone="Asia/Kolkata")

In [9]: input_y = Input.from_dict({"utterances": "hello"}, reference=input_x)

In [10]: input_y
Out[10]: Input(utterances='hello', reference_time=None, latent_entities=False, transcripts=['hello'], best_transcript='hello', clf_feature=[], lang='hi', locale='en_IN', timezone='Asia/Kolkata', slot_tracker=None, current_state=None, expected_slots=None, previous_intent=None, history=None)

Serialization

If there is a need to represent an Input as a dict we can do the following:

In [1]: input_y.json()
Out[1]: 
{'utterances': 'hello',
 'reference_time': None,
 'latent_entities': False,
 'transcripts': ['hello'],
 'best_transcript': 'hello',
 'clf_feature': [],
 'lang': 'hi',
 'locale': 'en_IN',
 'timezone': 'Asia/Kolkata',
 'slot_tracker': None,
 'current_state': None,
 'expected_slots': None,
 'previous_intent': None,
 'history': None}
class Input(transcripts=None, best_transcript=None, *, utterances, reference_time=None, latent_entities=False, clf_feature=NOTHING, lang='en', locale='en_IN', timezone='UTC', slot_tracker=None, current_state=None, expected_slots=NOTHING, previous_intent=None, history=None)[source]

Bases: object

Represents the inputs of the SLU API.

best_transcript: str

A derived attribute. Contains the best alternative selected out of the utterances.

clf_feature: Optional[List[str]]

Placeholder for the features of an intent classifier.

["<s> I want to book a flight </s> <s> I want to book flights </s> <s> I want to book a flight to Paris </s>"]
current_state: Optional[str]

Points at the active state (or node) within the conversation graph.

expected_slots: Set[str]

In a given turn, the expected number of slots to fill.

find_entities_in_history(intent_names=None, slot_names=None)[source]
Return type

Optional[List[Dict[str, Any]]]

classmethod from_dict(d, reference=None)[source]

Create a new Input instance from a dictionary.

Parameters
  • d (Dict[str, Any]) – A dictionary such that keys are a subset of Input attributes.

  • reference (Optional[Input], optional) – An existing Input instance., defaults to None

Returns

A new Input instance.

Return type

Input

history: Optional[List[Dict[str, Any]]]
json()[source]

Serialize Input to a JSON-like dict.

Returns

A dictionary that represents an Input instance.

Return type

Dict[str, Any]

lang: str

Expected language of the input. This is needed for language dependent plugins. These are present in https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes for “English” the code is “en”.

latent_entities: bool

A switch to turn on/off production of latent entities via Duckling API If you need to parse “4” as 4 am or 4 pm. Note the absence of “am” or “pm” in the utterance. Then you might need this to be set as True. It may be helpful to keep it False unless clearly required.

locale: str

The locale identifier consists of at least a language code and a country/region code. We keep “en_IN” as our default. This is used by Duckling for parsing patterns as per the locale. If locale is missing i.e. None, we may fallback to lang instead.

previous_intent: Optional[str]

The name of the intent that was predicted by the classifier in (exactly) the previous turn.

reference_time: Optional[int]

The time that should be used for parsing relative time values. This is a Unix timestamp in seconds. utils/datetime.py has make_unix_ts to help convert a date in ISO 8601 format to unix ms timestamp.

slot_tracker: Optional[List[Dict[str, Any]]]

This data structure tracks the slots that were filled in previous turns. This may come handy if we want to filter or reduce entities depending on our history. We use this in our CombineDateTimeOverSlots plugin.

[{
    "name": "_callback_",               # the intent name
    "slots": [{
        "name": "callback_datetime",    # the slot name
        "type": [                       # can fill entities of these types.
            "time",
            "date",
            "datetime"
        ],
        "values": [{                    # entities that were filled previously
            "alternative_index": 0,
            "body": "tomorrow",
            "entity_type": "date",
            "grain": "day",
            "parsers": ["duckling"],
            "range": {
                "end": 8,
                "start": 0
            },
            "score": None,
            "type": "value",
            "value": "2021-10-15T00:00:00+05:30"
        }]
    }]
}]
timezone: str

Timezones from https://en.wikipedia.org/wiki/List_of_tz_database_time_zones Used by duckling or any other date/time parsing plugins.

transcripts: List[str]

A derived attribute. We cross product each utterance and return a list of strings. We use this to normalize utterances.

utterances: List[List[Dict[str, Optional[Union[str, float]]]]]

ASRs produce utterances. Each utterance contains N-hypothesis. Each hypothesis is a dict with keys "transcript" and "confidence".