dialogy.base.plugin package

Module contents

Abstract Plugin

A Plugin transforms the data within the workflow. We use plugins for producing intents, entities, filling slots and even preparing intermediate data for other plugins. This package ships the abstract base class for creating other plugins. These offer a base for a few already implemented features like book-keeping and Guards.

Writing Plugins

Attention

We will learn plugin creation by exploring some examples. It is highly advised to read the Input and Output sections first.

Keyword Intent

Say we have to predict an intent _greeting_ if the first item in the transcripts have "hello". This is assuming that the transcripts are ranked by their confidence scores in the descending order.

In [1]: from typing import List
   ...: from dialogy.workflow import Workflow
   ...: from dialogy.base import Input, Plugin, Output
   ...: from dialogy.types import Intent
   ...: 

In [2]: class Transcript2IntentPlugin(Plugin):
   ...:     def __init__(self, dest=None, **kwargs):
   ...:         super().__init__(dest=dest, **kwargs)
   ...:         
   ...:     def greet(self, transcripts: List[str]) -> List[Intent]:
   ...:         if not transcripts:
   ...:             return []
   ...: 
   ...:         best_transcript, *rest = transcripts
   ...: 
   ...:         if "hello" in best_transcript:
   ...:             return [Intent(name="_greeting_", score=1.0)]
   ...:         else:
   ...:             return []
   ...: 
   ...:     def utility(self, input_: Input, output: Output) -> List[Intent]:
   ...:         return self.greet(input_.transcripts)
   ...: 

In [3]: transcript2intent   = Transcript2IntentPlugin(dest="output.intents")
   ...: workflow            = Workflow([transcript2intent])
   ...: input_, output      = workflow.run(Input(utterances=[[{"transcript": "hello"}]]))
   ...: # The last expression provides us a snapshot of the workflow's 
   ...: # input and output after all plugins have been executed.
   ...: # These aren't the Input and Output objects, but their `dict` equivalents.
   ...: 

In [4]: # we can see the utterance that we set 
   ...: # and the derived field `transcript` that we used within the plugin.
   ...: # Other values are defaults.
   ...: input_
   ...: 
Out[4]: 
{'utterances': [[{'transcript': 'hello'}]],
 'reference_time': None,
 'latent_entities': False,
 'transcripts': ['hello'],
 'best_transcript': 'hello',
 'clf_feature': [],
 'lang': 'en',
 'locale': 'en_IN',
 'timezone': 'UTC',
 'slot_tracker': None,
 'current_state': None,
 'expected_slots': None,
 'previous_intent': None,
 'history': None}

In [5]: # we can see that the _greeting_ intent has been set.
   ...: output
   ...: 
Out[5]: 
{'intents': [{'name': '_greeting_',
   'alternative_index': None,
   'score': 1.0,
   'parsers': [],
   'slots': []}],
 'entities': [],
 'original_intent': {}}

Heuristic based Intent

This problem requires us to predict _greeting_ if the word "hello" is present at least 6 times across all utterances. To help us with this, we will import an existing plugin: MergeASROutputPlugin.

In [6]: from collections import Counter
   ...: from dialogy.plugins import MergeASROutputPlugin
   ...: 

In [7]: class HeuristicBasedIntentPlugin(Plugin):
   ...:     def __init__(self, threshold=0, dest=None, **kwargs):
   ...:         super().__init__(dest=dest, **kwargs)
   ...:         self.threshold = threshold
   ...: 
   ...:     def greet(self, clf_features: List[str]) -> List[Intent]:
   ...:         if not clf_features:
   ...:             return []
   ...: 
   ...:         feature = clf_features[0]
   ...:         word_frequency = Counter(feature.split())
   ...:         if word_frequency.get("hello", 0) >= self.threshold:
   ...:             return [Intent(name="_greeting_", score=1.0)]
   ...:         else:
   ...:             return []
   ...:     def utility(self, input_: Input, output: Output) -> List[Intent]:
   ...:         return self.greet(input_.clf_feature)
   ...: 

In [8]: heuristic_based_intent  = HeuristicBasedIntentPlugin(threshold=6, dest="output.intents")
   ...: merge_asr_output_plugin = MergeASROutputPlugin(dest="input.clf_feature")
   ...: workflow                = Workflow([merge_asr_output_plugin, heuristic_based_intent])
   ...: input_, output          = workflow.run(
   ...:    Input(utterances=[
   ...:        [{"transcript": "hello is anyone there"},
   ...:         {"transcript": "yellow is anyone here"},
   ...:         {"transcript": "hello is one here"},
   ...:         {"transcript": "hello is one there"},
   ...:         {"transcript": "hello if one here"},
   ...:         {"transcript": "hello in one here"},
   ...:         {"transcript": "hello ip one here"}]
   ...:    ])
   ...: )
   ...: 

In [9]: # let's check the snapshots again.
   ...: # Pay close attention to `clf_feature`
   ...: # it wasn't set in the previous example.
   ...: # This time, it was set by the `MergeASROutputPlugin` plugin 
   ...: # as it had the dest = "input.clf_feature"
   ...: input_
   ...: 
Out[9]: 
{'utterances': [[{'transcript': 'hello is anyone there'},
   {'transcript': 'yellow is anyone here'},
   {'transcript': 'hello is one here'},
   {'transcript': 'hello is one there'},
   {'transcript': 'hello if one here'},
   {'transcript': 'hello in one here'},
   {'transcript': 'hello ip one here'}]],
 'reference_time': None,
 'latent_entities': False,
 'transcripts': ['hello is anyone there',
  'yellow is anyone here',
  'hello is one here',
  'hello is one there',
  'hello if one here',
  'hello in one here',
  'hello ip one here'],
 'best_transcript': 'hello is anyone there',
 'clf_feature': ['<s> hello is anyone there </s> <s> yellow is anyone here </s> <s> hello is one here </s> <s> hello is one there </s> <s> hello if one here </s> <s> hello in one here </s> <s> hello ip one here </s>'],
 'lang': 'en',
 'locale': 'en_IN',
 'timezone': 'UTC',
 'slot_tracker': None,
 'current_state': None,
 'expected_slots': None,
 'previous_intent': None,
 'history': None}

In [10]: output
Out[10]: 
{'intents': [{'name': '_greeting_',
   'alternative_index': None,
   'score': 1.0,
   'parsers': [],
   'slots': []}],
 'entities': [],
 'original_intent': {}}

Guarding plugins

It may seem useful to not run plugins all the time. In this case, if we knew the current state of the conversation we could decide to not run our naive plugin. Say a state where we are expecting numbers? So let’s look at an example where we prevent plugins from execution using guards.

In [11]: heuristic_based_intent  = HeuristicBasedIntentPlugin(
   ....:     threshold=6,
   ....:     dest="output.intents",
   ....:     guards=[lambda i, o: i.current_state == "STATE_EXPECTING_NUMBERS"]
   ....: )
   ....: merge_asr_output_plugin = MergeASROutputPlugin(dest="input.clf_feature")
   ....: workflow                = Workflow([merge_asr_output_plugin, heuristic_based_intent])
   ....: in_                     = Input(utterances=[[
   ....:                                 {"transcript": "hello is anyone there"},
   ....:                                 {"transcript": "yellow is anyone here"},
   ....:                                 {"transcript": "hello is one here"},
   ....:                                 {"transcript": "hello is one there"},
   ....:                                 {"transcript": "hello if one here"},
   ....:                                 {"transcript": "hello in one here"},
   ....:                                 {"transcript": "hello ip one here"}]]) 
   ....: input_, output          = workflow.run(in_)
   ....: 

In [12]: # Oops we forgot to set the current state within the Input!
   ....: output
   ....: 
Out[12]: 
{'intents': [{'name': '_greeting_',
   'alternative_index': None,
   'score': 1.0,
   'parsers': [],
   'slots': []}],
 'entities': [],
 'original_intent': {}}

In [13]: workflow            = Workflow([merge_asr_output_plugin, heuristic_based_intent])
   ....: in_.current_state   = "STATE_EXPECTING_NUMBERS"
   ....: # We get a FrozenInstanceError because Input and Output instances are immutable.
   ....: 
---------------------------------------------------------------------------
FrozenInstanceError                       Traceback (most recent call last)
Input In [13], in <cell line: 2>()
      1 workflow            = Workflow([merge_asr_output_plugin, heuristic_based_intent])
----> 2 in_.current_state   = "STATE_EXPECTING_NUMBERS"

File ~/miniconda3/envs/dialogy/lib/python3.10/site-packages/attr/_make.py:553, in _frozen_setattrs(self, name, value)
    549 def _frozen_setattrs(self, name, value):
    550     """
    551     Attached to frozen classes as __setattr__.
    552     """
--> 553     raise FrozenInstanceError()

FrozenInstanceError: 

In [14]: # We can use the following to create a new instance of Input instead.
   ....: in_ = Input.from_dict({"current_state": "STATE_EXPECTING_NUMBERS"}, reference=in_)
   ....: input_, output  = workflow.run(in_)
   ....: 

In [15]: # In this case, we don't see anything set as the 
   ....: # HeuristicBasedIntentPlugin was prevented
   ....: # by its guarding conditions.
   ....: output
   ....: 
Out[15]: {'intents': [], 'entities': [], 'original_intent': {}}

In [16]: # However, `clf_features` are set since we never wrote guards for 
   ....: # the MergeASROutputPlugin.
   ....: input_
   ....: 
Out[16]: 
{'utterances': [[{'transcript': 'hello is anyone there'},
   {'transcript': 'yellow is anyone here'},
   {'transcript': 'hello is one here'},
   {'transcript': 'hello is one there'},
   {'transcript': 'hello if one here'},
   {'transcript': 'hello in one here'},
   {'transcript': 'hello ip one here'}]],
 'reference_time': None,
 'latent_entities': False,
 'transcripts': ['hello is anyone there',
  'yellow is anyone here',
  'hello is one here',
  'hello is one there',
  'hello if one here',
  'hello in one here',
  'hello ip one here'],
 'best_transcript': 'hello is anyone there',
 'clf_feature': ['<s> hello is anyone there </s> <s> yellow is anyone here </s> <s> hello is one here </s> <s> hello is one there </s> <s> hello if one here </s> <s> hello in one here </s> <s> hello ip one here </s>'],
 'lang': 'en',
 'locale': 'en_IN',
 'timezone': 'UTC',
 'slot_tracker': None,
 'current_state': 'STATE_EXPECTING_NUMBERS',
 'expected_slots': None,
 'previous_intent': None,
 'history': None}

Update Plans

You may need to write plugins that generate an Intent or BaseEntity, there is also a category of plugins that might be required for modifications. Like the RuleBasedSlotFillerPlugin or the CombineDateTimeOverSlots.

The former updates the intents and the latter updates time-entities. In these cases, the plugins also take into account other values.

Such as, CombineDateTimeOverSlots:

  1. Separates time entities from other entity types.

  2. Combines them over slot presence.

  3. Rebuilds a full entity list.

This means

  1. Some plugins need to replace the entire set of intents or entities.

  2. While others need to append their results along with the history of the workflow.

If your plugin belongs to [1], then you need to set replace_output=True in the constructor of your plugin.

Note

Refer to CombineDateTimeOverSlots. It’s combine_time_entities_from_slots method shows where we may need to use replace_output=True.

class Plugin(input_column='alternatives', output_column=None, use_transform=False, replace_output=False, dest=None, guards=None, debug=False)[source]

Bases: abc.ABC

Abstract class to be implemented by all plugins.

Parameters
  • input_column (str) – Transforms data in this column for a given dataframe, defaults to const.ALTERNATIVES

  • output_column (Optional[str]) – Saves transformation in this column for a given dataframe, defaults to None

  • use_transform (bool) – Should the transformation be applied while training?, defaults to False

  • dest (Optional[str]) – The path where plugin output should be saved., defaults to None

  • guards (Optional[List[Guard]]) – A list of functions that evaluate to bool, defaults to None

  • debug (bool, optional) – Should we print debug logs?, defaults to False

prevent(input_, output)[source]

Decide if the plugin should execute.

If this method returns true, the plugin’s utility method will not be called.

Returns

prevent plugin execution if True.

Return type

bool

train(_)[source]

Train a plugin.

Return type

Any

transform(training_data)[source]

Transform data for a plugin in the workflow.

Return type

Any

abstract utility(input_, output)[source]

An abstract method that describes the plugin’s functionality.

Parameters
  • input (Input) – The workflow’s input.

  • output (Output) – The workflow’s output.

Returns

The value returned by the plugin.

Return type

Any