dialogy.plugins.text.list_entity_plugin package¶
Module contents¶
Regex Search¶
We have often seen certain keywords that gain significance in an SLU project. These keywords are
easy to extract via patterns and are often used to create entities. The ListEntityPlugin
helps in this task, it requires a pattern-map, we call it candidates
.
In [1]: from dialogy.workflow import Workflow
...: from dialogy.plugins import ListEntityPlugin
...: from dialogy.base import Input
...:
In [2]: candidates = {
...: "colours": {
...: "red": ["red", "crimson", "ruby", "raspberry"],
...: "blue": ["blue", "azure", "indigo", "navy"],
...: "green": ["green", "emerald", "jade", "teal"],
...: },
...: }
...:
In [3]: list_entity_plugin = ListEntityPlugin(
...: candidates=candidates,
...: style="regex",
...: dest="output.entities"
...: )
...: workflow = Workflow([list_entity_plugin])
...: _, output = workflow.run(Input(utterances="I want an emerald green shirt"))
...:
In [4]: output
Out[4]:
{'intents': [],
'entities': [{'range': {'start': 18, 'end': 23},
'body': 'green',
'type': 'colours',
'parsers': ['ListEntityPlugin'],
'score': 1.0,
'alternative_index': 0,
'value': 'green',
'entity_type': 'colours',
'_meta': {}}],
'original_intent': {}}
Numeric Values¶
We can also process numeric entities like so:
In [5]: candidates = {
...: "phone-number": {
...: "__value__": ["\d{10}"],
...: },
...: }
...:
In [6]: list_entity_plugin = ListEntityPlugin(
...: candidates=candidates,
...: style="regex",
...: dest="output.entities"
...: )
...: workflow = Workflow([list_entity_plugin])
...: _, output = workflow.run(Input(utterances="my number is 9999999999"))
...:
In [7]: output
Out[7]:
{'intents': [],
'entities': [{'range': {'start': 13, 'end': 23},
'body': '9999999999',
'type': 'phone-number',
'parsers': ['ListEntityPlugin'],
'score': 1.0,
'alternative_index': 0,
'value': '9999999999',
'entity_type': 'phone-number',
'_meta': {}}],
'original_intent': {}}
Spacy NER¶
We also allow using spacy’s NER. This has to be passed within the
spacy_nlp
attribute.
In [1]: import spacy
...: from dialogy.workflow import Workflow
...: from dialogy.plugins import ListEntityPlugin
...: from dialogy.base import Input
...:
In [2]: nlp = spacy.load("en_core_web_sm")
In [3]: list_entity_plugin = ListEntityPlugin(
...: spacy_nlp=nlp,
...: style="spacy",
...: dest="output.entities"
...: )
...: workflow = Workflow([list_entity_plugin])
...: _, output = workflow.run(Input(utterances="Need a place to stay in New Delhi."))
...:
In [4]: output
Out[4]:
{'intents': [],
'entities': [{'range': {'start': 24, 'end': 33},
'body': 'New Delhi',
'type': 'GPE',
'parsers': ['ListEntityPlugin'],
'score': 1.0,
'alternative_index': 0,
'value': 'New Delhi',
'entity_type': 'GPE',
'_meta': {}}],
'original_intent': {}}
- class ListEntityPlugin(style=None, candidates=None, spacy_nlp=None, dest=None, guards=None, labels=None, threshold=None, input_column='alternatives', output_column=None, use_transform=True, flags=RegexFlag.None, debug=False)[source]¶
Bases:
dialogy.base.entity_extractor.EntityScoringMixin
,dialogy.base.plugin.Plugin
A Plugin for extracting entities using spacy or a list of regex patterns.
- Parameters
style (Optional[str]) – One of [“regex”, “spacy”]
candidates (Optional[Dict[str, List[str]]]) – Required if style is “regex”, this is a
dict
that shows a mapping of entity values and their patterns.spacy_nlp (Any) – Required if style is “spacy”, requires is a spacy model.
labels (Optional[List[str]]) – Required if style is “spacy”. If there is a need to extract only a few labels from all the other available labels.
debug (bool) – A flag to set debugging on the plugin methods
- get_entities(transcripts)[source]¶
Parse entities using regex and spacy ner.
- Parameters
transcripts (List[str]) – A list of strings within which to search for entities.
- Returns
List of entities from regex matches or spacy ner.
- Return type
List[KeywordEntity]
- ner_search(transcript)[source]¶
Wrapper over spacy’s ner search.
- Parameters
transcript (str) – A string to search entities within.
- Returns
NER parsing via spacy.
- Return type
MatchType
- regex_search(transcript)[source]¶
Wrapper over regex searches.
- Parameters
transcript (str) – A string to search entities within.
- Returns
regex parsing via spacy.
- Return type
MatchType