dialogy.plugins.text.list_search_plugin package

Module contents

Module needs refactor. We are currently keeping all strategies bundled as methods as opposed to SearchStrategyClasses.

Within dialogy, we extract entities using Duckling, Pattern lists and Spacy. We can ship individual plugins but at the same time, the difference is just configuration of each of these tools/services. There is another difference of intermediate structure that the DucklingPlugin expects. We need to prevent the impact of the structure from affecting all other entities. So that their from_dict(...) methods are pristine and involve no shape hacking.

class ListSearchPlugin(fuzzy_dp_config, threshold=None, dest=None, guards=None, input_column='alternatives', output_column=None, use_transform=True, flags=RegexFlag.None, debug=False, fuzzy_threshold=0.1)[source]

Bases: dialogy.base.entity_extractor.EntityScoringMixin, dialogy.base.plugin.Plugin

A Plugin for extracting entities using spacy or a list of regex patterns.

Parameters
  • style (Optional[str]) – One of [“regex”, “spacy”]

  • candidates (Optional[Dict[str, List[str]]]) – Required if style is “regex”, this is a dict that shows a mapping of entity values and their patterns.

  • spacy_nlp (Any) – Required if style is “spacy”, this is a spacy model.

  • labels (Optional[List[str]]) – Required if style is “spacy”. If there is a need to extract only a few labels from all the other available labels.

  • debug (bool) – A flag to set debugging on the plugin methods

Return type

Tuple[str, str, str, Tuple[int, int], float]

fuzzy_init()[source]

Initializing the parameters for fuzzy dp search with their values

Return type

None

get_entities(transcripts, lang)[source]

Parse entities using regex and spacy ner.

Parameters

transcripts (List[str]) – A list of strings within which to search for entities.

Returns

List of entities from regex matches or spacy ner.

Return type

List[KeywordEntity]

Search for Entity in transcript from a defined List Search space :param transcripts : A list of transcripts, List[str]. :param lang : Language code of the transcript :code str :return: Token matches with the transcript. :rtype: List[MatchType]

get_words_from_nlp(nlp, query)[source]
Return type

List[Dict[str, Any]]

search_regex(query, entity_type='', entity_patterns=None, match_dict=None)[source]
Return type

Tuple[str, str, str, Tuple[int, int], float]

utility(input_, _)[source]

An abstract method that describes the plugin’s functionality.

Parameters
  • input (Input) – The workflow’s input.

  • output (Output) – The workflow’s output.

Returns

The value returned by the plugin.

Return type

Any