dialogy.base.entity_extractor package¶
Module contents¶
- class EntityScoringMixin[source]¶
Bases:
object
Mixin for scoring and aggregation of entities over a set of transcripts.
- aggregate_entities(entity_type_value_group, input_size)[source]¶
Reduce entities sharing same type and value.
Entities with same type and value are considered identical even if other metadata is same. These entities are part of a group.
We track the transcript indices for every entity in a group.
Select the minimum of all the indices. (because 0th transcript has highest confidence)
We pick one entity per group and modify its index to the minimum and score is aggregated for the group.
The entity picked is added to a list of aggregates.
The above is done for all entities in a group
- Parameters
entity_type_val_group (Dict[Tuple[str, Any], List[BaseEntity]]) – A data-structure that groups entities by type and value.
- Returns
A list of de-duplicated entities.
- Return type
List[BaseEntity]
- apply_filters(entities)[source]¶
Filter entities with score less than the threshold.
- Parameters
entities (List[BaseEntity]) – A list of entities.
- Returns
A list of entities. This can be at most the same length as entities.
- Return type
List[BaseEntity]
- entity_consensus(entities, input_size)[source]¶
Combine entities by type and value.
This issue: https://github.com/Vernacular-ai/dialogy/issues/52 Points at the problems where we can return multiple identical entities, depending on the number of transcripts that contain same body.
- Parameters
entities (List[BaseEntity]) – A list of entities which may have duplicates.
- Returns
A list of entities scored and unique by type and value.
- Return type
List[BaseEntity]
- static make_transform_values(transcript)[source]¶
Make transcripts from a string/json-string.
- Parameters
transcript (str) – A string to search entities within.
- Returns
List of transcripts.
- Return type
List[str]
- remove_low_scoring_entities(entities)[source]¶
Remove entities with a lower score than the threshold.
This doesn’t apply to entities with score=None.
- Parameters
entities (List[BaseEntity]) – A list of entities.
- Returns
A list of entities with score higher than configured threshold.
- Return type
List[BaseEntity]
- threshold: Optional[float] = None¶
Value to compare against an entity’s score.