dialogy.base.entity_extractor package

Module contents

class EntityScoringMixin[source]

Bases: object

Mixin for scoring and aggregation of entities over a set of transcripts.

aggregate_entities(entity_type_value_group, input_size)[source]

Reduce entities sharing same type and value.

  • Entities with same type and value are considered identical even if other metadata is same. These entities are part of a group.

  • We track the transcript indices for every entity in a group.

  • Select the minimum of all the indices. (because 0th transcript has highest confidence)

  • We pick one entity per group and modify its index to the minimum and score is aggregated for the group.

  • The entity picked is added to a list of aggregates.

The above is done for all entities in a group

Parameters

entity_type_val_group (Dict[Tuple[str, Any], List[BaseEntity]]) – A data-structure that groups entities by type and value.

Returns

A list of de-duplicated entities.

Return type

List[BaseEntity]

apply_filters(entities)[source]

Filter entities with score less than the threshold.

Parameters

entities (List[BaseEntity]) – A list of entities.

Returns

A list of entities. This can be at most the same length as entities.

Return type

List[BaseEntity]

entity_consensus(entities, input_size)[source]

Combine entities by type and value.

This issue: https://github.com/Vernacular-ai/dialogy/issues/52 Points at the problems where we can return multiple identical entities, depending on the number of transcripts that contain same body.

Parameters

entities (List[BaseEntity]) – A list of entities which may have duplicates.

Returns

A list of entities scored and unique by type and value.

Return type

List[BaseEntity]

static make_transform_values(transcript)[source]

Make transcripts from a string/json-string.

Parameters

transcript (str) – A string to search entities within.

Returns

List of transcripts.

Return type

List[str]

remove_low_scoring_entities(entities)[source]

Remove entities with a lower score than the threshold.

This doesn’t apply to entities with score=None.

Parameters

entities (List[BaseEntity]) – A list of entities.

Returns

A list of entities with score higher than configured threshold.

Return type

List[BaseEntity]

threshold: Optional[float] = None

Value to compare against an entity’s score.

entity_scoring(presence, input_size)[source]
Return type

float