dialogy.utils package¶
Submodules¶
dialogy.utils.datetime module¶
- dt2timestamp(date_time)[source]¶
Converts a python datetime object to unix-timestamp.
- Parameters
date_time (datetime) – An instance of datetime.
- Returns
Unix timestamp integer.
- Return type
int
- is_unix_ts(ts)[source]¶
Check if the input is a unix timestamp.
- Parameters
ts (int) – A unix timestamp (13-digit).
- Returns
True if
ts
is a unix timestamp, else False.- Return type
bool
- make_unix_ts(tz='UTC')[source]¶
Convert date in ISO 8601 format to unix ms timestamp.
In [1]: from dialogy.utils.datetime import make_unix_ts In [2]: ts = make_unix_ts("Asia/Kolkata")("2022-02-07T19:39:39.537827") In [3]: ts == 1644241599537 Out[3]: True
- Parameters
tz (Optional[str], optional) – A timezone string, defaults to “UTC”
- Returns
A callable that converts a date in ISO 8601 format to unix ms timestamp.
- Return type
Callable[[str], int]
dialogy.utils.file_handler module¶
- create_timestamps_path(directory, file_name, timestamp=None, dry_run=False)[source]¶
- Return type
str
- load_file(file_path=None, mode='r', loader=None)[source]¶
Safely load a file.
- Parameters
file_path ([type]) – The path to the file to load.
mode (str, optional) – The mode to use when opening the file., defaults to “r”
- Returns
The file contents.
- Return type
Any
- save_file(file_path=None, content=None, mode='w', encoding='utf-8', newline='\\n', writer=None)[source]¶
Save a file.
- param file_path
The path to the file to save.
- type file_path
str
- param content
The content to save.
- type content
Any
- param mode
The mode to use when opening the file., defaults to “w”
- type mode
str, optional
- param encoding
The encoding to use when writing the file, defaults to “utf-8”
- type encoding
str, optional
- param newline
The newline character to use when writing the file, defaults to “
- “
- type newline
str, optional
- Return type
None
dialogy.utils.logger module¶
Module provides access to logger.
This needs to be used sparingly, prefer to raise specific exceptions instead.
dialogy.utils.misc module¶
Module provides utility functions for entities.
- Import functions:
dict_traversal
validate_type
- traverse_dict(obj, properties)[source]¶
Traverse a dictionary for a given list of properties.
This is useful for traversing a deeply nested dictionary. Instead of recursion, we are using reduce to update the dict. Missing properties will lead to KeyErrors.
In [1]: from dialogy.utils import traverse_dict In [2]: input_ = { ...: "planets": { ...: "mars": [{ ...: "name": "", ...: "languages": [{ ...: "beep": {"speakers": 11}, ...: }, { ...: "bop": {"speakers": 30}, ...: }] ...: }] ...: } ...: } ...: In [3]: traverse_dict(input_, ["planets", "mars", 0 , "languages", 1, "bop"]) Out[3]: {'speakers': 30} # element with index 3 doesn't exist! In [4]: traverse_dict(input_, ["planets", "mars", 0 , "languages", 3, "bop"]) --------------------------------------------------------------------------- IndexError Traceback (most recent call last) Input In [4], in <cell line: 1>() ----> 1 traverse_dict(input_, ["planets", "mars", 0 , "languages", 3, "bop"]) File ~/Programs/pythoncode/dialogy/dialogy/utils/misc.py:52, in traverse_dict(obj, properties) 13 """ 14 Traverse a dictionary for a given list of properties. 15 (...) 49 :raises TypeError: Properties don't describe a path due to possible type error. 50 """ 51 try: ---> 52 return reduce(lambda o, k: o[k], properties, obj) 53 except KeyError as key_error: 54 raise KeyError( 55 f"Missing property {key_error} in {obj}. Check the types. Failed for path {properties}" 56 ) from key_error File ~/Programs/pythoncode/dialogy/dialogy/utils/misc.py:52, in traverse_dict.<locals>.<lambda>(o, k) 13 """ 14 Traverse a dictionary for a given list of properties. 15 (...) 49 :raises TypeError: Properties don't describe a path due to possible type error. 50 """ 51 try: ---> 52 return reduce(lambda o, k: o[k], properties, obj) 53 except KeyError as key_error: 54 raise KeyError( 55 f"Missing property {key_error} in {obj}. Check the types. Failed for path {properties}" 56 ) from key_error IndexError: list index out of range
- Parameters
obj (Dict[Any, Any]) – The dict to traverse.
properties (List[int]) – List of properties to be parsed as a path to be navigated in the dict.
- Returns
A value within a deeply nested dict.
- Return type
Any
- Raises
KeyError – Missing property in the dictionary.
TypeError – Properties don’t describe a path due to possible type error.
- validate_type(obj, obj_type)[source]¶
Raise TypeError on object type mismatch.
This is syntatic sugar for instance type checks.
The check is by exclusion of types. Wraps exception raising logic.
- param obj
An object available for type assertion
- type obj
Any
- param obj_type
This must match the type of the object.
- type obj_type
(Union[type, Tuple[type]])
- return
- rtype
- raises TypeError
If the type obj_type doesn’t match the type of obj.
- Return type
None
dialogy.utils.naive_lang_detect module¶
dialogy.utils.normalize_utterance module¶
This module was created in response to: https://github.com/Vernacular-ai/dialogy/issues/9 we will ship functions to assist normalization of ASR output, we will refer to these as Utterances.
- dict_get(prop, obj)[source]¶
Get value of prop within obj.
This simple function exists to facilitate a partial function defined here.
- Parameters
prop (str) – A property within a
dict
.obj (Dict[str, Any]) – A
dict
.
- Returns
Value of a property within a
dict
.- Return type
Any
- get_best_transcript(transcripts)[source]¶
Select the best transcript from a list of transcripts. The best transcript is the first transcript gven by ASR (20220803)
- Parameters
transcripts (List[str]) – List of transcripts
- Returns
A string containing the best transcript
- Return type
str
- is_each_element(type_, input_, transform=<function <lambda>>)[source]¶
Check if each element in a list is of a given type.
- Parameters
type (Type) – Expected
Type
of each element in theinput_
which is alist
.input (List[Any]) – A
list
.transform (Callable[[Any], Any]) – We may apply some transforms to each element before making these checks. This is to check if a certain key in a Dict matches the expected type. In case this is not needed, leave the argument unset and an identity transform is applied. Defaults to lambda x:x.
- Returns
Checks each element in a list to match
type_
, if any element fails the check, this returns False, else True.- Return type
bool
- is_list(input_)[source]¶
Check type of
input
- Parameters
input (Any) – Any arbitrary input
- Returns
True if
input
is alist
else False- Return type
True
- is_list_of_string(maybe_utterance)[source]¶
Check input to be of List[str].
In [1]: from dialogy.utils.normalize_utterance import is_list_of_string In [2]: is_list_of_string(["this", "works"]) Out[2]: True
- Parameters
maybe_utterance (Any) – Arbitrary input.
- Returns
True if
maybe_utterance
is astr
.- Return type
bool
- is_string(maybe_utterance)[source]¶
Check input’s type is str.
- Parameters
maybe_utterance (Any) – Arbitrary type input.
- Returns
True if
maybe_utterance
is astr
, else False.- Return type
bool
- is_unsqueezed_utterance(maybe_utterance, key='transcript')[source]¶
Check input to be of List[Dict].
In [1]: from dialogy.utils.normalize_utterance import is_unsqueezed_utterance # 1. This fails In [2]: is_unsqueezed_utterance([[{"transcript": "this"}, {"transcript": "works"}]]) Out[2]: False # 2. key is configurable In [3]: is_unsqueezed_utterance([{"text": "this"}, {"text": "works"}], key="text") Out[3]: True
- Parameters
maybe_utterance (Any) – Arbitrary type input.
key (str, Defaults to const.TRANSCRIPT.) – The key within which transcription string resides.
- Returns
True, if the input is of type
List[Dict[str, Any]]
else False.- Return type
bool
- is_utterance(maybe_utterance, key='transcript')[source]¶
Check input to be of List[List[Dict]].
In [1]: from dialogy.utils.normalize_utterance import is_utterance # 1. :code:`List[List[Dict[str, str]]]` In [2]: is_utterance([[{"transcript": "this"}, {"transcript": "works"}]]) Out[2]: True # 2. key is configurable In [3]: is_utterance([[{"text": "this"}, {"text": "works"}]], key="text") Out[3]: True # 3. Hope for everything else... you have a mastercard. # Or use this lib, works just fine 🍷. In [4]: is_utterance([{"transcript": "this"}, {"transcript": "doesn't"}, {"transcript": "work"}]) Out[4]: False
- Parameters
maybe_utterance (Any) – Arbitrary input.
key (str) – The key within which transcription string resides. Defaults to
const.TRANSCRIPT
.
- Returns
True if the inputs is
List[List[Dict[str, str]]]
, else False.- Return type
bool
- normalize(maybe_utterance, key='transcript')[source]¶
Adapt various non-standard ASR alternative forms.
The output will be a list of strings since models will expect that.
In [1]: In [1]: from dialogy.utils.normalize_utterance import normalize ...: In [2]: # A popular case ...: In [3]: normalize([[{"transcript": "this"}, {"transcript": "works"}]]) ...: Out[1]: ['this', 'works']
In [2]: In [3]: # A case with multiple utterances ...: In [4]: normalize([ ...: [{"transcript": "hello hello?", "transcript": "yellow yellow?"}], ...: [{"transcript": "I wanted to check"}], ...: [{"transcript": "if you have space for us?"}] ...: ]) ...: Out[2]: ['yellow yellow? I wanted to check if you have space for us?']
In [3]: In [5]: normalize([{"transcript": "I wanted to know umm hello?"}]) Out[3]: ['I wanted to know umm hello?']
In [4]: In [6]: normalize(["I wanted to know umm hello?"]) Out[4]: ['I wanted to know umm hello?']
In [5]: In [7]: normalize("I wanted to know umm hello?") Out[5]: ['I wanted to know umm hello?']
- Parameters
maybe_utterance (Any) – Arbitrary input.
key (str) – A string to be looked into
List[List[Dict[str, str]]]
,List[Dict[str, str]]
type inputs.
- Returns
A flattened list of strings parsed from various formats.
- Return type
List[str]
- Raises
TypeError: If
maybe_utterance
is none of the expected types.