skit_pipelines.components package¶
Subpackages¶
- skit_pipelines.components.evaluate_slu_from_repo package
- skit_pipelines.components.fetch_gpt_intent_prediction package
- skit_pipelines.components.fetch_tagged_data_label_store package
- skit_pipelines.components.final_conversation_generator package
- skit_pipelines.components.identify_compliance_breaches_llm package
- skit_pipelines.components.invalidate_situations_in_db package
- skit_pipelines.components.modify_entities package
- skit_pipelines.components.preprocess package
- Subpackages
- skit_pipelines.components.preprocess.create_true_transcript_column package
- skit_pipelines.components.preprocess.create_utterance_column package
- skit_pipelines.components.preprocess.extract_true_transcript_labels_to_txt package
- skit_pipelines.components.preprocess.process_true_transcript_column package
- Module contents
- Subpackages
- skit_pipelines.components.retrain_slu_from_repo package
- skit_pipelines.components.sample_conversations_generator package
- skit_pipelines.components.tag_calls package
- skit_pipelines.components.upload_conv_to_labelstudio package
- skit_pipelines.components.upload_conversation_data_to_metrics_db package
- skit_pipelines.components.upload_for_call_and_slot_tagging package
- skit_pipelines.components.validate_and_add_situations_to_db package
Submodules¶
skit_pipelines.components.asr_transcription module¶
- audio_transcription(audios_dir_path: <kfp.components._python_op.InputPath object at 0x1076db2e0>, config_path: <kfp.components._python_op.InputPath object at 0x1076db490>, output_path: <kfp.components._python_op.OutputPath object at 0x1076db520>, concurrency: int) None [source]¶
- audio_transcription_op(audios_dir: str, config: str, concurrency: int)¶
Audio transcription
skit_pipelines.components.asr_tune module¶
- asr_tune(corpus_path: <kfp.components._python_op.InputPath object at 0x1092883a0>, val_corpus_path: <kfp.components._python_op.InputPath object at 0x1093649a0>, augment_wordlist_path: <kfp.components._python_op.InputPath object at 0x109395610>, remove_wordlist_path: <kfp.components._python_op.InputPath object at 0x109395940>, base_model_path: <kfp.components._python_op.InputPath object at 0x1093959d0>, general_lm_path: <kfp.components._python_op.InputPath object at 0x109395a30>, output_path: <kfp.components._python_op.OutputPath object at 0x109395a90>, lang: str) None [source]¶
- asr_tune_op(corpus: str, val_corpus: str, augment_wordlist: str, remove_wordlist: str, base_model: str, general_lm: str, lang: str)¶
Asr tune
skit_pipelines.components.audio_download module¶
- download_audio_wavs(audio_data_path: <kfp.components._python_op.InputPath object at 0x1093cf430>, audio_sample_rate: str, audio_download_workers: int, output_path: <kfp.components._python_op.OutputPath object at 0x1093be0d0>) None [source]¶
- download_audio_wavs_op(audio_data: str, audio_sample_rate: str, audio_download_workers: int)¶
Download audio wavs
skit_pipelines.components.auth module¶
- org_auth_token_op(org_id: str, url: str = None)¶
Org auth token
skit_pipelines.components.create_mr module¶
- create_mr(git_host_name: str, repo_name: str, project_path: str, target_branch: str, source_branch: str, mr_title: str, s3_description_paths: str) str [source]¶
- create_mr_op(git_host_name: str, repo_name: str, project_path: str, target_branch: str, source_branch: str, mr_title: str, s3_description_paths: str)¶
Create mr
skit_pipelines.components.download_from_s3 module¶
- download_csv_from_s3(*, storage_path: str, empty_possible: bool = False, output_path: <kfp.components._python_op.OutputPath object at 0x10961f2e0>) None [source]¶
- download_csv_from_s3_op(storage_path: str, empty_possible: bool = 'False')¶
Download csv from s3
- download_directory_from_s3(*, storage_path: str, output_path: <kfp.components._python_op.OutputPath object at 0x109619400>) None [source]¶
- download_directory_from_s3_op(storage_path: str)¶
Download directory from s3
- download_file_from_s3(*, storage_path: str, storage_options: str = '', empty_possible: bool = False, output_path: <kfp.components._python_op.OutputPath object at 0x1076db280>) None [source]¶
- download_file_from_s3_op(storage_path: str, storage_options: str = '', empty_possible: bool = 'False')¶
Download file from s3
skit_pipelines.components.download_repo module¶
- download_repo(*, git_host_name: str, repo_name: str, project_path: str, repo_path: <kfp.components._python_op.OutputPath object at 0x1096c2460>) None [source]¶
- download_repo_op(git_host_name: str, repo_name: str, project_path: str)¶
Download repo
skit_pipelines.components.download_yaml module¶
- download_yaml(git_host_name: str, yaml_path: str, output_path: <kfp.components._python_op.OutputPath object at 0x1096efa60>)[source]¶
- download_yaml_op(git_host_name: str, yaml_path: str)¶
Download yaml
skit_pipelines.components.fetch_calls module¶
- fetch_calls(*, lang: str, start_date: str, end_date: Optional[str] = None, client_id: Optional[str] = None, start_date_offset: int = 0, end_date_offset: int = 0, start_time_offset: int = 0, end_time_offset: int = 0, call_quantity: int = 200, call_type: Optional[str] = None, timezone: Optional[str] = None, ignore_callers: Optional[str] = None, reported: bool = False, template_id: Optional[str] = None, use_case: Optional[str] = None, flow_name: Optional[str] = None, min_duration: Optional[str] = None, asr_provider: Optional[str] = None, intents: Optional[str] = None, states: Optional[str] = None, calls_file_s3_path: Optional[str] = None, use_fsm_url: bool = False, remove_empty_audios: bool = True, flow_ids: Optional[str] = None) str [source]¶
- fetch_calls_op(lang: str, start_date: str, end_date: str = None, client_id: str = None, start_date_offset: int = '0', end_date_offset: int = '0', start_time_offset: int = '0', end_time_offset: int = '0', call_quantity: int = '200', call_type: str = None, timezone: str = None, ignore_callers: str = None, reported: bool = 'False', template_id: str = None, use_case: str = None, flow_name: str = None, min_duration: str = None, asr_provider: str = None, intents: str = None, states: str = None, calls_file_s3_path: str = None, use_fsm_url: bool = 'False', remove_empty_audios: bool = 'True', flow_ids: str = None)¶
Fetch calls
skit_pipelines.components.fetch_tagged_dataset module¶
- fetch_tagged_dataset(output_path: <kfp.components._python_op.OutputPath object at 0x109697ac0>, job_id: typing.Optional[str] = None, project_id: typing.Optional[str] = None, task_type: str = 'conversation', timezone: typing.Optional[str] = None, start_date: typing.Optional[str] = None, end_date: typing.Optional[str] = None, start_date_offset: typing.Optional[int] = None, end_date_offset: typing.Optional[int] = None, empty_possible: bool = False)[source]¶
- fetch_tagged_dataset_op(job_id: str = None, project_id: str = None, task_type: str = 'conversation', timezone: str = None, start_date: str = None, end_date: str = None, start_date_offset: int = None, end_date_offset: int = None, empty_possible: bool = 'False')¶
Fetch tagged dataset
skit_pipelines.components.file_contents_to_markdown_s3 module¶
- file_contents_to_markdown_s3(ext: str, path_on_disk: <kfp.components._python_op.InputPath object at 0x109690d90>, file_title: str = '') str [source]¶
- file_contents_to_markdown_s3_op(ext: str, path_on_disk: str, file_title: str = '')¶
File contents to markdown s3
skit_pipelines.components.gen_asr_metrics module¶
- gen_asr_metrics(data_path: <kfp.components._python_op.InputPath object at 0x1096975e0>, output_path: <kfp.components._python_op.OutputPath object at 0x109697820>, true_label_column: str = 'transcript_y', pred_label_column: str = 'utterances')[source]¶
- gen_asr_metrics_op(data: str, true_label_column: str = 'transcript_y', pred_label_column: str = 'utterances')¶
Gen asr metrics
skit_pipelines.components.merge_transcription module¶
- overlay_transcription_csv(sqlite_path: <kfp.components._python_op.InputPath object at 0x109801280>, original_csv_path: <kfp.components._python_op.InputPath object at 0x1098013d0>, output_path: <kfp.components._python_op.OutputPath object at 0x1097a7a30>) None [source]¶
- overlay_transcription_csv_op(sqlite: str, original_csv: str)¶
Overlay transcription csv
skit_pipelines.components.modify_tagged_entities module¶
- modify_entity_dataset(data_path: <kfp.components._python_op.InputPath object at 0x109801fa0>, output_path: <kfp.components._python_op.OutputPath object at 0x109801ca0>, tog_job_id: typing.Optional[str] = None, labelstudio_project_id: typing.Optional[str] = None, timezone: str = 'Asia/Kolkata')[source]¶
Takes a entity dataset and, 1) hits duckling service for inference on ground-truth 2) modifies the predicted entity structure to be consistent
- modify_entity_dataset_op(data: str, tog_job_id: str = None, labelstudio_project_id: str = None, timezone: str = 'Asia/Kolkata')¶
Modify entity dataset Takes a entity dataset and,
skit_pipelines.components.notification module¶
- slack_notification(message: str, code_block: str = '', channel: str = '', cc: str = '', thread_id: str = '', file_title: str = '', file_content: str = '') None [source]¶
Send a message on any channel.
- slack_notification_op(message: str, code_block: str = '', channel: str = '', cc: str = '', thread_id: str = '', file_title: str = '', file_content: str = '')¶
Slack notification Send a message on any channel.
skit_pipelines.components.push_compliance_report_to_postgres module¶
- push_compliance_report_to_postgres_op(s3_file_path: str)¶
Push compliance report to postgres
skit_pipelines.components.re_presign_s3_urls module¶
- re_presign_s3_urls(audio_data_path: <kfp.components._python_op.InputPath object at 0x1096f8490>, output_path: <kfp.components._python_op.OutputPath object at 0x10970ba60>) None [source]¶
- re_presign_s3_urls_op(audio_data: str)¶
Re presign s3 urls
skit_pipelines.components.read_json_key module¶
- read_json_key(req_value: str, input_file: <kfp.components._python_op.InputPath object at 0x109746d60>) Any [source]¶
- read_json_key_op(req_value: str, input: str)¶
Read json key
skit_pipelines.components.retrain_slu_from_repo_old module¶
- retrain_slu_from_repo(*, s3_data_path: <kfp.components._python_op.InputPath object at 0x109653850>, annotated_job_data_path: <kfp.components._python_op.InputPath object at 0x109653520>, slu_path: <kfp.components._python_op.InputPath object at 0x109653250>, intent_alias_path: <kfp.components._python_op.InputPath object at 0x109653790>, bucket: str, repo_name: str, branch: str, remove_intents: str = '', use_previous_dataset: bool = True, train_split_percent: int = 85, stratify: bool = False, epochs: int = 10, initial_training: bool = False, job_ids: str = '', labelstudio_project_ids: str = '', s3_paths: str = '', validate_setup: bool = False, output_classification_report_path: <kfp.components._python_op.OutputPath object at 0x1096531f0>, output_confusion_matrix_path: <kfp.components._python_op.OutputPath object at 0x1096536a0>, customization_repo_name: str = '', customization_repo_branch: str = '') str [source]¶
- retrain_slu_from_repo_op_old(s3_data: str, annotated_job_data: str, slu: str, intent_alias: str, bucket: str, repo_name: str, branch: str, remove_intents: str = '', use_previous_dataset: bool = 'True', train_split_percent: int = '85', stratify: bool = 'False', epochs: int = '10', initial_training: bool = 'False', job_ids: str = '', labelstudio_project_ids: str = '', s3_paths: str = '', validate_setup: bool = 'False', customization_repo_name: str = '', customization_repo_branch: str = '')¶
Retrain slu from repo
skit_pipelines.components.upload2s3 module¶
- upload2s3(path_on_disk: <kfp.components._python_op.InputPath object at 0x109796f40>, reference: str = '', file_type: str = '', bucket: str = '', ext: str = '.csv', output_path: str = '', storage_options: str = '', upload_as_directory: bool = False) str [source]¶
- upload2s3_op(path_on_disk: str, reference: str = '', file_type: str = '', bucket: str = '', ext: str = '.csv', output_path: str = '', storage_options: str = '', upload_as_directory: bool = 'False')¶
Upload2s3
skit_pipelines.components.utils module¶
- create_dataset_path(data_type, dataset_type)¶
- evaluate(test_dataset_path, project_config_local_path, core_slu_repo_name, repo_name)[source]¶
To evaluate a model on a test set.
- execute_cli(cmd, split=True)¶
skit_pipelines.components.utils_slu module¶
- handle_dvc_and_data_paths(repo, project_config_local_path, bucket, repo_name, initial_training, core_slu_repo_name, use_previous_dataset)[source]¶
- prepare_data(tagged_data_path, core_slu_repo_name, project_config_local_path, repo_name, custom_test_dataset_present, use_previous_dataset, train_split_percent, stratify)[source]¶
Prepare training and testing datasets.
skit_pipelines.components.zip_files_and_notify module¶
- zip_file_and_notify(path_on_disk: <kfp.components._python_op.InputPath object at 0x10988a130>, message: str, channel: str = '', thread_id: str = '', file_title: str = '', file_name: str = '', notify: str = '', display_sample: bool = False)[source]¶
Zip a file or folder and upload the same on slack :param message: the slack message to be sent :param channel: the channel in which the message is to be sent :param thread_id: the thread to which the message must be added :param file_title: Title for the file :param file_name: name of the file :param notify: Whether to send a slack notification :param display_sample: Set it as true to display the value in the file
- zip_file_and_notify_op(path_on_disk: str, message: str, channel: str = '', thread_id: str = '', file_title: str = '', file_name: str = '', notify: str = '', display_sample: bool = 'False')¶
Zip file and notify Zip a file or folder and upload the same on slack
Module contents¶
- asr_tune_op(corpus: str, val_corpus: str, augment_wordlist: str, remove_wordlist: str, base_model: str, general_lm: str, lang: str)¶
Asr tune
- audio_transcription_op(audios_dir: str, config: str, concurrency: int)¶
Audio transcription
- create_mr_op(git_host_name: str, repo_name: str, project_path: str, target_branch: str, source_branch: str, mr_title: str, s3_description_paths: str)¶
Create mr
- create_true_transcript_labels_op(data: str, true_label_column: str)¶
Create true transcript labels
- create_utterances_op(data: str)¶
Create utterances
- download_audio_wavs_op(audio_data: str, audio_sample_rate: str, audio_download_workers: int)¶
Download audio wavs
- download_csv_from_s3_op(storage_path: str, empty_possible: bool = 'False')¶
Download csv from s3
- download_directory_from_s3_op(storage_path: str)¶
Download directory from s3
- download_file_from_s3_op(storage_path: str, storage_options: str = '', empty_possible: bool = 'False')¶
Download file from s3
- download_repo_op(git_host_name: str, repo_name: str, project_path: str)¶
Download repo
- download_yaml_op(git_host_name: str, yaml_path: str)¶
Download yaml
- evalution_slu_from_repo_op(s3_data: str, annotated_job_data: str, intent_alias: str, bucket: str, repo_name: str, compare_branch: str, branch: str, remove_intents: str = '', validate_setup: bool = 'False', customization_repo_name: str = '', customization_repo_branch: str = '', core_slu_repo_name: str = '', core_slu_repo_branch: str = '')¶
Evaluate slu from repo
- extract_true_transcript_labels_to_txt_op(data: str, true_label_column: str)¶
Extract true transcript labels to txt
- fetch_calls_for_slots_op(untagged_records_path: str, org_id: str = '', language_code='', start_date='', end_date='')¶
Fetch calls for slots
- fetch_calls_op(lang: str, start_date: str, end_date: str = None, client_id: str = None, start_date_offset: int = '0', end_date_offset: int = '0', start_time_offset: int = '0', end_time_offset: int = '0', call_quantity: int = '200', call_type: str = None, timezone: str = None, ignore_callers: str = None, reported: bool = 'False', template_id: str = None, use_case: str = None, flow_name: str = None, min_duration: str = None, asr_provider: str = None, intents: str = None, states: str = None, calls_file_s3_path: str = None, use_fsm_url: bool = 'False', remove_empty_audios: bool = 'True', flow_ids: str = None)¶
Fetch calls
- fetch_gpt_intent_prediction_op(s3_file_path: str, use_assisted_annotation: bool)¶
Fetch gpt intent prediction
- fetch_tagged_data_label_store_op(start_date: str, flow_id: str, end_date: str = None, limit: int = '200', data_labels: str = '')¶
Fetch tagged data label store
- fetch_tagged_dataset_op(job_id: str = None, project_id: str = None, task_type: str = 'conversation', timezone: str = None, start_date: str = None, end_date: str = None, start_date_offset: int = None, end_date_offset: int = None, empty_possible: bool = 'False')¶
Fetch tagged dataset
- file_contents_to_markdown_s3_op(ext: str, path_on_disk: str, file_title: str = '')¶
File contents to markdown s3
- final_conversation_generator_op(situation_info_list: List[Dict[str, str]], s3_links_to_prompts: str, n_iter: int, n_choice: int, temperature: float, model: str, llm_trainer_repo_name: str, llm_trainer_repo_branch: str)¶
Final conversation generator
- gen_asr_metrics_op(data: str, true_label_column: str = 'transcript_y', pred_label_column: str = 'utterances')¶
Gen asr metrics
- identify_compliance_breaches_llm_op(s3_file_path: str)¶
Identify compliance breaches llm Groups turns into calls and pushes them to an LLM (uses openai chatComplete functionality) to identify
- invalidate_situations_in_db_op(situation_id)¶
Invalidate situations in db Check if the situation exists in db, if exists return the id else insert the situation to db and return the id
- modify_entity_dataset_op(data: str, tog_job_id: str = None, labelstudio_project_id: str = None, timezone: str = 'Asia/Kolkata')¶
Modify entity dataset Takes a entity dataset and,
- org_auth_token_op(org_id: str, url: str = None)¶
Org auth token
- overlay_transcription_csv_op(sqlite: str, original_csv: str)¶
Overlay transcription csv
- process_true_transcript_labels_op(data: str, true_label_column: str)¶
Process true transcript labels
- push_compliance_report_to_postgres_op(s3_file_path: str)¶
Push compliance report to postgres
- re_presign_s3_urls_op(audio_data: str)¶
Re presign s3 urls
- read_json_key_op(req_value: str, input: str)¶
Read json key
- retrain_slu_from_repo_op(s3_data: str, custom_test_s3_data: str, annotated_job_data: str, intent_alias: str, bucket: str, repo_name: str, branch: str, remove_intents: str = '', use_previous_dataset: bool = 'True', train_split_percent: int = '85', stratify: bool = 'False', epochs: int = '10', initial_training: bool = 'False', labelstudio_project_ids: str = '', s3_paths: str = '', validate_setup: bool = 'False', customization_repo_name: str = '', customization_repo_branch: str = '', core_slu_repo_name: str = '', core_slu_repo_branch: str = '')¶
Retrain slu from repo
- retrain_slu_from_repo_op_old(s3_data: str, annotated_job_data: str, slu: str, intent_alias: str, bucket: str, repo_name: str, branch: str, remove_intents: str = '', use_previous_dataset: bool = 'True', train_split_percent: int = '85', stratify: bool = 'False', epochs: int = '10', initial_training: bool = 'False', job_ids: str = '', labelstudio_project_ids: str = '', s3_paths: str = '', validate_setup: bool = 'False', customization_repo_name: str = '', customization_repo_branch: str = '')¶
Retrain slu from repo
- sample_conversations_generator_op(filename: str, prompt_file_path: str, n_iter: int, n_choice: int, temperature: float, model: str, llm_trainer_repo_name: str, llm_trainer_repo_branch: str, situation_file_path: str = '', situations: str = None)¶
Sample conversations generator
- slack_notification_op(message: str, code_block: str = '', channel: str = '', cc: str = '', thread_id: str = '', file_title: str = '', file_content: str = '')¶
Slack notification Send a message on any channel.
- tag_calls_op(input_file: str, data_label: str = '', project_id: str = None, call_project_id: str = None)¶
Tag calls
- upload2s3_op(path_on_disk: str, reference: str = '', file_type: str = '', bucket: str = '', ext: str = '.csv', output_path: str = '', storage_options: str = '', upload_as_directory: bool = 'False')¶
Upload2s3
- upload_conv_to_label_studio_op(project_id: str, conversations_dir: str, data_label: str, situations_id_info: List[Dict[str, str]])¶
Upload conv to label studio
- upload_conversation_data_to_metrics_db_op(situations_id_info: List[Dict[str, str]], client_id: str, template_id: str, generated_conversations_s3_link: str, prompt_links_in_s3: str, conv_directory: str)¶
Upload conversation data to metrics db Upload the conversation data to metrics DB
- validate_and_add_situations_to_db_op(situations: str, scenario: str, scenario_category: str)¶
Validate and add situations to db Check if the situation exists in db, if exists return the id else insert the situation to db and return the id
- zip_file_and_notify_op(path_on_disk: str, message: str, channel: str = '', thread_id: str = '', file_title: str = '', file_name: str = '', notify: str = '', display_sample: bool = 'False')¶
Zip file and notify Zip a file or folder and upload the same on slack