skit_pipelines.pipelines.retrain_slu_old package¶
Module contents¶
- retrain_slu_old(*, repo_name: str, repo_branch: str = 'master', job_ids: str = '', dataset_path: str = '', labelstudio_project_ids: str = '', job_start_date: str = '', job_end_date: str = '', remove_intents: str = '', alias_yaml_path: str = '', initial_training: bool = False, use_previous_dataset: bool = True, epochs: int = 10, train_split_percent: int = 85, stratify: bool = False, target_mr_branch: str = 'sandbox', notify: str = '', channel: str = '', slack_thread: str = '', customization_repo_name: str = 'slu-customization', customization_repo_branch: str = 'master')[source]¶
A pipeline to retrain an existing SLU model.
Example payload to invoke via slack integrations:
A minimal example:
@charon run retrain_slu
{ "repo_name": "slu_repo_name", "labelstudio_project_ids": "10,13" }
A full available parameters example:
@charon run retrain_slu
{ "repo_name": "slu_repo_name", "repo_branch": "master", "dataset_path": "s3://bucket-name/path1/to1/data1.csv,s3://bucket-name/path2/to2/data2.csv", "job_ids": "4011,4012", "labelstudio_project_ids": "10,13", "job_start_date": "2022-08-01", "job_end_date": "2022-09-19", "remove_intents": "_confirm_,_oos_,audio_speech_unclear,ood", "alias_yaml_path": "intents/oppo/alias.yaml", "use_previous_dataset": True, "train_split_percent": 85, "stratify": False, "epochs": 10, }
Training an SLU for first time example:
@charon run retrain_slu
{ "repo_name": "slu_repo_name", "repo_branch": "master", "labelstudio_project_ids": "10,13", "initial_training": True }
- Parameters
repo_name (str, optional) – SLU repository name under /vernacularai/ai/clients org in gitlab.
repo_branch – The branch name in the SLU repository one wants to use, defaults to master.
dataset_path (str, optional) – The S3 URI or the S3 key for the tagged dataset (can be multiple - comma separated).
job_ids (str) – The job ids as per tog. Optional if labestudio_project_ids is provided.
labelstudio_project_ids (str) – The labelstudio project id (this is a number) since this is optional, defaults to “”.
epochs (int, optional) – Number of epchs to train the model, defaults to 10
job_start_date (str, optional) – The start date range (YYYY-MM-DD) to filter tagged data.
job_end_date (str, optional) – The end date range (YYYY-MM-DD) to filter tagged data
remove_intents (str, optional) – Comma separated list of intents to remove from dataset while training.
alias_yaml_path (str, optional) –
eevee’s intent_report alias.yaml, refer docs here . Upload your yaml to eevee-yamls repository here & pass the relative path of the yaml from base of the repository.
initial_training (bool, optional) – Set to true only if you’re training a model for the first time, defaults to False.
use_previous_dataset (bool, optional) – Before retraining combines new dataset with last dataset the model was trained on, defaults to True.
train_split_percent (int, optional) – Percentage of new data one should train the model on, defaults to 85.
stratify (bool, optional) – For stratified splitting of dataset into train and test set, defaults to False.
notify (str, optional) – Whether to send a slack notification, defaults to “”
channel (str, optional) – The slack channel to send the notification, defaults to “”
slack_thread (str, optional) – The slack thread to send the notification, defaults to “”