skit_pipelines.pipelines.retrain_slu_old package¶

Module contents¶

retrain_slu_old(*, repo_name: str, repo_branch: str = 'master', job_ids: str = '', dataset_path: str = '', labelstudio_project_ids: str = '', job_start_date: str = '', job_end_date: str = '', remove_intents: str = '', alias_yaml_path: str = '', initial_training: bool = False, use_previous_dataset: bool = True, epochs: int = 10, train_split_percent: int = 85, stratify: bool = False, target_mr_branch: str = 'sandbox', notify: str = '', channel: str = '', slack_thread: str = '', customization_repo_name: str = 'slu-customization', customization_repo_branch: str = 'master')[source]¶

A pipeline to retrain an existing SLU model.

Example payload to invoke via slack integrations:

A minimal example:

@charon run retrain_slu

{
    "repo_name": "slu_repo_name",
    "labelstudio_project_ids": "10,13"
}

A full available parameters example:

@charon run retrain_slu

{
    "repo_name": "slu_repo_name",
    "repo_branch": "master",
    "dataset_path": "s3://bucket-name/path1/to1/data1.csv,s3://bucket-name/path2/to2/data2.csv",
    "job_ids": "4011,4012",
    "labelstudio_project_ids": "10,13",
    "job_start_date": "2022-08-01",
    "job_end_date": "2022-09-19",
    "remove_intents": "_confirm_,_oos_,audio_speech_unclear,ood",
    "alias_yaml_path": "intents/oppo/alias.yaml",
    "use_previous_dataset": True,
    "train_split_percent": 85,
    "stratify": False,
    "epochs": 10,
}

Training an SLU for first time example:

@charon run retrain_slu

{
    "repo_name": "slu_repo_name",
    "repo_branch": "master",
    "labelstudio_project_ids": "10,13",
    "initial_training": True
}

Parameters

repo_name (str, optional) – SLU repository name under /vernacularai/ai/clients org in gitlab.
repo_branch – The branch name in the SLU repository one wants to use, defaults to master.
dataset_path (str, optional) – The S3 URI or the S3 key for the tagged dataset (can be multiple - comma separated).
job_ids (str) – The job ids as per tog. Optional if labestudio_project_ids is provided.
labelstudio_project_ids (str) – The labelstudio project id (this is a number) since this is optional, defaults to “”.
epochs (int, optional) – Number of epchs to train the model, defaults to 10
job_start_date (str, optional) – The start date range (YYYY-MM-DD) to filter tagged data.
job_end_date (str, optional) – The end date range (YYYY-MM-DD) to filter tagged data
remove_intents (str, optional) – Comma separated list of intents to remove from dataset while training.
alias_yaml_path (str, optional) –
eevee’s intent_report alias.yaml, refer docs here . Upload your yaml to eevee-yamls repository here & pass the relative path of the yaml from base of the repository.
initial_training (bool, optional) – Set to true only if you’re training a model for the first time, defaults to False.
use_previous_dataset (bool, optional) – Before retraining combines new dataset with last dataset the model was trained on, defaults to True.
train_split_percent (int, optional) – Percentage of new data one should train the model on, defaults to 85.
stratify (bool, optional) – For stratified splitting of dataset into train and test set, defaults to False.
notify (str, optional) – Whether to send a slack notification, defaults to “”
channel (str, optional) – The slack channel to send the notification, defaults to “”
slack_thread (str, optional) – The slack thread to send the notification, defaults to “”

skit_pipelines.pipelines.retrain_slu_old package¶

Module contents¶

skit_pipelines

Navigation

Related Topics