skit_pipelines.pipelines.retrain_slu_old package

Module contents

retrain_slu_old(*, repo_name: str, repo_branch: str = 'master', job_ids: str = '', dataset_path: str = '', labelstudio_project_ids: str = '', job_start_date: str = '', job_end_date: str = '', remove_intents: str = '', alias_yaml_path: str = '', initial_training: bool = False, use_previous_dataset: bool = True, epochs: int = 10, train_split_percent: int = 85, stratify: bool = False, target_mr_branch: str = 'sandbox', notify: str = '', channel: str = '', slack_thread: str = '', customization_repo_name: str = 'slu-customization', customization_repo_branch: str = 'master')[source]

A pipeline to retrain an existing SLU model.

Example payload to invoke via slack integrations:

A minimal example:

@charon run retrain_slu

{
    "repo_name": "slu_repo_name",
    "labelstudio_project_ids": "10,13"
}

A full available parameters example:

@charon run retrain_slu

{
    "repo_name": "slu_repo_name",
    "repo_branch": "master",
    "dataset_path": "s3://bucket-name/path1/to1/data1.csv,s3://bucket-name/path2/to2/data2.csv",
    "job_ids": "4011,4012",
    "labelstudio_project_ids": "10,13",
    "job_start_date": "2022-08-01",
    "job_end_date": "2022-09-19",
    "remove_intents": "_confirm_,_oos_,audio_speech_unclear,ood",
    "alias_yaml_path": "intents/oppo/alias.yaml",
    "use_previous_dataset": True,
    "train_split_percent": 85,
    "stratify": False,
    "epochs": 10,
}

Training an SLU for first time example:

@charon run retrain_slu

{
    "repo_name": "slu_repo_name",
    "repo_branch": "master",
    "labelstudio_project_ids": "10,13",
    "initial_training": True
}
Parameters
  • repo_name (str, optional) – SLU repository name under /vernacularai/ai/clients org in gitlab.

  • repo_branch – The branch name in the SLU repository one wants to use, defaults to master.

  • dataset_path (str, optional) – The S3 URI or the S3 key for the tagged dataset (can be multiple - comma separated).

  • job_ids (str) – The job ids as per tog. Optional if labestudio_project_ids is provided.

  • labelstudio_project_ids (str) – The labelstudio project id (this is a number) since this is optional, defaults to “”.

  • epochs (int, optional) – Number of epchs to train the model, defaults to 10

  • job_start_date (str, optional) – The start date range (YYYY-MM-DD) to filter tagged data.

  • job_end_date (str, optional) – The end date range (YYYY-MM-DD) to filter tagged data

  • remove_intents (str, optional) – Comma separated list of intents to remove from dataset while training.

  • alias_yaml_path (str, optional) –

    eevee’s intent_report alias.yaml, refer docs here . Upload your yaml to eevee-yamls repository here & pass the relative path of the yaml from base of the repository.

  • initial_training (bool, optional) – Set to true only if you’re training a model for the first time, defaults to False.

  • use_previous_dataset (bool, optional) – Before retraining combines new dataset with last dataset the model was trained on, defaults to True.

  • train_split_percent (int, optional) – Percentage of new data one should train the model on, defaults to 85.

  • stratify (bool, optional) – For stratified splitting of dataset into train and test set, defaults to False.

  • notify (str, optional) – Whether to send a slack notification, defaults to “”

  • channel (str, optional) – The slack channel to send the notification, defaults to “”

  • slack_thread (str, optional) – The slack thread to send the notification, defaults to “”