skit_pipelines.pipelines.transcription_pipeline package

Module contents

transcription_pipeline(*, data_s3_path: str, config_s3_path: str, audio_sample_rate: str = '8k', audio_download_workers: int = 30, transcription_concurrency: int = 8, notify: str = '', channel: str = '', slack_thread: str = '')[source]

A pipeline to transcribe the audio files present in a dataset using different ASRs.

Example payload to invoke via slack integrations:

@charon run transcription_pipeline

{

}
Parameters
  • data_s3_path (str) – S3 path of the data in CSV

  • config_s3_path (str) – the config yaml to be used by blaze. Refer to (https://github.com/skit-ai/blaze#config) for more info.

  • audio_sample_rate (str) – audio sample rate / frequency of output audios. (default “8k”).

  • audio_download_workers (int) – maximum workers while downloading the audios (default 30).

  • transcription_concurrency (int) – maximum workers while transcribing the audios (default 8).