skit_pipelines.pipelines.transcription_pipeline package¶

Module contents¶

transcription_pipeline(*, data_s3_path: str, config_s3_path: str, audio_sample_rate: str = '8k', audio_download_workers: int = 30, transcription_concurrency: int = 8, notify: str = '', channel: str = '', slack_thread: str = '')[source]¶

A pipeline to transcribe the audio files present in a dataset using different ASRs.

Example payload to invoke via slack integrations:

@charon run transcription_pipeline
{

}

Parameters

data_s3_path (str) – S3 path of the data in CSV
config_s3_path (str) – the config yaml to be used by blaze. Refer to (https://github.com/skit-ai/blaze#config) for more info.
audio_sample_rate (str) – audio sample rate / frequency of output audios. (default “8k”).
audio_download_workers (int) – maximum workers while downloading the audios (default 30).
transcription_concurrency (int) – maximum workers while transcribing the audios (default 8).