Speech Recognition
TODO
Metric | Description |
---|---|
WER | Word Error Rate |
Utterance False Positive Rate (uFPR) | Ratio of cases where non speech utterances were transcribed. |
Utterance False Negative Rate (uFNR) | Ratio of cases where utterances where transcribed as silence. |
SER | Sentence Error Rate |
Min 3 WER | The minimum Word Error Rate when considering the first three alternatives only |
Min WER | The minimum Word Error Rate out of all the alternatives |
Short Utterance WER | WER of utterance with ground truth length of 1 or 2 words |
Long Utterance WER | WER of utterances with at least 3 words in ground truth |
Data schema
We expect,
- tagged.transcriptions.csv to have columns called
id
andtranscription
, wheretranscription
can have only one string as value for each row, if not present leave it empty as it is, it’ll get parsed asNaN
. - predicted.transcriptions.csv to have columns called
id
andutterances
, where each value in theutterances
column looks like this:
'[[
{"confidence": 0.94847125, "transcript": "iya iya iya iya iya"},
{"confidence": 0.9672866, "transcript": "iya iya iya iya"},
{"confidence": 0.8149829, "transcript": "iya iya iya iya iya iya"}
]]'
as you might have noticed it is expected to be in JSON
format. each transcript
represents each alternative from the ASR, and confidence
represents ASR’s confidence for that particular alternative. If no such utterances
present for that particular id
, leave it as '[]'
(json.dumps
of empty list []
)
Note: Please remove transcription
column from predicted.transcriptions.csv (if it exists) before using eevee
.
Usage
Command Line
Use the sub-command asr
like shown below:
eevee asr ./data/tagged.transcriptions.csv ./data/predicted.transcriptions.csv
Value Support
Metric
WER 0.571429 6
Utterance FPR 0.500000 2
Utterance FNR 0.250000 4
SER 0.666667 6
Min 3 WER 0.571429 6
Min WER 0.571429 6
Short Utterance WER 0.000000 1
Long Utterance WER 0.809524 3
For users who want utterance level metrics or edit operations, add the “–dump” flag like:
eevee asr ./data/tagged.transcriptions.csv ./data/predicted.transcriptions.csv --dump
This will add two csv files
- predicted.transcriptions-dump.csv : File containing utterance level metrics
- predicted.transcriptions-ops.csv : File containing dataset level edit operations.
The filename is based on the prediction filename given by the user
For users who want ASR metrics reported separately on noisy
and non-noisy
subsets of audios, use the “–noisy” flag like:
eevee asr ./data/tagged.transcriptions.csv ./data/predicted.transcriptions.csv --noisy
Results are in two DataFrames - for each of noisy
and non-noisy
subsets, in order. An important note here, is that the transcriptions in tagged.transcriptions.csv
are expected to contain info tags, like - <audio_silent>
, <inaudible>
, etc - which aren’t expected when not using the “–noisy” flag.
Python module
>>> import pandas as pd
>>> from eevee.metrics.asr import asr_report
>>>
>>> true_df = pd.read_csv("data/tagged.transcriptions.csv", usecols=["id", "transcription"])
>>> pred_df = pd.read_csv("data/predicted.transcriptions.csv", usecols=["id", "utterances"])
>>>
>>> asr_report(true_df, pred_df)
Value Support
Metric
WER 0.571429 6
Utterance FPR 0.500000 2
Utterance FNR 0.250000 4
SER 0.666667 6
Min 3 WER 0.571429 6
Min WER 0.571429 6
Short Utterance WER 0.000000 1
Long Utterance WER 0.809524 3
For ASR metrics, segregated by “noisy”:
>>> import pandas as pd
>>> from eevee.metrics.asr import asr_report, process_noise_info
>>>
>>> true_df = pd.read_csv("data/tagged.transcriptions.csv", usecols=["id", "transcription"])
>>> pred_df = pd.read_csv("data/predicted.transcriptions.csv", usecols=["id", "utterances"])
>>>
>>> noisy_dict, not_noisy_dict = process_noise_info(true_labels, pred_labels)
>>>
>>> asr_report(noisy_dict["true"], noisy_dict["pred"])
Value Support
Metric
WER 0.571429 6
Utterance FPR 0.500000 2
Utterance FNR 0.250000 4
SER 0.666667 6
Min 3 WER 0.571429 6
Min WER 0.571429 6
Short Utterance WER 0.000000 1
Long Utterance WER 0.809524 3
>>> asr_report(not_noisy_dict["true"], not_noisy_dict["pred"])
Value Support
Metric
WER 0.571429 6
Utterance FPR 0.500000 2
Utterance FNR 0.250000 4
SER 0.666667 6
Min 3 WER 0.571429 6
Min WER 0.571429 6
Short Utterance WER 0.000000 1
Long Utterance WER 0.809524 3