Code Mixing in NLP and Speech
This seminar is closed. A short summary of the papers that were covered can be found here. There were some interesting metrics that we came across which have been documented here.
Members - Jaivarsan, Kriti, Shahid, Swaraj, Shashank, Shangeth
Goal : Learn the different approaches taken to work with code-mixed data in speech and language tech. Here code-mixing is used in an expansive sense and includes code-switching as well.
Timeline
There were 6 sessions part of this seminar. There were 2 sessions each on NLP, Speech Synthesis and Speech Recognition. Format :
Session 1 - NLP
- GLUECoS : An Evaluation Benchmark for Code-Switched NLP [2020] [shashank]
Session 2 - Speech Synthesis
Session 3 - Speech Recognition
Session 4 - NLP
Session 5 - Speech Recognition
- Learning to recognize code-switched speech without forgetting monolingual speech recognition [2020] [shahid]
Session 6 - Speech Synthesis
- Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora [shangeth]
Reading List
Speech Recognition
- Phone Merging for Code-switched Speech Recognition [2018]
- Exploiting Monolingual Speech Corpora for Code-mixed Speech Recognition [2019]
- Semi-supervised Development of ASR Systems for Multilingual Code-switched Speech in Under-resourced Languages [2020]
- Learning to recognize code-switched speech without forgetting monolingual speech recognition [2020]
Speech Synthesis
Current approaches handling code-switching fall into three broad categories:phone mapping, multilingual or polyglot synthesis.
- Code-switching in Indic Speech Synthesisers [2018]
- Building Multilingual End-to-End Speech Synthesisers for Indian Languages [2019]
- One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech [2020]
- On Improving Code Mixed Speech Synthesis with Mixlingual Grapheme-to-Phoneme Model [2020]
NLP/Misc
- BERTologiCoMix : How does Code-Mixing interact with Multilingual BERT? [2021]
- Challenges of Computational Processing of Code-Switching [2016]
- Metrics for modeling code-switching across corpora [2017]
- A Semi-supervised Approach to Generate the Code-Mixed Text using Pre-trained Encoder and Transfer Learning [2020]
- Are Multilingual Models Effective in Code-Switching? [2021]
References:
- Survey Paper
- A Survey of Code-switched Speech and Language Processing [2020] - this paper mentions a lot of indic-code mixed datasets as well
- Projects/Research Groups
- Benchmarks and Datasets
- Awesome Github Compilations