Link Search Menu Expand Document

Code Mixing in NLP and Speech

This seminar is closed. A short summary of the papers that were covered can be found here. There were some interesting metrics that we came across which have been documented here.

Members - Jaivarsan, Kriti, Shahid, Swaraj, Shashank, Shangeth

Goal : Learn the different approaches taken to work with code-mixed data in speech and language tech. Here code-mixing is used in an expansive sense and includes code-switching as well.


There were 6 sessions part of this seminar. There were 2 sessions each on NLP, Speech Synthesis and Speech Recognition. Format : senter>

Session 1 - NLP

Session 2 - Speech Synthesis

Session 3 - Speech Recognition

Session 4 - NLP

Session 5 - Speech Recognition

Session 6 - Speech Synthesis

Reading List

Speech Recognition

  • Phone Merging for Code-switched Speech Recognition [2018]
  • Exploiting Monolingual Speech Corpora for Code-mixed Speech Recognition [2019]
  • Semi-supervised Development of ASR Systems for Multilingual Code-switched Speech in Under-resourced Languages [2020]
  • Learning to recognize code-switched speech without forgetting monolingual speech recognition [2020]

Speech Synthesis

Current approaches handling code-switching fall into three broad categories:phone mapping, multilingual or polyglot synthesis.

  • Code-switching in Indic Speech Synthesisers [2018]
  • Building Multilingual End-to-End Speech Synthesisers for Indian Languages [2019]
  • One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech [2020]
  • On Improving Code Mixed Speech Synthesis with Mixlingual Grapheme-to-Phoneme Model [2020]


  • BERTologiCoMix : How does Code-Mixing interact with Multilingual BERT? [2021]
  • Challenges of Computational Processing of Code-Switching [2016]
  • Metrics for modeling code-switching across corpora [2017]
  • A Semi-supervised Approach to Generate the Code-Mixed Text using Pre-trained Encoder and Transfer Learning [2020]
  • Are Multilingual Models Effective in Code-Switching? [2021]