Topic: Language Research Challenges for the Next Billion Users
With proliferation of mobile devices and internet, we are witnessing a new generation of new internet users. This next wave of new Internet users (or the next billion users for Google) have unique requirements from web search, web applications and the Internet in general. In this talk we explore three important and unique research challenges that arise from the usage patterns of the next billion users. First, translation is a very important tool to interact with the web that is predominantly in English. We focus on methods to develop high quality translation models for low resource languages where training data is sparse and noisy. Second, code-mixing is a very common phenomenon among multilingual users where two or more languages are used in the same utterance / sentence. Code-Mixing is a phenomenon that is highly diverse across various user groups. We focus on how to develop robust and controllable generation models that can produce diverse code mixed text with precise controls. Finally, we present an observation that for multilingual users, their L1 (native) language has an effect on the spelling patterns of a L2 language. We present methods to measure and improve the robustness of language models towards inter-language phonetic influence.
Dr. Aravindan Raghuveer is a Principal Engineer in Google Research India. Before that, he worked as a Research Scientist on Facebook and Principal Research Engineer at Yahoo Labs. Aravindan received his M.S. degree in computer engineering and PhD in Computer Science, both from the University of Minnesota, Twin Cities,. He received his B.E. in computer science from the University of Madras, India, in 2001. His research interests include Large Scale Machine Learning and Natural Language Processing.