Learning With Limited Partial and Noisy Data

An area of significant interest in data science is its use on extensively large data sets that are being thrown up due to the changing landscape of computational improvements and IoT. While this addresses a new and exciting arena of changing possibilities, there exist a wide range of real-world contexts, particularly in third-world environments, where this abundance of useful data is not the norm. This research seeks to study the use of machine learning in two such contexts, one where there are significant inadequacies in terms of the data quantity and the other in terms of data quality. On the quantity front, we look at the use of experimentation and information sharing as means of generating data when there is an absence of it. To this end, we intend to study the use of offline experimentation–inspired by traditional statistical designed experiments—and online experimentation—inspired by the Bandit framework from Reinforcement Learning. Specifically, we seek to understand real-world idiosyncrasies that may entail the use principles from both fields. We seek to also study the effect of information sharing and how it can help agents acquire data in a more efficient way. On the quality front, we look at a particularly recurring problem of limited and asymmetric noise. A few of our previous studies have indicated thatin many sources of data, which carry implications for incentives or assessment, there exists a tendency for human intervention or behavior that instills a bias in the data, which is not truly reflective of the underlying phenomena. Broadly, we seek to look at the use of machine learning algorithms which account for this through modeling of latent variables and other approaches which explicitly look at the asymmetric noise in the variable of interest