Third RBCDSAI LatentView Colloquium by Dr. Ramanathan Guha

Watch the stream on Youtube


DataCommons: Publicly available data from open sources are a vital resource for students and researchers in a variety of disciplines. Unfortunately, processing these datasets to make them useful --- scraping, cleaning, normalizing, joining --- is tedious, error prone and has to be repeated by every group. DataCommons attempts to alleviate some of this pain by synthesizing a single Knowledge Graph from many different data sources. It links references to the same entities (such as cities, counties, organizations, etc.) across different datasets to nodes on the graph, so that users can access data about a particular entity aggregated from different sources. Like the Web, the DataCommons graph is open - any user can contribute data or build applications powered by the graph. We are jump-starting the graph with data from publicly available sources such as CDC, Census, BLS, FBI, etc. and are looking to engage with the broader community to take it further.


Guha is the founder and lead for ,a platform which synthesis a wide range of data sets into a single knowledge graph, for use by students and researchers. He is the creator of widely used web standards such as RSS, RDF and, and products such as Google Custom Search. Co-founder of and Alpiri, he is currently a Google Fellow and Vice President at Google. He has a Ph.D. in Computer Science from Stanford University, a Master of Science from University of California, Berkeley and a Bachelor of Technology in Mechanical Engineering from IIT Madras.