Data Source

FACEBOOK Data for Good

Facebook’s mobility data represents people who use Facebook in this region and have location services enabled. Data is aggregated at varying levels of spatial resolution and vectors (lines) are drawn connecting all areas that share high levels of mobility.

Note: It is important to note that this data can be skewed and might not be representative of the whole population. However, extracting insights from the mobility of this subset of the population could be helpful in making and assessing the various policy measures planned by the government.

covid19india

covid19india is a volunteer-driven database for COVID-19 stats & patient tracing in India. This group of volunteers are using state bulletins and official handles to update daily COVID data. The data is validated by a group of volunteers and published into a Google sheet and an API. API is available for all at api.covid19india.org. covid19india volunteers curate and verify the data coming from several sources, and update their database several times per day. Apart from collecting information about new cases, they also keep track of past cases and update their status once it has been confirmed officially. Their dataset contains key features like: age bracket, nationality of the infected person, district in which the person is infected, and at least 2 official sources confirming the status of the infected person.

Property prediction task

For all the datasets, a training pair is represented by a molecular structure (SMILES string) and an activity measurement.
• E.coli: This dataset consists of 2335 pairs, with a binary activity measurement indicating E. coli inhibition. There are 120 molecules which inhibit E. coli growth. The size, quality, and distributional properties of this set are a good proxy for the SARS-CoV-2 screening data that will eventually be available.
• SARS-CoV 3CLpro: This dataset consists of 290,726 pairs obtained via an assay that measures activity against the SARS-CoV 3CLpro target, which is highly homologous to the corresponding protease in SARS-CoV-2. There are 405 molecules in this dataset which are active against the 3CLpro target.

Specific training, validation, and test splits for the above datasets are available here.