Computational Social Sciences
|| 23 Aug 2022
-- Prof. Ponnurangam Kumaraguru --

Sciences and Social sciences are often considered apart but Prof. Ponnurangam Kumaraguru, Professor of Computer Science at IIIT-Hyderabad and Associate Researcher at Robert Bosch Centre for Data Science and AI, believes that technology has so much to offer to social sciences. Therefore, Prof. Kumaraguru is carrying out research in an emerging area called Computational Social Sciences. To know more about his research in this field, Prof. Kumaraguru was hosted by RBCDSAI to give a talk on Computational Social Science on 1st August 2022.

Prof. Kumaraguru commenced his talk by giving a background about his education and research interests and said that his interest in the field of computational social sciences started nearly two years back and since then he has been doing research in the areas intersecting computational sciences and social sciences such as social network analysis, cybersecurity, privacy, legal AI, fake news, hate speech, code-mixed analysis, knowledge graph, Natural language processing, mental health etc. He further said that due to the availability of a humongous amount of data from the internet, we can now study problems at the population level which was not possible earlier. He next talked about two projects in detail and gave an overview of several other computational social science projects going on in their labs.

Talking about how AI can be leveraged for law, he said that there is a hefty backlog of pending court cases at the lower courts (around 40 million as on 2021)  where local languages is used for document filling. They particularly looked at the data from Uttar Pradesh and were able to collect around 9,00 thousand case documents written in Hindi.  To analyse this corpus of legal documents, they extracted raw documents, applied OCR to get raw text documents, performed document segmentation to get the segmented documents into header and body, cleaned and collated the data to get hindi legal documents corpus and finally applied natural language processing technique. Based on this analysis, they found out there are 300 different types of court cases and bail cases are the highest among them.  Therefore, the team decided to build an AI-based tool that can help in predicting the bail decision. To develop the bail prediction model, they used bail documents containing facts, arguments, the judge’s summary and the final decision. They formulated the bail prediction task as a binary classification problem (whether the bail will be granted or not) and given the ethical angle, they have developed this model to help the judge to hasten the process and not make final decisions. Although they found that the performance of the tool is lower in district-wise settings, possibly due to large variation across districts. However, the overall summarization models perform better than the existing models such as Doc2Vec and the simpler transformer-based models. He said that the major learnings from the research were that the Indian legal documents are a rich source of domain-specific Indic-language corpora which are readily available online, however, multiple tasks still need attention, especially for Indian settings such as citation predictions/network, realism vs positivism, a dashboard for analytics, legal summarization, case recommendations etc.

Next, he talked about his research work in the area of agricultural extension with an NGO called DIgital Greens. Digital Green has been showing the farmers videos related to agriculture extension and has been gathering data on the adoption rate based on these videos. This data was utilized by Prof. Kumaraguru’s team to find out the factors that lead to the high adoption rate of videos. Their analysis showed that language is one of the major factors driving the adoption rate i.e. if the video is in the native language of the group of farmers viewing it, then the adoption rate is higher. They also found that the farmers adopt the agriculture technique more if the video has characters from their village. Other factors that affected the adoption rate included the number of videos and the adoption rate of the video. 

Prof. Kumaraguru concluded the talk by briefly discussing some other research projects that were carried out or are ongoing in his laboratory. These included the development of tools to detect fake tweets such as whatsfarzi, spotfake, spotfake+, factdrill etc. He also talked about a project where they are trying to develop a model that can detect if a person is taking a selfie at a dangerous location which may harm him/her. He further said that their team routinely does the prediction of election results via social media posts and has also been interested to study the effect of popularity shocks on user behaviour. The talk was well-received by the audience and they posed various questions related to this emerging area and the research work of Prof. Kumaraguru.

The video is available on our YouTube channel: Link.


Natural language processing, social network analysis, cybersecurity, privacy, legal AI