Scaling graph representation learning algorithms in an implementation agnostic fashion | Robert Bosch Centre for Data Science and Artificial Intelligence

--Prof. Srinivasan Parthasarathy--

A graph, which consists of nodes and edges, is a pictorial representation of the relationship between various objects. Analysis of graphs can help in inferring social, biological, road, finance and other networks as well. Prof. Srinivasan Parthasarathy who is Professor at Dept. of Computer Science and Engineering and Department of Biomedical Informatics at the Ohio State University and RBCDSAI Distinguished Fellow gave a talk on his research in the area of graph analysis through a talk titled “Scaling graph representation learning algorithms in an implementation agnostic fashion” which was organized as a part of Sixth RBCDSAI Latent View Colloquium on 23rd March 2022. Prof. Parthasarathy began his talk by mentioning various methods of graph analysis such as frequent graph mining molecular graphs, motif retrieval for drug discovery, representation learning, link prediction and node classification and pointed out that most of these methods suffer in terms of productivity considerations and performance considerations. He then elaborated that machine learning tasks on graphs can be node classification such as classifying protein function or link prediction such as recommendation system. He further said that graph representation learning (GRL) methods result in state-of-art performance on such machine learning tasks. Elaborating on the graph representation learning method, he said that this method takes an input graph and produces a vector representation of the nodes and these node representation acts as sort of features in downstream machine learning tasks. Next, he mentioned about two constraints of graph representation learning: similarity between nodes in graph space approximations and closeness between nodes in embedding space and said that there are a plethora of techniques available to solve the latter. He then said that there are many techniques in unsupervised embedding landscape, however, they are either not scalable or take a significant amount of time to learn embeddings.

Describing his research in this area he said that his research team asked two questions- Can we scale up the existing embedding techniques in an agnostic manner and can the quality of such embedding methods be improved in the process? They went on to develop MILE and DistMILE as an answer to the above question. Describing MILE- Multilevel embedding-, he said that it is a framework that takes existing algorithms that look at unsupervised graph representation learning techniques, scales them in an agnostic manner and also often improves the quality of embedding in the process. Going further into detail, he told about three phases of MILE- Graph coarsening, base embedding and embedding refinement in detail. He then discussed the experiments conducted by his group to evaluate MILE in terms of scalability and quality of generated embeddings where various embedding techniques like Deepwalk, Node2vec, GraREp, NETMF, Line etc. were used as base embedding for the study on tasks node classification and link prediction on YouTube and Yelp dataset. Stating the results of the study, he said that experiments showed that MILE does not result in much loss in accuracy but there is a tremendous improvement in terms of speed and also a substantive reduction in memory consumption.

Next, Prof. Parthasarathy talked about DistMILE which is a major modification of MILE. In DistMILE, he said, coarsening and refinement phases are parallelized to further increase the speed. Talking about the differences between the two frameworks, he said that during coarsening step MILE matches nodes sequentially whereas DistMILE leverages unprotected matching and corrects matching conflicts via post-processing. He also informed that in the case of DistMILE, the shared-memory parallelism for coarsening is approximately seven times faster than distributed-memory parallelism. Prof. Parthasarathy ended the talk by showing the results stating that MILE and DistMILE have comparable quality but DistMILE is significantly faster and has better scalability and specifically, the DistMILE framework showed 10X improvement in speed coarsening step and up to 58X improvement in speed in refinement step as compared to MILE The talk was well received by the audience and the session ended with a round of question and answers.

The video is available on our YouTube channel: Link.

Keywords

Graph Analysis, Embedding Techniques