Topic: Scaling Graph Representation Learning Algorithms in an Implementation Agnostic Fashion
Joint work with Jionqian Liang (Google Brain), S. Gurukar (OSU) and Yuntian He (OSU)
Recently there has been a surge of interest in designing graph embedding methods. Few, if any, can scale to a large-sized graph with millions of nodes due to both computational complexity and memory requirements. In this talk, I will present an approach to redress this limitation by introducing the MultI-Level Embedding (MILE) framework – a generic methodology allowing con-temporary graph embedding methods to scale to large graphs. MILE repeatedly coarsens the graph into smaller ones using a hybrid matching technique to maintain the backbone structure of the graph. It then applies existing embedding methods on the coarsest graph and refines the embeddings to the original graph through a graph convolution neural network that it learns. I will then describe an extension to MILE in a distributed setting (DistMILE) to further improve the scalability of graph embedding. DistMILE leverages a novel shared-memory parallel algorithm for graph coarsening and a distributed training paradigm for embedding refinement. With the advantage of high-performance computing techniques, Dist-MILE can smoothly scale different base embedding methods over large networks.
The proposed MILE and Dist-MILE frameworks are agnostic to the underlying graph embedding techniques and can be applied to many existing graph embedding methods without modifying them.
Experimental results on five large-scale datasets demonstrate that MILE significantly boosts the speed (order of magnitude) of graph embedding while generating embeddings of better quality, for the task of node classification. MILE can comfortably scale to a graph with 9 million nodes and 40 million edges, on which existing methods run out of memory or take too long to compute on a modern workstation. Our experiments demonstrate that DistMILE learns representations of similar quality with respect to other baselines while reducing the time of learning embeddings even further (up to 40 x speedup over MILE).
Srinivasan Parthasarathy is a Professor in the Dept. of Computer Science and Engineering and Department of Biomedical Informatics at Ohio State University and VAJRA - Visiting Faculty Member of IIT Madras. He works in the areas of Data Mining, Database Systems and High Performance Computing. His work in these areas have received a number of best paper awards or similar honours from conferences such as VLDB, SIGKDD, ICDM(2), SIAM DM(2), and ISMB. He leads the Data Mining Research laboratory at Ohio State and co-directs (along with a colleague in Statistics) a brand new undergraduate program in Data Analytics at Ohio State (among the first of its kind).