Wolfgang Gatterbauer's Talk on "Algebraic Amplification for Node Classification in Sparsely Labeled Networks" at 2 PM Eastern on 10/29/2021
Abstract: Node classification is an important problem in graph data. It is commonly solved by various label propagation methods (such as belief propagation or linearized belief propagation) that work iteratively starting from a few labeled seed nodes. For graphs with arbitrary potentials (also called compatibilities or attractions) between classes (i.e. going beyond the assortative mixing assumption and explicitly allowing heterophily like in general Markov random fields), these methods crucially depend on knowing the compatibilities between neighboring labels. This information must be provided by either domain experts or heuristics before any label algorithm to work. Can we instead directly estimate the compatibilities from extremely sparsely labeled graphs in a way that scales to networks with millions of edges? We answer this question affirmatively and suggest a method called distant compatibility estimation that works even on extremely sparsely labeled graphs (e.g., 1 in 10,000 nodes is labeled) in a fraction of the time it later takes to label the remaining nodes. Our approach first creates multiple factorized graph representations (with size independent of the graph) and then performs estimation on these smaller graph summaries. We refer to algebraic amplification as the more general idea of leveraging algebraic properties of an algorithm’s update equations to amplify sparse signals. Our experiments show that our estimator is by orders of magnitude faster than alternative textbook approaches and that the end-to-end classification accuracy is comparable to using gold standard compatibilities. This makes it a cheap preprocessing step for any existing label propagation method and removes the current dependence on heuristics. It remains to be seen whether any of the more recent GNN approaches can be extended to work in a similar sparse regime.

Bio: Wolfgang Gatterbauer is an Associate Professor in the Khoury College of Computer Sciences at Northeastern University. A major focus of his research is to extend the capabilities of modern data management systems in generic ways and to allow them to support novel functionalities that seem hard at first. He received an NSF Career award and, with his students and collaborators, a best paper award at EDBT 2021, best-of-conference selections for PODS 2021, SIGMOD 2017, WALCOM 2017, and VLDB 2015, and two SIGMOD 2020 reproducibility awards. For details, visit https://gatterbauer.name.
Sign in to Google to save your progress. Learn more
Name *
Email *
Institution *
Submit
Clear form
Never submit passwords through Google Forms.
This content is neither created nor endorsed by Google. Report Abuse - Terms of Service - Privacy Policy