Presented By: Department of Statistics
Statistics Department Graduating Student Speaker: Shihao Wu, PhD Candidate, Department of Statistics, University of Michigan
"Efficient Embedding and Generative Modeling for Hypergraphs"
Abstract: Data that represent relations and interactions are ubiquitous in science, engineering, business, and medicine. Traditional analytical methods for such data primarily focus on pairwise relations; however, real-world interactions often involve more than two entities and are inherently multi-way. In current practice, these multi-way interactions are typically projected into pairwise relations before analysis, which causes substantial information loss. Directly studying hypergraphs, which naturally encode general multi-way interactions, allows for more effective extraction of information from such relational data. In this talk, I will discuss our development of generative models for hypergraphs. The first part introduces a general latent embedding framework that overcomes key limitations of existing hypergraph modeling methods. We establish identifiability of the latent embedding space and develop a likelihood-based estimator for the latent embeddings. We further derive consistency guarantees and asymptotic distributions for the parameter estimates, enabling efficient inference from an observed hypergraph. Building on these results, the second part of the talk introduces Denoising Diffused Embeddings (DDE), a generative architecture for hypergraphs that produces new hyperlinks not seen in the observed data. DDE connects discrete hyperlinks to a continuous latent space through a conditional hyperlink likelihood model, and then reconstructs that space using a denoising diffusion process. Compared with existing generative models, DDE is computationally efficient to train and sample from, and it offers interpretability from the likelihood perspective. Our theoretical and empirical studies demonstrate its advantages as a general generative modeling framework. Together, these results address core challenges in modeling multi-way interactions in relational data and illustrate how rigorous statistical modeling can contribute to building more efficient and trustworthy generative AI.