Presented By: Department of Statistics Dissertation Defenses
Efficient Embedding and Generative Modeling of Hypergraphs
Shihao Wu
Data that represent relations and interactions are ubiquitous in science, engineering, business, and medicine. Traditional analytical methods for relational data primarily focus on pairwise relations; however, real-world interactions often involve more than two entities and are inherently multi-way. In current practice, these multi-way interactions are typically projected into pairwise relations before analysis, which causes substantial information loss. Directly studying hypergraphs, which naturally encode general multi-way interactions, allows for more effective information extraction from such relational data.
This thesis develops efficient embedding and generative modeling frameworks for hypergraph data. The first part of the thesis introduces a general latent embedding framework that overcomes key limitations of existing hypergraph modeling methods. We establish identifiability of the latent embedding space and develop a likelihood-based estimator for the latent embeddings. We further derive consistency guarantees and asymptotic distributions for the parameter estimates, enabling efficient inference from an observed hypergraph. Building on these results, the second part of the thesis develops Denoising Diffused Embeddings (DDE), a generative architecture for hypergraphs that produces new hyperlinks not seen in the observed data. DDE connects discrete hyperlinks to a continuous latent space through a conditional hyperlink likelihood model and then reconstructs that space using a denoising diffusion process. Compared with existing generative models, DDE is computationally efficient to train and sample from, and it offers interpretability from the likelihood perspective. Our theoretical and empirical studies demonstrate its advantages as a general generative modeling framework. The third part of the thesis further extends this line of work to hypergraphs with hyperlink attributes. We propose ReLaSH, a generative framework that first learns a likelihood-based joint latent space for hyperlinks and their attributes and then reconstructs this space using a flexible distribution-free generator, enabling the generation of realistic synthetic attributed hypergraphs. We demonstrate the consistency and generalizability of ReLaSH through both theoretical analysis and simulation studies. Empirical results on a range of real-world datasets from diverse domains further demonstrate the strong performance of ReLaSH in comparison with other baselines, underscoring its broad utility and effectiveness in practical applications. Together, these results address core challenges in modeling multi-way interactions in relational data and illustrate how rigorous statistical modeling can contribute to building more efficient and trustworthy generative AI.
This thesis develops efficient embedding and generative modeling frameworks for hypergraph data. The first part of the thesis introduces a general latent embedding framework that overcomes key limitations of existing hypergraph modeling methods. We establish identifiability of the latent embedding space and develop a likelihood-based estimator for the latent embeddings. We further derive consistency guarantees and asymptotic distributions for the parameter estimates, enabling efficient inference from an observed hypergraph. Building on these results, the second part of the thesis develops Denoising Diffused Embeddings (DDE), a generative architecture for hypergraphs that produces new hyperlinks not seen in the observed data. DDE connects discrete hyperlinks to a continuous latent space through a conditional hyperlink likelihood model and then reconstructs that space using a denoising diffusion process. Compared with existing generative models, DDE is computationally efficient to train and sample from, and it offers interpretability from the likelihood perspective. Our theoretical and empirical studies demonstrate its advantages as a general generative modeling framework. The third part of the thesis further extends this line of work to hypergraphs with hyperlink attributes. We propose ReLaSH, a generative framework that first learns a likelihood-based joint latent space for hyperlinks and their attributes and then reconstructs this space using a flexible distribution-free generator, enabling the generation of realistic synthetic attributed hypergraphs. We demonstrate the consistency and generalizability of ReLaSH through both theoretical analysis and simulation studies. Empirical results on a range of real-world datasets from diverse domains further demonstrate the strong performance of ReLaSH in comparison with other baselines, underscoring its broad utility and effectiveness in practical applications. Together, these results address core challenges in modeling multi-way interactions in relational data and illustrate how rigorous statistical modeling can contribute to building more efficient and trustworthy generative AI.