Skip to Content

Sponsors

No results

Tags

No results

Types

No results

Search Results

Events

No results
Search events using: keywords, sponsors, locations or event type
When / Where
All occurrences of this event have passed.
This listing is displayed for historical purposes.

Presented By: Department of Statistics

Dissertation Defense: Statistical network analysis: beyond block models

Yuan Zhang

Abstract:

Network data represent​ ​ connections between units of analysis and lead to many interesting research questions​ with diverse applications​. In this thesis, we focus on inferring the structure underlying an observed network, which can be thought of as a noisy random realization of the unobserved true structure.  ​Different applications focus on different types of underlying structure;  one question of broad interest is finding a community structure, with communities typically defined as groups of nodes that share similar connectivity patterns.   ​One common and widely used model for describing​ a community structure​ in a network is the stochastic block model.   This model has attracted a lot of attention because of its tractable theoretical properties, but it is also well known to oversimplify the structure observed in real world networks and often does not fit the data well.   Thus there has been a recent push to expand the stochastic block model in various ways to make it closer to what we observe in the real world, and this thesis makes several contributions to this effort.  

We first study the problem of detecting communities ​in the​ presence of additional node features.​ Many​ existing methods detect communities based only on ​the ​observed edges between nodes, but in many networks, additional information on node features is available.​  Recent methods for community detection that​ incorporate node features​ typically either depend heavily on correct model specification, which is hard to verify, and/or do not attempt to perform feature selection.  Including features related to communities can improve community detection, but including unrelated features amounts to adding noise to the data and ​​can lead to ​substantial​ reductions in accuracy​.  In this thesis,​ we propose a mod​el​​l​-free joint criterion​ for community detection with node features, with the ability to select only relevant features. ​ ​We show that the underlying new community detection criterion has appropriate theoretical performance guarantees and the method is effective on both simulated and real networks.  ​

​Another direction we explore in this thesis ​is modeling and detecting overlapping communities. While community detection is commonly formulated as a partition problem, in practice communities in networks tend to overlap. Developing a good model for overlapping communities has been a challenge, due to identifiability issues and computational costs, although a number of special cases have been addressed.  We propose a novel overlapping model that generalizes the stochastic block model and includes many of the previously studied overlapping models as special cases.  The model is flexible and general but maintains identifiability and interpretability of parameters.  We propose a fast algorithm to fit this model, establish its consistency, and demonstrate the method outperforms a large number of benchmarks on both simulated and real data examples.

​The final contribution of this thesis is a novel method​  to estimate edge probabilities​ from a single observed network, a task closely related to the so-called graphon estimation problem.  The stochastic block model is able to infer this underlying edge probability matrix from a single observation by assuming the underlying probability function (the graphon) consists of constant blocks;  we deal with the much more general case of piecewise Lipschitz continuous functions.   Our estimator leverages a core technique of classical nonparametric statistics, neighborhood averaging, solving the challenge of defining suitable neighborhoods on networks.   The method is fast and accurate, and adapts to a large range of different graphon families.   We also show that it achieves the best theoretical error rate among currently known polynomial time methods for this problem.

Explore Similar Events

  •  Loading Similar Events...

Back to Main Content