Presented By: Department of Statistics Dissertation Defenses
Statistical Learning for Recurrent Event and Complex Network Data
Bo Meng
The development of modern technology has generated large-scale, complex datasets that pose significant challenges to traditional statistical methods. This dissertation presents novel statistical modeling frameworks and efficient computational methodologies tailored for large-scale recurrent event data and complex network data.
The first chapter focuses on recurrent event data, introducing a general framework that models the conditional mean function of the recurrent event process as the solution to an Ordinary Differential Equation (ODE). This flexible approach covers a wide range of semi-parametric models, including both non-homogeneous Poisson processes (NHPPs) and non-Poisson processes. We develop a Sieve Maximum Pseudo-Likelihood Estimation (SMPLE) method and establish its consistency, asymptotic normality, and semi-parametric efficiency.
The second chapter addresses signed network data, where relationships exhibit a complex interplay of positive (e.g., liking, alliances) and negative (e.g., disliking, conflicts) interactions. This chapter presents a novel latent space model that uses non-linear kernel functions to capture the sign-generating pattern of signed networks. Based on this framework, we identify a new sufficient condition for achieving population-level balance. We develop efficient projected gradient descent (PGD) algorithms to estimate the latent variables and establish non-asymptotic error rates for parameter estimation under both correctly and mis-specified settings.
The final chapter introduces a latent space model for analyzing longitudinal network data. This approach employs multivariate counting processes to model interaction sequences between node pairs, with intensity functions depending on static latent variables, time-varying baseline intensities, and time-varying edge covariates. We develop an efficient spline-based sieve estimation method and establish the non-asymptotic error rate of the corresponding PGD estimator under both parametric and nonparametric settings.
The first chapter focuses on recurrent event data, introducing a general framework that models the conditional mean function of the recurrent event process as the solution to an Ordinary Differential Equation (ODE). This flexible approach covers a wide range of semi-parametric models, including both non-homogeneous Poisson processes (NHPPs) and non-Poisson processes. We develop a Sieve Maximum Pseudo-Likelihood Estimation (SMPLE) method and establish its consistency, asymptotic normality, and semi-parametric efficiency.
The second chapter addresses signed network data, where relationships exhibit a complex interplay of positive (e.g., liking, alliances) and negative (e.g., disliking, conflicts) interactions. This chapter presents a novel latent space model that uses non-linear kernel functions to capture the sign-generating pattern of signed networks. Based on this framework, we identify a new sufficient condition for achieving population-level balance. We develop efficient projected gradient descent (PGD) algorithms to estimate the latent variables and establish non-asymptotic error rates for parameter estimation under both correctly and mis-specified settings.
The final chapter introduces a latent space model for analyzing longitudinal network data. This approach employs multivariate counting processes to model interaction sequences between node pairs, with intensity functions depending on static latent variables, time-varying baseline intensities, and time-varying edge covariates. We develop an efficient spline-based sieve estimation method and establish the non-asymptotic error rate of the corresponding PGD estimator under both parametric and nonparametric settings.