All occurrences of this event have passed.
This listing is displayed for historical purposes.

Free Lecture / Discussion

Presented By: Department of Statistics Dissertation Defenses

Ask a Question About This Event

Modeling Structure in Unstructured Data: Statistical and Causal Perspectives

Name: Modeling Structure in Unstructured Data: Statistical and Causal Perspectives
Start: 2026-04-16T13:00:00-04:00
End: 2026-04-16T15:00:00-04:00
Location: West Hall

Kevin Christian Wibisono

Modern machine learning systems are trained on massive amounts of unstructured data such as text, images, and sequences. Despite the apparent lack of explicit structure, they exhibit remarkable abilities to learn patterns, perform reasoning, and support decision-making. This apparent paradox raises a central question: what structure do these models recover from unstructured data, and how can we understand and use it?

This dissertation investigates how language models (i) represent structure through their architectures, (ii) learn structure from unstructured data, and (iii) enable us to leverage this learned structure for principled causal inference with unstructured data.

The first part develops a statistical perspective of attention mechanisms, the core building block of modern language models. We show that attention can be interpreted as adaptive mixture-of-experts models. This interpretation enables us to extend attention to model general exponential family-distributed data, making it capable of modeling complex, heterogeneous data beyond text. In turn, this perspective reframes attention as a statistical model, explaining how it captures complex dependencies and latent structure, with guarantees on identifiability and generalization.

The second part examines how such structure arises from unstructured training data. We show that many in-context learning behaviors can emerge directly from co-occurrence patterns in unstructured text, linking modern models to classical co-occurrence modeling tools like latent factor modeling. At the same time, we identify the limits of this mechanism: positional structure becomes essential for more complex reasoning tasks. We further demonstrate that training data composition plays a critical role in shaping model behavior and alignment, with example difficulty acting as a key factor.

The final part studies how learned representations in language models can be leveraged for causal inference in high-dimensional, unstructured settings. Our approach identifies causal variables directly within the representation space, enabling well-defined estimation of causal effects when treatments or outcomes are themselves unstructured. In particular, we isolate representation directions corresponding to the most causally influential treatment components and the most salient treatment-induced outcome variations.

Together, these results provide a unified perspective on how modern machine learning systems extract structure from unstructured data, and how that structure can be harnessed for rigorous statistical and causal analysis.

Explore Similar Events

Loading Similar Events...

Keywords

Dissertation

0 upcoming occurrence
0 expired occurrence

Happening @ Michigan

The University of Michigan Events Calendar

Sponsors

Keywords

Types

Search Results

Events

Modeling Structure in Unstructured Data: Statistical and Causal Perspectives

Kevin Christian Wibisono

Explore Similar Events

Keywords

Contact Event Organizers: Department of Statistics Dissertation Defenses

When and Where

Map West Hall - 438

April 2026

Contact Us

Happening @ Michigan

The University of Michigan Events Calendar

Sponsors

Keywords

Types

Search Results

Events

Modeling Structure in Unstructured Data: Statistical and Causal Perspectives

Kevin Christian Wibisono

Share Event

Explore Similar Events

Keywords

Contact Event Organizers: Department of Statistics Dissertation Defenses

When and Where

Map West Hall - 438

April 2026

Contact Us