{
    "147391-21900960":
    {
        "datetime_modified":"20260403T135851",
        "datetime_start":"20260414T140000",
        "datetime_end":"20260414T160000",
        "has_end_time":1,
        "date_start":"2026-04-14",
        "date_end":"2026-04-14",
        "time_start":"14:00:00",
        "time_end":"16:00:00",
        "time_zone":"America\/Detroit",
        "event_title":"Transport-Based Methods for Inference and Generation with Graphical Structure",
        "occurrence_title":"",
        "combined_title":"Transport-Based Methods for Inference and Generation with Graphical Structure: Yidan Xu",
        "event_subtitle":"Yidan Xu",
        "event_type":"Lecture \/ Discussion",
        "event_type_id":"13",
        "description":"Statistical learning posits a structured relationship between observed data and unobserved quantities \u2014 latent components underlying a mixture, counterfactual outcomes unobserved under the realized treatment assignment, or low-dimensional representations encoding complex generative factors \u2014 and makes inference over the parameters that govern this relationship. In each case, a graphical model encodes the structural assumptions through conditional independence and factorization, but inference over the resulting distributional objects demands tools that are stable under the geometric irregularities \u2014 limited overlap, high dimensionality, unknown model complexity \u2014 that arise in practice. This thesis develops a distributional framework that pairs graphical model structure with optimal transport geometry to address this need. We apply the framework to causal inference under limited overlap, replacing density-ratio reweighting with geometrically stable transport maps and developing Wasserstein-based sensitivity analysis for partial identification; to structured generative modeling, introducing Structured Flow Autoencoders that combine conditional normalizing flows with latent graphical models via a novel flow matching objective; and to mixture model estimation, where Bayes fixed-point iteration and entropy-regularized semi-discrete optimal transport yield a geometry-driven approach to component recovery and model selection. Across all settings, the thesis demonstrates that replacing pointwise inference procedures with distributional, geometry-aware ones \u2014 anchored by graphical model structure \u2014 yields methods that are simultaneously more principled and more practically reliable.",
        "occurrence_notes":null,
                "guid":"147391-21900960@events.umich.edu",
        "permalink":"http:\/\/events.umich.edu\/event\/147391",
        "building_id":"1000167",
        "building_name":"West Hall",
        "campus_maps_id":"163",
        "room":"438",
        "location_name":"West Hall",
        "has_livestream":0,
        "cost":"",
        "tags":["Dissertation"],
        "website":"",
        "sponsors":[
             {
                "group_name":"Department of Statistics Dissertation Defenses",
                "group_id":"4045",
                "website":""                }                    ],
        "image_url":"",
        "styled_images":{
                                        "event_thumb":"",
                                            "event_large":"",
                                            "event_large_2x":"",
                                            "event_large_lightbox":"",
                                            "group_thumb":"",
                                            "group_thumb_square":"",
                                            "group_large":"",
                                            "group_large_lightbox":"",
                                            "event_large_crop":"",
                                            "event_list":"",
                                            "event_list_2x":"",
                                            "event_grid":"",
                                            "event_grid_2x":"",
                                            "event_feature_large":"",
                                            "event_feature_thumb":""                    },
        "occurrence_count":1,
        "first_occurrence":21900960
    }    ,    "147382-21900950":
    {
        "datetime_modified":"20260403T112823",
        "datetime_start":"20260416T130000",
        "datetime_end":"20260416T150000",
        "has_end_time":1,
        "date_start":"2026-04-16",
        "date_end":"2026-04-16",
        "time_start":"13:00:00",
        "time_end":"15:00:00",
        "time_zone":"America\/Detroit",
        "event_title":"Modeling Structure in Unstructured Data: Statistical and Causal Perspectives",
        "occurrence_title":"",
        "combined_title":"Modeling Structure in Unstructured Data: Statistical and Causal Perspectives: Kevin Christian Wibisono",
        "event_subtitle":"Kevin Christian Wibisono",
        "event_type":"Lecture \/ Discussion",
        "event_type_id":"13",
        "description":"Modern machine learning systems are trained on massive amounts of unstructured data such as text, images, and sequences. Despite the apparent lack of explicit structure, they exhibit remarkable abilities to learn patterns, perform reasoning, and support decision-making. This apparent paradox raises a central question: what structure do these models recover from unstructured data, and how can we understand and use it?\n\nThis dissertation investigates how language models (i) represent structure through their architectures, (ii) learn structure from unstructured data, and (iii) enable us to leverage this learned structure for principled causal inference with unstructured data.\n\nThe first part develops a statistical perspective of attention mechanisms, the core building block of modern language models. We show that attention can be interpreted as adaptive mixture-of-experts models. This interpretation enables us to extend attention to model general exponential family-distributed data, making it capable of modeling complex, heterogeneous data beyond text. In turn, this perspective reframes attention as a statistical model, explaining how it captures complex dependencies and latent structure, with guarantees on identifiability and generalization.\n\nThe second part examines how such structure arises from unstructured training data. We show that many in-context learning behaviors can emerge directly from co-occurrence patterns in unstructured text, linking modern models to classical co-occurrence modeling tools like latent factor modeling. At the same time, we identify the limits of this mechanism: positional structure becomes essential for more complex reasoning tasks. We further demonstrate that training data composition plays a critical role in shaping model behavior and alignment, with example difficulty acting as a key factor.\n\nThe final part studies how learned representations in language models can be leveraged for causal inference in high-dimensional, unstructured settings. Our approach identifies causal variables directly within the representation space, enabling well-defined estimation of causal effects when treatments or outcomes are themselves unstructured. In particular, we isolate representation directions corresponding to the most causally influential treatment components and the most salient treatment-induced outcome variations.\n\nTogether, these results provide a unified perspective on how modern machine learning systems extract structure from unstructured data, and how that structure can be harnessed for rigorous statistical and causal analysis.",
        "occurrence_notes":null,
                "guid":"147382-21900950@events.umich.edu",
        "permalink":"http:\/\/events.umich.edu\/event\/147382",
        "building_id":"1000167",
        "building_name":"West Hall",
        "campus_maps_id":"163",
        "room":"438",
        "location_name":"West Hall",
        "has_livestream":0,
        "cost":"",
        "tags":["Dissertation"],
        "website":"",
        "sponsors":[
             {
                "group_name":"Department of Statistics Dissertation Defenses",
                "group_id":"4045",
                "website":""                }                    ],
        "image_url":"",
        "styled_images":{
                                        "event_thumb":"",
                                            "event_large":"",
                                            "event_large_2x":"",
                                            "event_large_lightbox":"",
                                            "group_thumb":"",
                                            "group_thumb_square":"",
                                            "group_large":"",
                                            "group_large_lightbox":"",
                                            "event_large_crop":"",
                                            "event_list":"",
                                            "event_list_2x":"",
                                            "event_grid":"",
                                            "event_grid_2x":"",
                                            "event_feature_large":"",
                                            "event_feature_thumb":""                    },
        "occurrence_count":1,
        "first_occurrence":21900950
    }    ,    "147383-21900951":
    {
        "datetime_modified":"20260403T113109",
        "datetime_start":"20260429T090000",
        "datetime_end":"20260429T110000",
        "has_end_time":1,
        "date_start":"2026-04-29",
        "date_end":"2026-04-29",
        "time_start":"09:00:00",
        "time_end":"11:00:00",
        "time_zone":"America\/Detroit",
        "event_title":"Principled Evaluation of Large Language Models: A Statistical Perspective",
        "occurrence_title":"",
        "combined_title":"Principled Evaluation of Large Language Models: A Statistical Perspective: Felipe Maia Polo",
        "event_subtitle":"Felipe Maia Polo",
        "event_type":"Lecture \/ Discussion",
        "event_type_id":"13",
        "description":"The rapid progress of large language models has outpaced the development of principled methodologies for their evaluation. This dissertation draws on ideas from psychometrics and statistics to build rigorous, efficient, and interpretable evaluation frameworks for modern AI systems. In this talk, I focus on three contributions that address complementary challenges in LLM evaluation.\n\nFirst, I present PromptEval, a method that confronts the problem of prompt sensitivity \u2014 the phenomenon whereby minor rephrasing of benchmark questions can substantially alter measured model performance. By combining Item Response Theory with matrix completion, PromptEval efficiently approximates the full distribution of model performance across hundreds of prompt variations while requiring less than 5% of the total evaluations, replacing arbitrary single-prompt assessments with statistically robust characterizations of model behavior.\n\nSecond, I introduce skill-based scaling laws that model LLM performance through latent capabilities such as reasoning and instruction-following. Inspired by factor analysis, this approach exploits the correlation structure among benchmark tasks to produce scaling predictions that are both more accurate and more interpretable than existing laws, which typically focus on aggregate validation loss and fail to generalize across model families.\n\nThird, I present Bridge, a unified statistical framework that explicitly connects LLM-as-a-Judge evaluations to human assessments. Bridge models the systematic discrepancies between human and LLM judgments through a latent preference score and a linear transformation of divergence-capturing covariates, enabling principled recalibration of automated scores and formal statistical testing for human\u2013LLM gaps.\n\nTogether, these contributions advance a vision of AI evaluation as a scientific discipline in its own right \u2014 one that demands the same statistical care we expect from the systems being evaluated.",
        "occurrence_notes":null,
                "guid":"147383-21900951@events.umich.edu",
        "permalink":"http:\/\/events.umich.edu\/event\/147383",
        "building_id":"1000167",
        "building_name":"West Hall",
        "campus_maps_id":"163",
        "room":"470",
        "location_name":"West Hall",
        "has_livestream":0,
        "cost":"",
        "tags":["Dissertation"],
        "website":"",
        "sponsors":[
             {
                "group_name":"Department of Statistics Dissertation Defenses",
                "group_id":"4045",
                "website":""                }                    ],
        "image_url":"",
        "styled_images":{
                                        "event_thumb":"",
                                            "event_large":"",
                                            "event_large_2x":"",
                                            "event_large_lightbox":"",
                                            "group_thumb":"",
                                            "group_thumb_square":"",
                                            "group_large":"",
                                            "group_large_lightbox":"",
                                            "event_large_crop":"",
                                            "event_list":"",
                                            "event_list_2x":"",
                                            "event_grid":"",
                                            "event_grid_2x":"",
                                            "event_feature_large":"",
                                            "event_feature_thumb":""                    },
        "occurrence_count":1,
        "first_occurrence":21900951
    }    }
