Skip to Content

Sponsors

No results

Tags

No results

Types

No results

Search Results

Events

No results
Search events using: keywords, sponsors, locations or event type
When / Where

Presented By: Department of Statistics Dissertation Defenses

An Exploration of the Statistical Challenges and Fairness Implications of Transfer Learning

Subha Maity

Abstract: The main goal of transfer learning strategies is to enhance the efficiency of learning models applied to target tasks by transferring knowledge from similar, yet distinct, source tasks. These strategies are particularly useful when it is excessively costly or impractical to collect an ample volume of training data in the target domain, because by borrowing strength from the source, they can significantly reduce the sample requirement in the target. This has led transfer learning to be widely applied across problems from numerous fields whenever sufficient task-specific data are difficult to obtain, and has allowed its emergence as a popular and promising area of research over the last decade. In contrast to the extensive body of existing literature, which focuses primarily on the development and assessment of various algorithms, the first part of this dissertation adopts a rigorous statistical approach to address and understand certain challenges within the realm of transfer learning, while the latter part explores distribution shift models, which lie at the heart of transfer learning, as sources of bias in machine learning algorithms and their potential impacts on algorithmic fairness.

The dissertation begins with an introductory overview of the transfer learning concepts in the first chapter. In the second chapter, it delves into the theoretical limitations of the `label shift' problem within the framework of nonparametric classification by examining the `minimax' performance outcomes in classification tasks and highlights the inherent challenges therein related to various problem-specific parameters. The third chapter introduces a simple, yet flexible, linear adjustment model and method that addresses the `posterior drift' issue when a modest volume of labeled data are accessible from the target domain. This chapter not only undertakes a `minimax' analysis for the model, but also illustrates a real data application: predicting mortality amongst a minority demographic within the `UKBiobank' dataset. The fourth chapter proposes an exponential tilt-based statistical framework tailored for classification problems in transfer learning contexts, where, in contrast to the previous chapter, the labeled examples are not present in the target domain, a situation that is significantly challenging in the presence of `posterior drift'. Along with presenting an importance-weighting approach, this chapter demonstrates the method's effectiveness both from a theoretical standpoint and empirically, through its application to the `Waterbirds' and `Breeds' image datasets.

The latter half of the thesis posits that biases in machine learning algorithms arise from a `subpopulation shift' model, which is a standard way to characterize the underrepresentation of minority groups, and examines its impact on various issues related to algorithmic fairness. For such biases, the fifth chapter explores whether the application of standard `group fairness' tools during the training phase enhances the trained model's performance on a `target domain', and provides a related necessary and sufficient condition. The final chapter studies how underrepresentation affects the performance of minority groups in a class of representation learning algorithms known as `contrastive learning'. It reveals that the representations of minority groups tend to collapse with certain majority groups, an issue referred to as `representation harm' or `stereotyping', and goes on to show that this representation harm can have detrimental effects on subsequent predictive tasks.

Explore Similar Events

  •  Loading Similar Events...

Back to Main Content