Presented By: University Career Center
Salesforce Research Speaker Series: Robustness Gym: Evaluating NLPModels Using Data Slices
Join Nazneen Rajani in this week's speaker series focused on Evaluating NLP Models Using Data Slices.
Abstract:
Deep neural networkshave proven powerful for various NLP tasks on (large) datasets, but are often not robust to (adversarial) data corruptions, distribution shifts,and other harmful data manipulations. In practice, this can lead to severe vulnerabilities, limited successful generalization to unseen data, and hinder safe deployment. While practitioners are increasingly aware of these problems, the common paradigm of reporting model performance continues to rely on the train-val-test i.i.d. splits.
In this talk, I will introduce the Robustness Gym: an easy-to-use and extendable framework to concisely evaluate the robustness and diagnose model vulnerabilities using data slices, i.e., user-defined splits or transformation of evaluation data, on a wide range of NLP tasks in both classification and generation. Data slices are generic and can represent, for example, semantic sub-populations (gender, age splits), augmentations (sub-word substitutions), adversarial attacks, or statistical splits (e.g., by word frequency).
Our framework is flexible and enables users to programmatically define and evaluate on data slices.
The Robustness Gym provides an overview of model performance on data slices and supports a wide variety of evaluation metrics, including accuracy, precision-recall, and fairness metrics. Our framework can also evaluate model robustness under dataset manipulations,including a range of adversarial attacks, data-augmentations, and different evaluation sets such as stress tests and contrast sets.
RSVP here:https://salesforce.recsolu.com/external/events/AhRCRs2BDJ8W7gVJLaUtdw
Abstract:
Deep neural networkshave proven powerful for various NLP tasks on (large) datasets, but are often not robust to (adversarial) data corruptions, distribution shifts,and other harmful data manipulations. In practice, this can lead to severe vulnerabilities, limited successful generalization to unseen data, and hinder safe deployment. While practitioners are increasingly aware of these problems, the common paradigm of reporting model performance continues to rely on the train-val-test i.i.d. splits.
In this talk, I will introduce the Robustness Gym: an easy-to-use and extendable framework to concisely evaluate the robustness and diagnose model vulnerabilities using data slices, i.e., user-defined splits or transformation of evaluation data, on a wide range of NLP tasks in both classification and generation. Data slices are generic and can represent, for example, semantic sub-populations (gender, age splits), augmentations (sub-word substitutions), adversarial attacks, or statistical splits (e.g., by word frequency).
Our framework is flexible and enables users to programmatically define and evaluate on data slices.
The Robustness Gym provides an overview of model performance on data slices and supports a wide variety of evaluation metrics, including accuracy, precision-recall, and fairness metrics. Our framework can also evaluate model robustness under dataset manipulations,including a range of adversarial attacks, data-augmentations, and different evaluation sets such as stress tests and contrast sets.
RSVP here:https://salesforce.recsolu.com/external/events/AhRCRs2BDJ8W7gVJLaUtdw
Explore Similar Events
-
Loading Similar Events...