stereosets
Stereosets are a type of dataset designed to evaluate how well language models handle stereotyping. They consist of sentences that are intentionally ambiguous or contain subtle cues that could lead a model to make stereotypical assumptions about gender, race, religion, or other social groups. The goal of stereosets is to uncover and measure biases present in large language models.
The dataset is structured to present pairs of sentences. One sentence in the pair is considered stereotypical,
By analyzing the model's choices, researchers can quantify the extent to which it exhibits stereotypical biases.