MRPC
MRPC, or Microsoft Research Paraphrase Corpus, is a benchmark dataset used for paraphrase identification in natural language processing. It consists of pairs of sentences labeled to indicate whether the two sentences convey the same meaning. The corpus was created by researchers at Microsoft Research in the mid-2000s, and the sentence pairs were drawn from online news sources and other published texts. Each data point contains two sentences and a binary label: 1 if the sentences are paraphrases (semantically equivalent) and 0 otherwise.
The MRPC dataset is widely used to train and evaluate models for sentence similarity and paraphrase detection,
Compared with some other paraphrase datasets, MRPC is relatively small, which has encouraged researchers to use