overgenerated
Overgenerated is a term used in linguistics and natural language processing to describe outputs that exceed the bounds of acceptable content for a given task. In NLP, overgeneration refers to generation that is irrelevant, self-contradictory, ungrammatical, or factually incorrect, including hallucinated facts or fabricated entities. It can arise from the combination of powerful language models and broad prompts, decoding methods that favor fluency over accuracy, or training data containing inconsistent information.
In computational linguistics, overgeneration describes a grammar, parser, or rule system that accepts strings not permitted
Mitigation strategies include careful data curation, instruction tuning, reinforcement learning from human feedback, adding factual grounding