NonMandarin
NonMandarin is a tag used in linguistics, data labeling, and language technology to indicate content, speech varieties, or datasets that do not involve the Mandarin Chinese language. It is not a formal linguistic category, but rather a practical label that can vary by context. In some studies and applications it denotes all varieties of Chinese other than Mandarin (for example, Cantonese, Wu, Min, Hakka, and others), while in data projects it may simply mean “not Mandarin” for filtering or classification purposes.
Scope and interpretation can differ. In linguistic work, nonMandarin may refer to the constellation of Chinese
Applications and considerations. NonMandarin labeling is common in language identification, dataset curation, and model training when
See also: Mandarin Chinese, Chinese languages, Dialect continuum, Language identification.