Sourceform
Sourceform is a term used in linguistics and data processing to refer to the original input representation of a word or data item before any normalization, analysis, or transformation is applied. In natural language processing, sourceform is often treated as the input string and may be used interchangeably with terms like surface form or input form, depending on the source.
In linguistic analysis, the sourceform is the form of a word as it appears in the text,
In data pipelines, sourceform denotes the raw data state before transformation. For example, a source-form CSV
Origin and usage: The phrase does not have a single standard definition across all disciplines, and its
See also: surface form, lemma, stem, lemmatization, morphological analysis, data normalization.