wordgram
Wordgram is a term used in information science and computational linguistics to describe a representation of words based on sequences of characters, or n-grams, within those words. The idea treats a word as a collection of overlapping character grams, which can be encoded into a vector for machine learning tasks. The term is not universally standardized and may be used differently in different studies.
Construction of a wordgram typically involves selecting a range of n-gram lengths, such as three to five
Applications for wordgram representations include text classification, language identification, authorship attribution, and spelling or OCR error
Relation to and limitations of the approach: Wordgram features overlap with broader n-gram and subword modeling
See also: n-gram, character n-gram, subword model, word embedding, bag of n-grams.