Tyyppisana
Tyyppisana is a term used in Finnish linguistics to denote the underlying lexical item or canonical form of a word, from which inflected forms are derived. It is closely related to, and often used as, the lemma or perusmuoto. In corpus linguistics, the tyyppisana is the unit represented by a distinct lexical item, in contrast to a token, which is any occurrence of a word form in running text.
For example, the Finnish noun talo has the tyyppisana talo, while its inflected forms talon, taloa, taloja,
Determining the tyyppisana typically requires lemmatization or stemming, depending on the application. In dictionaries, the tyyppisana
The concept helps distinguish between vocabulary diversity (types) and the number of word occurrences (tokens). The