Word2Vec - Infinite Lexicon - Infinite Lexicon

Word2Vec

word2vec is a family of neural network models that produce dense vector representations of words from large text corpora. Developed by Tomas Mikolov and colleagues at Google and first described in 2013, the models learn embeddings that place semantically similar words closer in a continuous vector space. These embeddings can be used to measure similarity, perform arithmetic on meanings, and serve as features in downstream NLP tasks.

word2vec encompasses two main architectures: Continuous Bag of Words (CBOW) and Skip-gram. CBOW predicts a target

To train efficiently on large vocabularies, word2vec uses optimization techniques such as hierarchical softmax or negative

Word vectors capture many linguistic regularities, enabling simple vector operations such as king minus man plus

Word2vec had a major impact on natural language processing by providing a scalable, effective way to learn

a

single-hidden-layer

representations

dimensionalities

a

a

a

representations

a

a