4.3.3. 技术

4.3.3.1. 文字嵌入

Text embeddings allow deep learning to be effective on smaller datasets. These are often first inputs to a deep learning archiectures and most popular way of transfer learning in NLP. Embeddings are simply vectors or a more generically, real valued representations of strings. Word embeddings are considered a great starting point for most deep NLP tasks.

The most popular names in word embeddings are word2vec by Google (Mikolov) and GloVe by Stanford (Pennington, Socher and Manning). fastText seems to be a fairly popular for multi-lingual sub-word embeddings.

4.3.3.1.1. 单词嵌入

E m b e d d i n g

Paper

O r g a n i s a t i o n

g e n s i m - T r a i n i n g S u p p o r t

Blogs

w o r d 2 v e c

Official Implementation, T.Mikolove et al. 2013. Distributed Representations of Words and Phrases and their Compositionality. pdf

G o o g l e

Y e s : h e a v y _ c h e c k _ m a r k :

Visual explanations by colah at Deep Learning, NLP, and Representations; gensim’s Making Sense of word2vec

G l o V e

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. pdf

S t a n f o r d

N o : n e g a t i v e _ s q u a r e d _ c r o s s _ m a r k :

Morning Paper on GloVe by acoyler

f a s t T e x t

Official Implementation, T. Mikolov et al. 2017. Enriching Word Vectors with Subword Information. pdf

F a c e b o o k

Y e s : h e a v y _ c h e c k _ m a r k :

Fasttext: Under the Hood _

Notes for Beginners:

4.3.3.1.2. 基于句子和语言模型的词嵌入