技术
====
文字嵌入
--------
Text embeddings allow deep learning to be effective on smaller datasets.
These are often first inputs to a deep learning archiectures and most
popular way of transfer learning in NLP. Embeddings are simply vectors
or a more generically, real valued representations of strings. Word
embeddings are considered a great starting point for most deep NLP
tasks.
The most popular names in word embeddings are word2vec by Google
(Mikolov) and GloVe by Stanford (Pennington, Socher and Manning).
fastText seems to be a fairly popular for multi-lingual sub-word
embeddings.
单词嵌入
~~~~~~~~
+---+-----------------------------------+---+---+---------------------------+
| E | Paper | O | g | Blogs |
| m | | r | e | |
| b | | g | n | |
| e | | a | s | |
| d | | n | i | |
| d | | i | m | |
| i | | s | - | |
| n | | a | T | |
| g | | t | r | |
| | | i | a | |
| | | o | i | |
| | | n | n | |
| | | | i | |
| | | | n | |
| | | | g | |
| | | | S | |
| | | | u | |
| | | | p | |
| | | | p | |
| | | | o | |
| | | | r | |
| | | | t | |
+===+===================================+===+===+===========================+
| w | `Official | G | Y | Visual explanations by |
| o | Implementation `__, | o | s | NLP, and |
| d | T.Mikolove et al. 2013. | g | : | Representations `__; |
| c | `pdf `__ | | c | se-of-word2vec>`__ |
| | | | h | |
| | | | e | |
| | | | c | |
| | | | k | |
| | | | _ | |
| | | | m | |
| | | | a | |
| | | | r | |
| | | | k | |
| | | | : | |
+---+-----------------------------------+---+---+---------------------------+
| G | Jeffrey Pennington, Richard | S | N | `Morning Paper on |
| l | Socher, and Christopher D. | t | o | GloVe `__ |
| | bs/glove.pdf>`__ | o | g | by acoyler |
| | | r | a | |
| | | d | t | |
| | | | i | |
| | | | v | |
| | | | e | |
| | | | _ | |
| | | | s | |
| | | | q | |
| | | | u | |
| | | | a | |
| | | | r | |
| | | | e | |
| | | | d | |
| | | | _ | |
| | | | c | |
| | | | r | |
| | | | o | |
| | | | s | |
| | | | s | |
| | | | _ | |
| | | | m | |
| | | | a | |
| | | | r | |
| | | | k | |
| | | | : | |
+---+-----------------------------------+---+---+---------------------------+
| f | `Official | F | Y | `Fasttext: Under the |
| a | Implementation `__, | c | s | science.com/fasttext-unde |
| t | T. Mikolov et al. 2017. Enriching | e | : | r-the-hood-11efc57b2b3>`_ |
| T | Word Vectors with Subword | b | h | _ |
| e | Information. | o | e | |
| x | `pdf `__ | k | v | |
| | | | y | |
| | | | _ | |
| | | | c | |
| | | | h | |
| | | | e | |
| | | | c | |
| | | | k | |
| | | | _ | |
| | | | m | |
| | | | a | |
| | | | r | |
| | | | k | |
| | | | : | |
+---+-----------------------------------+---+---+---------------------------+
Notes for Beginners:
- Thumb Rule: **fastText >> GloVe > word2vec**
- You can find `pre-trained fasttext
Vectors `__ in
several languages
- If you are interested in the logic and intuition behind word2vec and
GloVe: `The Amazing Power of Word
Vectors `__
and introduce the topics well
- `arXiv: Bag of Tricks for Efficient Text
Classification `__, and `arXiv:
FastText.zip: Compressing text classification
models `__ were released as part of
fasttext
基于句子和语言模型的词嵌入
~~~~~~~~~~~~~~~~~~~~~~~~~~
- *ElMo* from `Deep Contextualized Word
Represenations `__ - `PyTorch
implmentation `__
- `TF Implementation `__
- *ULimFit* aka `Universal Language Model Fine-tuning for Text
Classification `__ by Jeremy Howard
and Sebastian Ruder
- *InferSent* from `Supervised Learning of Universal Sentence
Representations from Natural Language Inference
Data `__ by facebook
- *CoVe* from `Learned in Translation: Contextualized Word
Vectors `__
- *Pargraph vectors* from `Distributed Representations of Sentences and
Documents `__.
See `doc2vec tutorial at
gensim `__
- `sense2vec `__ - on word sense
disambiguation
- `Skip Thought Vectors `__ - word
representation method
- `Adaptive skip-gram `__ - similar
approach, with adaptive properties
- `Sequence to Sequence
Learning `__
- word vectors for machine translation
问答和知识提取
--------------
- `DrQA: Open Domain Question
Answering `__ by facebook
on Wikipedia data
- DocQA: `Simple and Effective Multi-Paragraph Reading
Comprehension `__ by AllenAI
- `Markov Logic Networks for Natural Language Question
Answering `__
- `Template-Based Information Extraction without the
Templates `__
- `Relation extraction with matrix factorization and universal
schemas `__
- `Privee: An Architecture for Automatically Analyzing Web Privacy
Policies `__
- `Teaching Machines to Read and
Comprehend `__ - DeepMind paper
- `Relation Extraction with Matrix Factorization and Universal
Schemas `__
- `Towards a Formal Distributional Semantics: Simulating Logical
Calculi with Tensors `__
- `Presentation slides for MLN
tutorial `__
- `Presentation slides for QA applications of
MLNs `__
- `Presentation
slides `__