Word2Vec With TensorFlow - GitHub Pages

Transcription

Word2Vec withTensorFlowDepartment of Computer Science,National Tsing Hua University, TaiwanDeep Learning

Word embedding How to represent words in machine learning?My favorite pet is cat .-Treat words as discrete atomic symbols, and therefore ‘cat’may be represented as ‘1’ and ‘pet’ as ‘2.’-However, this embedding includes no semanEcs. Taskslearning from this representaEon causes models hard totrain.-BeHer soluEons?

Word embedding IntuiEon: Modeling relaEonships between words.--SemanEcs Similarity๏CBOW (ConEnuous Bag-of-Words model),๏Skip-GramGrammar / Syntax

CBOW & Skip-Gram AssumpEons in CBOW & Skip-Gram:-Words with similar meaning / domain have the closecontexts.My favorite pet is . cat, dog

CBOW & Skip-Gram CBOW: PredicEng the targetwords with context words. Skip-Gram: PredicEngcontext words from thetarget words.

CBOW & Skip-Gram For example, given the sentence:the quick brown fox jumped over the lazy dog. CBOW will be trained on the dataset:([the, brown], quick), ([quick, fox], brown), Skip-Gram will be trained on the dataset:(quick, the), (quick, brown), (brown, quick),(brown, fox)

Picture from https://kavita-ganesan.com/

Skip-GramPicture from https://lilianweng.github.io/

SoWmax & Words classificaEon Recap the soWmax funcEon:σ(z)i e ziK j 1 e zj While training the embedding, we need to sum over the enErevocabulary in the denominator, which is costly.p(w i x) exp(h vw′i ) j V exp(h vw′j ) V (vocabulary size) is thousands of millions

Noise ContrasEve EsEmaEon IntuiEon: EsEmate the parameters by learning to discriminatebetween the data w and some arEficially generated noise w̃.maximum likelihood estimation noise contrastive estimation ObjecEve funcEon:arg max log p(d 1 wt , x) Derive form:N w̃i Qlog p(d 0 w̃i, x)Np(wt x)Nq(w̃i)arg max log logp(wt x) Nq(w̃) w̃ p(wt x) Nq(w̃i) Qi

Noise ContrasEve EsEmaEon IntuiEon: EsEmate the parameters by learning to discriminatebetween the data w and some arEficially generated noise w̃.maximum likelihood estimation noise contrastive estimation AWer training: evaluate with soWmax funcEon.

Homework FuncEonal API:

Homework Model subclassing:

Reference noisecontrasEveesEmaEon rning-word-embedding.html hHps://aegis4048.github.io/opEmize computaEonal efficiency of skipgram with negaEve sampling 0a.pdf pdf hHps://www.tensorflow.org/api docs/python/k/nn/nce loss

Word embedding How to represent words in machine learning? -Treat words as discrete atomic symbols, and therefore 'cat' may be represented as '1' and 'pet' as '2.' -However, this embedding includes no semanEcs.Tasks learning from this representaEon causes models hard to train.