Why does word2vec use 2 representations for each word? -


i trying understand why word2vec's skipgram model has 2 representations each word (the hidden representation word embedding) , output representation (also called context word embedding) . generality context can (not words) or there more fundamental reason

i recommend read article word2vec : http://arxiv.org/pdf/1402.3722v1.pdf

they give intuition why 2 representations in footnote : not word appears in own context, want minimize probability p(w|w). if use same vectors w context w center word, cannot minimize p(w|w) (computed via dot product) if keep word embeddings in unit circle.

but intuition, don't know if there clear justification this...

imho, real reason why use different representations because manipulate entities of different nature. "dog" context not considered same "dog" center word because not. basicly manipulate big matrices of occurences (word,context), trying maximize probability of these pairs happen. theoreticaly use contexts bigrams, trying maximize instance probability of (word="for", context="to maximize"), , assign vector representation "to maximize". don't because there many representations compute, , have reeeeeally sparse matrix, think idea here : fact use "1-grams" context particular case of kinds of context use.

that's how see it, , if it's wrong please correct !


Comments

Popular posts from this blog

Payment information shows nothing in one page checkout page magento -

tcpdump - How to check if server received packet (acknowledged) -